Transformers

Latest version: v4.46.3

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 20 of 30

4.6.0

Not secure
Transformers aren't just for text - they can handle a huge range of input types, and there's been a flurry of papers and new models in the last few months applying them to vision tasks that had traditionally been dominated by convolutional networks. With this release, we're delighted to announce that several state-of-the-art pretrained vision and multimodal text+vision transformer models are now accessible in the huggingface/transformers repo. Give them a try!

ViT (NielsRogge)

Two new models are released as part of the ViT implementation: `ViTModel` and `ViTForImageClassification`, in PyTorch.

ViT is an image transformer-based model obtaining state-of-the-art results on image classification tasks. It was the first paper that successfully trained a Transformer encoder on ImageNet, attaining very good results compared to familiar convolutional architectures.

The Vision Transformer (ViT) model was proposed in An [Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=vit

DeiT (NielsRogge)

Three new models are released as part of the DeiT implementation: `DeiTModel`, `DeiTForImageClassification` and `DeiTForImageClassificationWithTeacher`, in PyTorch.

DeiT is an image transformer model similar to the ViT model. DeiT (data-efficient image transformers) models are more efficiently trained transformers for image classification, requiring far less data and far less computing resources compared to the original ViT models.

The DeiT model was proposed in [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deit

- Add DeiT (PyTorch) 11056 (NielsRogge)

CLIP (patil-suraj)

Three new models are released as part of the CLIP implementation: `CLIPModel`, `CLIPVisionModel` and `CLIPTextModel`, in PyTorch.

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.

The CLIP model was proposed in [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=clip

- CLIP 11445 (patil-suraj)

BigBirdPegasus (vasudevgupta7)

BigBird is a sparse-attention-based transformer that extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it has been shown that applying sparse, global, and random attention approximates full attention while being computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context, BigBird has shown improved performance on various long document NLP tasks, such as question answering and summarization, compared to BERT or RoBERTa.

The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others.

- Add BigBirdPegasus 10991 (vasudevgupta7)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=bigbird_pegasus

LUKE (NielsRogge, ikuyamada)

LUKE is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps improve performance on various downstream tasks involving reasoning about entities such as named entity recognition, extractive and cloze-style question answering, entity typing, and relation classification.

The LUKE model was proposed in LUKE: [Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto.

- Add LUKE 11223 (NielsRogge, ikuyamada)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=luke

Megatron (jdemouth)

The MegatronBERT model is added to the library, giving access to the 345m variants.

It is implemented comes with nine different models: `MegatronBertModel`, `MegatronBertForMaskedLM`, `MegatronBertForCausalLM`, `MegatronBertForNextSentencePrediction`, `MegatronBertForPreTraining`, `MegatronBertForSequenceClassification`, `MegatronBertForMultipleChoice`, `MegatronBertForTokenClassification`, `MegatronBertForQuestionAnswering`, in PyTorch.

The MegatronBERT model was proposed in [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

- Add nvidia megatron models 10911 (jdemouth)

Hub integration in Transformers

The Hugging Face Hub integrates better within `transformers`, through two new added features:
- Models, configurations and tokenizers now have a `push_to_hub` method to automatically push their state to the hub.
- The `Trainer` can now automatically push its underlying model, configuration and tokenizer in a similar fashion. Additionally, it is able to create a draft of model card on the fly with the training hyperparameters and evaluation results.

- Auto modelcard 11599 (sgugger)
- Trainer push to hub 11328 (sgugger)

DeepSpeed ZeRO Stage 3 & ZeRO-Infinity

The `Trainer` now integrates two additional stages of ZeRO: ZeRO stage 3 for parameter partitioning, and ZeRO Infinity which extends CPU Offload with NVMe Offload.

- [DeepSpeed] ZeRO Stage 3 10753 (stas00) [release notes](https://github.com/huggingface/transformers/issues/11044)
- [Deepspeed] ZeRO-Infinity integration plus config revamp 11418 (stas00) [release notes](https://github.com/huggingface/transformers/issues/11464)
- PLease read both release notes for configuration file changes


Flax

Flax support is getting more robust, with model code stabilizing and new models being added to the library.

- [FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py 11470 (patrickvonplaten)
- [Flax] Add Electra models 11426 (CoderPat)
- Adds Flax BERT finetuning example on GLUE 11564 (marcvanzee)

TensorFlow

We welcome Rocketknight1 as a TensorFlow contributor. This version includes a brand new TensorFlow example based on Keras, which will be followed by examples covering most tasks.
Additionally, more TensorFlow setups are covered by adding support for AMD-based GPUs and M1 Macs.

- Merge new TF example script 11360 (Rocketknight1)
- Update TF text classification example 11496 (Rocketknight1)
- run_text_classification.py fix 11660 (Rocketknight1)
- Accept tensorflow-rocm package when checking TF availability 11595 (mvsjober)
- Add MacOS TF version 11674 (jplu)

Pipelines

Two new pipelines are added:

- Adding `AutomaticSpeechRecognitionPipeline`. 11337 (Narsil)
- Add the ImageClassificationPipeline 11598 (LysandreJik)

Notebooks

- [Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips 11142 (Muennighoff)
- add bigbird-pegasus evaluation notebook 11654 (vasudevgupta7)
- Vit notebooks + vit/deit fixes 11309 (NielsRogge)

General improvements and bugfixes

- [doc] gpt-neo 11098 (stas00)
- Auto feature extractor 11097 (sgugger)
- accelerate question answering examples with no trainer 11091 (theainerd)
- dead link fixed 11103 (cronoik)
- GPTNeo: handle padded wte (11078) 11079 (leogao2)
- fix: The 'warn' method is deprecated 11105 (stas00)
- [examples] fix white space 11099 (stas00)
- Dummies multi backend 11100 (sgugger)
- Some styling of the training table in Notebooks 11118 (sgugger)
- Adds a note to resize the token embedding matrix when adding special … 11120 (LysandreJik)
- [BigBird] fix bigbird slow tests 11109 (vasudevgupta7)
- [versions] handle version requirement ranges 11110 (stas00)
- Adds use_auth_token with pipelines 11123 (philschmid)
- Fix and refactor check_repo 11127 (sgugger)
- Fix typing error in Trainer class (prediction_step) 11138 (jannisborn)
- Typo fix of the name of BertLMHeadModel in BERT doc 11133 (forest1988)
- [run_clm] clarify why we get the tokenizer warning on long input 11145 (stas00)
- [trainer] solve "scheduler before optimizer step" warning 11144 (stas00)
- Add fairscale and deepspeed back to the CI 11147 (LysandreJik)
- Updates SageMaker docs for updating DLCs 11140 (philschmid)
- Don't duplicate logs in TensorBoard and handle --use_env 11141 (sgugger)
- Run mlm pad to multiple for fp16 11128 (ak314)
- [tests] relocate core integration tests 11146 (stas00)
- [setup] extras[docs] must include 'all' 11148 (stas00)
- Add support for multiple models for one config in auto classes 11150 (sgugger)
- [setup] make fairscale and deepspeed setup extras 11151 (stas00)
- typo 11152 (stas00)
- Fix LogitsProcessor documentation 11130 (k-tahiro)
- Correct typographical error in README.md 11161 (Seyviour)
- Make `get_special_tokens_mask` consider all tokens 11163 (sgugger)
- Add a special tokenizer for CPM model 11068 (JetRunner)
- [examples/translation] support mBART-50 and M2M100 fine-tuning 11170 (patil-suraj)
- [examples run_clm] fix _LazyModule hasher error 11168 (stas00)
- added json dump and extraction of train run time 11167 (philschmid)
- Minor typos fixed 11182 (cronoik)
- model_path should be ignored as the checkpoint path 11157 (tsuchm)
- Added documentation for data collator. 10941 (fghuman)
- Fix typo 11188 (tma15)
- Replaced `which` with `who` 11183 (cronoik)
- Import torch.utils.checkpoint in ProphetNet 11214 (LysandreJik)
- Sagemaker test docs update for framework upgrade 11206 (philschmid)
- Use MSELoss with single class label in (M)BartForSequenceClassification 11178 (calpt)
- wav2vec2 converter: create the proper vocab.json while converting fairseq wav2vec2 finetuned model 11041 (cceyda)
- Add Matt as the TensorFlow reference 11212 (LysandreJik)
- Fix GPT-2 warnings 11213 (LysandreJik)
- fix docs for decoder_input_ids 11221 (patil-suraj)
- Add documentation for BertJapanese 11219 (forest1988)
- Replace error by warning when loading an architecture in another 11207 (sgugger)
- Refactor GPT2 11225 (patil-suraj)
- Doc check: a bit of clean up 11224 (sgugger)
- added cache_dir=model_args.cache_dir to all example with cache_dir arg 11220 (philschmid)
- Avoid using no_sync on SageMaker DP 11229 (sgugger)
- Indent code block in the documentation 11233 (sgugger)
- Run CI on deepspeed and fairscale 11172 (LysandreJik)
- [Deepspeed] zero3 tests band aid 11235 (stas00)
- Wav2Vec2 CommonVoice training - Save the processor before training starts 10910 (Nithin-Holla)
- Make "embeddings" plural in warning message within tokenization_utils_base 11228 (jstremme)
- Stale bot updated 10562 (LysandreJik)
- Close open files to suppress ResourceWarning 11240 (parakalan)
- Fix dimention misspellings. 11238 (odellus)
- Add prefix to examples in model_doc rst 11226 (forest1988)
- [troubleshooting] add 2 points of reference to the offline mode 11236 (stas00)
- Fix 10128 11248 (sgugger)
- [deepspeed] test on one node 2 gpus max 11237 (stas00)
- Trainer iterable dataset 11254 (sgugger)
- Adding pipeline task aliases. 11247 (Narsil)
- Support for set_epoch in IterableDataset 11258 (sgugger)
- Tokenizer fast save 11234 (sgugger)
- update dependency_versions_table 11273 (stas00)
- Workflow fixes 11270 (LysandreJik)
- Enabling multilingual models for translation pipelines. 10536 (Narsil)
- Trainer support for IterableDataset for evaluation and predict 11286 (sgugger)
- move device statements outside if statements 11292 (e-yi)
- modify double considering special tokens in `language_modeling.py` 11275 (taepd)
- [Trainer] fix the placement on device with fp16_full_eval 11322 (stas00)
- [Trainer] Add a progress bar for batches skipped 11324 (sgugger)
- Load checkpoint without re-creating the model 11318 (sgugger)
- Added translation example script 11196 (rajvi-k)
- [Generate] Remove outdated code 11331 (patrickvonplaten)
- [GPTNeo] create local attention mask ones 11335 (patil-suraj)
- Update to use datasets remove_cloumns method 11343 (sgugger)
- Add an error message for Reformer w/ .backward() 11117 (forest1988)
- Removed `max_length` from being mandatory within `generate`. 11314 (Narsil)
- Honor contributors to models 11329 (sgugger)
- [deepspeed] fix resume from checkpoint 11352 (stas00)
- Examples reorg 11350 (sgugger)
- Extract metric_key_prefix during NotebookProgressCallback.on_evaluate 11347 (lewtun)
- [testing doc] bring doc up to date 11359 (stas00)
- Remove boiler plate code 11340 (patrickvonplaten)
- Move old TF text classification script to legacy 11361 (Rocketknight1)
- [contributing doc] explain/link to good first issue 11346 (stas00)
- Fix token_type_ids error for big_bird model. 11355 (wlhgtc)
- [Wav2Vec2] Fix special tokens for Wav2Vec2 tokenizer 11349 (patrickvonplaten)
- [Flax] Correct typo 11374 (patrickvonplaten)
- [run_translation.py] fix typo 11372 (johnson7788)
- Add space 11373 (tma15)
- Correctly cast num_train_epochs to int 11379 (Rocketknight1)
- Fix typo 11369 (penut85420)
- Fix Trainer with remove_unused_columns=False 11382 (sgugger)
- [Flax] Big FlaxBert Refactor 11364 (patrickvonplaten)
- [Flax] Typo 11393 (patrickvonplaten)
- [Flax] Correct Flax <=> PyTorch conversion 11394 (patrickvonplaten)
- Fix small typo in text 11396 (maksym-del)
- Fix typos in README for text-classification 11391 (yoshitomo-matsubara)
- [Blenderbot] Integration Test should be slow 11395 (patrickvonplaten)
- Fixed trainer total_flos relaoding in distributed mode 11383 (TevenLeScao)
- [Wav2Vec2] Correct conversion script 11400 (patrickvonplaten)
- added support for exporting of T5 models to onnx with past_key_values. 10651 (Ki6an)
- Fixing bug in generation 11297 (nicola-decao)
- Fix cross-attention head mask for Torch encoder-decoder models 10605 (stancld)
- Default to accuracy metric in run_glue_no_trainer 11405 (sgugger)
- Enable option for subword regularization in `XLMRobertaTokenizer` 11149 (PhilipMay)
- wrong parentclass in documentation 11410 (cronoik)
- EncoderDecoderConfigs should not create new objects 11300 (cronoik)
- Updating checkpoint for GPT2ForSequenceClassification 11334 11434 (abiolaTresor)
- [BigBird] enable BigBirdForQuestionAnswering to return pooler output 11439 (vasudevgupta7)
- Upgrade Black to version 21.4b0 11442 (patrickvonplaten)
- TF BART models - Add `cross_attentions` to model output and fix cross-attention head masking 10699 (stancld)
- Add basic support for FP16 in SageMaker model parallelism 11407 (sgugger)
- Fix link to the TPU launcher script in the pytorch examples 11427 (amineabdaoui)
- Typo fixes 11432 (LSinev)
- Pass along seed to DistributedSampler 11406 (sgugger)
- Clarify description of the is_split_into_words argument 11449 (kstathou)
- [docs] fix invalid class name 11438 (stas00)
- [Makefile] make sure to test against the local checkout 11437 (stas00)
- Give each hub test a different repo name 11453 (sgugger)
- [Examples] Fixes inconsistency around eval vs val and predict vs test 11380 (bhadreshpsavani)
- Variable Correction for Consistency in Distillation Example 11444 (jaimeenahn)
- Remove max length beam scorer 11378 (GeetDsa)
- update QuickTour docs to reflect model output object 11462 (hamelsmu)
- Finish Making Quick Tour respect the model object 11467 (hamelsmu)
- fix docs for decoder_input_ids 11466 (patil-suraj)
- Update min versions in README and add Flax 11472 (sgugger)
- Update `PreTrainedTokenizerBase` to check/handle batch length for `text_pair` parameter 11486 (hamelsmu)
- [Docs] remove paragraph on CI from installation instructions 11493 (hamelsmu)
- [Flax] Add docstrings & model outputs 11498 (patrickvonplaten)
- Reformat to make code clearer in tokenizer call 11497 (sgugger)
- solved coefficient issue for the TF version of gelu_fast 11514 (michaelbenayoun)
- Split checkpoint from model_name_or_path in examples 11492 (sgugger)
- Pin HuggingFace Hub dependency 11502 (LysandreJik)
- correct incorrect dimension comment in Longformer model 11494 (fredo838)
- Fix `sp_model_kwargs` param missing at unpickle in `XLMRobertaTokenizer` 11430 (PhilipMay)
- [Master] Make style 11520 (patrickvonplaten)
- Update README.md 11489 (mrm8488)
- T5 Gradient Checkpointing 11353 (ceshine)
- Implement Fast Tokenization for Deberta 11387 (ShubhamSanghvi)
- Accepts BatchEncoding in LengthGroupedSampler 11431 (tma15)
- Fix do_eval default value in training_args.py 11511 (bonniehyeon)
- [examples, translation/summerization] resize token embeds 11524 (patil-suraj)
- Run model templates on master 11527 (LysandreJik)
- [Examples] Added support for test-file in QA examples with no trainer 11510 (bhadreshpsavani)
- Add Stas and Suraj as authors 11526 (sgugger)
- Improve task summary docs 11513 (hamelsmu)
- [debug utils] activation/weights underflow/overflow detector 11274 (stas00)
- [DeepSpeed] fp32 support 11499 (stas00)
- Fix examples in M2M100 docstrings 11540 (lewtun)
- [Flax BERT/Roberta] few small fixes 11558 (patil-suraj)
- [Wav2Vec2] Fix convert 11562 (patrickvonplaten)
- Remove `datasets` submodule. 11563 (LysandreJik)
- fix the mlm longformer example by changing [MASK] to <mask> 11559 (fredo838)
- [Wav2vec2] Fixed tokenization mistakes while adding single-char tokens to tokenizer 11538 (Muktan)
- Fix metric computation in `run_glue_no_trainer` 11569 (sgugger)
- Fixes a useless warning in `generate`. 11566 (Narsil)
- Fix checkpointing in SageMaker MP 11481 (sgugger)
- Update training tutorial 11533 (sgugger)
- [Deepspeed] fix resize_token_embeddings 11572 (stas00)
- Add multi-class, multi-label and regression to transformers 11012 (abhi1thakur)
- add importlib_metadata as dependency as it is required for py<3.8 11490 (cdeepali)
- Enable added tokens 11325 (LysandreJik)
- Make quality scripts work when one backend is missing. 11573 (sgugger)
- Removes SageMakerTrainer code but keeps class as wrapper 11587 (philschmid)
- Reproducible checkpoint 11582 (sgugger)
- [trainer] document resume randomness 11588 (stas00)
- [template runner CI] copies need to be fixed too 11585 (stas00)
- add importlib_metadata and huggingface_hub as dependency in the conda recipe 11591 (cdeepali)
- Pytorch - Lazy initialization of models 11471 (patrickvonplaten)
- fix head_mask for albert encoder part(`AlbertTransformer`) 11596 (baeseongsu)
- Fix Python version 11607 (LysandreJik)
- fix typo in command 11605 (vipulraheja)
- Fix typo in docstring 11611 (eldarkurtic)
- Re-styling in seq2seq attention 11613 (sgugger)
- [Lazy init] Fix edge cases 11615 (patrickvonplaten)
- [cuda ext tests] fixing tests 11619 (stas00)
- Fix RNG saves in distributed mode. 11620 (sgugger)
- Fix comment in run_clm_no_trainer.py 11624 (cccntu)
- make fix copy 11627 (patrickvonplaten)
- Reduce to 1 worker and set timeout for GPU TF tests 11633 (LysandreJik)
- [self-push CI] sync with self-scheduled 11637 (stas00)
- [examples] fix sys.path in conftest.py 11636 (stas00)
- [Examples] Check key exists in datasets first 11503 (oToToT)
- [Examples] Fix invalid links after reorg 11650 (oToToT)
- Update code example 11631 (NielsRogge)
- Add missing git dependency for RAG example 11634 (lhoestq)
- updated user permissions based on umask 11119 (bhavitvyamalik)
- Big Bird Fast Tokenizer implementation 11075 (tanmaylaud)
- Save scaler state dict when checkpointing 11663 (sgugger)
- [BigBird Pegasus] Add config to auto tokenizer 11667 (patrickvonplaten)
- Fixes NoneType exception when topk is larger than one coupled with a small context in the Question-Answering pipeline 11628 (psorianom)
- Add --text_column to run_summarization_no_trainer 11673 (cccntu)
- Fix docstring of description about input_ids 11672 (nxznm)
- Grammar and style edits for the frontpage README 11679 (Rocketknight1)
- Fix TF Roberta for mixed precision training 11675 (jplu)
- Test checkpointing 11682 (sgugger)
- Fix clip docs 11694 (patil-suraj)
- [Flax] Updates README and fixes bug 11701 (marcvanzee)
- remove defaults to None if optional 11703 (PhilipMay)

4.5.1

Not secure
- Fix `pipeline` when used with private models (11123)
- Fix loading an architecture in an other (11207)

4.5.0

Not secure
BigBird (vasudevgupta7)

Seven new models are released as part of the BigBird implementation: `BigBirdModel`, `BigBirdForPreTraining`, `BigBirdForMaskedLM`, `BigBirdForCausalLM`, `BigBirdForSequenceClassification`, `BigBirdForMultipleChoice`, `BigBirdForQuestionAnswering` in PyTorch.

BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.

The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

It is released with an accompanying blog post: [Understanding BigBird's Block Sparse Attention](https://huggingface.co/blog/big-bird)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird

- BigBird 10183 (vasudevgupta7)
- [BigBird] Fix big bird gpu test 10967 (patrickvonplaten)
- [Notebook] add BigBird trivia qa notebook 10995 (patrickvonplaten)
- [Docs] Add blog to BigBird docs 10997 (patrickvonplaten)

GPT Neo (patil-suraj)

Two new models are released as part of the GPT Neo implementation: `GPTNeoModel`, `GPTNeoForCausalLM` in PyTorch.

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public.

The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo

- GPT Neo 10848 (patil-suraj)
- GPT Neo few fixes 10968 (patil-suraj)
- GPT Neo configuration needs to be set to use GPT2 tokenizer 10992 (LysandreJik)
- [GPT Neo] fix example in config 10993 (patil-suraj)
- GPT Neo cleanup 10985 (patil-suraj )

Examples

Features have been added to some examples, and additional examples have been added.

Raw training loop examples

Based on the [`accelerate`](https://github.com/huggingface/accelerate) library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!

- Expand a bit the presentation of examples 10799 (sgugger)
- Add `examples/multiple-choice/run_swag_no_trainer.py` 10934 (stancld)
- Update the example template for a no Trainer option 10865 (sgugger)
- Add `examples/run_ner_no_trainer.py` 10902 (stancld)
- Add `examples/language_modeling/run_mlm_no_trainer.py` 11001 (hemildesai)
- Add `examples/language_modeling/run_clm_no_trainer.py` 11026 (hemildesai)

Standardize examples with Trainer

Thanks to the amazing contributions of bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.

- [Example] Updating Question Answering examples for Predict Stage 10792 (bhadreshpsavani)
- [Examples] Added predict stage and Updated Example Template 10868 (bhadreshpsavani)
- [Example] Fixed finename for Saving null_odds in the evaluation stage in QA Examples 10939 (bhadreshpsavani)
- [trainer] Fixes Typo in Predict Method of Trainer 10861 (bhadreshpsavani)

Trainer & SageMaker Model Parallelism

The `Trainer` now supports SageMaker model parallelism out of the box, the old `SageMakerTrainer` is deprecated as a consequence and will be removed in version 5.

- Merge trainers 10975 (sgugger)
- added new notebook and merge of trainer 11015 (philschmid)

FLAX

FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.

Auto models now have a FLAX implementation.

- [Flax] Add general conversion script 10809 (patrickvonplaten)
- [Flax] Add other BERT classes 10977 (patrickvonplaten)
- Refactor AutoModel classes and add Flax Auto classes 11027 (sgugger)

General improvements and bugfixes

- Patches the full import failure and adds a test 10750 (LysandreJik)
- Patches full import failure when sentencepiece is not installed 10752 (LysandreJik)
- [Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed 10464 (cli99)
- Fix ProphetNet Flaky Test 10771 (patrickvonplaten)
- [doc] [testing] extend the pytest -k section with more examples 10761 (stas00)
- Wav2Vec2 - fix flaky test 10773 (patrickvonplaten)
- [DeepSpeed] simplify init 10762 (stas00)
- [DeepSpeed] improve checkpoint loading code plus tests 10760 (stas00)
- [trainer] make failure to find a resume checkpoint fatal + tests 10777 (stas00)
- [Issue template] need to update/extend who to tag 10728 (stas00)
- [examples] document resuming 10776 (stas00)
- Check copies blackify 10775 (sgugger)
- Smmp batch not divisible by microbatches fix 10778 (mansimane)
- Add support for detecting intel-tensorflow version 10781 (mfuntowicz)
- wav2vec2: support datasets other than LibriSpeech 10581 (elgeish)
- add run_common_voice script 10767 (patil-suraj)
- Fix bug in input check for LengthGroupSampler 10783 (thominj)
- [file_utils] do not gobble certain kinds of requests.ConnectionError 10235 (julien-c)
- from_pretrained: check that the pretrained model is for the right model architecture 10586 (vimarshc)
- [examples/seq2seq/README.md] fix t5 examples 10734 (stas00)
- Fix distributed evaluation 10795 (sgugger)
- Add XLSR-Wav2Vec2 Fine-Tuning README.md 10786 (patrickvonplaten)
- addressing vulnerability report in research project deps 10802 (stas00)
- fix backend tokenizer args override: key mismatch 10686 (theo-m)
- [XLSR-Wav2Vec2 Info doc] Add a couple of lines 10806 (patrickvonplaten)
- Add transformers id to hub requests 10811 (philschmid)
- wav2vec doc tweaks 10808 (julien-c)
- Sort init import 10801 (sgugger)
- [wav2vec sprint doc] add doc for Local machine 10828 (patil-suraj)
- Add new community notebook - wav2vec2 with GPT 10794 (voidful)
- [Wav2Vec2] Small improvements for wav2vec2 info script 10829 (patrickvonplaten)
- [Wav2Vec2] Small tab fix 10846 (patrickvonplaten)
- Fix: typo in FINE_TUNE_XLSR_WAV2VEC2.md 10849 (qqhann)
- Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert 10818 (dependabot[bot])
- [vulnerability] in example deps fix 10817 (stas00)
- Correct AutoConfig call docstrings 10822 (Sebelino)
- [makefile] autogenerate target 10814 (stas00)
- Fix on_step_begin and on_step_end Callback Sequencing 10839 (siddk)
- feat(wandb): logging and configuration improvements 10826 (borisdayma)
- Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases 10823 (ruanchaves)
- Use DataCollatorForSeq2Seq in run_summarization in all cases 10856 (elsanns)
- [Generate] Add save mode logits processor to remove nans and infs if necessary 10769 (patrickvonplaten)
- Make convert_to_onnx runable as script again 10857 (sgugger)
- [trainer] fix nan in full-fp16 label_smoothing eval 10815 (stas00)
- Fix p_mask cls token masking in question-answering pipeline 10863 (mmaslankowska-neurosys)
- Amazon SageMaker Documentation 10867 (philschmid)
- [file_utils] import refactor 10859 (stas00)
- Fixed confusing order of args in generate() docstring 10862 (RafaelWO)
- Sm trainer smp init fix 10870 (philschmid)
- Fix test_trainer_distributed 10875 (sgugger)
- Add new notebook links in the docs 10876 (sgugger)
- error type of tokenizer in __init__ definition 10879 (ZhengZixiang)
- [Community notebooks] Add notebook for fine-tuning Bart with Trainer in two langs 10883 (elsanns)
- Fix overflowing bad word ids 10889 (LysandreJik)
- Remove version warning in pretrained BART models 10890 (sgugger)
- Update Training Arguments Documentation: ignore_skip_data -> ignore_data_skip 10891 (siddk)
- run_glue_no_trainer: datasets -> raw_datasets 10898 (jethrokuan)
- updates sagemaker documentation 10899 (philschmid)
- Fix comment in modeling_t5.py 10886 (lexhuismans)
- Rename NLP library to Datasets library 10920 (tomy0000000)
- [vulnerability] fix dependency 10914 (stas00)
- Add ImageFeatureExtractionMixin 10905 (sgugger)
- Return global attentions (see 7514) 10906 (gui11aume)
- Updated colab links in readme of examples 10932 (WybeKoper)
- Fix initializing BertJapaneseTokenizer with AutoTokenizers 10936 (singletongue)
- Instantiate model only once in pipeline 10888 (sgugger)
- Use pre-computed lengths, if available, when grouping by length 10953 (pcuenca)
- [trainer metrics] fix cpu mem metrics; reformat runtime metric 10937 (stas00)
- [vulnerability] dep fix 10954 (stas00)
- Fixes in the templates 10951 (sgugger)
- Sagemaker test 10925 (philschmid)
- Fix summarization notebook link 10959 (philschmid)
- improved sagemaker documentation for git_config and examples 10966 (philschmid)
- Fixed a bug where the `pipeline.framework` would actually contain a fully qualified model. 10970 (Narsil)
- added py7zr 10971 (philschmid)
- fix md file to avoid evaluation crash 10962 (ydshieh)
- Fixed some typos and removed legacy url 10989 (WybeKoper)
- Sagemaker test fix 10987 (philschmid)
- Fix the checkpoint for I-BERT 10994 (LysandreJik)
- Add more metadata to the user agent 10972 (sgugger)
- Enforce string-formatting with f-strings 10980 (sgugger)
- In the group by length documentation length is misspelled as legnth 11000 (JohnnyC08)
- Fix Adafactor documentation (recommend correct settings) 10526 (jsrozner)
- Improve the speed of adding tokens from added_tokens.json 10780 (cchen-dialpad)
- Add Vision Transformer and ViTFeatureExtractor 10950 (NielsRogge)
- DebertaTokenizer Rework closes 10258 10703 (cronoik)
- [doc] no more bucket 10793 (julien-c)
- Layout lm tf 2 10636 (atahmasb)
- fixed typo: logging instead of logger 11025 (versis)
- Add a script to check inits are consistent 11024 (sgugger)
- fix incorrect case for s|Pretrained|PreTrained| 11048 (stas00)
- [doc] fix code-block rendering 11053 (erensahin)
- Pin docutils 11062 (LysandreJik)
- Remove unnecessary space 11060 (LysandreJik)
- Some models have no tokenizers 11064 (LysandreJik)
- Documentation about loading a fast tokenizer within Transformers 11029 (LysandreJik)
- Add example for registering callbacks with trainers 10928 (amalad)
- Replace pkg_resources with importlib_metadata 11061 (konstin)
- Add center_crop to ImageFeatureExtractionMixin 11066 (sgugger)
- Document common config attributes 11070 (sgugger)
- Fix distributed gather for tuples of tensors of varying sizes 11071 (sgugger)
- Make a base init in FeatureExtractionMixin 11074 (sgugger)
- Add Readme for language modeling scripts with custom training loop and accelerate 11073 (hemildesai)
- HF emoji unicode doesn't work in console 11081 (stas00)
- added social thumbnail for docs 11083 (philschmid)
- added new merged Trainer test 11090 (philschmid)

4.4.61

This is mostly for `fx` and `onnx` issues!

** Fix regression loading dtype 34409 by SunMarc
** LLaVa: latency issues 34460 by zucchini-nlp
** Fix pix2struct 34374 by IlyasMoutawwakil
** Fix onnx non-exposable inplace aten op 34376 by IlyasMoutawwakil
** Fix torch.fx issue related to the new `loss_kwargs` keyword argument 34380 by michaelbenayoun

4.4.2

Not secure
- Add support for detecting intel-tensorflow version
- Fix distributed evaluation on SageMaker with distributed evaluation

4.4.0

Not secure
SpeechToText

Two new models are released as part of the S2T implementation: `Speech2TextModel` and `Speech2TextForConditionalGeneration`, in PyTorch.

Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.

The Speech2Text model was proposed in [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

- Speech2TextTransformer 10175 (patil-suraj)

M2M100

Two new models are released as part of the M2M100 implementation: `M2M100Model` and `M2M100ForConditionalGeneration`, in PyTorch.

M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.

The M2M100 model was proposed in [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100

- Add m2m100 10236 (patil-suraj)

I-BERT

Six new models are released as part of the I-BERT implementation: `IBertModel`, `IBertForMaskedLM`, `IBertForSequenceClassification`, `IBertForMultipleChoice`, `IBertForTokenClassification` and `IBertForQuestionAnswering`, in PyTorch.

I-BERT is a quantized version of RoBERTa running inference up to four times faster.

The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.

The I-BERT model was proposed in [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert

- I-BERT model support 10153 (kssteven418)
- [IBert] Correct link to paper 10445 (patrickvonplaten)
- Add I-BERT to README 10462 (LysandreJik)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

mBART-50

MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.

The MBart model was presented in [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50

- Add mBART-50 10154 (patil-suraj)

DeBERTa-v2

Fixe new models are released as part of the DeBERTa-v2 implementation: `DebertaV2Model`, `DebertaV2ForMaskedLM`, `DebertaV2ForSequenceClassification`, `DeberaV2ForTokenClassification` and `DebertaV2ForQuestionAnswering`, in PyTorch.

The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2

- Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… 10018 (BigBird01)
- DeBERTa-v2 fixes 10328 (LysandreJik)

Wav2Vec2

XLSR-Wav2Vec2

The XLSR-Wav2Vec2 model was proposed in [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

The checkpoint corresponding to that model is added to the model hub: [facebook/
wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)

- [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models 10648 (patrickvonplaten)

Training script

A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.

- Add Fine-Tuning for Wav2Vec2 10145 (patrickvonplaten)

Further improvements

The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.

- Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC 10089 (patrickvonplaten)
- Fix example in Wav2Vec2 documentation 10096 (abhishekkrthakur)
- [Wav2Vec2] Remove unused config 10457 (patrickvonplaten)
- [Wav2Vec2FeatureExtractor] smal fixes 10455 (patil-suraj)
- [Wav2Vec2] Improve Tokenizer & Model for batched inference 10117 (patrickvonplaten)
- [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer 10324 (patrickvonplaten)
- [Wav2Vec2 Example Script] Typo 10547 (patrickvonplaten)
- [Wav2Vec2] Make wav2vec2 test deterministic 10714 (patrickvonplaten)
- [Wav2Vec2] Fix documentation inaccuracy 10694 (MikeG112)

AMP & XLA Support for TensorFlow models

Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.

- Add AMP for TF Albert 10141 (jplu)
- Unlock XLA test for TF ConvBert 10207 (jplu)
- Making TF BART-like models XLA and AMP compliant 10191 (jplu)
- Making TF XLM-like models XLA and AMP compliant 10211 (jplu)
- Make TF CTRL compliant with XLA and AMP 10209 (jplu)
- Making TF GPT2 compliant with XLA and AMP 10230 (jplu)
- Making TF Funnel compliant with AMP 10216 (jplu)
- Making TF Lxmert model compliant with AMP 10257 (jplu)
- Making TF MobileBert model compliant with AMP 10259 (jplu)
- Making TF MPNet model compliant with XLA 10260 (jplu)
- Making TF T5 model compliant with AMP and XLA 10262 (jplu)
- Making TF TransfoXL model compliant with AMP 10264 (jplu)
- Making TF OpenAI GPT model compliant with AMP and XLA 10261 (jplu)
- Rework the AMP for TF XLNet 10274 (jplu)
- Making TF Longformer-like models compliant with AMP 10233 (jplu)

SageMaker Trainer for model parallelism

We are rolling out experimental support for model parallelism on SageMaker with a new `SageMakerTrainer` that can be used in place of the regular `Trainer`. This is a temporary class that will be removed in a future version, the end goal is to have `Trainer` support this feature out of the box.

- Add SageMakerTrainer for model paralellism 10122 (sgugger)
- Extend trainer logging for sm 10633 (philschmid)
- Sagemaker Model Parallel tensoboard writing fix 10403 (mansimane)
- Multiple fixes in SageMakerTrainer 10687 (sgugger)
- Add DistributedSamplerWithLoop 10746 (sgugger)

General improvements and bugfixes

- [trainer] deepspeed bug fixes and tests 10039 (stas00)
- Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py 10066 (cbjuan)
- remove token_type_ids from TokenizerBertGeneration output 10070 (sadakmed)
- [deepspeed tests] transition to new tests dir 10080 (stas00)
- Added integration tests for Pytorch implementation of the ELECTRA model 10073 (spatil6)
- Fix naming in TF MobileBERT 10095 (jplu)
- [examples/s2s] add test set predictions 10085 (patil-suraj)
- Logging propagation 10092 (LysandreJik)
- Fix some edge cases in report_to and add deprecation warnings 10100 (sgugger)
- Add head_mask and decoder_head_mask to TF LED 9988 (stancld)
- Replace strided slice with tf.expand_dims 10078 (jplu)
- Fix Faiss Import 10103 (patrickvonplaten)
- [RAG] fix generate 10094 (patil-suraj)
- Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test 10104 (abhishekkrthakur)
- doc: update W&B related doc 10086 (borisdayma)
- Remove speed metrics from default compute objective [WIP] 10107 (shiva-z)
- Fix tokenizers training in notebooks 10110 (n1t0)
- [DeepSpeed docs] new information 9610 (stas00)
- [CI] build docs faster 10115 (stas00)
- [scheduled github CI] add deepspeed fairscale deps 10116 (stas00)
- Line endings should be LF across repo and not CRLF 10119 (LysandreJik)
- Fix TF LED/Longformer attentions computation 10007 (jplu)
- remove adjust_logits_during_generation method 10087 (patil-suraj)
- [DeepSpeed] restore memory for evaluation 10114 (stas00)
- Update run_xnli.py to use Datasets library 9829 (Qbiwan)
- Add new community notebook - Blenderbot 10126 (lordtt13)
- [DeepSpeed in notebooks] Jupyter + Colab 10130 (stas00)
- [examples/run_s2s] remove task_specific_params and update rouge computation 10133 (patil-suraj)
- Fix typo in GPT2DoubleHeadsModel docs 10148 (M-Salti)
- [hf_api] delete deprecated methods and tests 10159 (julien-c)
- Revert propagation 10171 (LysandreJik)
- Conversion from slow to fast for BPE spm vocabs contained an error. 10120 (Narsil)
- Fix typo in comments 10157 (mrm8488)
- Fix typo in comment 10156 (mrm8488)
- [Doc] Fix version control in internal pages 10124 (sgugger)
- [t5 tokenizer] add info logs 9897 (stas00)
- Fix v2 model loading issue 10129 (BigBird01)
- Fix datasets set_format 10178 (sgugger)
- Fixing NER pipeline for list inputs. 10184 (Narsil)
- Add new model to labels that should not stale 10187 (LysandreJik)
- Check TF ops for ONNX compliance 10025 (jplu)
- [RAG] fix tokenizer 10167 (patil-suraj)
- Fix TF template 10189 (jplu)
- fix run_seq2seq.py; porting trainer tests to it 10162 (stas00)
- Specify dataset dtype 10195 (LysandreJik)
- [CI] make the examples sub-group of tests run always 10196 (stas00)
- [WIP][examples/seq2seq] move old s2s scripts to legacy 10136 (patil-suraj)
- set tgt_lang of MBart Tokenizer for summarization 10205 (HeroadZ)
- Store FLOS as floats to avoid overflow. 10213 (sgugger)
- Fix add_token_positions in custom datasets tutorial 10217 (joeddav)
- [trainer] fix ignored columns logger 10219 (stas00)
- Factor out methods 10215 (LysandreJik)
- Fix head masking for TFT5 models 9877 (stancld)
- [CI] 2 fixes 10248 (stas00)
- [trainer] refactor place_model_on_device logic, add deepspeed 10243 (stas00)
- [Trainer] doc update 10241 (stas00)
- Reduce the time spent for the TF slow tests 10152 (jplu)
- Introduce warmup_ratio training argument 10229 (tanmay17061)
- [Trainer] memory tracker metrics 10225 (stas00)
- Script for distilling zero-shot classifier to more efficient student 10244 (joeddav)
- [test] fix func signature 10271 (stas00)
- [trainer] implement support for full fp16 in evaluation/predict 10268 (stas00)
- [ISSUES.md] propose using google colab to reproduce problems 10270 (stas00)
- Introduce logging_strategy training argument 10267 (tanmay17061)
- [CI] Kill any run-away pytest processes 10281 (stas00)
- Patch zero shot distillation script cuda issue 10284 (joeddav)
- Move the TF NER example 10276 (jplu)
- Fix example links in the task summary 10291 (sgugger)
- fixes 10303 10304 (cronoik)
- [ci] don't fail when there are no zombies 10308 (stas00)
- fix typo in conversion script 10316 (tagucci)
- Add note to resize token embeddings matrix when adding new tokens to voc 10331 (LysandreJik)
- Deprecate prepare_seq2seq_batch 10287 (sgugger)
- [examples/seq2seq] defensive programming + expand/correct README 10295 (stas00)
- [Trainer] implement gradient_accumulation_steps support in DeepSpeed integration 10310 (stas00)
- Loading from last checkpoint functionality in Trainer.train 10334 (tanmay17061)
- [trainer] add Trainer methods for metrics logging and saving 10266 (stas00)
- Fix evaluation with label smoothing in Trainer 10338 (sgugger)
- Fix broken examples/seq2seq/README.md markdown 10344 (Wikidepia)
- [bert-base-german-cased] use model repo, not external bucket 10353 (julien-c)
- [Trainer/Deepspeed] handle get_last_lr() before first step() 10362 (stas00)
- ConvBERT fix torch <> tf weights conversion 10314 (abhishekkrthakur)

- fix deprecated reference `tokenizer.max_len` in glue.py 10220 (poedator)
- [trainer] move secondary methods into a separate file 10363 (stas00)
- Run GA on every push even on forks 10383 (LysandreJik)
- GA: only run model templates once 10388 (LysandreJik)
- Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding 10200 (mingruimingrui)
- Remove unused variable in example for Q&A 10392 (abhishekkrthakur)
- Ignore unexpected weights from PT conversion 10397 (LysandreJik)
- Add support for ZeRO-2/3 and ZeRO-offload in fairscale 10354 (sgugger)
- Fix None in add_token_positions - issue 10210 10374 (andreabac3)
- Make Barthez tokenizer tests a bit faster 10399 (sgugger)
- Fix run_glue evaluation when model has a label correspondence 10401 (sgugger)
- [ci, flax] non-existing models are unlikely to pass tests 10409 (julien-c)

- [LED] Correct Docs 10419 (patrickvonplaten)
- Add Ray Tune hyperparameter search integration test 10414 (krfricke)
- Ray Tune Integration Bug Fixes 10406 (amogkam)
- [examples] better model example 10427 (stas00)
- Fix conda-build 10431 (LysandreJik)
- [run_seq2seq.py] restore functionality: saving to test_generations.txt 10428 (stas00)
- updated logging and saving metrics 10436 (bhadreshpsavani)
- Introduce save_strategy training argument 10286 (tanmay17061)
- Adds terms to Glossary 10443 (darigovresearch)
- Fixes compatibility bug when using grouped beam search and constrained decoding together 10475 (mnschmit)
- Generate can return cross-attention weights too 10493 (Mehrad0711)
- Fix typos 10489 (WybeKoper)
- [T5] Fix speed degradation bug t5 10496 (patrickvonplaten)
- feat(docs): navigate with left/right arrow keys 10481 (ydcjeff)
- Refactor checkpoint name in BERT and MobileBERT 10424 (sgugger)
- remap MODEL_FOR_QUESTION_ANSWERING_MAPPING classes to names auto-generated file 10487 (stas00)
- Fix the bug in constructing the all_hidden_states of DeBERTa v2 10466 (felixgwu)
- Smp grad accum 10488 (sgugger)
- Remove unsupported methods from ModelOutput doc 10505 (sgugger)
- Not always consider a local model a checkpoint in run_glue 10517 (sgugger)
- Removes overwrites for output_dir 10521 (philschmid)
- Rework TPU checkpointing in Trainer 10504 (sgugger)
- [ProphetNet] Bart-like Refactor 10501 (patrickvonplaten)
- Fix example of custom Trainer to reflect signature of compute_loss 10537 (lewtun)
- Fixing conversation test for torch 1.8 10545 (Narsil)
- Fix torch 1.8.0 segmentation fault 10546 (LysandreJik)
- Fixed dead link in Trainer documentation 10554 (jwa018)
- Typo correction. 10531 (cliang1453)
- Fix embeddings for PyTorch 1.8 10549 (sgugger)
- Stale Bot 10509 (LysandreJik)
- Refactoring checkpoint names for multiple models 10527 (danielpatrickhug)
- offline mode for firewalled envs 10407 (stas00)
- fix tf doc bug 10570 (Sniper970119)
- [run_seq2seq] fix nltk lookup 10585 (stas00)
- Fix typo in docstring for pipeline 10591 (silvershine157)
- wrong model used for BART Summarization example 10582 (orena1)
- [M2M100] fix positional embeddings 10590 (patil-suraj)
- Enable torch 1.8.0 on GPU CI 10593 (LysandreJik)
- tokenization_marian.py: use current_spm for decoding 10357 (Mehrad0711)
- [trainer] fix double wrapping + test 10583 (stas00)
- Fix version control with anchors 10595 (sgugger)
- offline mode for firewalled envs (part 2) 10569 (stas00)
- [examples tests] various fixes 10584 (stas00)
- Added max_sample_ arguments 10551 (bhadreshpsavani)
- [examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me 10561 (stas00)
- Check layer types for Optimizer construction 10598 (sgugger)
- Speedup tf tests 10601 (LysandreJik)
- [docs] How to solve "Title level inconsistent" sphinx error 10600 (stas00)
- [FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor 10594 (patrickvonplaten)
- fix flaky m2m100 test 10604 (patil-suraj)
- [examples template] added max_sample args and metrics changes 10602 (bhadreshpsavani)
- Fairscale FSDP fix model save 10596 (sgugger)
- Fix tests of TrainerCallback 10615 (sgugger)
- Fixes an issue in `text-classification` where MNLI eval/test datasets are not being preprocessed. 10621 (allenwang28)
- [M2M100] remove final_logits_bias 10606 (patil-suraj)
- Add new GLUE example with no Trainer. 10555 (sgugger)
- Copy tokenizer files in each of their repo 10624 (sgugger)
- Document Trainer limitation on custom models 10635 (sgugger)
- Fix Longformer tokenizer filename 10653 (LysandreJik)
- Update README.md 10647 (Arvid-pku)
- Ensure metric results are JSON-serializable 10632 (sgugger)
- S2S + M2M100 should be available in tokenization_auto 10657 (LysandreJik)
- Remove special treatment for custom vocab files 10637 (sgugger)
- [S2T] fix example in docs 10667 (patil-suraj)
- W2v2 test require torch 10665 (LysandreJik)
- Fix Marian/TFMarian tokenization tests 10661 (LysandreJik)
- Fixes Pegasus tokenization tests 10671 (LysandreJik)
- Onnx fix test 10663 (mfuntowicz)
- Fix integration slow tests 10670 (sgugger)
- Specify minimum version for sacrebleu 10662 (LysandreJik)
- Add DeBERTa to MODEL_FOR_PRETRAINING_MAPPING 10668 (jeswan)
- Fix broken link 10656 (WybeKoper)
- fix typing error for HfArgumentParser for Optional[bool] 10672 (bfineran)
- MT5 integration test: adjust loss difference 10669 (LysandreJik)
- Adding new parameter to `generate`: `max_time`. 9846 (Narsil)
- TensorFlow tests: having from_pt set to True requires torch to be installed. 10664 (LysandreJik)
- Add auto_wrap option in fairscale integration 10673 (sgugger)
- fix: 10628 expanduser path in TrainingArguments 10660 (PaulLerner)
- Pass encoder outputs into GenerationMixin 10599 (ymfa)
- [wip] [deepspeed] AdamW is now supported by default 9624 (stas00)
- [Tests] RAG 10679 (patrickvonplaten)
- enable loading Mbart50Tokenizer with AutoTokenizer 10690 (patil-suraj)
- Wrong link to super class 10709 (cronoik)
- Distributed barrier before loading model 10685 (sgugger)
- GPT2DoubleHeadsModel made parallelizable 10658 (ishalyminov)
- split seq2seq script into summarization & translation 10611 (theo-m)
- Adding required flags to non-default arguments in hf_argparser 10688 (Craigacp)
- Fix backward compatibility with EvaluationStrategy 10718 (sgugger)
- Tests run on Docker 10681 (LysandreJik)
- Rename zero-shot pipeline multi_class argument 10727 (joeddav)
- Add minimum version check in examples 10724 (sgugger)
- independent training / eval with local files 10710 (riklopfer)
- Flax testing should not run the full torch test suite 10725 (patrickvonplaten)

Page 20 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.