New models
Nyströmformer
The Nyströmformer model was proposed in [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.
The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.
* Add Nystromformer by novice03 in https://github.com/huggingface/transformers/pull/14659
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformer
REALM
The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
It’s a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.
* Add REALM by qqaatw in https://github.com/huggingface/transformers/pull/13292
* Add FastTokenizer to REALM by qqaatw in https://github.com/huggingface/transformers/pull/15211
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realm
ViTMAE
The ViTMAE model was proposed in [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377v2) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.
* Add MAE by NielsRogge in https://github.com/huggingface/transformers/pull/15120
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_mae
ViLT
The ViLT model was proposed in [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).
* Add ViLT by NielsRogge in https://github.com/huggingface/transformers/pull/14895
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vilt
Swin Transformer
The Swin Transformer was proposed in [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.
* Add Swin Transformer by novice03 in https://github.com/huggingface/transformers/pull/15085
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swin
YOSO
The YOSO model was proposed in [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714)
by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.
* Add YOSO by novice03 in 15091
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yoso
Add model like
To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run `transformers-cli add-new-model-like` and fill the questionnaire!
* Add model like by sgugger in https://github.com/huggingface/transformers/pull/14992
Training scripts
New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models.
Finally, an image captioning example in Flax gets added to the library.
* Add Speech Seq2Seq Training script by patrickvonplaten in https://github.com/huggingface/transformers/pull/14792
* [ViTMAE] Add image pretraining script by NielsRogge in https://github.com/huggingface/transformers/pull/15242
* Add Flax image captioning example by ydshieh in https://github.com/huggingface/transformers/pull/14864
Pipelines
Adding support for long files on `automatic-speech-recognition` (ASR) as well as supporting audio models with LM which increases the WER on many tasks [See the blogpost](https://huggingface.co/blog/wav2vec2-with-ngram).
Also continuously increasing homogeneity in arguments, framework support on all pipelines.
* Large audio chunking for the existing ASR pipeline by anton-l in https://github.com/huggingface/transformers/pull/14896
* Enabling `TF` on `image-classification` pipeline. by Narsil in https://github.com/huggingface/transformers/pull/15030
* Pipeline ASR with LM. by Narsil in https://github.com/huggingface/transformers/pull/15071
* ChunkPipeline: `batch_size` enabled on `zero-cls` and `qa` pipelines. by Narsil in https://github.com/huggingface/transformers/pull/14225
PyTorch improvements
The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.
* Add `ElectraForCausalLM` -> Enable Electra encoder-decoder model by stancld in https://github.com/huggingface/transformers/pull/14729
TensorFlow improvements
<FILL ME>
* Keras metric callback by Rocketknight1 and merveenoyan in https://github.com/huggingface/transformers/pull/14867
The vision encoder decoder model can now be used in TensorFlow.
* Add TFVisionEncoderDecoderModel by ydshieh in https://github.com/huggingface/transformers/pull/14148
CLIP gets ported to TensorFlow.
* Add TFCLIPModel by ydshieh in https://github.com/huggingface/transformers/pull/13967
Flax improvements
RoFormer gets ported to Flax.
* Add Flax RoFormer by stancld in https://github.com/huggingface/transformers/pull/15005
Deprecations
* Deprecates AdamW and adds `--optim` by manuelciosici in https://github.com/huggingface/transformers/pull/14744
Documentation
The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on [how to write good docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification).
* Convert rst files by sgugger in https://github.com/huggingface/transformers/pull/14888
* Doc styler v2 by sgugger in https://github.com/huggingface/transformers/pull/14950
* Convert last rst file by sgugger in https://github.com/huggingface/transformers/pull/14952
* Doc styler examples by sgugger in https://github.com/huggingface/transformers/pull/14953
* [doc] consistent True/False/None default format by stas00 in https://github.com/huggingface/transformers/pull/14951
* [doc] :obj: hunt by stas00 in https://github.com/huggingface/transformers/pull/14954
* [doc] :class: hunt by stas00 in https://github.com/huggingface/transformers/pull/14955
Bugfixes and improvements
* Fix installation instructions for BART ONNX example by lewtun in https://github.com/huggingface/transformers/pull/14885
* Fix doc examples: ... takes no keyword arguments by ydshieh in https://github.com/huggingface/transformers/pull/14701
* Fix `AttributeError` from `PreTrainedTokenizerFast.decoder` by aphedges in https://github.com/huggingface/transformers/pull/14691
* Add 'with torch.no_grad()' to ALBERT integration test forward pass by henholm in https://github.com/huggingface/transformers/pull/14808
* Add ONNX support for MarianMT models by lewtun in https://github.com/huggingface/transformers/pull/14586
* add custom stopping criteria to human eval script by lvwerra in https://github.com/huggingface/transformers/pull/14897
* Set `run_name` in MLflowCallback by YangDong2002 in https://github.com/huggingface/transformers/pull/14894
* [AutoTokenizer] Fix incorrect from pretrained by patrickvonplaten in https://github.com/huggingface/transformers/pull/14900
* [Tests] Update speech diarization and WavLM tolerances by anton-l in https://github.com/huggingface/transformers/pull/14902
* [doc] post-porting by stas00 in https://github.com/huggingface/transformers/pull/14890
* [Generate] Remove attention_mask and integrate model_main_input_name by patrickvonplaten in https://github.com/huggingface/transformers/pull/14856
* Fix failing GPU trainer tests by sgugger in https://github.com/huggingface/transformers/pull/14903
* Better logic for getting tokenizer config in AutoTokenizer by sgugger in https://github.com/huggingface/transformers/pull/14906
* [doc] install - add link to jax installation by stas00 in https://github.com/huggingface/transformers/pull/14912
* [WavLM] fix wavlm docs by patrickvonplaten in https://github.com/huggingface/transformers/pull/14910
* Fix Perceiver docs by Sanster in https://github.com/huggingface/transformers/pull/14917
* fix to issue 14833 in data_collator - consider no labels by kleinay in https://github.com/huggingface/transformers/pull/14930
* Fix duplicate call to save_checkpoint when using deepspeed by MihaiBalint in https://github.com/huggingface/transformers/pull/14946
* [WavLM] give model more precision tolerance in tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/14958
* [Speech Recognition Examples] Update README.md by patrickvonplaten in https://github.com/huggingface/transformers/pull/14965
* [Tests] Speed up tokenizer tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/14964
* [Wav2Vec2] Rename model's feature extractor to feature encoder by patrickvonplaten in https://github.com/huggingface/transformers/pull/14959
* Replace assertion with exception by jaketae in https://github.com/huggingface/transformers/pull/14970
* remove absl workaround as it's no longer needed by stas00 in https://github.com/huggingface/transformers/pull/14909
* Fixing a pathological case for slow tokenizers by Narsil in https://github.com/huggingface/transformers/pull/14981
* [AutoProcessor] Correct AutoProcessor and automatically add processor… by patrickvonplaten in https://github.com/huggingface/transformers/pull/14881
* [Generate] correct encoder_outputs are passed without attention_mask by patrickvonplaten in https://github.com/huggingface/transformers/pull/14980
* Adding `num_return_sequences` support for text2text generation. by Narsil in https://github.com/huggingface/transformers/pull/14988
* Enabling `tokenizers` upgrade. by Narsil in https://github.com/huggingface/transformers/pull/14941
* Allow training to resume even if RNG states are not properly loaded by sgugger in https://github.com/huggingface/transformers/pull/14994
* Map model_type and doc pages names by sgugger in https://github.com/huggingface/transformers/pull/14944
* Fixing t2t pipelines lists outputs. by Narsil in https://github.com/huggingface/transformers/pull/15008
* Improve truncation_side by Narsil in https://github.com/huggingface/transformers/pull/14947
* Fix doc examples: name 'torch' is not defined by ydshieh in https://github.com/huggingface/transformers/pull/15016
* [Tests] Correct Wav2Vec2 & WavLM tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15015
* [doc] Update parallelism.mdx by hyunwoongko in https://github.com/huggingface/transformers/pull/15013
* Fix Code block speech pretraining example by flozi00 in https://github.com/huggingface/transformers/pull/14983
* Fix a little typo by milyiyo in https://github.com/huggingface/transformers/pull/15002
* Hotfix `chunk_length_s` instead of `_ms`. by Narsil in https://github.com/huggingface/transformers/pull/15029
* [doc] Update parallelism.mdx by hyunwoongko in https://github.com/huggingface/transformers/pull/15018
* [megatron convert] PYTHONPATH requirements by stas00 in https://github.com/huggingface/transformers/pull/14956
* Fix doc example: mask_time_indices (numpy) has no attribute 'to' by ydshieh in https://github.com/huggingface/transformers/pull/15033
* Adding QoL for `batch_size` arg (like others enabled everywhere). by Narsil in https://github.com/huggingface/transformers/pull/15027
* [CLIP] Fix PT test by patrickvonplaten in https://github.com/huggingface/transformers/pull/15041
* [SpeechEncoderDecoder] Fix from pretrained by patrickvonplaten in https://github.com/huggingface/transformers/pull/15043
* [CLIP] Fix TF test by patil-suraj in https://github.com/huggingface/transformers/pull/15042
* Wrap Roberta integration test forward passes with torch.no_grad() by mattchurgin in https://github.com/huggingface/transformers/pull/15037
* Add Detectron2 to Github actions by NielsRogge in https://github.com/huggingface/transformers/pull/15053
* Remove old asserts. by Narsil in https://github.com/huggingface/transformers/pull/15012
* Add 'with torch.no_grad()' to BertGeneration integration test forward passes by itsTurner in https://github.com/huggingface/transformers/pull/14963
* Update run_speech_recognition_seq2seq.py (max_eval_samples instead of train_samples) by flozi00 in https://github.com/huggingface/transformers/pull/14967
* [VisionTextDualEncoder] Fix doc example by ydshieh in https://github.com/huggingface/transformers/pull/15057
* Resubmit changes after rebase to master by kct22aws in https://github.com/huggingface/transformers/pull/14982
* [Fix doc examples] missing from_pretrained by ydshieh in https://github.com/huggingface/transformers/pull/15044
* [VisionTextDualEncoder] Add token_type_ids param by ydshieh in https://github.com/huggingface/transformers/pull/15073
* Fix convert for newer megatron-lm bert model by yoquankara in https://github.com/huggingface/transformers/pull/14082
* [Wav2Vec2 Speech Event] Add speech event v2 by patrickvonplaten in https://github.com/huggingface/transformers/pull/15083
* fix model table cell text alignment by ydshieh in https://github.com/huggingface/transformers/pull/14999
* Update check_repo.py by kamalkraj in https://github.com/huggingface/transformers/pull/15014
* Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x by cody-moveworks in https://github.com/huggingface/transformers/pull/15019
* Change assignee for tokenizers by LysandreJik in https://github.com/huggingface/transformers/pull/15088
* support the trocr small models by liminghao1630 in https://github.com/huggingface/transformers/pull/14893
* [Fix doc example] RagModel by ydshieh in https://github.com/huggingface/transformers/pull/15076
* Model summary doc page horizontal banners by mishig25 in https://github.com/huggingface/transformers/pull/15058
* Use tqdm.auto in Pipeline docs by bryant1410 in https://github.com/huggingface/transformers/pull/14920
* [doc] normalize HF Transformers string by stas00 in https://github.com/huggingface/transformers/pull/15023
* Happy New Year! by sgugger in https://github.com/huggingface/transformers/pull/15094
* [DOC] fix doc examples for bart-like models by patil-suraj in https://github.com/huggingface/transformers/pull/15093
* [performance doc] Power and Cooling by stas00 in https://github.com/huggingface/transformers/pull/14935
* Add test to check reported training loss by sgugger in https://github.com/huggingface/transformers/pull/15096
* Take gradient accumulation into account when defining samplers by sgugger in https://github.com/huggingface/transformers/pull/15095
* [Fix doc example] Speech2TextForConditionalGeneration by ydshieh in https://github.com/huggingface/transformers/pull/15092
* Fix cookiecutter by NielsRogge in https://github.com/huggingface/transformers/pull/15100
* [Wav2Vec2ProcessorWithLM] improve decoder download by patrickvonplaten in https://github.com/huggingface/transformers/pull/15040
* Adds IBERT to models exportable with ONNX by MaximovaIrina in https://github.com/huggingface/transformers/pull/14868
* change metric_key_prefix in seq2seq_trainer.py by JejuWayfarer in https://github.com/huggingface/transformers/pull/15099
* Print out durations of all scheduled tests by LysandreJik in https://github.com/huggingface/transformers/pull/15102
* Fix failing W2V2 test by LysandreJik in https://github.com/huggingface/transformers/pull/15104
* Doc styler tip by sgugger in https://github.com/huggingface/transformers/pull/15105
* Update ONNX docs by lewtun in https://github.com/huggingface/transformers/pull/14904
* Fix saving FlaubertTokenizer configs by vmaryasin in https://github.com/huggingface/transformers/pull/14991
* Update TF test_step to match train_step by Rocketknight1 in https://github.com/huggingface/transformers/pull/15111
* use block_size instead of max_seq_length in tf run_clm example by riklopfer in https://github.com/huggingface/transformers/pull/15036
* fix: switch from slow to generic tokenizer class by lvwerra in https://github.com/huggingface/transformers/pull/15122
* Fix TFEncoderDecoder labels handling 14357 by ydshieh in https://github.com/huggingface/transformers/pull/15001
* Add ONNX configuration classes to docs by lewtun in https://github.com/huggingface/transformers/pull/15121
* Add `with torch.no_grad()` to DistilBERT integration test forward pass by jaketae in https://github.com/huggingface/transformers/pull/14979
* mBART support for run_summarization.py by banda-larga in https://github.com/huggingface/transformers/pull/15125
* doc-builder -> doc-build by LysandreJik in https://github.com/huggingface/transformers/pull/15134
* [Fix doc example] - ProphetNetDecoder by ydshieh in https://github.com/huggingface/transformers/pull/15124
* [examples/flax/language-modeling] set loglevel by stas00 in https://github.com/huggingface/transformers/pull/15129
* Update model_sharing.mdx by carlos-aguayo in https://github.com/huggingface/transformers/pull/15142
* Enable AMP for xla:gpu device in trainer class by ymwangg in https://github.com/huggingface/transformers/pull/15022
* [deepspeed tests] fix summarization by stas00 in https://github.com/huggingface/transformers/pull/15149
* Check the repo consistency in model templates test by sgugger in https://github.com/huggingface/transformers/pull/15141
* Add TF glu activation function by gante in https://github.com/huggingface/transformers/pull/15146
* Make sure all submodules are properly registered by sgugger in https://github.com/huggingface/transformers/pull/15144
* [Fix doc example] - OpenAIGPTDoubleHeadsModel by ydshieh in https://github.com/huggingface/transformers/pull/15143
* fix BertTokenizerFast `tokenize_chinese_chars` arg by SaulLu in https://github.com/huggingface/transformers/pull/15158
* Fix typo in test_configuration_common.py by novice03 in https://github.com/huggingface/transformers/pull/15160
* Add "open in hf spaces" gradio button issue 73 by AK391 in https://github.com/huggingface/transformers/pull/15106
* TF Bert inference - support `np.ndarray` optional arguments by gante in https://github.com/huggingface/transformers/pull/15074
* Fixing flaky test (hopefully). by Narsil in https://github.com/huggingface/transformers/pull/15154
* Better dummies by sgugger in https://github.com/huggingface/transformers/pull/15148
* Update from keras2onnx to tf2onnx by gante in https://github.com/huggingface/transformers/pull/15162
* [doc] performance: Efficient Software Prebuilds by stas00 in https://github.com/huggingface/transformers/pull/15147
* [Speech models] Disable non-existing chunking in tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15163
* Added forward pass of test_inference_image_classification_head by MrinalTyagi in https://github.com/huggingface/transformers/pull/14777
* Fix dtype issue in TF BART by Rocketknight1 in https://github.com/huggingface/transformers/pull/15178
* [doc] new MoE paper by stas00 in https://github.com/huggingface/transformers/pull/15184
* Mark bad tokenizers version by sgugger in https://github.com/huggingface/transformers/pull/15188
* [Fix doc example] UniSpeechSatForPreTraining by ydshieh in https://github.com/huggingface/transformers/pull/15152
* `is_ctc` needs to be updated to `self.type == "ctc". by Narsil in https://github.com/huggingface/transformers/pull/15194
* [Fix doc example] TFRagModel by ydshieh in https://github.com/huggingface/transformers/pull/15187
* Error when code examples are improperly closed by sgugger in https://github.com/huggingface/transformers/pull/15186
* Fix deprecation warnings for int div by sgugger in https://github.com/huggingface/transformers/pull/15180
* Copies and docstring styling by sgugger in https://github.com/huggingface/transformers/pull/15202
* [ASR pipeline] correct with lm pipeline by patrickvonplaten in https://github.com/huggingface/transformers/pull/15200
* Remove dependency to quiet Dependabot by sgugger in https://github.com/huggingface/transformers/pull/15205
* Ignore empty subfolders when identifying submodules by sgugger in https://github.com/huggingface/transformers/pull/15204
* [MBartTokenizer] remove dep on xlm-roberta tokenizer by patil-suraj in https://github.com/huggingface/transformers/pull/15201
* fix: 14486 do not use BertPooler in DPR by PaulLerner in https://github.com/huggingface/transformers/pull/15068
* [Fix doc example] Wrong checkpoint name by ydshieh in https://github.com/huggingface/transformers/pull/15079
* [Robust Speech Event] Add guides by patrickvonplaten in https://github.com/huggingface/transformers/pull/15155
* Enable tqdm toggling by jaketae in https://github.com/huggingface/transformers/pull/15167
* [FLAX] glue training example refactor by kamalkraj in https://github.com/huggingface/transformers/pull/13815
* Rename compute_loss in TF models by Rocketknight1 in https://github.com/huggingface/transformers/pull/15207
* Build dev documentation by LysandreJik in https://github.com/huggingface/transformers/pull/15210
* [Fix doc example] TFFunnelTokenizer' is not defined by ydshieh in https://github.com/huggingface/transformers/pull/15225
* Correct Speech Event Readme by patrickvonplaten in https://github.com/huggingface/transformers/pull/15226
* [ViTMAE] Various fixes by NielsRogge in https://github.com/huggingface/transformers/pull/15221
* [Speech Event] Fix speech event readme by patil-suraj in https://github.com/huggingface/transformers/pull/15227
* Fix typo in BERT tokenization file by qqaatw in https://github.com/huggingface/transformers/pull/15228
* Fix PR number by LysandreJik in https://github.com/huggingface/transformers/pull/15231
* Adapt Common Voice Talk Title and Abstract by patrickvonplaten in https://github.com/huggingface/transformers/pull/15233
* Update Trainer code example by NielsRogge in https://github.com/huggingface/transformers/pull/15070
* Make chuking smartly (long files) work on asr ctc_with_lm. by Narsil in https://github.com/huggingface/transformers/pull/15219
* Fix usage of additional kwargs in `from_encoder_decoder_pretrained` in encoder-decoder models by jsnfly in https://github.com/huggingface/transformers/pull/15056
* Update README.md by anton-l in https://github.com/huggingface/transformers/pull/15239
* Update README.md by anton-l in https://github.com/huggingface/transformers/pull/15246
* Update pipelines.mdx by kamalkraj in https://github.com/huggingface/transformers/pull/15243
* [Fix doc example] missing import by ydshieh in https://github.com/huggingface/transformers/pull/15240
* Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels by Rocketknight1 in https://github.com/huggingface/transformers/pull/15234
* Make sure to raise NotImplementedError with correct method name by kumapo in https://github.com/huggingface/transformers/pull/15253
* Fix crash when logs are empty because Keras has wiped them out of spite by Rocketknight1 in https://github.com/huggingface/transformers/pull/15258
* Tentative workflow improvement by LysandreJik in https://github.com/huggingface/transformers/pull/15255
* Fix code examples by NielsRogge in https://github.com/huggingface/transformers/pull/15257
* Adds missing module_specs for usages of _LazyModule by jkuball in https://github.com/huggingface/transformers/pull/15230
* Prepare ONNX export for torch v1.11 by lewtun in https://github.com/huggingface/transformers/pull/15270
* Fix by novice03 in https://github.com/huggingface/transformers/pull/15276
* Move BART + ONNX example to research_projects by lewtun in https://github.com/huggingface/transformers/pull/15271
* Specify providers explicitly in ORT session initialization by wangyems in https://github.com/huggingface/transformers/pull/15235
* Fixes Benchmark example link by evandrosks in https://github.com/huggingface/transformers/pull/15278
* [Robust Speech Challenge] Add timeline by patrickvonplaten in https://github.com/huggingface/transformers/pull/15274
* [Fix doc example] TFLayoutLMForTokenClassification: missing import tf by ydshieh in https://github.com/huggingface/transformers/pull/15268
* [Wav2Vec2ProcessorWithLM] improve multi processing by patrickvonplaten in https://github.com/huggingface/transformers/pull/15247
* Refine errors for pretrained objects by sgugger in https://github.com/huggingface/transformers/pull/15261
* [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15272
* Update eval.py by patrickvonplaten in https://github.com/huggingface/transformers/pull/15310
* Update CONTRIBUTING.md by kamalkraj in https://github.com/huggingface/transformers/pull/15290
* Fix a typo in tag addition by sgugger in https://github.com/huggingface/transformers/pull/15286
* Remove old debug code leftover. by Narsil in https://github.com/huggingface/transformers/pull/15306
* [Fix doc example] fix missing import jnp by ydshieh in https://github.com/huggingface/transformers/pull/15291
* [LayoutLMV2 Tests] Make sure input is on GPU by patrickvonplaten in https://github.com/huggingface/transformers/pull/15314
* Replace NystromformerTokenizer with AutoTokenizer by novice03 in https://github.com/huggingface/transformers/pull/15312
* [Beam Search] Correct returned beam scores by patrickvonplaten in https://github.com/huggingface/transformers/pull/14654
* [Examples] Correct run ner label2id for fine-tuned models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15017
* Avoid using get_list_of_files by sgugger in https://github.com/huggingface/transformers/pull/15287
* [Tests] Fix test by NielsRogge in https://github.com/huggingface/transformers/pull/15324
* Add 🤗 Accelerate tutorial by stevhliu in https://github.com/huggingface/transformers/pull/15263
* Added missing code in exemplary notebook - custom datasets fine-tuning by Pawloch247 in https://github.com/huggingface/transformers/pull/15300
* Fix encoder-decoder models when labels is passed by ydshieh in https://github.com/huggingface/transformers/pull/15172
* Fix table formatting in SegFormer docs by deppen8 in https://github.com/huggingface/transformers/pull/15337
* Fix deepspeed docs by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15346
* Fix 'eval_split_name' described as defaulting to 'train' by FremyCompany in https://github.com/huggingface/transformers/pull/15348
* Update doc writing guide by sgugger in https://github.com/huggingface/transformers/pull/15350
* Add YOSO by novice03 in https://github.com/huggingface/transformers/pull/15091
* [docs] post-PR merge fix by stas00 in https://github.com/huggingface/transformers/pull/15355
* Fix YosoConfig doc by sgugger in https://github.com/huggingface/transformers/pull/15353
* [DocTests Speech] Add doc tests for all speech models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15031
* Push to hub save by sgugger in https://github.com/huggingface/transformers/pull/15327
* Fix KerasMetricCallback prediction with generate() and inference of column names by Rocketknight1 in https://github.com/huggingface/transformers/pull/15351
* Add a device argument to the eval script by anton-l in https://github.com/huggingface/transformers/pull/15371
* improve saving strategy of sentencepiece tokenizer by SaulLu in https://github.com/huggingface/transformers/pull/15328
* Implement fixes for TrainingArguments doc by sgugger in https://github.com/huggingface/transformers/pull/15370
* Super-small fix stops us confusing Keras console logging by modifying… by Rocketknight1 in https://github.com/huggingface/transformers/pull/15373
* Add proper documentation for Keras callbacks by sgugger in https://github.com/huggingface/transformers/pull/15374
* Example script for PushToHubCallback by Rocketknight1 in https://github.com/huggingface/transformers/pull/15375
Impressive community contributors
The community contributors below have significantly contributed to the v4.16.0 release. Thank you!
- novice03, for contributing Nyströmformer, Swin Transformer and YOSO
- qqaatw, for contributing REALM
- stancld, for adding support for ELECTRA as a decoder, and porting RoFormer to Flax
- ydshieh, for a myriad of documentation fixes, the port of CLIP to TensorFlow, the addition of the TensorFlow vision encoder-decoder model, and the contribution of an image captioning example in Flax.
New Contributors
* YangDong2002 made their first contribution in https://github.com/huggingface/transformers/pull/14894
* Sanster made their first contribution in https://github.com/huggingface/transformers/pull/14917
* kleinay made their first contribution in https://github.com/huggingface/transformers/pull/14930
* MihaiBalint made their first contribution in https://github.com/huggingface/transformers/pull/14946
* milyiyo made their first contribution in https://github.com/huggingface/transformers/pull/15002
* mattchurgin made their first contribution in https://github.com/huggingface/transformers/pull/15037
* itsTurner made their first contribution in https://github.com/huggingface/transformers/pull/14963
* kct22aws made their first contribution in https://github.com/huggingface/transformers/pull/14982
* yoquankara made their first contribution in https://github.com/huggingface/transformers/pull/14082
* cody-moveworks made their first contribution in https://github.com/huggingface/transformers/pull/15019
* MaximovaIrina made their first contribution in https://github.com/huggingface/transformers/pull/14868
* JejuWayfarer made their first contribution in https://github.com/huggingface/transformers/pull/15099
* novice03 made their first contribution in https://github.com/huggingface/transformers/pull/14659
* banda-larga made their first contribution in https://github.com/huggingface/transformers/pull/15125
* manuelciosici made their first contribution in https://github.com/huggingface/transformers/pull/14744
* carlos-aguayo made their first contribution in https://github.com/huggingface/transformers/pull/15142
* gante made their first contribution in https://github.com/huggingface/transformers/pull/15146
* AK391 made their first contribution in https://github.com/huggingface/transformers/pull/15106
* MrinalTyagi made their first contribution in https://github.com/huggingface/transformers/pull/14777
* jsnfly made their first contribution in https://github.com/huggingface/transformers/pull/15056
* jkuball made their first contribution in https://github.com/huggingface/transformers/pull/15230
* wangyems made their first contribution in https://github.com/huggingface/transformers/pull/15235
* evandrosks made their first contribution in https://github.com/huggingface/transformers/pull/15278
* Pawloch247 made their first contribution in https://github.com/huggingface/transformers/pull/15300
* deppen8 made their first contribution in https://github.com/huggingface/transformers/pull/15337
* ngoquanghuy99 made their first contribution in https://github.com/huggingface/transformers/pull/15346
**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.15.0...v4.16.0