Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 30

4.23.1

Not secure
Fix a revert introduced by mistake making the `"automatic-speech-recognition"` for Whisper.

- Fix whisper for pipeline by ArthurZucker in 19482

4.23.0

Not secure
Whisper

The Whisper model was proposed in [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

* Add WhisperModel to transformers by ArthurZucker in 19166
* Add TF whisper by amyeroberts in 19378

Deformable DETR

The Deformable DETR model was proposed in [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original [DETR](https://huggingface.co/docs/transformers/model_doc/detr) by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

* Add Deformable DETR by NielsRogge in 17281
* [fix] Add DeformableDetrFeatureExtractor by NielsRogge in 19140

Conditional DETR

The Conditional DETR model was proposed in [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than [DETR](https://huggingface.co/docs/transformers/model_doc/detr).

* Add support for conditional detr by DeppMeng in 18948
* Improve conditional detr docs by NielsRogge in 19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).

* time series forecasting model by kashif in 17965

Masked Siamese Networks

The ViTMSN model was proposed in [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

* MSN (Masked Siamese Networks) for ViT by sayakpaul in 18815

MarkupLM

The MarkupLM model was proposed in [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to [LayoutLM](https://huggingface.co/docs/transformers/main/en/model_doc/layoutlm).

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: [WebSRC](https://x-lance.github.io/WebSRC/) and [SWDE](https://www.researchgate.net/publication/221299838_From_one_tree_to_a_forest_a_unified_solution_for_structured_web_data_extraction).

* Add MarkupLM by NielsRogge in 19198

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the [safetensors](https://github.com/huggingface/safetensors) library for that.

Support is for PyTorch models only at this stage, and still experimental.

* Poc to use safetensors by sgugger in 19175

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs.
:warning: The existing methods that are superseded by the introduced methods `post_process_object_detection`, `post_process_semantic_segmentation`, `post_process_instance_segmentation`, `post_process_panoptic_segmentation` are now deprecated.

* Improve DETR post-processing methods by alaradirik in 19205
* Beit postprocessing by alaradirik in 19099
* Fix BeitFeatureExtractor postprocessing by alaradirik in 19119
* Add post_process_semantic_segmentation method to SegFormer by alaradirik in 19072
* Add post_process_semantic_segmentation method to DPTFeatureExtractor by alaradirik in 19107
* Add semantic segmentation post-processing method to MobileViT by alaradirik in 19105
* Detr preprocessor fix by alaradirik in 19007
* Improve and fix ImageSegmentationPipeline by alaradirik in 19367
* Restructure DETR post-processing, return prediction scores by alaradirik in 19262
* Maskformer post-processing fixes and improvements by alaradirik in 19172
* Fix MaskFormer failing postprocess tests by alaradirik in 19354
* Fix DETR segmentation postprocessing output by alaradirik in 19363
* fix docs example, add object_detection to DETR docs by alaradirik in 19377

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

* 🚨🚨🚨 Fix ViT parameter initialization by alaradirik in 19341

Breaking change for the `top_p` argument of the `TopPLogitsWarper` of the `generate` method.

* 🚨🚨🚨 Optimize Top P Sampler and fix edge case by ekagra-ranjan in 18984

Model head additions

OPT and BLOOM now have question answering heads available.

* Add `OPTForQuestionAnswering` by clementapa in 19402
* Add `BloomForQuestionAnswering` by younesbelkada in 19310

Pipelines

There is now a zero-shot object detection pipeline.

* Add ZeroShotObjectDetectionPipeline by sahamrit in 18445)

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

* [TensorFlow] Adding GroupViT by ariG23498 in 18020

Bugfixes and improvements

* Fix a broken link for deepspeed ZeRO inference in the docs by nijkah in 19001
* [doc] debug: fix import by stas00 in 19042
* [bnb] Small improvements on utils by younesbelkada in 18646
* Update image segmentation pipeline test by amyeroberts in 18731
* Fix `test_save_load` for `TFViTMAEModelTest` by ydshieh in 19040
* Pin minimum PyTorch version for BLOOM ONNX export by lewtun in 19046
* Update serving signatures and make sure we actually use them by Rocketknight1 in 19034
* Move cache: expand error message by sgugger in 19051
* Fixing OPT fast tokenizer option. by Narsil in 18753
* Fix custom tokenizers test by sgugger in 19052
* Run `torchdynamo` tests by ydshieh in 19056
* [fix] Add DeformableDetrFeatureExtractor by NielsRogge in 19140
* fix arg name in BLOOM testing and remove unused arg document by shijie-wu in 18843
* Adds package and requirement spec output to version check exception by colindean in 18702
* fix `use_cache` by younesbelkada in 19060
* FX support for ConvNext, Wav2Vec2 and ResNet by michaelbenayoun in 19053
* [doc] Fix link in PreTrainedModel documentation by tomaarsen in 19065
* Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by jimypbr in 18746
* Organize test jobs by sgugger in 19058
* Automatically tag CLIP repos as zero-shot-image-classification by osanseviero in 19064
* Fix `LeViT` checkpoint by ydshieh in 19069
* TF: tests for (de)serializable models with resized tokens by gante in 19013
* Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by daspartho in 19039
* replace logger.warn by logger.warning by fxmarty in 19068
* Fix tokenizer load from one file by sgugger in 19073
* Note about developer mode by LysandreJik in 19075
* german autoclass by flozi00 in 19049
* Add tests for legacy load by url and fix bugs by sgugger in 19078
* Add runner availability check by ydshieh in 19054
* fix working dir by ydshieh in 19101
* Added type hints for TFConvBertModel by kishore-s-15 in 19088
* Added Type hints for VIT MAE by kishore-s-15 in 19085
* Add type hints for TF MPNet models by kishore-s-15 in 19089
* Added type hints to ResNetForImageClassification by kishore-s-15 in 19084
* added type hints by daspartho in 19076
* Improve vision models docs by NielsRogge in 19103
* correct spelling in README by flozi00 in 19092
* Don't warn of move if cache is empty by sgugger in 19109
* HPO: keep the original logic if there's only one process, pass the trial to trainer by sywangyi in 19096
* Add documentation of Trainer.create_model_card by sgugger in 19110
* Added type hints for YolosForObjectDetection by kishore-s-15 in 19086
* Fix the wrong schedule by ydshieh in 19117
* Change document question answering pipeline to always return an array by ankrgyl in 19071
* german processing by flozi00 in 19121
* Fix: update ltp word segmentation call in mlm_wwm by xyh1756 in 19047
* Add a missing space in a script arg documentation by bryant1410 in 19113
* Skip `test_export_to_onnx` for `LongT5` if `torch` < 1.11 by ydshieh in 19122
* Fix GLUE MNLI when using `max_eval_samples` by lvwerra in 18722
* [BugFix] Fix fsdp option on shard_grad_op. by ZHUI in 19131
* Fix FlaxPretTrainedModel pt weights check by mishig25 in 19133
* suppoer deps from github by lhoestq in 19141
* Fix dummy creation for multi-frameworks objects by sgugger in 19144
* Allowing users to use the latest `tokenizers` release ! by Narsil in 19139
* Add some tests for check_dummies by sgugger in 19146
* Fixed typo in generation_utils.py by nbalepur in 19145
* Add `accelerate` support for ViLT by younesbelkada in 18683
* TF: check embeddings range by gante in 19102
* Reduce LR for TF MLM example test by Rocketknight1 in 19156
* update perf_train_cpu_many doc by sywangyi in 19151
* fix: ckpt paths. by sayakpaul in 19159
* Fix TrainingArguments documentation by sgugger in 19162
* fix HPO DDP GPU problem by sywangyi in 19168
* [WIP] Trainer supporting evaluation on multiple datasets by timbmg in 19158
* Add doctests to Perceiver examples by stevenmanton in 19129
* Add offline runners info in the Slack report by ydshieh in 19169
* Fix incorrect comments about atten mask for pytorch backend by lygztq in 18728
* Fixed type hint for pipelines/check_task by Fei-Wang in 19150
* Update run_clip.py by enze5088 in 19130
* german training, accelerate and model sharing by flozi00 in 19171
* Separate Push CI images from Scheduled CI by ydshieh in 19170
* Remove pos arg from Perceiver's Pre/Postprocessors by aielawady in 18602
* Use `assertAlmostEqual` in `BloomEmbeddingTest.test_logits` by ydshieh in 19200
* Move the model type check by ankrgyl in 19027
* Use repo_type instead of deprecated datasets repo IDs by sgugger in 19202
* Updated hf_argparser.py by IMvision12 in 19188
* Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by ydshieh in 19203
* Fix cached_file in offline mode for cached non-existing files by sgugger in 19206
* Remove unused `cur_len` in generation_utils.py by ekagra-ranjan in 18874
* add wav2vec2_alignment by arijitx in 16782
* add doc for hyperparameter search by sywangyi in 19192
* Add a use_parallel_residual argument to control the residual computing way by NinedayWang in 18695
* translated add_new_pipeline by nickprock in 19215
* More tests for regression in cached non existence by sgugger in 19216
* Use `math.pi` instead of `torch.pi` in `MaskFormer` by ydshieh in 19201
* Added tests for yaml and json parser by IMvision12 in 19219
* Fix small use_cache typo in the docs by ankrgyl in 19191
* Generate: add warning when left padding should be used by gante in 19067
* Fix deprecation warning for return_all_scores by ogabrielluiz in 19217
* Fix doctest for `TFDeiTForImageClassification` by ydshieh in 19173
* Document and validate typical_p in generation by mapmeld in 19128
* Fix trainer seq2seq qa.py evaluate log and ft script by iamtatsuki05 in 19208
* Fix cache names in CircleCI jobs by ydshieh in 19223
* Move AutoClasses under Main Classes by stevhliu in 19163
* Focus doc around preprocessing classes by stevhliu in 18768
* Fix confusing working directory in Push CI by ydshieh in 19234
* XGLM - Fix Softmax NaNs when using FP16 by gsarti in 18057
* Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by michaelbenayoun in 19233
* Fix `m2m_100.mdx` doc example missing `labels` by Mustapha-AJEGHRIR in 19149
* Fix opt softmax small nit by younesbelkada in 19243
* Use `hf_raise_for_status` instead of deprecated `_raise_for_status` by Wauplin in 19244
* Fix TrainingArgs argument serialization by atturaioe in 19239
* Fix test fetching for examples by sgugger in 19237
* Cast TF generate() inputs by Rocketknight1 in 19232
* Skip pipeline tests by sgugger in 19248
* Add job names in Past CI artifacts by ydshieh in 19235
* Update Past CI report script by ydshieh in 19228
* [Wav2Vec2] Fix None loss in doc examples by rbsteinm in 19218
* Catch `HFValidationError` in `TrainingSummary` by ydshieh in 19252
* Add expected output to the sample code for `ViTMSNForImageClassification` by sayakpaul in 19183
* Add stop sequence to text generation pipeline by KMFODA in 18444
* Add notebooks by JingyaHuang in 19259
* Add `beautifulsoup4` to the dependency list by ydshieh in 19253
* Fix Encoder-Decoder testing issue about repo. names by ydshieh in 19250
* Fix cached lookup filepath on windows for hub by kjerk in 19178
* Docs - Guide to add a new TensorFlow model by gante in 19256
* Update no_trainer script for summarization by divyanshugit in 19277
* Don't automatically add bug label by sgugger in 19302
* Breakup export guide by stevhliu in 19271
* Update Protobuf dependency version to fix known vulnerability by qthequartermasterman in 19247
* Update README.md by ShubhamJagtap2000 in 19309
* [Docs] Fix link by patrickvonplaten in 19313
* Fix for sequence regression fit() in TF by Rocketknight1 in 19316
* Added Type hints for LED TF by IMvision12 in 19315
* Added type hints for TF: rag model by debjit-bw in 19284
* alter retrived to retrieved by gouqi666 in 18863
* ci(stale.yml): upgrade actions/setup-python to v4 by oscard0m in 19281
* ci(workflows): update actions/checkout to v3 by oscard0m in 19280
* wrap forward passes with torch.no_grad() by daspartho in 19279
* wrap forward passes with torch.no_grad() by daspartho in 19278
* wrap forward passes with torch.no_grad() by daspartho in 19274
* wrap forward passes with torch.no_grad() by daspartho in 19273
* Removing BertConfig inheritance from LayoutLMConfig by arnaudstiegler in 19307
* docker-build: Update actions/checkout to v3 by Sushrut1101 in 19288
* Clamping hidden state values to allow FP16 by SSamDav in 19229
* Remove interdependency from OpenAI tokenizer by E-Aho in 19327
* removing XLMConfig inheritance from FlaubertConfig by D3xter1922 in 19326
* Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by divyanshugit in 19331
* Remove bert interdependency from clip tokenizer by shyamsn97 in 19332
* [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by D3xter1922 in 19330
* Making camembert independent from roberta, clean by Mustapha-AJEGHRIR in 19337
* Add sudachi and jumanpp tokenizers for bert_japanese by r-terada in 19043
* Frees LongformerTokenizer of the Roberta dependency by srhrshr in 19346
* Change `BloomConfig` docstring by younesbelkada in 19336
* Test failing test while we resolve the issue. by sgugger in 19355
* Call _set_save_spec() when creating TF models by Rocketknight1 in 19321
* correct typos in README by paulaxisabel in 19304
* Removes Roberta and Bert config dependencies from Longformer by srhrshr in 19343
* Fix gather for metrics by muellerzr in 19360
* Fix pipeline tests for Roberta-like tokenizers by sgugger in 19365
* Change link of repojacking vulnerable link by Ilaygoldman in 19393
* Making `ConvBert Tokenizer` independent from `bert Tokenizer` by IMvision12 in 19347
* Fix gather for metrics by muellerzr in 19389
* Added Type hints for XLM TF by IMvision12 in 19333
* add ONNX support for swin transformer by bibhabasumohapatra in 19390
* removes prophet config dependencies from xlm-prophet by srhrshr in 19400
* Added type hints for TF: TransfoXL by thliang01 in 19380
* HF <-> megatron checkpoint reshaping and conversion for GPT by pacman100 in 19317
* Remove unneded words from audio-related feature extractors by osanseviero in 19405
* edit: cast attention_mask to long in DataCollatorCTCWithPadding by ddobokki in 19369
* Copy BertTokenizer dependency into retribert tokenizer by Davidy22 in 19371
* Export TensorFlow models to ONNX with dynamic input shapes by dwyatte in 19255
* update attention mask handling by ArthurZucker in 19385
* Remove dependency of Bert from Squeezebert tokenizer by rchan26 in 19403
* Removed Bert and XML Dependency from Herbert by harry7337 in 19410
* Clip device map by patrickvonplaten in 19409
* Remove Dependency between Bart and LED (slow/fast) by Infrared1029 in 19408
* Removed `Bert` interdependency in `tokenization_electra.py` by OtherHorizon in 19356
* Make `Camembert` TF version independent from `Roberta` by Mustapha-AJEGHRIR in 19364
* Removed Bert dependency from BertGeneration code base. by Threepointone4 in 19370
* Rework pipeline tests by sgugger in 19366
* Fix `ViTMSNForImageClassification` doctest by ydshieh in 19275
* Skip `BloomEmbeddingTest.test_embeddings` for PyTorch < 1.10 by ydshieh in 19261
* remove RobertaConfig inheritance from MarkupLMConfig by D3xter1922 in 19404
* Backtick fixed (paragraph 68) by kant in 19440
* Fixed duplicated line (paragraph 83) Documentation: sgugger by kant in 19436
* fix marianMT convertion to onnx by kventinel in 19287
* Fix typo in image-classification/README.md by zhawe01 in 19424
* Stop relying on huggingface_hub's private methods by LysandreJik in 19392
* Add onnx support for VisionEncoderDecoder by mht-sharma in 19254
* Remove dependency of Roberta in Blenderbot by rchan26 in 19411
* fix: renamed variable name by ariG23498 in 18850
* Fix the error message in run_t5_mlm_flax.py by yangky11 in 19282
* Add Italian translation for `add_new_model.mdx` by Steboss89 in 18713
* Fix momentum and epsilon values by amyeroberts in 19454
* Generate: corrected exponential_decay_length_penalty type hint by ShivangMishra in 19376
* Fix misspelled word in docstring by Bearnardd in 19415
* Fixed a non-working hyperlink in the README.md file by MikailINTech in 19434
* fix by ydshieh in 19469
* wrap forward passes with torch.no_grad() by daspartho in 19439
* wrap forward passes with torch.no_grad() by daspartho in 19438
* wrap forward passes with torch.no_grad() by daspartho in 19416
* wrap forward passes with torch.no_grad() by daspartho in 19414
* wrap forward passes with torch.no_grad() by daspartho in 19413
* wrap forward passes with torch.no_grad() by daspartho in 19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* flozi00
* german autoclass (19049)
* correct spelling in README (19092)
* german processing (19121)
* german training, accelerate and model sharing (19171)
* DeppMeng
* Add support for conditional detr (18948)
* sayakpaul
* MSN (Masked Siamese Networks) for ViT (18815)
* fix: ckpt paths. (19159)
* Add expected output to the sample code for `ViTMSNForImageClassification` (19183)
* IMvision12
* Updated hf_argparser.py (19188)
* Added tests for yaml and json parser (19219)
* Added Type hints for LED TF (19315)
* Making `ConvBert Tokenizer` independent from `bert Tokenizer` (19347)
* Added Type hints for XLM TF (19333)
* ariG23498
* [TensorFlow] Adding GroupViT (18020)
* fix: renamed variable name (18850)
* Mustapha-AJEGHRIR
* Fix `m2m_100.mdx` doc example missing `labels` (19149)
* Making camembert independent from roberta, clean (19337)
* Make `Camembert` TF version independent from `Roberta` (19364)
* D3xter1922
* removing XLMConfig inheritance from FlaubertConfig (19326)
* [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (19330)
* remove RobertaConfig inheritance from MarkupLMConfig (19404)
* srhrshr
* Frees LongformerTokenizer of the Roberta dependency (19346)
* Removes Roberta and Bert config dependencies from Longformer (19343)
* removes prophet config dependencies from xlm-prophet (19400)
* sahamrit
* [WIP] Add ZeroShotObjectDetectionPipeline (18445) (18930)
* Davidy22
* Copy BertTokenizer dependency into retribert tokenizer (19371)
* rchan26
* Remove dependency of Bert from Squeezebert tokenizer (19403)
* Remove dependency of Roberta in Blenderbot (19411)
* harry7337
* Removed Bert and XML Dependency from Herbert (19410)
* Infrared1029
* Remove Dependency between Bart and LED (slow/fast) (19408)
* Steboss89
* Add Italian translation for `add_new_model.mdx` (18713)

4.22.2

Not secure
Fixes a bug where a cached tokenizer/model was not accessible anymore offline (either forcing offline mode or because of an internet issue).

- More tests for regression in cached non existence by sgugger in 19216
- Fix cached_file in offline mode for cached non-existing files by sgugger in 19206
- Don't warn of move if cache is empty by sgugger in 19109

4.22.1

Not secure
Patch release for the following PRs:

- [Add tests for legacy load by url and fix bugs (](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)[#19078](https://github.com/huggingface/transformers/pull/19078)[)](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)
- [Note about developer mode (](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)[#19075](https://github.com/huggingface/transformers/pull/19075)[)](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)
- [Fix tokenizer load from one file (](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)[#19073](https://github.com/huggingface/transformers/pull/19073)[)](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)
- [Fixing OPT fast tokenizer option. (](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)[#18753](https://github.com/huggingface/transformers/pull/18753)[)](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)
- [Move cache: expand error message (](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)[#19051](https://github.com/huggingface/transformers/pull/19051)[)](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)

4.22.0

Not secure
Swin Transformer v2

The Swin Transformer V2 model was proposed in [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Swin Transformer v2 improves the original [Swin Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/swin) using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

* Add swin transformer v2 by nandwalritik in 17469

VideoMAE

The VideoMAE model was proposed in [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders ([MAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae)) to video, claiming state-of-the-art performance on several video classification benchmarks.

VideoMAE is an extension of [ViTMAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae) for video.

* Add VideoMAE by NielsRogge in 17821

Donut

The Donut model was proposed in [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

* Add Donut by NielsRogge in 18488

Pegasus-X

The PEGASUS-X model was proposed in [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

* PEGASUS-X by zphang in 18551

X-CLIP

The X-CLIP model was proposed in [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/main/en/model_doc/clip) for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

X-CLIP is a minimal extension of CLIP for video-language understanding.

* Add X-CLIP by NielsRogge in 18852

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including [ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428), [ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244), etc.
These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

* ERNIE-2.0 and ERNIE-3.0 models by nghuyong in 18686

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

* TensorFlow MobileViT by sayakpaul in 18555
* [LayoutLMv3] Add TensorFlow implementation by ChrisFugl in 18678

New task-specific architectures

A new question answering head was added for the LayoutLM model.

* Add LayoutLMForQuestionAnswering model by ankrgyl in 18407

New pipelines

Two new pipelines are available in `transformers`: a document question answering pipeline, as well as an image to text generation pipeline.

* Add DocumentQuestionAnswering pipeline by ankrgyl in 18414
* Add Image To Text Generation pipeline by OlivierDehaene in 18821

M1 support

There is now Mac M1 support in PyTorch in `transformers` in pipelines and the Trainer.

* `pipeline` support for `device="mps"` (or any other string) by julien-c in 18494
* mac m1 `mps` integration by pacman100 in 18598

Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago.
Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions.
This project can be followed [here](https://github.com/huggingface/transformers/issues/18817).

* PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by sgugger in 19016

Generate method updates

The `generate` method now starts enforcing stronger validation in order to ensure proper usage.

* Generate: validate `model_kwargs` (and catch typos in generate arguments) by gante in 18261
* Generate: validate `model_kwargs` on TF (and catch typos in generate arguments) by gante in 18651
* Generate: add model class validation by gante in 18902

API changes

The `as_target_tokenizer` and `as_target_processor` context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:
py
with tokenizer.as_target_tokenizer():
encoded_labels = tokenizer(labels, padding=True)

becomes
py
encoded_labels = tokenizer(text_target=labels, padding=True)


* Replace `as_target` context managers by direct calls by sgugger in 18325

Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

* Supporting seq2seq models for `bitsandbytes` integration by younesbelkada in 18579
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models by younesbelkada in 17901

Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

* Load sharded pt to flax by ArthurZucker in 18419

TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

* TF Examples Rewrite by Rocketknight1 in 18451

DeBERTa-v2 is now trainable with XLA.

* TF: XLA-trainable DeBERTa v2 by gante in 18546

Documentation changes

* Split model list on modality by stevhliu in 18328

Improvements and bugfixes

* sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by LysandreJik in 18320
* Fix sacremoses sof dependency for Transformers XL by sgugger in 18321
* Owlvit test fixes by alaradirik in 18303
* [Flax] Fix incomplete batches in example scripts by sanchit-gandhi in 17863
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by sywangyi in 18229
* Update feature extractor docs by stevhliu in 18324
* fixed typo by banda-larga in 18331
* updated translation by banda-larga in 18333
* Updated _toctree.yml by nickprock in 18337
* Update automatic_speech_recognition.py by bofenghuang in 18339
* Fix codeparrot deduplication - ignore whitespaces by loubnabnl in 18023
* Remove Flax OPT from doctest for now by ydshieh in 18338
* Include tensorflow-aarch64 as a candidate by ankrgyl in 18345
* [BLOOM] Deprecate `position_ids` by thomasw21 in 18342
* Migrate metric to Evaluate library for tensorflow examples by VijayKalmath in 18327
* Migrate metrics used in flax examples to Evaluate by VijayKalmath in 18348
* [Docs] Fix Speech Encoder Decoder doc sample by sanchit-gandhi in 18346
* Fix OwlViT torchscript tests by ydshieh in 18347
* Fix some doctests by ydshieh in 18359
* [FX] Symbolic trace for Bloom by michaelbenayoun in 18356
* Fix TFSegformerForSemanticSegmentation doctest by ydshieh in 18362
* fix FSDP ShardedGradScaler by pacman100 in 18358
* Migrate metric to Evaluate in Pytorch examples by atturaioe in 18369
* Correct the spelling of bleu metric by ToluClassics in 18375
* Remove pt-like calls on tf tensor by amyeroberts in 18393
* Fix from_pretrained kwargs passing by YouJiacheng in 18387
* Add a check regarding the number of occurrences of by ydshieh in 18389
* Add evaluate to test dependencies by sgugger in 18396
* Fix OPT doc tests by ArthurZucker in 18365
* Fix doc tests by NielsRogge in 18397
* Add balanced strategies for device_map in from_pretrained by sgugger in 18349
* Fix docs by NielsRogge in 18399
* Adding fine-tuning models to LUKE by ikuyamada in 18353
* Fix ROUGE add example check and update README by sgugger in 18398
* Add Flax BART pretraining script by duongna21 in 18297
* Rewrite push_to_hub to use upload_files by sgugger in 18366
* Layoutlmv2 tesseractconfig by kelvinAI in 17733
* fix: create a copy for tokenizer object by YBooks in 18408
* Fix uninitialized parameter in conformer relative attention. by PiotrDabkowski in 18368
* Fix the hub user name in a longformer doctest checkpoint by ydshieh in 18418
* Change audio kwarg to images in TROCR processor by ydshieh in 18421
* update maskformer docs by alaradirik in 18423
* Fix `test_load_default_pipelines_tf` test error by ydshieh in 18422
* fix run_clip README by ydshieh in 18332
* Improve `generate` docstring by JoaoLages in 18198
* Accept `trust_remote_code` and ignore it in `PreTrainedModel.from_pretrained` by ydshieh in 18428
* Update pipeline word heuristic to work with whitespace in token offsets by davidbenton in 18402
* Add programming languages by cakiki in 18434
* fixing error when using sharded ddp by pacman100 in 18435
* Update _toctree.yml by stevhliu in 18440
* support ONNX export of XDropout in deberta{,_v2} and sew_d by garymm in 17502
* Add Spanish translation of run_scripts.mdx by donelianc in 18415
* Update no trainer scripts for language modeling and image classification examples by nandwalritik in 18443
* Update pinned hhub version by osanseviero in 18448
* Fix failing tests for XLA generation in TF by dsuess in 18298
* add zero-shot obj detection notebook to docs by alaradirik in 18453
* fix: keras fit tests for segformer tf and minor refactors. by sayakpaul in 18412
* Fix torch version comparisons by LSinev in 18460
* [BLOOM] Clean modeling code by thomasw21 in 18344
* change shape to support dynamic batch input in tf.function XLA generate for tf serving by nlpcat in 18372
* HFTracer.trace can now take callables and torch.nn.Module by michaelbenayoun in 18457
* Update no trainer scripts for multiple-choice by kiansierra in 18468
* Fix load of model checkpoints in the Trainer by sgugger in 18470
* Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by thomasw21 in 18363
* Add machine type in the artifact of Examples directory job by ydshieh in 18459
* Update no trainer examples for QA and Semantic Segmentation by kiansierra in 18474
* Add `TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING` by ydshieh in 18469
* Fixing issue where generic model types wouldn't load properly with the pipeline by Narsil in 18392
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight by harrydrippin in 18226
* Refactor `TFSwinLayer` to increase serving compatibility by harrydrippin in 18352
* Add TF prefix to TF-Res test class by ydshieh in 18481
* Remove py.typed by sgugger in 18485
* Fix pipeline tests by sgugger in 18487
* Use new huggingface_hub tools for download models by sgugger in 18438
* Fix `test_dbmdz_english` by updating expected values by ydshieh in 18482
* Move cache folder to huggingface/hub for consistency with hf_hub by sgugger in 18492
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` by ydshieh in 18484
* disable Onnx test for google/long-t5-tglobal-base by ydshieh in 18454
* Typo reported by Joel Grus on TWTR by julien-c in 18493
* Just re-reading the whole doc every couple of months 😬 by julien-c in 18489
* `transformers-cli login` => `huggingface-cli login` by julien-c in 18490
* Add seed setting to image classification example by regisss in 18519
* [DX fix] Fixing QA pipeline streaming a dataset. by Narsil in 18516
* Clean up hub by sgugger in 18497
* update fsdp docs by pacman100 in 18521
* Fix compatibility with 1.12 by sgugger in 17925
* Specify en in doc-builder README example by ankrgyl in 18526
* New cache fixes: add safeguard before looking in folders by sgugger in 18522
* unpin resampy by ydshieh in 18527
* ✨ update to use interlibrary links instead of Markdown by stevhliu in 18500
* Add example of multimodal usage to pipeline tutorial by stevhliu in 18498
* [VideoMAE] Add model to doc tests by NielsRogge in 18523
* Update perf_train_gpu_one.mdx by mishig25 in 18532
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by Rasmusafj in 18473
* Add Spanish translation of converting_tensorflow_models.mdx by donelianc in 18512
* Spanish translation of summarization.mdx by AguilaCudicio in 15947)
* Let's not cast them all by younesbelkada in 18471
* fix: data2vec-vision Onnx ready-made configuration. by NikeNano in 18427
* Add mt5 onnx config by ChainYo in 18394
* Minor update of `run_call_with_unpacked_inputs` by ydshieh in 18541
* BART - Fix attention mask device issue on copied models by younesbelkada in 18540
* Adding a new `align_to_words` param to qa pipeline. by Narsil in 18010
* 📝 update metric with evaluate by stevhliu in 18535
* Restore _init_weights value in no_init_weights by YouJiacheng in 18504
* 📝 update documentation build section by stevhliu in 18548
* Preserve hub-related kwargs in AutoModel.from_pretrained by sgugger in 18545
* Use commit hash to look in cache instead of calling head by sgugger in 18534
* Update philosophy to include other preprocessing classes by stevhliu in 18550
* Properly move cache when it is not in default path by sgugger in 18563
* Adds CLIP to models exportable with ONNX by unography in 18515
* raise atol for MT5OnnxConfig by ydshieh in 18560
* fix string by mrwyattii in 18568
* Segformer TF: fix output size in documentation by joihn in 18572
* Fix resizing bug in OWL-ViT by alaradirik in 18573
* Fix LayoutLMv3 documentation by pocca2048 in 17932
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by donebydan in 18486
* german docs translation by flozi00 in 18544
* Deberta V2: Fix critical trace warnings to allow ONNX export by iiLaurens in 18272
* [FX] _generate_dummy_input supports audio-classification models for labels by michaelbenayoun in 18580
* Fix docstrings with last version of hf-doc-builder styler by sgugger in 18581
* fix owlvit tests, update docstring examples by alaradirik in 18586
* Return the permuted hidden states if return_dict=True by amyeroberts in 18578
* Add type hints for ViLT models by donelianc in 18577
* update doc for perf_train_cpu_many, add intel mpi introduction by sywangyi in 18576
* typos by stas00 in 18594
* FSDP bug fix for `load_state_dict` by pacman100 in 18596
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` by ydshieh in 18600
* Fix URLs by NielsRogge in 18604
* Update BLOOM parameter counts by Muennighoff in 18531
* [doc] fix anchors by stas00 in 18591
* [fsmt] deal with -100 indices in decoder ids by stas00 in 18592
* small change by younesbelkada in 18584
* Flax Remat for LongT5 by KMFODA in 17994
* Change scheduled CIs to use torch 1.12.1 by ydshieh in 18644
* Add checks for some workflow jobs by ydshieh in 18583
* TF: Fix generation repetition penalty with XLA by gante in 18648
* Update longt5.mdx by flozi00 in 18634
* Update run_translation_no_trainer.py by zhoutang776 in 18637
* [bnb] Minor modifications by younesbelkada in 18631
* Examples: add Bloom support for token classification by stefan-it in 18632
* Fix Yolos ONNX export test by ydshieh in 18606
* Fix matmul inputs dtype by JingyaHuang in 18585
* Update feature extractor methods to enable type cast before normalize by amyeroberts in 18499
* Allow users to force TF availability by Rocketknight1 in 18650
* [LongT5] Correct docs long t5 by patrickvonplaten in 18669
* Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by gante in 18653
* Ping `detectron2` for CircleCI tests by ydshieh in 18680
* Rename method to avoid clash with property by amyeroberts in 18677
* Rename second input dimension from "sequence" to "num_channels" for CV models by regisss in 17976
* Fix repo consistency by lewtun in 18682
* Fix breaking change in `onnxruntime` for ONNX quantization by severinsimmler in 18336
* Add evaluate to examples requirements by muellerzr in 18666
* [bnb] Move documentation by younesbelkada in 18671
* Add an examples folder for code downstream tasks by loubnabnl in 18679
* `model.tie_weights()` should be applied after `accelerator.prepare()` by Gladiator07 in 18676
* Generate: add missing `**model_kwargs` in sample tests by gante in 18696
* Temp fix for broken detectron2 import by patrickvonplaten in 18699
* [Hotfix] pin detectron2 5aeb252 to avoid test fix by ydshieh in 18701
* Fix Data2VecVision ONNX test by ydshieh in 18587
* Add missing tokenizer tests - Longformer by tgadeliya in 17677
* remove check for main process for trackers initialization by Gladiator07 in 18706
* Unpin detectron2 by ydshieh in 18727
* Removing warning of model type for `microsoft/tapex-base-finetuned-wtq` by Narsil in 18711
* improve `add_tokens` docstring by SaulLu in 18687
* CLI: Don't check the model head when there is no model head by gante in 18733
* Update perf_infer_gpu_many.mdx by mishig25 in 18744
* Add minor doc-string change to include hp_name param in hyperparameter_search by constantin-huetterer in 18700
* fix pipeline_tutorial.mdx doctest by ydshieh in 18717
* Add TF implementation of `XGLMModel` by stancld in 16543
* fixed docstring typos by JadeKim042386 in 18739
* add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer by SaulLu in 18693
* examples/run_summarization_no_trainer: fixed incorrect param to hasattr by rahular in 18720
* Add ONNX support for Longformer by deutschmn in 17176
* Determine framework automatically before ONNX export by rachthree in 18615
* streamlining 'checkpointing_steps' parsing by rahular in 18755
* CLI: Improved error control and updated hub requirement by gante in 18752
* [VisionEncoderDecoder] Add gradient checkpointing by patrickvonplaten in 18697
* [Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now by patrickvonplaten in 18749
* Fix incomplete outputs of FlaxBert by duongna21 in 18772
* Fix broken link DeepSpeed documentation link by philschmid in 18783
* fix missing block when there is no failure by ydshieh in 18775
* fix a possible typo in auto feature extraction by fcakyon in 18779
* Fix memory leak issue in `torch_fx` tests by ydshieh in 18547
* Fix mock in `test_cached_files_are_used_when_internet_is_down` by Wauplin in 18804
* Add SegFormer and ViLT links by NielsRogge in 18808
* send model to the correct device by ydshieh in 18800
* Revert to and safely handle flag in owlvit config by amyeroberts in 18750
* Add docstring for BartForCausalLM by ekagra-ranjan in 18795
* up by qqaatw in 18805
* [Swin, Swinv2] Fix attn_mask dtype by NielsRogge in 18803
* Run tests if skip condition not met by amyeroberts in 18764
* Remove ViltForQuestionAnswering from check_repo by NielsRogge in 18762
* Adds OWLViT to models exportable with ONNX by unography in 18588
* Adds GroupViT to models exportable with ONNX by unography in 18628
* LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it by anthony2261 in 18774
* Added Docstrings for Deberta and DebertaV2 [PyTorch] by Tegzes in 18610
* Improving the documentation for "word", within the pipeline. by Narsil in 18763
* Disable nightly CI temporarily by ydshieh in 18820
* Pin max tf version by gante in 18818
* Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost by kongzii in 18647
* oob performance improvement for cpu DDP by sywangyi in 18595
* Warn on TPUs when the custom optimizer and model device are not the same by muellerzr in 18668
* Update location identification by LysandreJik in 18834
* fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary by azonti in 18607
* [DETR] Add num_channels attribute by NielsRogge in 18714
* Pin ffspec by sgugger in 18837
* Improve GPT2 doc by ekagra-ranjan in 18787
* Add an option to `HfArgumentParser.parse_{dict,json_file}` to raise an Exception when there extra keys by FelixSchneiderZoom in 18692
* Improve Text Generation doc by ekagra-ranjan in 18788
* Add SegFormer ONNX support by NielsRogge in 18006
* Add security warning about the from_pretrained() method by lewtun in 18801
* Owlvit memory leak fix by alaradirik in 18734
* Create pipeline_tutorial.mdx german docs by flozi00 in 18625
* Unpin fsspec by albertvillanova in 18846
* Delete `state_dict` to release memory as early as possible by ydshieh in 18832
* Generate: smaller TF serving test by gante in 18840
* add a script to get time info. from GA workflow jobs by ydshieh in 18822
* Pin rouge_score by albertvillanova in 18247
* Minor typo in prose of model outputs documentation. by pcuenca in 18848
* reflect max_new_tokens in `Seq2SeqTrainer` by kumapo in 18786
* Adds timeout argument to training_args to avoid socket timeouts in DDP by gugarosa in 18562
* Cache results of is_torch_tpu_available() by comaniac in 18777
* Tie weights after preparing the model in run_clm by sgugger in 18855
* Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests by ankrgyl in 18854
* Split docs on modality by stevhliu in 18205
* if learning rate is a tensor, get item (float) by kmckiern in 18861
* Fix naming issue with ImageToText pipeline by OlivierDehaene in 18864
* [LayoutLM] Add clarification to docs by NielsRogge in 18716
* Add OWL-ViT to the appropriate section by NielsRogge in 18867
* Clean up utils.hub using the latest from hf_hub by sgugger in 18857
* pin Slack SDK to 3.18.1 to avoid failing issue by ydshieh in 18869
* Fix number of examples for iterable datasets in multiprocessing by sgugger in 18856
* postpone bnb load until it's needed by stas00 in 18859
* A script to download artifacts and perform CI error statistics by ydshieh in 18865
* Remove cached torch_extensions on CI runners by ydshieh in 18868
* Update docs landing page by stevhliu in 18590
* Finetune guide for semantic segmentation by stevhliu in 18640
* Add Trainer to quicktour by stevhliu in 18723
* TF: TFMarianMTModel final logits bias as a layer by gante in 18833
* Mention TF and Flax checkpoints by LysandreJik in 18894
* Correct naming pegasus x by patrickvonplaten in 18896
* Update perf_train_gpu_one.mdx by thepurpleowl in 18442
* Add type hints to XLM-Roberta-XL models by asofiaoliveira in 18475
* Update Chinese documentation by zkep in 18893
* Generate: get the correct beam index on eos token by gante in 18851
* Mask t5 relative position bias then head pruned by hadaev8 in 17968
* updating gather function with gather_for_metrics in run_wav2vec2_pretraining by arun99481 in 18877
* Fix decode_input_ids to bare T5Model and improve doc by ekagra-ranjan in 18791
* Fix `test_tf_encode_plus_sent_to_model` for `LayoutLMv3` by ydshieh in 18898
* fixes bugs to handle non-dict output by alaradirik in 18897
* Further reduce the number of alls to head for cached objects by sgugger in 18871
* unpin slack_sdk version by ydshieh in 18901
* Fix incorrect size of input for 1st strided window length in `Perplexity of fixed-length models` by ekagra-ranjan in 18906
* [VideoMAE] Improve code examples by NielsRogge in 18919
* Add checks for more workflow jobs by ydshieh in 18905
* Accelerator end training by nbroad1881 in 18910
* update the train_batch_size in case HPO change batch_size_per_device by sywangyi in 18918
* Update TF fine-tuning docs by Rocketknight1 in 18654
* TF: final bias as a layer in seq2seq models (replicate TFMarian fix) by gante in 18903
* remvoe `_create_and_check_torch_fx_tracing` in specific test files by ydshieh in 18667
* [DeepSpeed ZeRO3] Fix performance degradation in sharded models by tjruwase in 18911
* pin TF 2.9.1 for self-hosted CIs by ydshieh in 18925
* Fix XLA fp16 and bf16 error checking by ymwangg in 18913
* Starts on a list of external deps required for dev by colindean in 18929
* Add image height and width to ONNX dynamic axes by lewtun in 18915
* Skip some doctests in quicktour by stevhliu in 18927
* Fix LayoutXLM wrong link in README by Devlee247 in 18932
* Update translation requests contact by NimaBoscarino in 18941
* [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* by sanchit-gandhi in 18361
* Neptune.ai integration improvements by Raalsky in 18934
* Generate: Simplify is_pad_token_not_equal_to_eos_token_id by ekagra-ranjan in 18933
* Fix train_step, test_step and tests for CLIP by Rocketknight1 in 18684
* Exit early in load if no weights are in the sharded state dict by sgugger in 18937
* update black target version by BramVanroy in 18955
* RFC: Replace custom TF embeddings by Keras embeddings by gante in 18939
* TF: unpin maximum TF version by gante in 18917
* Revert "TF: unpin maximum TF version by sgugger in 18917)"
* remove unused activation dropout by shijie-wu in 18842
* add DDP HPO support for sigopt by sywangyi in 18931
* Remove `decoder_position_ids` from `check_decoder_model_past_large_inputs` by ydshieh in 18980
* create Past CI results as tables for GitHub issue by ydshieh in 18953
* Remove dropout in embedding layer of OPT by shijie-wu in 18845
* Fix TF start docstrings by Rocketknight1 in 18991
* Align try_to_load_from_cache with huggingface_hub by sgugger in 18966
* Fix tflongformer int dtype by Rocketknight1 in 18907
* TF: correct TFBart embeddings weights name when load_weight_prefix is passed by gante in 18993
* fix checkpoint name for wav2vec2 conformer by ydshieh in 18994
* added type hints by daspartho in 18996
* TF: TF 2.10 unpin + related onnx test skips by gante in 18995
* Fixed typo by tnusser in 18921
* Removed issue in wav2vec link by chrisemezue in 18945
* Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug by alaradirik in 18997
* Add type hints for M2M by daspartho in 18998
* Fix tokenizer for XLMRobertaXL by ydshieh in 19004
* Update default revision for document-question-answering by ankrgyl in 18938
* Fixed bug which caused overwrite_cache to always be True by rahular in 19000
* add DDP HPO support for optuna by sywangyi in 19002
* add missing `require_tf` for `TFOPTGenerationTest` by ydshieh in 19010
* Re-add support for single url files in objects download by sgugger in 19014

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nandwalritik
* Add swin transformer v2 (17469)
* Update no trainer scripts for language modeling and image classification examples (18443)
* ankrgyl
* Include tensorflow-aarch64 as a candidate (18345)
* Specify en in doc-builder README example (18526)
* Add LayoutLMForQuestionAnswering model (18407)
* Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (18854)
* Add DocumentQuestionAnswering pipeline (18414)
* Update default revision for document-question-answering (18938)
* ikuyamada
* Adding fine-tuning models to LUKE (18353)
* duongna21
* Add Flax BART pretraining script (18297)
* Fix incomplete outputs of FlaxBert (18772)
* donelianc
* Add Spanish translation of run_scripts.mdx (18415)
* Add Spanish translation of converting_tensorflow_models.mdx (18512)
* Add type hints for ViLT models (18577)
* sayakpaul
* fix: keras fit tests for segformer tf and minor refactors. (18412)
* TensorFlow MobileViT (18555)
* flozi00
* german docs translation (18544)
* Update longt5.mdx (18634)
* Create pipeline_tutorial.mdx german docs (18625)
* stancld
* Add TF implementation of `XGLMModel` (16543)
* ChrisFugl
* [LayoutLMv3] Add TensorFlow implementation (18678)
* zphang
* PEGASUS-X (18551)
* nghuyong
* add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (18686)

4.21.3

Not secure
Patch release to add a disclaimer about torch models hosted on the Hub. See [autoclass tutorial](https://huggingface.co/docs/transformers/v4.21.3/en/autoclass_tutorial#automodel).

Page 12 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.