Transformers

Latest version: v4.50.3

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 33

4.25.1

Not secure
PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack. You can enable `torch.compile` on any of our models, and get support with the `Trainer` (and in all our PyTorch examples) by using the `torchdynamo` training argument. For instance, just add `--torchdynamo inductor` when launching those examples from the command line.

This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

Note that to get the best performance, we recommend:
- using an Ampere GPU (or more recent)
- sticking to fixed shaped for now (so use `--pad_to_max_length` in our examples)

* Repurpose torchdynamo training args towards torch._dynamo by sgugger in 20498

Audio Spectrogram Transformer

The Audio Spectrogram Transformer model was proposed in [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a [Vision Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/vit) to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

* Add Audio Spectogram Transformer by NielsRogge in 19981

Jukebox

The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

* Add Jukebox model (replaces 16875) by ArthurZucker in 17826

Switch Transformers

The SwitchTransformers model was proposed in [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer.

It is the first MoE model supported in `transformers`, with the largest checkpoint currently available currently containing 1T parameters.

* Add Switch transformers by younesbelkada and ArthurZucker in 19323

RocBert

The RoCBert model was proposed in [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.

* Add RocBert by sww9370 in 20013

CLIPSeg

The CLIPSeg model was proposed in [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker. [CLIP](https://huggingface.co/docs/transformers/main/en/model_doc/clip)Seg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

* Add CLIPSeg by NielsRogge in 20066

NAT and DiNAT

NAT

NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

DiNAT

DiNAT was proposed in [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi.

It extends [NAT](https://huggingface.co/docs/transformers/main/en/model_doc/nat) by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

* Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by alihassanijr in 20219

MobileNetV2

The MobileNet model was proposed in [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

* add MobileNetV2 model by hollance in 17845

MobileNetV1

The MobileNet model was proposed in [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

* add MobileNetV1 model by hollance in 17799

Image processors

Image processors replace feature extractors as the processing class for computer vision models.

Important changes:
* `size` parameter is now a dictionary of `{"height": h, "width": w}`, `{"shortest_edge": s}`, `{"shortest_egde": s, "longest_edge": l}` instead of int or tuple.
* Addition of `data_format` flag. You can now specify if you want your images to be returned in `"channels_first"` - NCHW - or `"channels_last"` - NHWC - format.
* Processing flags e.g. `do_resize` can be passed directly to the `preprocess` method instead of modifying the class attribute: `image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")`
* Leaving `return_tensors` unset will return a list of numpy arrays.

The classes are backwards compatible and can be created using existing feature extractor configurations - with the `size` parameter converted.

* Add Image Processors by amyeroberts in 19796
* Add Donut image processor by amyeroberts 20425
* Add segmentation + object detection image processors by amyeroberts in 20160
* AutoImageProcessor by amyeroberts in 20111

Backbone for computer vision models

We're adding support for a general `AutoBackbone` class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

* Add AutoBackbone + ResNetBackbone by NielsRogge in 20229
* Improve backbone by NielsRogge in 20380
* [AutoBackbone] Improve API by NielsRogge in 20407

Support for `safetensors` offloading

If the model you are using has a `safetensors` checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

* Safetensors offload by sgugger in 20321

Contrastive search in the `generate` method

* Generate: TF contrastive search with XLA support by gante in 20050
* Generate: contrastive search with full optional outputs by gante in 19963

Breaking changes

* 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` by beneyal in 15775

Bugfixes and improvements

* add dataset by stevhliu in 20005
* Add BERT resources by stevhliu in 19852
* Add LayoutLMv3 resource by stevhliu in 19932
* fix typo by stevhliu in 20006
* Update object detection pipeline to use post_process_object_detection methods by alaradirik in 20004
* clean up vision/text config dict arguments by ydshieh in 19954
* make sentencepiece import conditional in bertjapanesetokenizer by ripose-jp in 20012
* Fix gradient checkpoint test in encoder-decoder by ydshieh in 20017
* Quality by sgugger in 20002
* Update auto processor to check image processor created by amyeroberts in 20021
* [Doctest] Add configuration_deberta_v2.py by Saad135 in 19995
* Improve model tester by ydshieh in 19984
* Fix doctest by ydshieh in 20023
* Show installed libraries and their versions in CI jobs by ydshieh in 20026
* reorganize glossary by stevhliu in 20010
* Now supporting pathlike in pipelines too. by Narsil in 20030
* Add **kwargs by amyeroberts in 20037
* Fix some doctests after PR 15775 by ydshieh in 20036
* [Doctest] Add configuration_camembert.py by Saad135 in 20039
* [Whisper Tokenizer] Make more user-friendly by sanchit-gandhi in 19921
* [FuturWarning] Add futur warning for LEDForSequenceClassification by ArthurZucker in 19066
* fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by sywangyi in 19891
* Update esmfold conversion script by Rocketknight1 in 20028
* Fixed torch.finfo issue with torch.fx by michaelbenayoun in 20040
* Only resize embeddings when necessary by sgugger in 20043
* Speed up TF token classification postprocessing by converting complete tensors to numpy by deutschmn in 19976
* Fix ESM LM head test by Rocketknight1 in 20045
* Update README.md by bofenghuang in 20063
* fix `tokenizer_type` to avoid error when loading checkpoint back by pacman100 in 20062
* [Trainer] Fix model name in push_to_hub by sanchit-gandhi in 20064
* PoolformerImageProcessor defaults to match previous FE by amyeroberts in 20048
* change constant torch.tensor to torch.full by MerHS in 20061
* Update READMEs for ESMFold and add notebooks by Rocketknight1 in 20067
* Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by jordiclive in 20068
* Allow passing arguments to model testers for CLIP-like models by ydshieh in 20044
* Show installed libraries and their versions in GA jobs by ydshieh in 20069
* Update defaults and logic to match old FE by amyeroberts in 20065
* Update modeling_tf_utils.py by cakiki in 20076
* Update hub.py by cakiki in 20075
* [Doctest] Add configuration_dpr.py by Saad135 in 20080
* Removing RobertaConfig inheritance from CamembertConfig by Saad135 in 20059
* Skip 2 tests in `VisionTextDualEncoderProcessorTest` by ydshieh in 20098
* Replace unsupported facebookresearch/bitsandbytes by tomaarsen in 20093
* docs: Resolve many typos in the English docs by tomaarsen in 20088
* use huggingface_hub.model_inifo() to get pipline_tag by y-tag in 20077
* Fix `generate_dummy_inputs` for `ImageGPTOnnxConfig` by ydshieh in 20103
* docs: Fixed variables in f-strings by tomaarsen in 20087
* Add new terms to the glossary by stevhliu in 20051
* Replace awkward timm link with the expected one by tomaarsen in 20109
* Fix AutoTokenizer with subfolder passed by sgugger in 20110
* [Audio Processor] Only pass sr to feat extractor by sanchit-gandhi in 20022
* Update github pr docs actions by mishig25 in 20125
* Adapt has_labels test when no labels were found by sgugger in 20113
* Improve tiny model creation script by ydshieh in 20119
* Remove BertConfig inheritance from RobertaConfig by Saad135 in 20124
* [Swin] Add Swin SimMIM checkpoints by NielsRogge in 20034
* Update `CLIPSegModelTester` by ydshieh in 20134
* Update SwinForMaskedImageModeling doctest values by amyeroberts in 20139
* Attempting to test automatically the `_keys_to_ignore`. by Narsil in 20042
* Generate: move generation_*.py src files into generation/*.py by gante in 20096
* add cv + audio labels by stevhliu in 20114
* Update VisionEncoderDecoder to use an image processor by amyeroberts in 20137
* [CLIPSeg] Add resources by NielsRogge in 20118
* Make DummyObject more robust by mariosasko in 20146
* Add `RoCBertTokenizer` to `TOKENIZER_MAPPING_NAMES` by ydshieh in 20141
* Adding support for LayoutLMvX variants for `object-detection`. by Narsil in 20143
* Add doc tests by NielsRogge in 20158
* doc comment fix: Args was in wrong place by hollance in 20164
* Update `OnnxConfig.generate_dummy_inputs` to check `ImageProcessingMixin` by ydshieh in 20157
* Generate: fix TF doctests by gante in 20159
* Fix arg names for our models by Rocketknight1 in 20166
* [processor] Add 'model input names' property by sanchit-gandhi in 20117
* Fix object-detection bug (height, width inversion). by Narsil in 20167
* [OWL-ViT] Make model consistent with CLIP by NielsRogge in 20144
* Fix type - update any PIL.Image.Resampling by amyeroberts in 20172
* Fix tapas scatter by Bearnardd in 20149
* Update README.md by code-with-rajeev in 19530
* Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder just nice to use. by Narsil in 19571
* Pytorch type hints by IMvision12 in 20112
* Generate: TF sample doctest result update by gante in 20208
* [ROC_BERT] Make CI happy by younesbelkada in 20175
* add _keys_to_ignore_on_load_unexpected = [r"pooler"] by ArthurZucker in 20210
* docs: translated index page to korean by wonhyeongseo in 20180
* feat: add i18n issue template by wonhyeongseo in 20199
* [Examples] Generalise Seq2Seq ASR to handle Whisper by sanchit-gandhi in 19519
* mark `test_save_load_fast_init_from_base` as `is_flaky` by ydshieh in 20200
* Update README.md by Nietism in 20188
* Downgrade log warning -> info by amyeroberts in 20202
* Generate: add Bloom fixes for contrastive search by gante in 20213
* Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by Narsil in 20104
* [docs] set overflowing image width to auto-scale by wonhyeongseo in 20197
* Update tokenizer_summary.mdx by bofenghuang in 20135
* Make `ImageSegmentationPipelineTests` less flaky by ydshieh in 20147
* update relative positional embedding by ArthurZucker in 20203
* [WHISPER] Update modeling tests by ArthurZucker in 20162
* Add `accelerate` support for `ViT` family by younesbelkada in 20174
* Add param_name to size_dict logs & tidy by amyeroberts in 20205
* Add object detection + segmentation transforms by amyeroberts in 20003
* Typo on doctring in ElectraTokenizer by FacerAin in 20192
* Remove `authorized_missing_keys`in favor of _keys_to_ignore_on_load_missing by ArthurZucker in 20228
* Add missing ESM autoclass by Rocketknight1 in 20177
* fix device issue by ydshieh in 20227
* fixed spelling error in testing.mdx by kasmith11 in 20220
* Fix `run_clip.py` by ydshieh in 20234
* Fix docstring of CLIPTokenizer(Fast) by TilmannR in 20233
* Fix MaskformerFeatureExtractor by NielsRogge in 20100
* New logging support to "Trainer" Class (ClearML Logger) by skinan in 20184
* Enable PyTorch 1.13 by sgugger in 20168
* [CLIP] allow loading projection layer in vision and text model by patil-suraj in 18962
* Slightly alter Keras dummy loss by Rocketknight1 in 20232
* Add to DeBERTa resources by Saad135 in 20155
* Add clip resources to the transformers documentation by ambujpawar in 20190
* Update reqs to include min gather_for_metrics Accelerate version by muellerzr in 20242
* Allow trainer to return eval. loss for CLIP-like models by ydshieh in 20214
* Adds image-guided object detection support to OWL-ViT by alaradirik in 20136
* Adding `audio-classification` example in the doc. by Narsil in 20235
* Updating the doctest for conversational. by Narsil in 20236
* Adding doctest for `fill-mask` pipeline. by Narsil in 20241
* Adding doctest for `feature-extraction`. by Narsil in 20240
* Adding ASR pipeline example. by Narsil in 20226
* Adding doctest for document-question-answering by Narsil in 20239
* Adding an example for `depth-estimation` pipeline. by Narsil in 20237
* Complete doc migration by mishig25 in 20267
* Fix result saving errors of pytorch examples by li-plus in 20276
* Adding a doctest for `table-question-answering` pipeline. by Narsil in 20260
* Adding doctest for `image-segmentation` pipeline. by Narsil in 20256
* Adding doctest for `text2text-generation` pipeline. by Narsil in 20261
* Adding doctest for `text-generation` pipeline. by Narsil in 20264
* Add TF protein notebook to notebooks doc by Rocketknight1 in 20271
* Rephrasing the link. by Narsil in 20253
* Add Chinese-CLIP implementation by yangapku in 20368
* Adding doctest example for `image-classification` pipeline. by Narsil in 20254
* Adding doctest for `zero-shot-image-classification` pipeline. by Narsil in 20272
* Adding doctest for `zero-shot-classification` pipeline. by Narsil in 20268
* Adding doctest for `visual-question-answering` pipeline. by Narsil in 20266
* Adding doctest for `text-classification` pipeline. by Narsil in 20262
* Adding doctest for `question-answering` pipeline. by Narsil in 20259
* [Docs] Add resources of OpenAI GPT by shogohida in 20084
* Adding doctest for `image-to-text` pipeline. by Narsil in 20257
* Adding doctest for `token-classification` pipeline. by Narsil in 20265
* remaining pytorch type hints by IMvision12 in 20217
* Data collator for token classification pads labels column when receives pytorch tensors by markovalexander in 20244
* [Doctest] Add configuration_deformable_detr.py by Saad135 in 20273
* Fix summarization script by muellerzr in 20286
* [DOCTEST] Fix the documentation of RoCBert by ArthurZucker in 20142
* [bnb] Let's warn users when saving 8-bit models by younesbelkada in 20282
* Adding `zero-shot-object-detection` pipeline doctest. by Narsil in 20274
* Adding doctest for `object-detection` pipeline. by Narsil in 20258
* Image transforms functionality used instead by amyeroberts in 20278
* TF: add test for `PushToHubCallback` by gante in 20231
* Generate: general TF XLA constrastive search are now slow tests by gante in 20277
* Fixing the doctests failures. by Narsil in 20294
* set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by sywangyi in 20289
* Add docstrings for canine model by raghavanone in 19457
* Add missing report button for Example test by ydshieh in 20293
* refactor test by younesbelkada in 20300
* [Tiny model creation] deal with `ImageProcessor` by ydshieh in 20298
* Fix blender bot missleading doc by ArthurZucker in 20301
* remove two tokens that should not be suppressed by ArthurZucker in 20302
* [ASR Examples] Update README for Whisper by sanchit-gandhi in 20230
* Add padding image transformation by amyeroberts in 19838
* Pin TensorFlow by sgugger in 20313
* Add AnyPrecisionAdamW optimizer by atturaioe in 18961
* [Proposal] Breaking change `zero-shot-object-detection` for improved consistency. by Narsil in 20280
* Fix flakey test with seed by muellerzr in 20318
* Pin TF 2.10.1 for Push CI by ydshieh in 20319
* Remove double brackets by stevhliu in 20307
* TF: future proof our keras imports by gante in 20317
* organize pipelines by modality by stevhliu in 20306
* Fix torch device issues by ydshieh in 20304
* Generate: add generation config class by gante in 20218
* translate zh quicktour by bfss in 20095)
* Add Spanish translation of serialization.mdx by donelianc in 20245
* Add LayerScale to NAT/DiNAT by alihassanijr in 20325
* [Switch Transformers] Fix failing slow test by younesbelkada in 20346
* fix: "BigSicence" typo in docs by rajrajhans in 20331
* Generate: `model_kwargs` can also be an input to `prepare_inputs_for_generation` by gante in 20353
* Update Special Language Tokens for PLBART by jordiclive in 19980
* Add resources by NielsRogge in 20296
* Enhance HfArgumentParser functionality and ease of use by konstantinjdobler in 20323
* Add inference section to task guides by stevhliu in 18781
* Fix toctree for Section 3 in Spanish Documentation by donelianc in 20360
* Generate: shorter XLA contrastive search tests by gante in 20354
* revert `keys_to_ignore` for M2M100 by younesbelkada in 20381
* add `accelerate` support for `ESM` by younesbelkada in 20379
* Fix nightly runs by sgugger in 20352
* Optimizes DonutProcessor token2json method for speed by michaelnation26 in 20283
* Indicate better minimal version of PyTorch in big model inference by sgugger in 20385
* Fix longformer onnx broken export by fxmarty in 20292
* Use tiny models for ONNX tests - text modality by lewtun in 20333
* [ESM] fix `accelerate` tests for esmfold by younesbelkada in 20387
* Generate: fix plbart generation tests by gante in 20391
* [bloom] convert script tweaks by stas00 in 18593
* Fix doctest file path by ydshieh in 20400
* [Image Transformers] to_pil fix float edge cases by patrickvonplaten in 20406
* make daily CI happy by younesbelkada in 20410
* fix nasty `bnb` bug by younesbelkada in 20408
* change the way sentinel tokens can retrived by raghavanone in 20373
* [BNB] Throw `ValueError` when trying to cast or assign by younesbelkada in 20409
* Use updated `model_max_length` when saving tokenizers by ydshieh in 20401
* Add Spanish translation of pr_checks.mdx by donelianc in 20339
* fix device in longformer onnx path by fxmarty in 20419
* Fix ModelOutput instantiation when there is only one tuple by sgugger in 20416
* `accelerate` support for `OwlViT` by younesbelkada in 20411
* [AnyPrecisionAdamW] test fix by stas00 in 20454
* fix `word_to_tokens` docstring format by SaulLu in 20450
* Fix typo in FSMT Tokenizer by kamalkraj in 20456
* Fix device issues in `CLIPSegModelIntegrationTest` by ydshieh in 20467
* Fix links for `contrastive_loss` by ydshieh in 20455
* Fix doctests for audio models by ydshieh in 20468
* Fix ESM checkpoints for tests by Rocketknight1 in 20436
* More TF int dtype fixes by Rocketknight1 in 20384
* make tensors in function build_relative_position created on proper device instead of always on cpu by qq775294390 in 20434
* update cpu related doc by sywangyi in 20444
* with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by sywangyi in 20445
* [CLIPTokenizer] Improve warning by patrickvonplaten in 20458
* Replace assertions with value errors on distilbert model by JuheonChu in 20463
* [Doctest] Add configuration_fsmt.py by sha016 in 19936
* Replace assertion with ValueError exceptions in run_image_captioning_flax.py by katiele47 in 20365
* [FLAX] Add dtype to embedding for bert/bart/opt/t5 by merrymercy in 20340
* fix both failing RoCBert tests by ArthurZucker in 20469
* Include image processor in add-new-model-like by amyeroberts in 20439
* chore: add link to the video cls notebook. by sayakpaul in 20386
* add timeout option for deepspeed engine by henghuiz in 20443
* [Maskformer] Add MaskFormerSwin backbone by NielsRogge in 20344
* Extract warnings from CI artifacts by ydshieh in 20474
* Add Donut image processor by amyeroberts in 20425
* Fix torch meshgrid warnings by fxmarty in 20475
* Fix init import_structure sorting by sgugger in 20477
* extract warnings in GH workflows by ydshieh in 20487
* add in layer gpt2 tokenizer by piEsposito in 20421
* Replace assert statements with raise exceptions by miyu386 in 20478
* fixed small typo by sandeepgadhwal in 20490
* Fix documentation code to import facebook/detr-resnet-50 model by JuanFKurucz in 20491
* Fix disk offload for full safetensors checkpoints by sgugger in 20497
* [modelcard] Check for IterableDataset by sanchit-gandhi in 20495
* [modelcard] Set model name if empty by sanchit-gandhi in 20496
* Add segmentation + object detection image processors by amyeroberts in 20160
* remove `attention_mask` truncation in whisper by ydshieh in 20488
* Make `add_special_tokens` more clear by ydshieh in 20424
* [OPT/Galactica] Load large `galactica` models by younesbelkada in 20390
* Support extraction of both train and eval XLA graphs by jeffhataws in 20492
* fix ipex+fp32 jit trace error in ipex 1.13 by sywangyi in 20504
* Expected output for the test changed by ArthurZucker in 20493
* Fix TF nightly tests by Rocketknight1 in 20507
* Update doc examples feature extractor -> image processor by amyeroberts in 20501
* Fix Typo in Docs for GPU by julianpollmann in 20509
* Fix minimum version for device_map by sgugger in 20489
* Update `AutomaticSpeechRecognitionPipeline` doc example by ydshieh in 20512
* Add `natten` for CI by ydshieh in 20511
* Fix Data2VecTextForCasualLM example code documentation by JuanFKurucz in 20510
* Add some warning for Dynamo and enable TF32 when it's set by sgugger in 20515
* [modelcard] Update dataset tags by sanchit-gandhi in 20506
* Change Doctests CI launch time by ydshieh in 20523
* Fix `PLBart` doctest by ydshieh in 20527
* Fix `ConditionalDetrForSegmentation` doc example by ydshieh in 20531
* add doc for by younesbelkada in 20525
* Update `ZeroShotObjectDetectionPipeline` doc example by ydshieh in 20528
* update post_process_image_guided_detection by fcakyon in 20521
* QnA example: add speed metric by sywangyi in 20522
* Fix doctest by NielsRogge in 20534
* Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by JuanFKurucz in 20516
* Fix link in pipeline device map by stevhliu in 20517

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sww9370
* Add RocBert (20013)
* IMvision12
* Pytorch type hints (20112)
* remaining pytorch type hints (20217)
* alihassanijr
* Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (20219)
* Add LayerScale to NAT/DiNAT (20325)
* bfss
* translate zh quicktour(20095) (20181)
* donelianc
* Add Spanish translation of serialization.mdx (20245)
* Fix toctree for Section 3 in Spanish Documentation (20360)
* Add Spanish translation of pr_checks.mdx (20339)
* yangapku
* Add Chinese-CLIP implementation (20368)

4.24.0

Not secure
ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, [from 8 million parameters up to a huge 15 billion parameter model](https://huggingface.co/models?other=esm).

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and `openfold`, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper [Biological structure and function emerge from scaling
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

* Add ESMFold by Rocketknight1 in 19977
* TF port of ESM by Rocketknight1 in 19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable [LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)-like document understanding for many languages.

It was proposed in [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding.

* Add LiLT by NielsRogge in 19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

* Add `flan-t5` documentation page by younesbelkada in 19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in [PubTables-1M: Towards comprehensive table extraction from unstructured documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.

* Add table transformer [v2] by NielsRogge in 19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in [A Contrastive Framework for Neural Text Generation](https://arxiv.org/abs/2202.06417) by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

* Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by gmftbyGMFTBY in 19477

Safety and security

We continue to explore the new serialization format not using Pickle via the [safetensors](https://github.com/huggingface/safetensors) library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

* Safetensors tf by sgugger in 19900

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

* 🚨🚨🚨 TF: Remove `TFWrappedEmbeddings` (breaking: TF embedding initialization updated for encoder-decoder models) by gante in 19263
* 🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations by Narsil in 19678

Bugfixes and improvements

* Enabling custom TF signature draft by dimitreOliveira in 19249
* Fix whisper for `pipeline` by ArthurZucker in 19482
* Extend `nested_XXX` functions to mappings/dicts. by Guillem96 in 19455
* Syntax issues (lines 126, 203) by kant in 19444
* CLI: add import protection to datasets by gante in 19470
* Fix `TFGroupViT` CI by ydshieh in 19461
* Fix doctests for `DeiT` and `TFGroupViT` by ydshieh in 19466
* Update `WhisperModelIntegrationTests.test_large_batched_generation` by ydshieh in 19472
* [Swin] Replace hard-coded batch size to enable dynamic ONNX export by lewtun in 19475
* TF: TFBart embedding initialization by gante in 19460
* Make LayoutLM tokenizers independent from BertTokenizer by arnaudstiegler in 19351
* Make `XLMRoberta` model and config independent from `Roberta` by asofiaoliveira in 19359
* Fix `get_embedding` dtype at init. time by ydshieh in 19473
* Decouples `XLMProphet` model from `Prophet` by srhrshr in 19406
* Implement multiple span support for DocumentQuestionAnswering by ankrgyl in 19204
* Add warning in `generate` & `device_map=auto` & half precision models by younesbelkada in 19468
* Update TF whisper doc tests by amyeroberts in 19484
* Make bert_japanese and cpm independent of their inherited modules by Davidy22 in 19431
* Added tokenize keyword arguments to feature extraction pipeline by quancore in 19382
* Adding the README_es.md and reference to it in the others files readme by Oussamaosman02 in 19427
* [CvT] Tensorflow implementation by mathieujouffroy in 18597
* `python3` instead of `python` in push CI setup job by ydshieh in 19492
* Update PT to TF CLI for audio models by amyeroberts in 19465
* New by IMvision12 in 19481
* Fix `OPTForQuestionAnswering` doctest by ydshieh in 19479
* Use a dynamic configuration for circleCI tests by sgugger in 19325
* Add multi-node conditions in trainer_qa.py and trainer_seq2seq.py by regisss in 19502
* update doc for perf_train_cpu_many by sywangyi in 19506
* Avoid Push CI failing to report due to many commits being merged by ydshieh in 19496
* [Doctest] Add `configuration_bert.py` to doctest by ydshieh in 19485
* Fix whisper doc by ArthurZucker in 19518
* Syntax issue (line 497, 526) Documentation by kant in 19442
* Fix pytorch seq2seq qa by FilipposVentirozos in 19258
* Add depth estimation pipeline by nandwalritik in 18618
* Adding links to pipelines parameters documentation by AndreaSottana in 19227
* fix MarkupLMProcessor option flag by davanstrien in 19526
* [Doctest] Bart configuration update by imarekkus in 19524
* Remove roberta dependency from longformer fast tokenizer by sirmammingtonham in 19501
* made tokenization_roformer independent of bert by naveennamani in 19426
* Remove bert fast dependency from electra by Threepointone4 in 19520
* [Examples] Fix typos in run speech recognition seq2seq by sanchit-gandhi in 19514
* [X-CLIP] Fix doc tests by NielsRogge in 19523
* Update Marian config default vocabulary size by gante in 19464
* Make `MobileBert` tokenizers independent from `Bert` by 501Good in 19531
* [Whisper] Fix gradient checkpointing by sanchit-gandhi in 19538
* Syntax issues (paragraphs 122, 130, 147, 155) Documentation: sgugger by kant in 19437
* using trunc_normal for weight init & cls_token by mathieujouffroy in 19486
* Remove `MarkupLMForMaskedLM` from `MODEL_WITH_LM_HEAD_MAPPING_NAMES` by ydshieh in 19534
* Image transforms library by amyeroberts in 18520
* Add a decorator for flaky tests by sgugger in 19498
* [Doctest] Add `configuration_yolos.py` by daspartho in 19539
* Albert config update by imarekkus in 19541
* [Doctest] `Add configuration_whisper.py` by daspartho in 19540
* Throw an error if `getattribute_from_module` can't find anything by ydshieh in 19535
* [Doctest] Beit Config for doctest by daspartho in 19542
* Create the arange tensor on device for enabling CUDA-Graph for Clip Encoder by RezaYazdaniAminabadi in 19503
* [Doctest] GPT2 Config for doctest by daspartho in 19549
* Build Push CI images also in a daily basis by ydshieh in 19532
* Fix checkpoint used in `MarkupLMConfig` by ydshieh in 19547
* add a note to whisper docs clarifying support of long-form decoding by akashmjn in 19497
* [Whisper] Freeze params of encoder by sanchit-gandhi in 19527
* [Doctest] Fixing the Doctest for imageGPT config by RamitPahwa in 19556
* [Doctest] Fixing mobile bert configuration doctest by RamitPahwa in 19557
* [Doctest] Fixing doctest bert_generation configuration by Threepointone4 in 19558
* [Doctest] DeiT Config for doctest by daspartho in 19560
* [Doctest] Reformer Config for doctest by daspartho in 19562
* [Doctest] RoBERTa Config for doctest by daspartho in 19563
* [Doctest] Add `configuration_vit.py` by daspartho in 19561
* [Doctest] bloom config update by imarekkus in 19566
* [Re-submit] Compute true loss Flax examples by duongna21 in 19504
* Fix fairseq wav2vec2-xls-r pretrained weights conversion scripts by heatz123 in 19508
* [Doctest] CTRL config by imarekkus in 19574
* [Doctest] Add configuration_canine.py by IzicTemi in 19575
* [Doctests] Config files for `ViTMAE` and `YOSO` by grgkaran03 in 19567
* Added type hints to `DebertaV2ForMultipleChoice` Pytorch by IMvision12 in 19536
* [WIP] Add type hints for Lxmert (TF) by elusenji in 19441
* [Doctests] add `configuration_blenderbot.py` by grgkaran03 in 19577
* [Doctest] adds trajectory_transformer config to Docs test by SD-13 in 19586
* [Doctests] add `configuration_blenderbot_small.py` by grgkaran03 in 19589
* [Doctest] Swin V2 Config for doctest by daspartho in 19595
* [Doctest] Swin Config for doctest by daspartho in 19594
* [Doctest] SEW Config for doctest by daspartho in 19597
* [Doctest] UniSpeech Config for doctest by daspartho in 19596
* [Doctest] SEW-D Config for doctest by daspartho in 19598
* [Doctest] fix doc test for megatron bert by RamitPahwa in 19600
* Adding type hints for TFXLnet by thliang01 in 19344
* [Doctest] Add `configuration_bigbird_pegasus.py` and `configuration_big_bird.py` by Xabilahu in 19606
* Cast masks to np.unit8 before converting to PIL.Image.Image by amyeroberts in 19616
* [Whisper] Don't return attention mask in feat extractor by sanchit-gandhi in 19521
* [Time Series Transformer] Add doc tests by NielsRogge in 19607
* fix BLOOM ONNX config by NouamaneTazi in 19573
* Fix `test_tf_encode_plus_sent_to_model` for `TAPAS` by ydshieh in 19559
* Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving by piEsposito in 19590
* add gloo backend support for CPU DDP by sywangyi in 19555
* Fix `ImageToTextPipelineTests.test_small_model_tf` by ydshieh in 19565
* Fix `FlaubertTokenizer` by ydshieh in 19552
* Visual Bert config for doctest by ztjhz in 19605
* GPTTokenizer dependency removed from deberta class by RamitPahwa in 19551
* xlm roberta config for doctest by ztjhz in 19609
* Ernie config for doctest by ztjhz in 19611
* xlm roberta xl config for doctest by ztjhz in 19610
* fix: small error by 0xflotus in 19612
* Improve error messaging for ASR pipeline. by Narsil in 19570
* [Doctest] LeViT Config for doctest by daspartho in 19622
* [Doctest] DistilBERT Config for doctest by daspartho in 19621
* [Whisper] Fix gradient checkpointing (again!) by sanchit-gandhi in 19548
* [Doctest] Add `configuration_resnet.py` by daspartho in 19620
* Fix whisper doc by ArthurZucker in 19608
* Sharding fails in TF when absolute scope was modified if `.` in layer name by ArthurZucker in 19124
* [Doctest] Add configuration_vision_text_dual_encoder.py by SD-13 in 19580
* [Doctest] Add configuration_vision_encoder_decoder.py by SD-13 in 19583
* [Doctest] Add configuration_time_series_transformer.py by SD-13 in 19582
* Tokenizer from_pretrained should not use local files named like tokenizer files by sgugger in 19626
* [Doctest] CodeGen config for doctest by AymenBer99 in 19633
* [Doctest] Add `configuration_data2vec_text.py` by daspartho in 19636
* [Doctest] Conditional DETR config for doctest by AymenBer99 in 19641
* [Doctest] XLNet config for doctest by AymenBer99 in 19649
* [Doctest] Add `configuration_trocr.py` by thliang01 in 19658
* Add doctest info in testingmdx by ArthurZucker in 19623
* Add pillow to layoutlmv3 example requirements.txt by Spacefish in 19663
* add return types for tf gptj, xlm, and xlnet by sirmammingtonham in 19638
* Fix pipeline predict transform methods by s-udhaya in 19657
* Type hints MCTCT by rchan26 in 19618
* added type hints for Yolos Pytorch model by WhiteWolf47 in 19545
* A few CI fixes for `DocumentQuestionAnsweringPipeline` by ankrgyl in 19584
* Removed Bert interdependency from Funnel transformer by mukesh663 in 19655
* fix warnings in deberta by sanderland in 19458
* word replacement line 231 by shreem-123 in 19662
* [Doctest] Add configuration_transfo_xl.py by thliang01 in 19651
* Update perf_train_gpu_one.mdx by cakiki in 19676
* object-detection instead of object_detection by Spacefish in 19677
* add return_tensor parameter for feature extraction by ajsanjoaquin in 19257
* Fix code examples of DETR and YOLOS by NielsRogge in 19669
* Revert "add return_tensor parameter for feature extraction by sgugger in 19257)"
* Fixed the docstring and type hint for forced_decoder_ids option in Ge… by koreyou in 19640
* Add normalize to image transforms module by amyeroberts in 19544
* [Doctest] Data2VecAudio Config for doctest by daspartho in 19635
* Update ESM checkpoints to point to `facebook/` by Rocketknight1 in 19675
* Removed XLMModel inheritance from FlaubertModel(torch+tf) by D3xter1922 in 19432
* [Examples] make default preprocessing_num_workers=1 by Yang-YiFan in 19684
* [Doctest] Add configuration_convbert.py by AymenBer99 in 19643
* [Doctest] Add configuration_realm.py by ak04p in 19646
* Update CONTRIBUTING.md by shreem-123 in 19689
* [Doctest] Add `configuration_data2vec_vision.py` by daspartho in 19637
* Fix some CI torch device issues for PyTorch 1.13 by ydshieh in 19681
* Fix checkpoint used in `VisualBertConfig` doc example by ydshieh in 19692
* Fix dtype in radnomly initialized head by sgugger in 19690
* fix tests by ArthurZucker in 19670
* fix test whisper with new max length by ArthurZucker in 19668
* check decoder_inputs_embeds is None before shifting labels by ArthurZucker in 19671
* Fix docs by NielsRogge in 19687
* update documentation by ArthurZucker in 19706
* Improve DETR models by NielsRogge in 19644
* Small fixes for TF-ESM1b and ESM-1b weight conversions by Rocketknight1 in 19683
* Fix typo in perf docs by cakiki in 19705
* Fix redundant normalization of OWL-ViT text embeddings by alaradirik in 19712
* Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode by falcaopetri in 18351
* [Doctest] CVT config for doctest by AymenBer99 in 19695
* [Doctest] Add configuration_wav2vec2.py to documentation_tests.py by juancopi81 in 19698
* ]Fixed pegasus config doctest by mukesh663 in 19722
* fix seq2seqtrainer predict without labels by IvanSedykh in 19721
* add return_tensors parameter for feature_extraction 2 by Narsil in 19707
* Improving `image-segmentation` pipeline tests. by Narsil in 19710
* [Doctest] Adding config files for convnext by soma2000-lang in 19717
* [Doctest] Fixing doctest `configuration_pegasus_x.py` by mukesh663 in 19725
* Specify TF framework in TF-related pipeline tests by ydshieh in 19719
* Add docs by NielsRogge in 19729
* Fix activations being all the same module by sgugger in 19728
* add `accelerate` support for `Whisper` by younesbelkada in 19697
* Clean up deprecation warnings by Davidy22 in 19654
* Repo utils test by sgugger in 19696
* Add decorator to flaky test by amyeroberts in 19674
* [Doctest] Add doctest for `FlavaConfig` and `FNetConfig` by ndrohith09 in 19724
* Update contribution guide by stevhliu in 19700
* [Doctest] Add wav2vec2_conformer for doctest by juancopi81 in 19734
* [Doctest] XLM Config for doctest by AymenBer99 in 19685
* [Doctest] Add `configuration_clip.py` by daspartho in 19647
* [Doctest] GPTNeoConfig , GPTNeoXConfig , GPTNeoXJapaneseConfig by ndrohith09 in 19741
* Update modeling_markuplm.py by IMvision12 in 19723
* Fix issue 19300 by raghavanone in 19483
* [Doctest] Add `configuration_wavlm.py` by juancopi81 in 19749
* Specify TF framework explicitly in more pipeline tests by ydshieh in 19748
* Fix cache version file creation by sgugger in 19750
* Image transforms add center crop by amyeroberts in 19718
* [Doctest] Add `configuration_decision_transformer.py` by Xabilahu in 19751
* [Doctest] Add `configuration_detr.py` by Xabilahu in 19752
* Fixed spacing errors by shreya24ag in 19754
* All broken links were fixed in contributing file by mdfaizanahmed786 in 19760
* [Doctest] SpeechToTextTransformer Config for doctest by daspartho in 19757
* [Doctest] SqueezeBERT Config for doctest by daspartho in 19758
* [Doctest] SpeechToTextTransformer2 Config for doctest by daspartho in 19756
* [Doctest] OpenAIGPTConfig and OPTConfig by ndrohith09 in 19763
* `image-segmentation` pipeline: re-enable `small_model_pt` test. by Narsil in 19716
* Update modeling_layoutlmv3.py by IMvision12 in 19753
* adding key pair dataset by rohit1998 in 19765
* Fix exception thrown using MishActivation by chinoll in 19739
* [FLAX] Add dtype to embedding for gpt2 model by merrymercy in 18462
* TF: sample generation compatible with XLA and dynamic batch sizes by gante in 19773
* Install tf2onnx dev version by ydshieh in 19755
* Fix docker image build by ydshieh in 19759
* PT <-> TF for composite models by ydshieh in 19732
* Add warning about restarting runtime to import errors by Rocketknight1 in 19774
* Added support for multivariate independent emission heads by kashif in 19453
* Update `ImageToTextPipelineTests.test_small_model_tf` by ydshieh in 19785
* Make public versions of private tensor utils by sgugger in 19775
* Update training.mdx by ftorres16 in 19791
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. by davialvb in 19779
* Add sentencepiece to BertJapaneseTokenizer by conan1024hao in 19769
* Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript` by ydshieh in 19786
* Fix nightly test setup by sgugger in 19792
* Fix image segmentation pipeline errors, resolve backward compatibility issues by alaradirik in 19768
* Fix error/typo in docstring of TokenClassificationPipeline by pchr8 in 19798
* Use None to detect if truncation was unset by sgugger in 19794
* Generate: contrastive search test updates by gante in 19787
* Run some TF Whisper tests in subprocesses to avoid GPU OOM by ydshieh in 19772
* Added translation of run_scripts.mdx to Portuguese Issue 16824 by davialvb in 19800
* Generate: minor docstring fix by gante in 19801
* [Doctest] `MaskFormerConfig` doctest by sha016 in 19817
* [Doctest] Add `configuration_plbart.py` by ayaka14732 in 19809
* [Doctest] Add `configuration_poolformer.py` by ayaka14732 in 19808
* [Doctest] Add `configuration_electra.py` by ayaka14732 in 19807
* [Doctest] Add `configuration_nezha.py` by ayaka14732 in 19810
* Display the number of trainable parameters when lauching a training by regisss in 19835
* replace reference to Datasets in metrics deprecation with Evaluate by angus-lherrou in 19812
* Fix OOM in Config doctest by ydshieh in 19840
* fix broken links in testing.mdx by XFFXFF in 19820
* fix image2test args forwarding by kventinel in 19648
* Added translation of converting_tensorflow_models.mdx to Portuguese Issue 16824 by davialvb in 19824
* Fix nightly CircleCI by ydshieh in 19837
* fixed typo in fp16 training section for perf_train_gpu_one by dsingal0 in 19736
* Update `LEDModelIntegrationTests` expected values by ydshieh in 19841
* Improve check copies by kventinel in 19829
* Fix doctest for `MarkupLM` by ydshieh in 19845
* add small updates only by stevhliu in 19847
* Refactor conversion function by sgugger in 19799
* Spanish translation of multiple_choice.mdx, question_answering.mdx. by alceballosa in 19821
* Fix doctest for `GenerationMixin.contrastive_search` by ydshieh in 19863
* Add missing lang tokens in M2M100Tokenizer.get_vocab by guillaumekln in 18416
* Added translation of serialization.mdx to Portuguese Issue 16824 by davialvb in 19869
* Generate: contrastive search cosmetic tweaks by gante in 19871
* [Past CI] Vilt only supports PT >= v1.10 by LysandreJik in 19851
* Fix incorrect model<->tokenizer mapping in tokenization testing by ydshieh in 19872
* Update doc for revision and token by sgugger in 19793
* Factored out some code in the `image-segmentation` pipeline. by Narsil in 19727
* [DOCTEST] Config doctest for `MCTCT`, `MBart` and `LayoutLM` by Revanth2002 in 19889
* Fix LR by regisss in 19875
* Correct README image text by KayleeDavisGitHub in 19883
* No conv bn folding in ipex to avoid warning by sanderland in 19870
* Add missing information on token_type_ids for roberta model by raghavanone in 19766
* Change the import of kenlm from github to pypi by raghavanone in 19770
* Update `max_diff` in `test_save_load_fast_init_to_base` by ydshieh in 19849
* Allow flax subfolder by patrickvonplaten in 19902
* `accelerate` support for `RoBERTa` family by younesbelkada in 19906
* Add checkpoint links in a few config classes by ydshieh in 19910
* Generate: contrastive search uses existing abstractions and conventions by gante in 19896
* Convert None logits processor/stopping criteria to empty list. by ccmaymay in 19880
* Some fixes regarding auto mappings and test class names by ydshieh in 19923
* Fix bug in Wav2Vec2's GPU tests by falcaopetri in 19803
* Fix warning when collating list of numpy arrays by sgugger in 19846
* Add type hints to TFPegasusModel by EdAbati in 19858
* Remove embarrassing debug print() in save_pretrained by Rocketknight1 in 19922
* Add `accelerate` support for M2M100 by younesbelkada in 19912
* Add RoBERTa resources by stevhliu in 19911
* Add T5 resources by stevhliu in 19878
* Add BLOOM resources by stevhliu in 19881
* Add GPT2 resources by stevhliu in 19879
* Let inputs of fast tokenizers be tuples as well as lists by sgugger in 19898
* Add `accelerate` support for BART-like models by younesbelkada in 19927
* Create dummy models by ydshieh in 19901
* Support segformer fx by dwlim-nota in 19924
* Use self._trial to generate trial_name for Trainer. by reyoung in 19874
* Add Onnx Config for ImageGPT by RaghavPrabhakar66 in 19868
* Update Code of Conduct to Contributor Covenant v2.1 by pankali in 19935
* add resources for bart by stevhliu in 19928
* add resources for distilbert by stevhliu in 19930
* Add wav2vec2 resources by stevhliu in 19931
* [Conditional, Deformable DETR] Add postprocessing methods by NielsRogge in 19709
* Fix ONNX tests for ONNX Runtime v1.13.1 by lewtun in 19950
* donut -> donut-swin by ydshieh in 19920
* [Doctest] Add configuration_deberta.py by Saad135 in 19968
* gradient checkpointing for GPT-NeoX by chiaolun in 19946
* [modelcard] Update for ASR by sanchit-gandhi in 19985
* [ASR] Update 'tasks' for model card by sanchit-gandhi in 19986
* Tranformers documentation translation to Italian 17459 by draperkm in 19988
* Pin torch to < 1.13 temporarily by ydshieh in 19989
* Add support for gradient checkpointing by NielsRogge in 19990

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* arnaudstiegler
* Make LayoutLM tokenizers independent from BertTokenizer (19351)
* asofiaoliveira
* Make `XLMRoberta` model and config independent from `Roberta` (19359)
* srhrshr
* Decouples `XLMProphet` model from `Prophet` (19406)
* Davidy22
* Make bert_japanese and cpm independent of their inherited modules (19431)
* Clean up deprecation warnings (19654)
* mathieujouffroy
* [CvT] Tensorflow implementation (18597)
* using trunc_normal for weight init & cls_token (19486)
* IMvision12
* New (19481)
* Added type hints to `DebertaV2ForMultipleChoice` Pytorch (19536)
* Update modeling_markuplm.py (19723)
* Update modeling_layoutlmv3.py (19753)
* 501Good
* Make `MobileBert` tokenizers independent from `Bert` (19531)
* mukesh663
* Removed Bert interdependency from Funnel transformer (19655)
* ]Fixed pegasus config doctest (19722)
* [Doctest] Fixing doctest `configuration_pegasus_x.py` (19725)
* D3xter1922
* Removed XLMModel inheritance from FlaubertModel(torch+tf) (19432)
* falcaopetri
* Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (18351)
* Fix bug in Wav2Vec2's GPU tests (19803)
* gmftbyGMFTBY
* Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (19477)
* davialvb
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (19779)
* Added translation of run_scripts.mdx to Portuguese Issue 16824 (19800)
* Added translation of converting_tensorflow_models.mdx to Portuguese Issue 16824 (19824)
* Added translation of serialization.mdx to Portuguese Issue 16824 (19869)
* alceballosa
* Spanish translation of multiple_choice.mdx, question_answering.mdx. (19821)

4.23.1

Not secure
Fix a revert introduced by mistake making the `"automatic-speech-recognition"` for Whisper.

- Fix whisper for pipeline by ArthurZucker in 19482

4.23.0

Not secure
Whisper

The Whisper model was proposed in [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

* Add WhisperModel to transformers by ArthurZucker in 19166
* Add TF whisper by amyeroberts in 19378

Deformable DETR

The Deformable DETR model was proposed in [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original [DETR](https://huggingface.co/docs/transformers/model_doc/detr) by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

* Add Deformable DETR by NielsRogge in 17281
* [fix] Add DeformableDetrFeatureExtractor by NielsRogge in 19140

Conditional DETR

The Conditional DETR model was proposed in [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than [DETR](https://huggingface.co/docs/transformers/model_doc/detr).

* Add support for conditional detr by DeppMeng in 18948
* Improve conditional detr docs by NielsRogge in 19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).

* time series forecasting model by kashif in 17965

Masked Siamese Networks

The ViTMSN model was proposed in [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

* MSN (Masked Siamese Networks) for ViT by sayakpaul in 18815

MarkupLM

The MarkupLM model was proposed in [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to [LayoutLM](https://huggingface.co/docs/transformers/main/en/model_doc/layoutlm).

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: [WebSRC](https://x-lance.github.io/WebSRC/) and [SWDE](https://www.researchgate.net/publication/221299838_From_one_tree_to_a_forest_a_unified_solution_for_structured_web_data_extraction).

* Add MarkupLM by NielsRogge in 19198

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the [safetensors](https://github.com/huggingface/safetensors) library for that.

Support is for PyTorch models only at this stage, and still experimental.

* Poc to use safetensors by sgugger in 19175

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs.
:warning: The existing methods that are superseded by the introduced methods `post_process_object_detection`, `post_process_semantic_segmentation`, `post_process_instance_segmentation`, `post_process_panoptic_segmentation` are now deprecated.

* Improve DETR post-processing methods by alaradirik in 19205
* Beit postprocessing by alaradirik in 19099
* Fix BeitFeatureExtractor postprocessing by alaradirik in 19119
* Add post_process_semantic_segmentation method to SegFormer by alaradirik in 19072
* Add post_process_semantic_segmentation method to DPTFeatureExtractor by alaradirik in 19107
* Add semantic segmentation post-processing method to MobileViT by alaradirik in 19105
* Detr preprocessor fix by alaradirik in 19007
* Improve and fix ImageSegmentationPipeline by alaradirik in 19367
* Restructure DETR post-processing, return prediction scores by alaradirik in 19262
* Maskformer post-processing fixes and improvements by alaradirik in 19172
* Fix MaskFormer failing postprocess tests by alaradirik in 19354
* Fix DETR segmentation postprocessing output by alaradirik in 19363
* fix docs example, add object_detection to DETR docs by alaradirik in 19377

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

* 🚨🚨🚨 Fix ViT parameter initialization by alaradirik in 19341

Breaking change for the `top_p` argument of the `TopPLogitsWarper` of the `generate` method.

* 🚨🚨🚨 Optimize Top P Sampler and fix edge case by ekagra-ranjan in 18984

Model head additions

OPT and BLOOM now have question answering heads available.

* Add `OPTForQuestionAnswering` by clementapa in 19402
* Add `BloomForQuestionAnswering` by younesbelkada in 19310

Pipelines

There is now a zero-shot object detection pipeline.

* Add ZeroShotObjectDetectionPipeline by sahamrit in 18445)

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

* [TensorFlow] Adding GroupViT by ariG23498 in 18020

Bugfixes and improvements

* Fix a broken link for deepspeed ZeRO inference in the docs by nijkah in 19001
* [doc] debug: fix import by stas00 in 19042
* [bnb] Small improvements on utils by younesbelkada in 18646
* Update image segmentation pipeline test by amyeroberts in 18731
* Fix `test_save_load` for `TFViTMAEModelTest` by ydshieh in 19040
* Pin minimum PyTorch version for BLOOM ONNX export by lewtun in 19046
* Update serving signatures and make sure we actually use them by Rocketknight1 in 19034
* Move cache: expand error message by sgugger in 19051
* Fixing OPT fast tokenizer option. by Narsil in 18753
* Fix custom tokenizers test by sgugger in 19052
* Run `torchdynamo` tests by ydshieh in 19056
* [fix] Add DeformableDetrFeatureExtractor by NielsRogge in 19140
* fix arg name in BLOOM testing and remove unused arg document by shijie-wu in 18843
* Adds package and requirement spec output to version check exception by colindean in 18702
* fix `use_cache` by younesbelkada in 19060
* FX support for ConvNext, Wav2Vec2 and ResNet by michaelbenayoun in 19053
* [doc] Fix link in PreTrainedModel documentation by tomaarsen in 19065
* Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by jimypbr in 18746
* Organize test jobs by sgugger in 19058
* Automatically tag CLIP repos as zero-shot-image-classification by osanseviero in 19064
* Fix `LeViT` checkpoint by ydshieh in 19069
* TF: tests for (de)serializable models with resized tokens by gante in 19013
* Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by daspartho in 19039
* replace logger.warn by logger.warning by fxmarty in 19068
* Fix tokenizer load from one file by sgugger in 19073
* Note about developer mode by LysandreJik in 19075
* german autoclass by flozi00 in 19049
* Add tests for legacy load by url and fix bugs by sgugger in 19078
* Add runner availability check by ydshieh in 19054
* fix working dir by ydshieh in 19101
* Added type hints for TFConvBertModel by kishore-s-15 in 19088
* Added Type hints for VIT MAE by kishore-s-15 in 19085
* Add type hints for TF MPNet models by kishore-s-15 in 19089
* Added type hints to ResNetForImageClassification by kishore-s-15 in 19084
* added type hints by daspartho in 19076
* Improve vision models docs by NielsRogge in 19103
* correct spelling in README by flozi00 in 19092
* Don't warn of move if cache is empty by sgugger in 19109
* HPO: keep the original logic if there's only one process, pass the trial to trainer by sywangyi in 19096
* Add documentation of Trainer.create_model_card by sgugger in 19110
* Added type hints for YolosForObjectDetection by kishore-s-15 in 19086
* Fix the wrong schedule by ydshieh in 19117
* Change document question answering pipeline to always return an array by ankrgyl in 19071
* german processing by flozi00 in 19121
* Fix: update ltp word segmentation call in mlm_wwm by xyh1756 in 19047
* Add a missing space in a script arg documentation by bryant1410 in 19113
* Skip `test_export_to_onnx` for `LongT5` if `torch` < 1.11 by ydshieh in 19122
* Fix GLUE MNLI when using `max_eval_samples` by lvwerra in 18722
* [BugFix] Fix fsdp option on shard_grad_op. by ZHUI in 19131
* Fix FlaxPretTrainedModel pt weights check by mishig25 in 19133
* suppoer deps from github by lhoestq in 19141
* Fix dummy creation for multi-frameworks objects by sgugger in 19144
* Allowing users to use the latest `tokenizers` release ! by Narsil in 19139
* Add some tests for check_dummies by sgugger in 19146
* Fixed typo in generation_utils.py by nbalepur in 19145
* Add `accelerate` support for ViLT by younesbelkada in 18683
* TF: check embeddings range by gante in 19102
* Reduce LR for TF MLM example test by Rocketknight1 in 19156
* update perf_train_cpu_many doc by sywangyi in 19151
* fix: ckpt paths. by sayakpaul in 19159
* Fix TrainingArguments documentation by sgugger in 19162
* fix HPO DDP GPU problem by sywangyi in 19168
* [WIP] Trainer supporting evaluation on multiple datasets by timbmg in 19158
* Add doctests to Perceiver examples by stevenmanton in 19129
* Add offline runners info in the Slack report by ydshieh in 19169
* Fix incorrect comments about atten mask for pytorch backend by lygztq in 18728
* Fixed type hint for pipelines/check_task by Fei-Wang in 19150
* Update run_clip.py by enze5088 in 19130
* german training, accelerate and model sharing by flozi00 in 19171
* Separate Push CI images from Scheduled CI by ydshieh in 19170
* Remove pos arg from Perceiver's Pre/Postprocessors by aielawady in 18602
* Use `assertAlmostEqual` in `BloomEmbeddingTest.test_logits` by ydshieh in 19200
* Move the model type check by ankrgyl in 19027
* Use repo_type instead of deprecated datasets repo IDs by sgugger in 19202
* Updated hf_argparser.py by IMvision12 in 19188
* Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by ydshieh in 19203
* Fix cached_file in offline mode for cached non-existing files by sgugger in 19206
* Remove unused `cur_len` in generation_utils.py by ekagra-ranjan in 18874
* add wav2vec2_alignment by arijitx in 16782
* add doc for hyperparameter search by sywangyi in 19192
* Add a use_parallel_residual argument to control the residual computing way by NinedayWang in 18695
* translated add_new_pipeline by nickprock in 19215
* More tests for regression in cached non existence by sgugger in 19216
* Use `math.pi` instead of `torch.pi` in `MaskFormer` by ydshieh in 19201
* Added tests for yaml and json parser by IMvision12 in 19219
* Fix small use_cache typo in the docs by ankrgyl in 19191
* Generate: add warning when left padding should be used by gante in 19067
* Fix deprecation warning for return_all_scores by ogabrielluiz in 19217
* Fix doctest for `TFDeiTForImageClassification` by ydshieh in 19173
* Document and validate typical_p in generation by mapmeld in 19128
* Fix trainer seq2seq qa.py evaluate log and ft script by iamtatsuki05 in 19208
* Fix cache names in CircleCI jobs by ydshieh in 19223
* Move AutoClasses under Main Classes by stevhliu in 19163
* Focus doc around preprocessing classes by stevhliu in 18768
* Fix confusing working directory in Push CI by ydshieh in 19234
* XGLM - Fix Softmax NaNs when using FP16 by gsarti in 18057
* Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by michaelbenayoun in 19233
* Fix `m2m_100.mdx` doc example missing `labels` by Mustapha-AJEGHRIR in 19149
* Fix opt softmax small nit by younesbelkada in 19243
* Use `hf_raise_for_status` instead of deprecated `_raise_for_status` by Wauplin in 19244
* Fix TrainingArgs argument serialization by atturaioe in 19239
* Fix test fetching for examples by sgugger in 19237
* Cast TF generate() inputs by Rocketknight1 in 19232
* Skip pipeline tests by sgugger in 19248
* Add job names in Past CI artifacts by ydshieh in 19235
* Update Past CI report script by ydshieh in 19228
* [Wav2Vec2] Fix None loss in doc examples by rbsteinm in 19218
* Catch `HFValidationError` in `TrainingSummary` by ydshieh in 19252
* Add expected output to the sample code for `ViTMSNForImageClassification` by sayakpaul in 19183
* Add stop sequence to text generation pipeline by KMFODA in 18444
* Add notebooks by JingyaHuang in 19259
* Add `beautifulsoup4` to the dependency list by ydshieh in 19253
* Fix Encoder-Decoder testing issue about repo. names by ydshieh in 19250
* Fix cached lookup filepath on windows for hub by kjerk in 19178
* Docs - Guide to add a new TensorFlow model by gante in 19256
* Update no_trainer script for summarization by divyanshugit in 19277
* Don't automatically add bug label by sgugger in 19302
* Breakup export guide by stevhliu in 19271
* Update Protobuf dependency version to fix known vulnerability by qthequartermasterman in 19247
* Update README.md by ShubhamJagtap2000 in 19309
* [Docs] Fix link by patrickvonplaten in 19313
* Fix for sequence regression fit() in TF by Rocketknight1 in 19316
* Added Type hints for LED TF by IMvision12 in 19315
* Added type hints for TF: rag model by debjit-bw in 19284
* alter retrived to retrieved by gouqi666 in 18863
* ci(stale.yml): upgrade actions/setup-python to v4 by oscard0m in 19281
* ci(workflows): update actions/checkout to v3 by oscard0m in 19280
* wrap forward passes with torch.no_grad() by daspartho in 19279
* wrap forward passes with torch.no_grad() by daspartho in 19278
* wrap forward passes with torch.no_grad() by daspartho in 19274
* wrap forward passes with torch.no_grad() by daspartho in 19273
* Removing BertConfig inheritance from LayoutLMConfig by arnaudstiegler in 19307
* docker-build: Update actions/checkout to v3 by Sushrut1101 in 19288
* Clamping hidden state values to allow FP16 by SSamDav in 19229
* Remove interdependency from OpenAI tokenizer by E-Aho in 19327
* removing XLMConfig inheritance from FlaubertConfig by D3xter1922 in 19326
* Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by divyanshugit in 19331
* Remove bert interdependency from clip tokenizer by shyamsn97 in 19332
* [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by D3xter1922 in 19330
* Making camembert independent from roberta, clean by Mustapha-AJEGHRIR in 19337
* Add sudachi and jumanpp tokenizers for bert_japanese by r-terada in 19043
* Frees LongformerTokenizer of the Roberta dependency by srhrshr in 19346
* Change `BloomConfig` docstring by younesbelkada in 19336
* Test failing test while we resolve the issue. by sgugger in 19355
* Call _set_save_spec() when creating TF models by Rocketknight1 in 19321
* correct typos in README by paulaxisabel in 19304
* Removes Roberta and Bert config dependencies from Longformer by srhrshr in 19343
* Fix gather for metrics by muellerzr in 19360
* Fix pipeline tests for Roberta-like tokenizers by sgugger in 19365
* Change link of repojacking vulnerable link by Ilaygoldman in 19393
* Making `ConvBert Tokenizer` independent from `bert Tokenizer` by IMvision12 in 19347
* Fix gather for metrics by muellerzr in 19389
* Added Type hints for XLM TF by IMvision12 in 19333
* add ONNX support for swin transformer by bibhabasumohapatra in 19390
* removes prophet config dependencies from xlm-prophet by srhrshr in 19400
* Added type hints for TF: TransfoXL by thliang01 in 19380
* HF <-> megatron checkpoint reshaping and conversion for GPT by pacman100 in 19317
* Remove unneded words from audio-related feature extractors by osanseviero in 19405
* edit: cast attention_mask to long in DataCollatorCTCWithPadding by ddobokki in 19369
* Copy BertTokenizer dependency into retribert tokenizer by Davidy22 in 19371
* Export TensorFlow models to ONNX with dynamic input shapes by dwyatte in 19255
* update attention mask handling by ArthurZucker in 19385
* Remove dependency of Bert from Squeezebert tokenizer by rchan26 in 19403
* Removed Bert and XML Dependency from Herbert by harry7337 in 19410
* Clip device map by patrickvonplaten in 19409
* Remove Dependency between Bart and LED (slow/fast) by Infrared1029 in 19408
* Removed `Bert` interdependency in `tokenization_electra.py` by OtherHorizon in 19356
* Make `Camembert` TF version independent from `Roberta` by Mustapha-AJEGHRIR in 19364
* Removed Bert dependency from BertGeneration code base. by Threepointone4 in 19370
* Rework pipeline tests by sgugger in 19366
* Fix `ViTMSNForImageClassification` doctest by ydshieh in 19275
* Skip `BloomEmbeddingTest.test_embeddings` for PyTorch < 1.10 by ydshieh in 19261
* remove RobertaConfig inheritance from MarkupLMConfig by D3xter1922 in 19404
* Backtick fixed (paragraph 68) by kant in 19440
* Fixed duplicated line (paragraph 83) Documentation: sgugger by kant in 19436
* fix marianMT convertion to onnx by kventinel in 19287
* Fix typo in image-classification/README.md by zhawe01 in 19424
* Stop relying on huggingface_hub's private methods by LysandreJik in 19392
* Add onnx support for VisionEncoderDecoder by mht-sharma in 19254
* Remove dependency of Roberta in Blenderbot by rchan26 in 19411
* fix: renamed variable name by ariG23498 in 18850
* Fix the error message in run_t5_mlm_flax.py by yangky11 in 19282
* Add Italian translation for `add_new_model.mdx` by Steboss89 in 18713
* Fix momentum and epsilon values by amyeroberts in 19454
* Generate: corrected exponential_decay_length_penalty type hint by ShivangMishra in 19376
* Fix misspelled word in docstring by Bearnardd in 19415
* Fixed a non-working hyperlink in the README.md file by MikailINTech in 19434
* fix by ydshieh in 19469
* wrap forward passes with torch.no_grad() by daspartho in 19439
* wrap forward passes with torch.no_grad() by daspartho in 19438
* wrap forward passes with torch.no_grad() by daspartho in 19416
* wrap forward passes with torch.no_grad() by daspartho in 19414
* wrap forward passes with torch.no_grad() by daspartho in 19413
* wrap forward passes with torch.no_grad() by daspartho in 19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* flozi00
* german autoclass (19049)
* correct spelling in README (19092)
* german processing (19121)
* german training, accelerate and model sharing (19171)
* DeppMeng
* Add support for conditional detr (18948)
* sayakpaul
* MSN (Masked Siamese Networks) for ViT (18815)
* fix: ckpt paths. (19159)
* Add expected output to the sample code for `ViTMSNForImageClassification` (19183)
* IMvision12
* Updated hf_argparser.py (19188)
* Added tests for yaml and json parser (19219)
* Added Type hints for LED TF (19315)
* Making `ConvBert Tokenizer` independent from `bert Tokenizer` (19347)
* Added Type hints for XLM TF (19333)
* ariG23498
* [TensorFlow] Adding GroupViT (18020)
* fix: renamed variable name (18850)
* Mustapha-AJEGHRIR
* Fix `m2m_100.mdx` doc example missing `labels` (19149)
* Making camembert independent from roberta, clean (19337)
* Make `Camembert` TF version independent from `Roberta` (19364)
* D3xter1922
* removing XLMConfig inheritance from FlaubertConfig (19326)
* [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (19330)
* remove RobertaConfig inheritance from MarkupLMConfig (19404)
* srhrshr
* Frees LongformerTokenizer of the Roberta dependency (19346)
* Removes Roberta and Bert config dependencies from Longformer (19343)
* removes prophet config dependencies from xlm-prophet (19400)
* sahamrit
* [WIP] Add ZeroShotObjectDetectionPipeline (18445) (18930)
* Davidy22
* Copy BertTokenizer dependency into retribert tokenizer (19371)
* rchan26
* Remove dependency of Bert from Squeezebert tokenizer (19403)
* Remove dependency of Roberta in Blenderbot (19411)
* harry7337
* Removed Bert and XML Dependency from Herbert (19410)
* Infrared1029
* Remove Dependency between Bart and LED (slow/fast) (19408)
* Steboss89
* Add Italian translation for `add_new_model.mdx` (18713)

4.22.2

Not secure
Fixes a bug where a cached tokenizer/model was not accessible anymore offline (either forcing offline mode or because of an internet issue).

- More tests for regression in cached non existence by sgugger in 19216
- Fix cached_file in offline mode for cached non-existing files by sgugger in 19206
- Don't warn of move if cache is empty by sgugger in 19109

4.22.1

Not secure
Patch release for the following PRs:

- [Add tests for legacy load by url and fix bugs (](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)[#19078](https://github.com/huggingface/transformers/pull/19078)[)](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)
- [Note about developer mode (](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)[#19075](https://github.com/huggingface/transformers/pull/19075)[)](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)
- [Fix tokenizer load from one file (](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)[#19073](https://github.com/huggingface/transformers/pull/19073)[)](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)
- [Fixing OPT fast tokenizer option. (](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)[#18753](https://github.com/huggingface/transformers/pull/18753)[)](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)
- [Move cache: expand error message (](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)[#19051](https://github.com/huggingface/transformers/pull/19051)[)](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)

Page 14 of 33

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.