Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 30

4.27.1

Not secure
Patches two unwanted breaking changes that were released in v4.27.0:
- Revert 22152 MaskedImageCompletionOutput changes (22187)
- Regression pipeline device (22190)

4.27.0

Not secure
BridgeTower

The goal of this model is to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder thus achieving remarkable performance on various downstream tasks with almost negligible additional performance and computational costs.

* Add BridgeTower model by abhiwand in 20775
* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval by abhiwand in 21684
* [WIP] Add BridgeTowerForContrastiveLearning by abhiwand in 21964

Whisper speedup

The Whisper model was integrated a few releases ago. This release offers significant performance optimizations when generating with timestamps. This was made possible by rewriting the `generate()` function of `Whisper`, which now uses the `generation_config` and implementing a batched timestamp prediction. The `language` and `task` can now also be setup when calling `generate()`. For more details about this refactoring checkout [this colab](https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor#scrollTo=Ca4YYdtATxzo).
Notably, whisper is also now supported in `Flax` 🚀 thanks to andyehrenberg ! More whisper related commits:

* [Whisper] Refactor whisper by ArthurZucker in 21252
* [WHISPER] Small patch by ArthurZucker in 21307
* [Whisper] another patch by ArthurZucker in 21324
* add flax whisper implementation by andyehrenberg in 20479
* Add WhisperTokenizerFast by jonatanklosko in 21222
* Remove CLI spams with Whisper FeatureExtractor by qmeeus in 21267
* Update document of WhisperDecoderLayer by ling0322 in 21621
* [WhisperModel] fix bug in reshaping labels by jonatasgrosman in 21653
* [Whisper] Add SpecAugment by bofenghuang in 21298
* Fix-ci-whisper by ArthurZucker in 21767
* Fix `WhisperModelTest` by ydshieh in 21883
* [Whisper] Add rescaling function with `do_normalize` by ArthurZucker in 21263
* Refactor whisper asr pipeline to include language too. by Narsil in 21427
* Update `model_split_percents` for `WhisperModelTest` by ydshieh in 21922
* [Whisper] Fix feature normalization in `WhisperFeatureExtractor` by bofenghuang in 21938
* [Whisper] Add model for audio classification by sanchit-gandhi in 21754
* fixes the gradient checkpointing of whisper by soma2000-lang in 22019
* Skip 3 tests for `WhisperEncoderModelTest` by ydshieh in 22060
* [Whisper] Remove embed_tokens from encoder docstring by sanchit-gandhi in 21996
* [`Whiper`] add `get_input_embeddings` to `WhisperForAudioClassification` by younesbelkada in 22133
* [🛠️] Fix-whisper-breaking-changes by ArthurZucker in 21965

DETA

DETA (short for Detection Transformers with Assignment) improves [Deformable DETR](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/deformable_detr) by replacing the one-to-one bipartite Hungarian matching loss with one-to-many label assignments used in traditional detectors with non-maximum suppression (NMS). This leads to significant gains of up to 2.5 mAP.

* Add DETA by NielsRogge in 20983

SpeechT5

The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.

* add SpeechT5 model by hollance in 18922

XLM-V

XLM-V is multilingual language model with a one million token vocabulary trained on 2.5TB of data from Common Crawl (same as XLM-R).

* Add XLM-V to Model Doc by stefan-it in 21498

BLIP-2

BLIP-2 leverages frozen pre-trained image encoders and large language models (LLMs) by training a lightweight, 12-layer Transformer encoder in between them, achieving state-of-the-art performance on various vision-language tasks. Most notably, BLIP-2 improves upon [Flamingo](https://arxiv.org/abs/2204.14198), an 80 billion parameter model, by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.

* Add BLIP-2 by NielsRogge in 21441

X-MOD

X-MOD extends multilingual masked language models like [XLM-R](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/xlm-roberta) to include language-specific modular components (language adapters) during pre-training. For fine-tuning, the language adapters in each transformer layer are frozen.

* Add X-MOD by jvamvas in 20939

Ernie-M

ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.

* Add Ernie-M Model to huggingface by susnato in 21349

TVLT

The Textless Vision-Language Transformer (TVLT) is a model that uses raw visual and audio inputs for vision-and-language representation learning, without using text-specific modules such as tokenization or automatic speech recognition (ASR). It can perform various audiovisual and vision-language tasks like retrieval, question answering, etc.

* Add TVLT by zinengtang in 20725

CLAP

CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score.

* [CLAP] Add CLAP to the library by ArthurZucker in 21370
* [`CLAP`] Fix few broken things by younesbelkada in 21670

GPTSAN

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.

* add GPTSAN model (reopen) by tanreinama in 21291

EfficientNet

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.

* Add EfficientNet by alaradirik in 21563

ALIGN

ALIGN is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. ALIGN features a dual-encoder architecture with [EfficientNet](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/efficientnet) as its vision encoder and [BERT](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/bert) as its text encoder, and learns to align visual and text representations with contrastive learning. Unlike previous work, ALIGN leverages a massive noisy dataset and shows that the scale of the corpus can be used to achieve SOTA representations with a simple recipe.

* Add ALIGN to transformers by alaradirik in 21741

Informer

Informer is a method to be applied to long-sequence time-series forecasting. This method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

* [Time-Series] informer model by elisim in 21099

API updates and improvements

Safetensors

`safetensors` is a safe format of serialization of tensors, which has been supported in `transformers` as a first-class citizen for the past few versions.

This change enables explicitly forcing the `from_pretrained` method to use or not to use `safetensors`. This unlocks a few use-cases, notably the possibility to enforce loading *only* from this format, limiting security risks.

Example of usage:

python
from transformers import AutoModel

As of version v4.27.0, this loads the `pytorch_model.bin` by default if `safetensors` is not installed.
It loads the `model.safetensors` file if `safetensors` is installed.
model = AutoModel.from_pretrained('bert-base-cased')

This forces the load from the `model.safetensors` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=True)

This forces the load from the `pytorch_model.bin` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=False)


* [Safetensors] Add explicit flag to from pretrained by patrickvonplaten in 22083

Variant

This PR adds a "variant" keyword argument to PyTorch's from_pretrained and save_pretrained so that multiple weight variants can be saved in the model repo.

Example of usage with the model hosted in [this folder on the Hub](https://huggingface.co/huggingface/the-no-branch-repo/tree/main/text_encoder):

python
from transformers import CLIPTextModel

path = "huggingface/the-no-branch-repo" or ./text_encoder if local

Loads the `no_ema` variant. This loads the `pytorch_model.fp16.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder", variant="fp16")

This loads the no-variant checkpoint, loading the `pytorch_model.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder")


* Add variant to transformers by patrickvonplaten in 21332
* [Variant] Make sure variant files are not incorrectly deleted by patrickvonplaten in 21562

bitsandbytes

The `bitsandbytes` integration is overhauled, now offering a new configuration: the `BytsandbytesConfig`.

Read more about it in the [documentation](https://huggingface.co/docs/transformers/v4.27.0/en/main_classes/quantization#bitsandbytes-integration).

* [`bnb`] Introducing `BitsAndBytesConfig` by younesbelkada in 21579
* [`bnb`] fix `bnb` decoders bug by younesbelkada in 21688

FSDP

This PR enables the user to make use of the [PyTorch/XLA implementation of FSDP](https://github.com/pytorch/xla/tree/master/torch_xla/distributed/fsdp), including the newly added [auto-wrap feature](https://github.com/pytorch/xla/pull/4318). Four arguments have been added to `training_args.py` to facilitate this functionality:

- `xla_fsdp`: this flag is a string containing the location of a `.json` file which specifies the FSDP arguments the user wants to use when wrapping their model.
- `xla_fsdp_min_num_params`: this flag is an int which will set a size-based automatic wrapping policy which automatically FSDP wraps any module with at least `xla_fsdp_min_num_params` many parameters.
- `xla_fsdp_transformer_layer_cls_to_wrap`: this flag is a list of (case-sensitive) strings which will set a layer-class-based automatic wrapping policy which automatically FSDP wraps any module whose name matches one of the listed strings.
- `xla_fsdp_grad_ckpt`: this flag is a bool which determines whether gradient checkpointing is enabled for the automatically wrapped layers.

* Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) by AlexWertheim in 21406

Breaking changes

*Generate*

This PR standardizes beam search behavior across all three frameworks through `early_stopping`. PyTorch is unchanged, but TensorFlow and Flax users will see a significant speedup if they keep the default generation parameters.

There are, however, minor differences in outputs of the `.generate` method with beam search on TensorFlow and Flax. It should be very small and will come with significant speedups, but in case it breaks your workflow, we recommend you downgrade to a previous version and let us know in a GitHub issue so that we may investigate what is going on.

* 🚨🚨 Generate: standardize beam search behavior across frameworks by gante in 21368

*Single model initialization*

Model initialization has problems which led to the initialization being incoherent across models and across initialization techniques. This is technically a bugfix, but as it may result in your models being initialized with different values, we think it best to highlight it here.

* 🚨🚨🚨 Enforce single model initialization by sgugger in 21431

Deprecations

This PR deprecated the `parallelize` API which has been replaced by `accelerate` months ago. We recommend loading the model using the `device_map` attribute and setting it to `balanced` to obtain the previous behavior.

Setting your own `device_map` is still permitted, but it needs to be a dictionary from module name to device, for example:

py
device_map = {'h.0': 0, 'h.1': 1, ...}


* Deprecate parallelize API by sgugger in 21448

Pipelines

A new pipeline focused on zero-shot audio classification is added to the repository.

* [Pipeline] Add zero shot audio classification pipeline by ArthurZucker in 21600

Documentation

The task and model summaries have been refactored to take into account the larger number of tasks and models we now have.

* Update task summary by stevhliu in 21067
* Refactor model summary by stevhliu in 21408

Bugfixes and improvements

* [`t5`] Fix T5 inference in `float16` + `bnb` error by younesbelkada in 21281
* [examples/deepspeed] fix renamed api by stas00 in 21283
* [GenerationConfig] add additional kwargs handling by ArthurZucker in 21269
* [W2V2 with LM] Fix decoder test with params by sanchit-gandhi in 21277
* Fix `TrainingArguments.label_names` docs to reflect the correct default value behaviour by fredtcaroli in 21288
* Update expected values for doctest by stevhliu in 21284
* [GIT] Add test for batched generation by NielsRogge in 21282
* Supporting `ImageProcessor` in place of `FeatureExtractor` for pipelines by Narsil in 20851
* [Mask2Former] Add doc tests by NielsRogge in 21232
* Moving to cleaner tokenizer version or `oneformer`. by Narsil in 21292
* Fix `EfficientFormer` by ydshieh in 21294
* [Hubert] Fix Hubert processing auto by younesbelkada in 21299
* Update `OneFormerModelIntegrationTest` expected values by ydshieh in 21295
* [Doctest] Fix `Blenderbot` doctest by younesbelkada in 21297
* Documentation code sample fixes by MKhalusova in 21302
* [CI-Daily] replace `past` in prepare inputs for generation by ArthurZucker in 21296
* Small fix to ExponentialDecayLengthPenalty docstring by njhill in 21308
* Accept batched tensor of images as input to image processor by amyeroberts in 21144
* Use `model_class.__name__` and compare against `XXX_MAPPING_NAMES` by ydshieh in 21304
* Fix 2 paths in the doctest list by ydshieh in 21314
* [i18n-KO] Translated quicktour page to Korean by wonhyeongseo in 20946
* Small QoL for qa. by Narsil in 21316
* check paths in `utils/documentation_tests.txt` by ydshieh in 21315
* Fix `TFEncoderDecoder` tests by ydshieh in 21301
* Generate: better `compute_transition_scores` examples by gante in 21323
* [Doctest] Fix `Perceiver` doctest by younesbelkada in 21318
* Update Hebrew language code to he per IANA registry by altryne in 21310
* Fix M2M100 positional embedding creation for ONNX by michaelbenayoun in 21328
* Fix `RobertaPreLayerNorm` doctest by ydshieh in 21337
* Little cleanup: let huggingface_hub manage token retrieval by Wauplin in 21333
* Automated compatible models list for task guides by MKhalusova in 21338
* Fix `GitModelIntegrationTest.test_batched_generation` device issue by ydshieh in 21362
* Pipeline testing - using tiny models on Hub by ydshieh in 20426
* fix the issue that the output dict of jit model could not get [0] by sywangyi in 21354
* Corrected by HsiangNianian in 21350
* Remove duplicate declarations in dummy inputs for TFLongformer by peakji in 21352
* Fix DETR tests after 21144 by amyeroberts in 21365
* Add cPython files in build by sgugger in 21372
* Generate: Relaxed `max_length` and `max_new_tokens` coexistence by gante in 21347
* Fixes path for Graphormer checkpoint by clefourrier in 21367
* Adding resource section to GPT-J docs by adit299 in 21270
* translate index to zh by bfss in 20095)
* [`run_(clm|mlm).py` examples] add streaming dataset support by stas00 in 21343
* Template for framework-agnostic tests by gante in 21348
* Cleanup the usage of `layer_norm_eps` in some models by ydshieh in 21336
* Do not log the generation config for each prediction step in TrainerSeq2Seq by regisss in 21385
* [Docs] Minor fixes by NielsRogge in 21383
* Simplify column_names in run_clm/mlm by lhoestq in 21382
* Add support of backward_prefetch and forward_prefetch by raghavanone in 21237
* Remove more unused attributes in config classes by ydshieh in 21327
* Generate: fix TF XLA tests on models with `max_position_embeddings` or `max_target_positions` by gante in 21389
* Update `Graphormer` and fix its `torchscript` test failures by ydshieh in 21380
* Moved LiLT under multimodal models in TOC by MKhalusova in 21393
* Fix the issue of using only inputs_embeds in convbert model by raghavanone in 21398
* Skip batches fast with accelerate by sgugger in 21390
* Added DagshubCallback by jinensetpal in 21404
* Add TF image classification example script by amyeroberts in 19956
* Generate: decoder-only models can generate with `inputs_embeds` by gante in 21405
* Use torch `1.13.1` in push/schedule CI by ydshieh in 21421
* Fix image_processor_class bug by shikhartuli in 21410
* Add distinct section names for PyTorch and TF by Rocketknight1 in 21422
* Add the GeLU activation from pytorch with the tanh approximation by jlamypoirier in 21345
* Fix Graphormer test suite by clefourrier in 21419
* [`bnb`] Fine-tuning HF 8-bit models by younesbelkada in 21290
* Allow to add more information in `is_flaky` by ydshieh in 21426
* Fix some pipeline tests by ydshieh in 21401
* Fix task guide formatting by stevhliu in 21409
* Fixes bug in the creation of ExponentialDecayLengthPenalty by jorgemcgomes in 21423
* Add `inputs_embeds` support for `.generate()` with BLOOM models by akreal in 21430
* Remove more unused attributes in config classes by ydshieh in 21392
* Added model resources for LayoutLM Issue19848 by avisinghal6 in 21377
* Fix device issue in a `ConvBertModelTest` test by ydshieh in 21438
* do not scale gradient in bf16 mode by kashif in 21428
* exclude deleted files in the fixup script by dtuit in 21436
* Add tutorial doc for TF + TPU by Rocketknight1 in 21429
* For IterableDataset, return DataLoader using self._train_batch_size. … by agossard in 21447
* Avoid flaky generation sampling tests by ydshieh in 21445
* Fix `SpeechT5ForSpeechToSpeechIntegrationTests` device issue by ydshieh in 21460
* Add perf numbers for perf_train_cpu by jianan-gu in 20974
* Added documentation for DagsHubCallback by jinensetpal in 21452
* Fix `PushToHubCallback` import in Share a model docs by ireneisdoomed in 21457
* Add VQGAN-CLIP research project by ErwannMillon in 21329
* Fixed RAG script which was failing on dummy example by kaustubhdhole in 21416
* make SpeechT5 doc examples deterministic by hollance in 21470
* Generate: TF can now accept custom logits processors by gante in 21454
* Removing `more_itertools` dependency. by Narsil in 21473
* [examples] improve block_size warning message by stas00 in 21463
* [i18n-fr] Translate index page to French by NoB0 in 21458
* OPT: BLIP2-ready `prepare_inputs_for_generation` by gante in 21477
* Add tips for generation with Int8 models by lewtun in 21424
* Update quality tooling for formatting by sgugger in 21480
* Fix epoch number when resuming training by sgugger in 21478
* [CI ] Remove `past` in favor of `pat_key_values` by ArthurZucker in 21443
* Generate: TF can now generate from embeddings in encoder-decoder models by gante in 21475
* [`Doc`] Fix int8 docs by younesbelkada in 21487
* changed "ot" to "to" by Iulian277 in 21488
* :pen: fix typo in pytorch semantic segmentation readme by jvdd in 21492
* Typos/fixes to link syntax by Rocketknight1 in 21450
* Sanity check the type of id2label and label2id arguments of from_pretrained for TokenClassification models by raghavanone in 21490
* [OPT] Adds `GPT2TokenizerFast` to the list of tokenizer to use for OPT. by ArthurZucker in 20823
* A new test to check config attributes being used by ydshieh in 21453
* Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug by raghavanone in 21489
* Cleanup quality by sgugger in 21493
* [tokenizer] sanitize saved config by stas00 in 21483
* Add inverse sqrt learning rate scheduler by Sager611 in 21495
* Check for mapping/dict in distributed_concat function by prajwal967 in 21500
* Fix import in Accelerate for find_exec_bs by sgugger in 21501
* Wrap RemBert integration test forward passes with torch.no_grad() by katiele47 in 21503
* Exclude the madeup words from M2M100Tokenizer.vocab_size by guillaumekln in 20976
* [Doc] Minor URL fixes in PyTorch Text Classification Readme by stefan-it in 21511
* Generate: TF `compute_transition_scores` by gante in 21341
* no more dummies for speech processors by hollance in 21517
* Update OPT conversion script to work for OPT-IML by thomasw21 in 21519
* [tests] add missing `report_to none` by stas00 in 21505
* Fixing backward compatiblity `image_processor` in pipeline. by Narsil in 21513
* Fix multiple `eos_token_id`s in model.generate(...) by tokestermw in 21461
* Add `__len__` method to `_LazyAutoMapping` by ydshieh in 21522
* Generate: make TF `.generate()` signature == PT `.generate()` signature by gante in 21525
* Generate: TF `.generate()` can now be exported with dynamic length by gante in 21474
* Fix missing unfinished_sequences by tokestermw in 21529
* Fix ClearML Integration to run in ClearML pipelines and external Tasks. by thepycoder in 21531
* Tag tests as slow ⌛ by gante in 21537
* fix typo in run_speech_recognition_ctc.py by 21jun in 21528
* Fix inclusion of non py files in package by sgugger in 21546
* Fix from_pretrained API with config and state_dict by sgugger in 21542
* Added with torch.no_grad() to XLM-Roberta integration test by katiele47 in 21547
* [`pipeline`] A simple fix for half-precision & 8bit models by younesbelkada in 21479
* Added with torch.no_grad() to Camembert integration test by katiele47 in 21544
* adding a tip for deepspeed integration in multi-node environment by izapolsk in 21459
* Fix stuff related to the causal_mask in CodeGen. by GeneZC in 21527
* Replace inefficient torch.sqrt taking scalar input with numpy.sqrt by FindHao in 21496
* Add _mp_fn to run_mae.py for XLA testing by steventk-g in 21551
* [Tests] Improve flax test_attention_outputs by Shubhamai in 21486
* [from_pretrained] extend `torch_dtype="auto"` to look up `config.torch_dtype` first, expand docs by stas00 in 21524
* [Tasks] Adds image captioning by sayakpaul in 21512
* Goodbye to Blip-2 doctests by ydshieh in 21566
* [deepspeed] deal with models w/o `config.hidden_size` by stas00 in 21504
* improving contributing tests section by Shubhamai in 21569
* Replace input_values_processing with unpack_inputs by amyeroberts in 21502
* Added timesformer configuration by AdiaWu in 21446
* Remove more unused attributes in config classes by ydshieh in 21543
* [`Blip2`] Add int8 support for `blip2-flan-t5-xxl` by younesbelkada in 21574
* Generate: TF supports multiple eos tokens by gante in 21571
* Add: document question answering task guide by MKhalusova in 21518
* CI: skip failing TF hubert test by gante in 21601
* Remove trailing 'extractive' word from en documentation by tpaviot in 21594
* [MINOR] Fix link in timeseries transformer docs by cakiki in 21602
* Add `inputs_embeds` support when generating with GPT-J by dimitry12 in 21575
* Generate: Fix flaky indexing error in `test_constrained_beam_search_generate_dict_output` by gante in 21561
* [`bnb`] Let's make the daily CI green 🍏 by younesbelkada in 21597
* annotated TFvisionEncoderDecoder input type hints by miyu386 in 21432
* Correct Markdown bullets indentation by wangkuiyi in 21583
* Add missing arguemtn to run_clip.py by WarrenGreen in 21588
* Fix Blip-2 CI by ydshieh in 21595
* Generate: correct default model input creation for decoder-only models by gante in 21580
* [i18n-fr] Translate quicktour page to French by NoB0 in 21589
* Update setup.py by stas00 in 21584
* [deepspeed] performance docs by stas00 in 21573
* Clarify available pipelines in quicktour by stevhliu in 21607
* Fix env. variable type issue in testing by ydshieh in 21609
* Fix TF CTC tests by gante in 21606
* Add in big model inference to issue template by muellerzr in 21611
* Enable `requires_grad` on input embedding to train on top of frozen layers by younesbelkada in 21598
* Generate: filter encoder inputs when its signature does not accept wildcards by gante in 21603
* Generate: input expansion for any model input by gante in 21624
* Final cleanup of TOKENIZER_FOR_DOC by sgugger in 21565
* Remove Niels from templates by sgugger in 21564
* Fix generation config for empty state dict by sgugger in 21630
* Removes duplicate computations in DETR post processing by eclique in 21592
* Fix typo in documentation. by mmcdermott in 21632
* Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) by BenoitDalFerro in 21627
* Fix typo in QA task guide by stevhliu in 21608
* fix: Race Condition when using Sagemaker Checkpointing and Model Repository by DougTrajano in 21614
* Remove extra "`max_length` is reached." from InfNaNLogitsProcessor documentation by mmcdermott in 21634
* Fix Blip-2 CI again by ydshieh in 21637
* Skip wav2vec2 hubert high mem tests by amyeroberts in 21643
* Fix passing kwargs to TFBertTokenizer by balvisio in 21619
* Skipping more high mem tests - Wav2Vec2 Hubert by amyeroberts in 21647
* Pass parent exception as context exception to provide clearer stack trace by balvisio in 21636
* Generate: PT Dynamo without graph breaks in the main greedy/sample loop by gante in 21648
* Update deprecated load_module by sgugger in 21651
* Fix typos in contrastive-image-text example README by regisss in 21665
* [WIP] Move X-MOD models to facebook organization by jvamvas in 21640
* refactor: Make direct_transformers_import util by connor-henderson in 21652
* [bloom] gradient_checkpointing fix by stas00 in 21655
* Add OPT resources to the transformers documentation by alissadb in 21625
* Adapt PerceiverIO Multimodal class to work with arbitrary modalities by stevenmanton in 20054
* Fix multi-gpu training error for LayoutLMv2 by akkikiki in 21675
* Generate: eta sampling numerical stability by gante in 21676
* [`ImageProcessor`] Refactor default `mean` & `std` to `OPENAI_CLIP_MEAN` & `OPENAI_CLIP_STD` by younesbelkada in 21425
* [`BLIP`] update blip path on slow tests by younesbelkada in 21476
* Fix dynamic module import error by ydshieh in 21646
* Fix for non-contiguous label tensors in VisonEncoderDecoder by morganmcg1 in 21582
* Fix-rag-finetune-project-requirement by ArthurZucker in 21697
* Pass along revision in dynamic code fetch by sgugger in 21698
* Fix axial positional encoding calculations for reformer.mdx by ijindal in 21649
* remove position ids and token type ids from forward args in docstring by ArthurZucker in 21701
* Fix typo in `PROCESSOR_MAPPING_NAMES` and add tests by ydshieh in 21703
* Fix `get_class_in_module` by ydshieh in 21709
* Fix TVLT (torch device issue) by ydshieh in 21710
* Adding task guides to resources by MKhalusova in 21704
* Adding type hints to call() functions in this file by mollerup23 in 21548
* Time series transformer: input projection and Std scaler by kashif in 21020
* Apply ruff flake8-comprehensions by Skylion007 in 21694
* [`MBart`] Fix cross attention mask check by younesbelkada in 21730
* Respect documentation on passive log level by sgugger in 21700
* Remove `gptsan_japanese` from doctest list to avoid GPU OOM by ydshieh in 21722
* Change doc example for `BigBirdForQuestionAnswering` by ydshieh in 21723
* Fix `ErnieMEmbeddings` device issue by ydshieh in 21726
* Fix `GPTSanJapaneseModel` by ydshieh in 21731
* [SpeechT5HifiGan] Handle batched inputs by sanchit-gandhi in 21702
* Fix to KerasMetricCallback when the model returns unstructured output by Rocketknight1 in 21727
* Added "Open in Colab" to task guides by MKhalusova in 21729
* typos in french documentation by tpaviot in 21750
* Make ImageProcessorMixin compatible with subfolder kwarg by Abhinay1997 in 21725
* Update doctest GH workflow file by ydshieh in 21744
* Fix 2 quicktour file doctest by ydshieh in 21742
* [`GPTNeo`] Fix gradient checkpointing bug by younesbelkada in 21733
* Generate: Fix GIT batched captioning by gante in 21738
* Added Type Hints for modeling_tf_encoder_decoder.py by Batese2001 in 21673
* Auto api Value Error addition to Troubleshoot by MKhalusova in 21708
* [deepspeed tests] fix issues introduced by 21700 by stas00 in 21769
* Graphormer fix by clefourrier in 21699
* fix: Change is_last chunk calc and add conditional break in chunk_iter by connor-henderson in 21612
* [Flax] adding support for batch norm layers by Shubhamai in 21581
* [Examples] Generalise run audio classification for log-mel models by sanchit-gandhi in 21756
* Different behavior in DistilBERT when using "inputs_embeds" by ArthurZucker in 21752
* [Flax] Fix erroneous kwargs being passed to generate config by sanchit-gandhi in 21765
* Generate - update cookie cutters to not initialize cache with training and gradient checkpointing by gante in 21759
* [time series] updated expected values for integration test. by kashif in 21762
* [GPT2, ProphetNet] Fix gradient checkpointing bug by yhl48 in 21772
* [SpeechT5] Fix HiFiGAN tests by sanchit-gandhi in 21788
* Fix resume_from_checkpoint for deepspeed by mosheber in 21735
* [examples/summarization] deal with `max_length` and `num_beams` by bofenghuang in 21740
* Fix type in gpt2 config docstring by WeberJulian in 21782
* Fix en documentation typos by tpaviot in 21799
* [FX tracer] Make `concrete_args` from outside available by lygztq in 21775
* [torch] remove deprecated uint8 in favor of bool by ArthurZucker in 21384
* [`tests`] add `accelerate` marker by younesbelkada in 21743
* Fix PyTorch Perceiver `PerceiverFourierPositionEncoding` with fp16 by fxmarty in 21787
* Fix nn.init.trunc_normal_ call on torch.float16 data by fxmarty in 21789
* Fix gradient checkpointing bug in gptneox by KMFODA in 21815
* Inheritance-based framework detection by gante in 21784
* Fix quality with `ruff==0.0.253` by ydshieh in 21828
* introduce `logger.warning_once` and use it for grad checkpointing code by stas00 in 21804
* Rename `MobileViTModelTest` to `TFMobileViTModelTest` by ydshieh in 21825
* Fix gradient checkpointing bug BioGpt by saswatmeher in 21844
* check for None forced tokens by andyehrenberg in 21793
* Fix gradient checkpointing bug in git by KMFODA in 21818
* Fix gradient checkpointing imagegpt by KMFODA in 21816
* Fix tf random token masking probability in data collator by anruijian in 21834
* [`T5`] Fix torchquant issue by younesbelkada in 21843
* [`Blip2`] Add `Blip2Model` by younesbelkada in 21817
* Fix the issue of blip model returning loss even when the label is not provided. by raghavanone in 21811
* [GPTJ] Fix gradient checkpointing bug by krypticmouse in 21794
* Add: task guide for zero shot object detection by MKhalusova in 21829
* Make Slack CI reporting stronger by ydshieh in 21823
* [`Blip2`] Fix Blip-2 multi gpu by younesbelkada in 21707
* 🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 by ydshieh in 21516
* Improve TF weight loading, especially PT crossloading by Rocketknight1 in 21792
* Fix flaky test for log level by sgugger in 21776
* prepare for "__floordiv__ is deprecated and its behavior will change in a future version of pytorch" by ArthurZucker in 20211
* [ConvBert] Fix 21523 by ArthurZucker in 21849
* Flax beam search fix by andyehrenberg in 21857
* Fix gradient checkpointing bug Bart by saswatmeher in 21866
* [deepspeed] check whether model is NLP one instead of counting on input type by izapolsk in 21800
* Change the way tensor is reshaped in BartAttention (from .view to .reshape) by raghavanone in 21860
* Italian translation of community.mdx by lorenzobalzani in 21871
* [`Blip`] Fix blip doctest by younesbelkada in 21868
* Removed BLIP mention from the troubleshooting guide by MKhalusova in 21872
* update FSDP and add XLA-FSDP documentation by pacman100 in 21812
* [doc] deepspeed tests by stas00 in 21859
* Add an utility file to get information from test files by ydshieh in 21856
* Add check for different embedding types in examples by Rocketknight1 in 21881
* Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights by twaka in 21879
* Fix Gradient checkpointing bug BigBird by saswatmeher in 21882
* Fix `test_load_default_pipelines_pt` for `ClapModel` by ydshieh in 21886
* fix checkpoint by ArthurZucker in 21874
* [Refactor] Relative imports wherever we can by ArthurZucker in 21880
* [ZAC] fix ci daily by ArthurZucker in 21893
* Use PyAV instead of Decord in examples by amyeroberts in 21572
* Add `inputs_embeds` functionality when generating with BioGPT by sidkiblawi in 21889
* [T5 doc] Fix confusing documentation about `d_kv` by ArthurZucker in 21896
* fix typo in Bart's attention by kashif in 21898
* [GPT-J] add deprecation warning by ArthurZucker in 21869
* fsdp bf16 enable autocast by pacman100 in 21847
* Fix gradient checkpointing bug LED by KMFODA in 21840
* Fix gradient checkpointing bug M2M 100 by KMFODA in 21841
* Fix gradient checkpointing bug marian by KMFODA in 21842
* Mark pipeline tests to skip them easily by sgugger in 21887
* Clean up auto mapping names by ydshieh in 21903
* Prophetnet batch dimension inversion fix by kiansierra in 21870
* Make schedulers picklable by making lr_lambda fns global by connor-henderson in 21768
* Add Blip and Blip2 for pipeline tests by ydshieh in 21904
* Temporarily skip 3 tests in `BridgeTowerModelTest` by ydshieh in 21908
* Faster zero shot image by Narsil in 21897
* [time series] Add Time series inputs tests by kashif in 21846
* Avoid modeling tests run in pipeline CI jobs by ydshieh in 21911
* Fix doctests for TFVisionTextDualEncoder by Rocketknight1 in 21910
* faster forward following what is done for images by ArthurZucker in 21906
* Fix gradient checkpointing bug in MBart by KMFODA in 21918
* Fix gradient checkpointing bug in mvp by KMFODA in 21920
* Fix gradient checkpointing megatron bert by KMFODA in 21921
* Use large VM for `repo_utils_job` by ydshieh in 21928
* Cleanup more auto mapping names by ydshieh in 21909
* feat: filter try/except when looking at custom code by zanussbaum in 21914
* Fix `AlignModelTest` tests by ydshieh in 21923
* Avoid failure in `check_repo.py` due to missing backends by ydshieh in 21930
* Fix wrong documentation about DataCollator padding defaults by substanc3-dev in 21919
* [Flan-UL2] Add-flan-ul2 by ArthurZucker in 21929
* Update README logo by gary149 in 21933
* [CLAP] Support batched inputs for CLAP. Fixes pipeline issues by ArthurZucker in 21931
* Fix gradient checkpointing bug in OPT by KMFODA in 21943
* Fix gradient checkpointing bug in Pegasus by KMFODA in 21944
* Fix gradient checkpointing bug in Rembert by KMFODA in 21945
* Fix gradient checkpointing bug in Roformer by KMFODA in 21946
* Fixed gradient_checkpointing/use_cache bug in blenderbot by Batese2001 in 21833
* Update expected values in `XLMProphetNetModelIntegrationTest` by ydshieh in 21957
* [CI] Fix ci by ArthurZucker in 21940
* Disable DDP for neuron by aws-sangeetha in 21953
* Fix bert issue by saswatmeher in 21963
* [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM by asrimanth in 21956
* Add missing parameter definition in layoutlm config by Atomnp in 21960
* Use larger atol in `torch.allclose` for some tests by ydshieh in 21966
* Add TF contrastive image text finetuning example by Rocketknight1 in 21939
* Update expected values for `test_xglm_sample` by ydshieh in 21975
* Fix gradient checkpointing bug in BigBird Pegasus by KMFODA in 21976
* Fix gradient checkpointing bug in Blenderbot Small by KMFODA in 21977
* Fix gradient checkpointing bug in BlipText by KMFODA in 21978
* Fix gradient checkpointing bug in Codegen by KMFODA in 21979
* Fix gradient checkpointing bug in ESM by KMFODA in 21980
* docs: improve clarity for language modeling by pdhall99 in 21952
* Update `Jukebox` tests by ydshieh in 21984
* Add check before int casting for PIL conversion by amyeroberts in 21969
* Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens by eladsegal in 21959
* [DETR, YOLOS] Fix device bug by NielsRogge in 21974
* Remove unneeded casts to bool by regisss in 21983
* Update `notification_service.py` by ydshieh in 21992
* Skip `test_multi_gpu_data_parallel_forward` for some model tests by ydshieh in 21991
* Stop requiring Torch for our TF examples! by Rocketknight1 in 21997
* [TF] Fix creating a PR while pushing in TF framework by ArthurZucker in 21968
* [DETR and friends] Remove is_timm_available by NielsRogge in 21814
* Update tiny model creation script and some others files by ydshieh in 22006
* Generate - add 1 to cur_len to make up the new beam length by jimmieliu in 21993
* VideoMAE doctest - use valid dummy pixel values by amyeroberts in 22022
* update: bertology paper by QiushiSun in 22012
* Update `AudioClassificationPipelineTests::test_small_model_pt` for PT 2.0.0 by ydshieh in 22023
* [`bnb`] Fix bnb error message by younesbelkada in 22026
* Fix test for torchneuroncore in Trainer by sgugger in 22028
* Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline by anruijian in 22031
* [examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py by bofenghuang in 21942
* Avoid `text_config_dict` and `vision_config_dict` being saved for CLIP-like models by ydshieh in 22035
* Mark all `BridgeTower` tests slow for now by ydshieh in 22039
* Bug fix: token classification pipeline while passing offset_mapping by cceyda in 22034
* Update ALIGN docs by alaradirik in 22025
* [21737][T5]: Fix gradient checkpoint bug by nipunjindal in 22036
* Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it by shaun-scale in 22045
* Can't install tf2 on M1 Chip by default by shaun-scale in 22046
* Remove set_access_token usage + fail tests if FutureWarning by Wauplin in 22051
* Show the number of `huggingface_hub` warnings in CI report by ydshieh in 22054
* Return analysis for hyperparameter_search with Ray backend by anruijian in 22040
* pt-to-tf model architecture override by Rocketknight1 in 22055
* rm $ symbol from code block from contributing.md by kamalkraj in 22057
* [deepspeed] offload + non-cpuadam optimizer exception by stas00 in 22043
* Edit the docstring of `image_processing_donut` to match code by vermouthmjl in 22033
* Add setters by type of args to TrainingArguments by sgugger in 21570
* Update tiny model creation script by ydshieh in 22058
* Fix case when using --gradient_accumulation_steps with DDP disabled. by aws-sangeetha in 22007
* Add a progress bar for the total download of shards by sgugger in 22062
* Fix gradient checkpointing bug in Speech2Text by KMFODA in 22079
* Fix gradient checkpointing bug in switch transformer by KMFODA in 22081
* [GPT2] Propose fix for 21080 by ArthurZucker in 21853
* Fix small typo in flan-ul2.mdx by kevin51jiang in 22068
* Generate - Fix broken documentation links by gante in 22078
* Fix gradient checkpointing bug in Speecht5 by KMFODA in 22080
* Fix hint in src/transformers/modeling_utils.py by J-shang in 22074
* handle numpy inputs in whole word mask data collator by dwyatte in 22032
* GPT-J specific half precision on CPU note by MKhalusova in 22086
* Fix imports of TF MobileViT by sgugger in 22065
* Revert "[GPT2] Propose fix for 21080" by ydshieh in 22093
* Add AutoModelForZeroShotImageClassification by alaradirik in 22087
* add new model of MGP-STR by wdp-007 in 21418
* Add pr_checks.mdx Italian translation by alexcalabrese in 17459)
* Fix gradient checkpointing bug in xglm by KMFODA in 22127
* Add TFVisionTextDualEncoder by Rocketknight1 in 21873
* Fix gradient checkpointing bug in Trajectory Transformer by KMFODA in 22125
* Fix gradient checkpointing bug in xlm_roberta_xl by KMFODA in 22128
* Added big_models.mdx italian translation 17600 by nickprock in 22115
* [`Blip2`] skip accelerate test by younesbelkada in 22124
* Fix gradient checkpointing bug in xmod by KMFODA in 22129
* Fix gradient checkpointing bug in LongT5 by KMFODA in 22130
* Fix gradient checkpointing bug in trocr by KMFODA in 22126
* Zero-shot image classification task guide by MKhalusova in 22132
* Fix doc link for MGP-STR by sgugger in 22138
* Adding Type Hints to TF_Pegasus model by mollerup23 in 21941
* Add a new script to check model testers' config by ydshieh in 22063
* Update configuration_align.py (projected_dim=640) by bishmdl76 in 22139
* Trainer: let generate pick its inputs by gante in 22108
* Enforce same behavior as PyTorch 2.0 for older versions by sgugger in 22136
* [trainer] fix bug in grad accum with multiple epochs by stas00 in 22098
* [deepspeed docs] Activation Checkpointing by stas00 in 22099
* Remove backend check for torch.compile by sgugger in 22140
* Prepare daily CI for torch 2.0.0 by ydshieh in 22135
* docs: New terms and updates to glossary by MichaelRipa in 21982
* Move `is_pipeline_test_to_skip` to specific model test classes by ydshieh in 21999
* Add ConvNeXT V2 by alaradirik in 21679
* Update 2 doctest expected values for torch 2.0.0 by ydshieh in 22148
* Translation Italian: perf_train_cpu and perf_train_cpu_many by nickprock in 22151
* Fix big model inference for T5 models in float16 by sgugger in 22095
* Create MaskedImageCompletionOutput and fix ViT docs by alaradirik in 22152
* to_pil - don't rescale if int and in range 0-255 by amyeroberts in 22158
* [trainer] add `--optim adamw_torch_fused` for pt-2.0+ by stas00 in 22144
* Revert "Enforce same behavior as PyTorch 2.0 for older versions" by sgugger in 22163

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* abhiwand
* Add BridgeTower model (20775)
* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (21684)
* [WIP] Add BridgeTowerForContrastiveLearning (21964)
* wonhyeongseo
* [i18n-KO] Translated quicktour page to Korean (20946)
* ErwannMillon
* Add VQGAN-CLIP research project (21329)
* NoB0
* [i18n-fr] Translate index page to French (21458)
* [i18n-fr] Translate quicktour page to French (21589)
* jvamvas
* Add X-MOD (20939)
* [WIP] Move X-MOD models to facebook organization (21640)
* susnato
* Add Ernie-M Model to huggingface (21349)
* zinengtang
* Add TVLT (20725)
* andyehrenberg
* add flax whisper implementation (20479)
* check for None forced tokens (21793)
* Flax beam search fix (21857)
* tanreinama
* add GPTSAN model (reopen) (21291)
* jonatanklosko
* Add WhisperTokenizerFast (21222)
* Skylion007
* Apply ruff flake8-comprehensions (21694)
* kiansierra
* Prophetnet batch dimension inversion fix (21870)
* elisim
* [Time-Series] informer model (21099)
* wdp-007
* add new model of MGP-STR (21418)

4.26.1

Not secure
* ESM openfold_utils type hints by ringohoffman in 20544
* Add cPython files in build by sgugger in 21372
* Fix T5 inference in float16 + bnb error by younesbelkada in 21281
* Fix import in Accelerate for find_exec_bs by sgugger in 21501
* Fix inclusion of non py files in package by sgugger in 21546

4.26.0

Not secure
`GenerationConfig`

The `generate` method has multiple arguments whose defaults were lying in the model config. We have now decoupled these in a separate generation config, which makes it easier to store different sets of parameters for a given model, with different generation strategies. While we will keep supporting generate arguments in the model configuration for the foreseeable future, it is now recommended to use a generation config. You can learn more about its uses [here](https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) and its documentation [here](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig).

* Generate: use `GenerationConfig` as the basis for `.generate()` parametrization by gante in 20388
* Generate: TF uses `GenerationConfig` as the basis for `.generate()` parametrization by gante in 20994
* Generate: FLAX uses `GenerationConfig` as the basis for `.generate()` parametrization by gante in 21007

`ImageProcessor`

In the vision integration, all feature extractor classes have been deprecated to be renamed to `ImageProcessor`. The old feature extractors will be fully removed in version 5 of Transformers and new vision models will only implement the `ImageProcessor` class, so be sure to switch your code to this new name sooner rather than later!

* Add deprecation warning when image FE instantiated by amyeroberts in 20427
* Vision processors - replace FE with IPs by amyeroberts in 20590
* Replace FE references by amyeroberts in 20702

New models

AltCLIP

AltCLIP is a variant of CLIP obtained by switching the text encoder with a pretrained multilingual text encoder (XLM-Roberta). It has very close performances with CLIP on almost all tasks, and extends the original CLIP’s capabilities to multilingual understanding.

* Add AltCLIP by jongjyh in 20446

BLIP

BLIP is a model that is able to perform various multi-modal tasks including visual question answering, image-text retrieval (image-text matching) and image captioning.

* Add BLIP by younesbelkada in 20716

BioGPT

BioGPT is a domain-specific generative pre-trained Transformer language model for biomedical text generation and mining. BioGPT follows the Transformer language model backbone, and is pre-trained on 15M PubMed abstracts from scratch.

* Add BioGPT by kamalkraj in 20420

BiT

BiT is a simple recipe for scaling up pre-training of [ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)-like architectures (specifically, ResNetv2). The method results in significant improvements for transfer learning.

* Add BiT + ViT hybrid by NielsRogge in 20550

EfficientFormer

EfficientFormer proposes a dimension-consistent pure transformer that can be run on mobile devices for dense prediction tasks like image classification, object detection and semantic segmentation.

* Efficientformer by Bearnardd in 20459

GIT

GIT is a decoder-only Transformer that leverages [CLIP](https://huggingface.co/docs/transformers/model_doc/clip)’s vision encoder to condition the model on vision inputs besides text. The model obtains state-of-the-art results on image captioning and visual question answering benchmarks.

* Add GIT (GenerativeImage2Text) by NielsRogge in 20295

GPT-sw3

GPT-Sw3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-Sw3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.

* Add gpt-sw3 model to transformers by ekgren in 20209

Graphormer

Graphormer is a Graph Transformer model, modified to allow computations on graphs instead of text sequences by generating embeddings and features of interest during preprocessign and collation, then using a modified attention.

* Graphormer model for Graph Classification by clefourrier in 20968

Mask2Former

Mask2Former is a unified framework for panoptic, instance and semantic segmentation and features significant performance and efficiency improvements over [MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer).

* Add Mask2Former by alaradirik and shivalikasingh95 in 20792

OneFormer

OneFormer is a universal image segmentation framework that can be trained on a single panoptic dataset to perform semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task-guided for training, and task-dynamic for inference.

* Add OneFormer Model by praeclarumjj3 in 20577

Roberta prelayernorm

The RoBERTa-PreLayerNorm model is identical to RoBERTa but uses the --encoder-normalize-before flag in [fairseq](https://fairseq.readthedocs.io/).

* Implement Roberta PreLayerNorm by AndreasMadsen in 20305

Swin2SR

Swin2R improves the [SwinIR](https://github.com/JingyunLiang/SwinIR/) model by incorporating [Swin Transformer v2](https://huggingface.co/docs/transformers/model_doc/swinv2) layers which mitigates issues such as training instability, resolution gaps between pre-training and fine-tuning, and hunger on data.

* Add Swin2SR by NielsRogge in 19784

TimeSformer

TimeSformer is the first video transformer. It inspired many transformer based video understanding and classification papers.

* [New Model] Add TimeSformer model by fcakyon in 18908

UPerNet

UPerNet is a general framework to effectively segment a wide range of concepts from images, leveraging any vision backbone like [ConvNeXt](https://huggingface.co/docs/transformers/model_doc/convnext) or [Swin](https://huggingface.co/docs/transformers/model_doc/swin).

* Add UperNet by NielsRogge in 20648

Vit Hybrid

ViT hybrid is a slight variant of the [plain Vision Transformer](https://huggingface.co/docs/transformers/model_doc/vit), by leveraging a convolutional backbone (specifically, [BiT](https://huggingface.co/docs/transformers/model_doc/bit)) whose features are used as initial “tokens” for the Transformer. It’s the first architecture that attains similar results to familiar convolutional architectures.

* Add BiT + ViT hybrid by NielsRogge in 20550

Backbones

Breaking a bit the one model per file policy, we introduce backbones (mainly for vision models) which can then be re-used in more complex models like DETR, MaskFormer, Mask2Former etc.

* [NAT, DiNAT] Add backbone class by NielsRogge in 20654
* Add Swin backbone by NielsRogge in 20769
* [DETR and friends] Use AutoBackbone as alternative to timm by NielsRogge in 20833

Bugfixes and improvements

* fix cuda OOM by using single Prior by ArthurZucker in 20486
* Add ESM contact prediction by Rocketknight1 in 20535
* flan-t5.mdx: fix link to large model by szhublox in 20555
* Fix torch device issues by ydshieh in 20584
* Fix flax GPT-J-6B linking model in tests by JuanFKurucz in 20556
* [Vision] fix small nit on `BeitDropPath` layers by younesbelkada in 20587
* Install `natten` with CUDA version by ydshieh in 20546
* Add entries to `FEATURE_EXTRACTOR_MAPPING_NAMES` by ydshieh in 20551
* Cleanup some config attributes by ydshieh in 20554
* [Whisper] Move decoder id method to tokenizer by sanchit-gandhi in 20589
* Add `require_torch` to 2 pipeline tests by ydshieh in 20585
* Install `tensorflow_probability` for TF pipeline CI by ydshieh in 20586
* Ci-whisper-asr by ArthurZucker in 20588
* cross platform from_pretrained by ArthurZucker in 20538
* Make convert_to_onnx runable as script again by mcernusca in 20009
* ESM openfold_utils type hints by ringohoffman in 20544
* Add RemBERT ONNX config by hchings in 20520
* Fix link to Swin Model contributor novice03 by JuanFKurucz in 20557
* Fix link to swin transformers v2 microsoft model by JuanFKurucz in 20558
* Fix link to table transformer detection microsoft model by JuanFKurucz in 20560
* clean up unused `classifier_dropout` in config by ydshieh in 20596
* Fix whisper and speech to text doc by ArthurZucker in 20595
* Replace `set-output` by `$GITHUB_OUTPUT` by ydshieh in 20547
* [Vision] `.to` function for ImageProcessors by younesbelkada in 20536
* [Whisper] Fix decoder ids methods by sanchit-gandhi in 20599
* Add-whisper-conversion by ArthurZucker in 20600
* README in Hindi 🇮🇳 by pacman100 in 20097
* Fix code sample in preprocess by stevhliu in 20561
* Split autoclasses on modality by stevhliu in 20559
* Fix test for file not found by sgugger in 20604
* Rework the pipeline tutorial by Narsil in 20437
* Documentation fixes by SamuelzXu in 20607
* Adding anchor links to Hindi README by pacman100 in 20606
* exclude jit time from the speed metric calculation of evaluation and prediction by sywangyi in 20553
* Check if docstring is None before formating it by xxyzz in 20592
* updating T5 and BART models to support Prefix Tuning by pacman100 in 20601
* Fix `AutomaticSpeechRecognitionPipelineTests.run_pipeline_test` by ydshieh in 20597
* Ci-jukebox by ArthurZucker in 20613
* Update some GH action versions by ydshieh in 20537
* Fix dtype of weights in from_pretrained when device_map is set by sgugger in 20602
* add missing is_decoder param by stevhliu in 20631
* Fix link to speech encoder decoder model in speech recognition readme by JuanFKurucz in 20633
* Fix `natten` installation in docker file by ydshieh in 20632
* Clip floating point constants to bf16 range to avoid inf conversion by aws-sangeetha in 20605
* Pin TensorFlow to the next release by sgugger in 20635
* [MaskFormer] Add support for ResNet backbone by NielsRogge in 20483
* [Trainer] add error when passing `8bit`models by younesbelkada in 20651
* [`ViTHybrid`] + [`BiT`] cleaner `__init__` by younesbelkada in 20649
* Update summarization `run_pipeline_test` by ydshieh in 20623
* pin TF 2.11 in docker files by ydshieh in 20642
* Speed up git-lfs detection on error by xloem in 20641
* Updated Trainer args typing by julianmack in 20655
* Add `dpt-hybrid` support by younesbelkada in 20645
* [Whisper] Fix forced decoder ids by sanchit-gandhi in 20652
* Add TFBartForSequenceClassification by uglyboxer in 20570
* run_speech_recognition_seq2seq.py: add cache_dir param to dataset by eschmidbauer in 20540
* [`BiT`] Small patch fix by younesbelkada in 20657
* Fix gpt2 fp16 training when tracing is enabled by JingyaHuang in 20656
* Fix load from PT-formatted checkpoint in composite TF models by sgugger in 20661
* Update the list of contributors to reflect current organization by sgugger in 20603
* Fix expected values for TF-ESM tests by Rocketknight1 in 20680
* Add `BackboneMixin` by ydshieh in 20660
* Migrate torchdynamo to torch.compile by sgugger in 20634
* Whilelist Transformers private method in DummyObject by sgugger in 20681
* [`ViTHybrid`] Fix `accelerate` slow tests by younesbelkada in 20679
* Enable bf16 option for XLA devices by jeffhataws in 20684
* Fix CIs for PyTorch 1.13 by ydshieh in 20686
* Fix donut image processor by amyeroberts in 20625
* Added missing `test_tokenization_led` by IMvision12 in 20568
* Add video classification pipeline by nateraw in 20151
* [Backbones] Improve out features by NielsRogge in 20675
* Change transformers.onnx to use optimum.exporters.onnx by michaelbenayoun in 20529
* skip `test_multi_gpu_data_parallel_forward` for `MaskFormerSwinModelTest` by ydshieh in 20688
* [`ViTHybrid`] fix last `accelerate` slow test by younesbelkada in 20705
* Fix rendering issue in quicktour by sgugger in 20708
* Made LUKE Tokenizer independent from RoBERTa by salvo96 in 20720
* Spanish translation of asr.mdx and add_new_pipeline.mdx by alceballosa in 20569
* Add `accelerate` support for LongT5 models by pszemraj in 20341
* Fix `AutoModelTest.test_model_from_pretrained` by ydshieh in 20730
* Adding ValueError when imcompatible parameters are used. by Narsil in 20729
* Add type hints for Whisper models by donelianc in 20396
* Very small edit to change name to OpenAI GPT by stanleycai95 in 20722
* fsdp fix by pacman100 in 20719
* Spanish translation of the file debugging.mdx by SimplyJuanjo in 20566
* Convert tokenizer outputs for Keras in doc example by Rocketknight1 in 20732
* Clarify return_tensor and return_text parameters by stevhliu in 20662
* Add vision requirement to image transforms by amyeroberts in 20712
* Add a progress bar for large model loading by sgugger in 20713
* Disambiguate test for required_input in tokenization base file. by sgugger in 20731
* Add decorator for flaky Donut tests by amyeroberts in 20739
* rename `layoutlm_job` to `exotic_models_job` by ydshieh in 20736
* Update CI to torch 1.13.0 by ydshieh in 20687
* Add `keep_in_fp32_modules` support by younesbelkada in 20683
* Change a logic in pipeline test regarding TF by ydshieh in 20710
* Fix AdamWeightDecay for TF 2.11 by Rocketknight1 in 20735
* in the resize() function in image_transforms.py, the line 267: by dhansmair in 20728
* Add docs xlm roberta by hazrulakmal in 20742
* Fixing the pipeline tutorial test by Narsil in 20746
* Uninstall `torch_tensorrt` in `DeepSpeed` CI image for now by ydshieh in 20758
* Remove image_transforms functions from init by amyeroberts in 20704
* Fix missing `()` in some usage of `is_flaky` by ydshieh in 20749
* [Tests] Improve test_attention_outputs by NielsRogge in 20701
* Fix attribute error problem by casuallyName in 20765
* [CI-Test] Fixes but also skips the mT5 tests by ArthurZucker in 20755
* Replaces xxx_required with requires_backends by amyeroberts in 20715
* Install `torch-tensorrt 1.3.0` for DeepSpeed CI by ydshieh in 20764
* Even more validation. by Narsil in 20762
* Install vision for TF pipeline tests by ydshieh in 20771
* Patch for FlanT5-XXL 8bit support by larsmennen in 20760
* [Pipeline] fix failing bloom `pipeline` test by younesbelkada in 20778
* Fixing object detection with `layoutlm` by Narsil in 20776
* Install video dependency for pipeline CI by ydshieh in 20777
* Move convert_to_rgb to image_transforms module by amyeroberts in 20784
* Recompile `apex` in `DeepSpeed` CI image by ydshieh in 20788
* [Pipeline] skip feature extraction test if in `IMAGE_PROCESSOR_MAPPING` by younesbelkada in 20790
* Fix object detection2 by Narsil in 20798
* Stop calling expand_1d on newer TF versions by Rocketknight1 in 20786
* Add Universal Segmentation class + mapping by NielsRogge in 20766
* Install `sentencepiece` in `DeepSpeed` CI image by ydshieh in 20795
* lazy import torch._softmax_backward_data for better compatibility by daquexian in 20796
* [`Vision`] [Refactor] Initialize weights on the correct place by younesbelkada in 20803
* Vilt - use image_transforms pad by amyeroberts in 20780
* [clip] fix error message by stas00 in 20818
* Add model resources for ViT by stanleycai95 in 20723
* fix typo output not ouput in bitsandbytes trainer test by Thomas-MMJ in 20839
* Fix tiny typo by fzyzcjy in 20841
* Remove unused `max_position_embeddings ` in config classes by ydshieh in 20836
* [mBART] fix erroneous italics in docstring by sanchit-gandhi in 20835
* TF AdamWeightDecay fix for 2.11 by Rocketknight1 in 20848
* remove unused `use_cache` in config classes by ydshieh in 20844
* [SegFormer] Add support for segmentation masks with one label by NielsRogge in 20279
* Clarify `use_fast` parameter in docstring by stevhliu in 20840
* [S2T, Whisper] Add copied from statements by sanchit-gandhi in 20787
* Embed circle packing chart for model summary by stevhliu in 20791
* [Swin2SR] Add doc tests by NielsRogge in 20829
* [Examples] Update big table by NielsRogge in 20845
* Use `config.num_channels` in CLIP-like modeling files by ydshieh in 20857
* fix past_key_values in GPTNeoXForCausalLM.prepare_inputs_for_generation by ValeKnappich in 20621
* Add visual prompt to processor of CLIPSeg model by idilsulo in 20816
* Adding `evaluate` to the list of libraries required in generated notebooks by MKhalusova in 20850
* Fix past CI by skipping `LevitModelTest.test_problem_types` by ydshieh in 20859
* Fix whisper export by mht-sharma in 20800
* Fix doctest by ArthurZucker in 20843
* Add-warning-tokenizer by ArthurZucker in 20826
* Update `HubertModelIntegrationTest.test_inference_keyword_spotting` by ydshieh in 20863
* Generate: post-generate config doctest fix by gante in 20804
* change strings to f-strings in image_processing_utils.py by dhansmair in 20865
* [`FSMT`] Make it compatible with `xxxForConditionalGeneration` models by younesbelkada in 20825
* [`MobileNet-v2`] Fix ONNX typo by younesbelkada in 20860
* having new model entries in Hindi for Hindi README by pacman100 in 20869
* Add Onnx Config for PoolFormer by BakingBrains in 20868
* Adding support for `fp16` for asr pipeline. by Narsil in 20864
* Add script to convert T5X T5 (v1.0 and v1.1) checkpoints to PyTorch by bastings in 20801
* Add japanese translation of template by younesbelkada in 20870
* [RobertaPreLayernom] Fixes the CI daily test by ArthurZucker in 20886
* Fixes typo in the help text for --max_length by makrai in 20883
* typo fix by nathan-barry in 20891
* [ `T5`] fix fp16 loading issue by younesbelkada in 20878
* Update flan-t5 original model link by kamalkraj in 20897
* fix docs typos in "add_new_model" by elisim in 20900
* [Past CI] 🔥 Leave Past CI failures in the past 🔥 by ydshieh in 20861
* Avoid collisions in writing metrics via 2 APIs - azureml + mlflow by akshaya-a in 20837
* Generate: correctly detect default max length by gante in 20911
* Remove non-breaking spaces by aphedges in 20929
* Load the state dict on CPU to prevent unnecessary GPU memory surge by HarshTrivedi in 20920
* Fix FP16 inference in TextGenerationPipeline by bofenghuang in 20913
* Remove Bert tokenizer dependency from DistillBert (slow/fast) tokenizers by IvanLauLinTiong in 20933
* Adds type checking to PreTrainedConfig. by mmcdermott in 20926
* Fix error message in `WhisperFeatureExtractor` by bofenghuang in 20936
* Fixing DistilBert error message by SamuelzXu in 20945
* [trainer: `distributed_concat`] ensure `all_gather`'s inputs are contiguous by stas00 in 20951
* Add generate kwargs to `AutomaticSpeechRecognitionPipeline` by bofenghuang in 20952
* update pyknp to rhoknp by conan1024hao in 20890
* Generate: TF XLA beam sample by gante in 20927
* Fix T5 docstring by IvanLauLinTiong in 20957
* Generate: delete unused TF `_reorder_cache` by gante in 20964
* `MinNewTokensLengthLogitsProcessor` for `.generate` method 20814 by kotikkonstantin in 20892
* Fix post_process_object_detection method descriptions by alaradirik in 20977
* Remove more unused attributes in config classes by ydshieh in 20858
* [run_clm example] add torch_dtype option for model load. by sywangyi in 20971
* Fix valid ratio for Deformable Detr by long8v in 20958
* Enable `decoder_attention_mask` in `generate` function by samuelpullely in 20726
* Ignore errors when deleting old checkpoints in trainer by akrogager in 20984
* Avoid CI runs under users' own CircleCI personal account by ydshieh in 20981
* Fix for LXMERT by ydshieh in 20986
* Improve OWL-ViT postprocessing by alaradirik in 20980
* Fix race condition on cleaning checkpoints when save_total_limit set to 1 by radcheb in 20989
* Add custom stop token ids for generation by tokestermw in 20727
* update template by ArthurZucker in 20885
* Add: doc page for the object detection task by MKhalusova in 20925
* auxiliary_loss works for Deformable Detr by long8v in 20959
* Update image processor parameters if creating with kwargs by amyeroberts in 20866
* Fix bug in segmentation postprocessing by alaradirik in 20198
* Don't call deprecated method by amyeroberts in 20904
* Fix model hub link by idilsulo in 20998
* Refactor the function get_results by milyiyo in 20999
* Update bug report template by stevhliu in 21004
* Remove T5 dependency from mT5 model by SD-13 in 20949
* Update PR template by stevhliu in 21006
* add: task guide on video classification model fine-tuning. by sayakpaul in 20827
* Generate: Fix CI related to 20727 by gante in 21003
* Fix (DeepSpeed) docker image build issue by ydshieh in 21002
* Fix callback docstrings by stevhliu in 21005
* Generate: post-generate config TF doctest fix by gante in 21018
* Generate: FLAX infers pad token in its absence and has functional example by gante in 21009
* [`BLIP`] Fix daily CI failing test by younesbelkada in 20877
* Make sure dynamic objects can be saved and reloaded by sgugger in 21008
* [CLIPSeg] Fix integration test by NielsRogge in 20995
* Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script by mpierrau in 20985
* [NumPy] Remove references to deprecated NumPy type aliases by hvaara in 21022
* Fix arguments passed to predict function in QA Seq2seq training script by Observer46 in 21026
* Support turning off the model uploading in ClearML by david1542 in 20969
* fix parameter name in docstring by cceyda in 21032
* fix levit timm conversion file by Bearnardd in 20938
* fix typo by kaisugi in 21042
* fix typo by sabaul in 21048
* Replace `past` with `past_key_values` by ArthurZucker in 20944
* Fix warning for MCTC model by sgugger in 21049
* remove flax file from `documentation_tests.txt` by ydshieh in 21036
* Patch-past-refactor by ArthurZucker in 21050
* Make the attention_head_size in distilbert an object attribute by KarlFelixJoehnk in 20970
* feature: update wandb callback to upload checkpoints by parambharat in 21035
* Fix header level by stevhliu in 21072
* Update docstring for CLIPConfig by yingzha in 21066
* fix typo in comment by soulseen in 21088
* Optimize inference only mode memory if ipex is used by sywangyi in 21083
* Fixed issue 21039 by susnato in 21062
* Remove more unused attributes in config classes by ydshieh in 21000
* [bnb optim] fixing test by stas00 in 21030
* Fix past CI by ydshieh in 20967
* Fix `torchscript` tests for `AltCLIP` by ydshieh in 21102
* [Tokenizers] Fix a small typo by ArthurZucker in 21104
* Update task summary part 1 by stevhliu in 21014
* Add Spanish translation to community.mdx by shogohida in 21055
* Rework automatic code samples in docstrings by sgugger in 20757
* [CI-doc-daily] Remove RobertaPreLayernorm random tests by ArthurZucker in 20992
* Use raw string for regex in tokenization_t5_fast.py by odashi in 21125
* Fixed typo in docstring by tkburis in 21115
* [VideoMAE] Fix docstring by NielsRogge in 21111
* [LongT5] Remove duplicate encoder_attention_mask default value check by guillaume-be in 21124
* Add `min_new_tokens` argument in generate() (implementation based on `MinNewTokensLengthLogitsProcessor`) by silverriver in 21044
* Fixing batching pipelines on single items for ChunkPipeline by Narsil in 21132
* Fixed issue 21053 by susnato in 21065
* Fix `RealmModelIntegrationTest.test_inference_open_qa` by ydshieh in 21136
* Added clefourrier as ref point for graph models in bug reports by clefourrier in 21139
* Update `TFTapasEmbeddings` by ydshieh in 21107
* [GIT] Fix training by NielsRogge in 21133
* Fixes to TF collators by Rocketknight1 in 21143
* TF: serializable hubert by gante in 20966
* Generate: TF contrastive search must pop `use_cache` from `model_kwargs` by gante in 21149
* Rename test_feature_extraction files by amyeroberts in 21140
* Small simplification to TopKLogitsWarper by njhill in 21130
* feat: add standalone guide on XLA support. by sayakpaul in 21141
* Clarify and add missing typical_p argument docstring. by shermansiu in 21095
* Fixing offline mode for pipeline (when inferring task). by Narsil in 21113
* Whisper Timestamp processor and prediction by ArthurZucker in 20620
* Add batch of resources by NielsRogge in 20647
* Change variable name to prevent shadowing by sayakpaul in 21153
* CLI: update hub PR URL by gante in 21154
* Add resources by NielsRogge in 20872
* Add: tensorflow example for image classification task guide by MKhalusova in 21038
* Add: An introductory guide for text generation by MKhalusova in 21090
* Refactoring of the text generate API docs by MKhalusova in 21112
* Add Epsilon- and Eta-Sampling by shermansiu in 21121
* Fixed num_channels!=3 normalization training by layjain in 20630
* 🌐 [i18n-KO] Translated `installation.mdx` to Korean by wonhyeongseo in 20948
* Add Japanese translation to multilingual.mdx by shogohida in 21084
* Make `test_save_pretrained_signatures` slow test by ydshieh in 21105
* `blip` support for training by younesbelkada in 21021
* Remove Roberta Dependencies from XLM Roberta Flax and Tensorflow models by SamuelzXu in 21047
* Fix typos in documentation by jordimas in 21160
* OPT: Fix batched generation with FLAX by gante in 21150
* Fix git model for generate with beam search. by PeterL1n in 21071
* fix the issue that the output dict of jit model could not get [:2] by sywangyi in 21146
* using raw string for regex to search <extra_id> by pfliu-nlp in 21162
* Fix doctest CI by ydshieh in 21166
* Adapt repository creation to latest hf_hub by sgugger in 21158
* Add AWS Neuron torchrun support by jeffhataws in 20806
* Rewrite a couple of lines in the TF XLA doc by Rocketknight1 in 21177
* [issues template] update deepspeed owners by stas00 in 21027
* Fix `Mask2FormerForUniversalSegmentation` by ydshieh in 21175
* Update year 2020 to 2023 in one file by ydshieh in 21190
* workaround documentation rendering bug by hollance in 21189
* Fix device issue in `UperNetModelIntegrationTest` by ydshieh in 21192
* Updates to computer vision section of the Preprocess doc by MKhalusova in 21181
* Rename GLPN image processor tests by amyeroberts in 21194
* Update examples with image processors by amyeroberts in 21155
* hertz is already per second by hollance in 21188
* [Whisper] Fix timestamp processor by ArthurZucker in 21187
* Add hallucination filter by KMFODA in 18675
* [`CVT`] Fix module initialization issue by younesbelkada in 21193
* Flax dtype-dependent numerical masking by gante in 21197
* Add Japanese translation index.mdx by kambehmw in 21186
* Add disclaimer for necessary fake models by sgugger in 21178
* Enabling live `automatic-speech-recognition` asr for Whisper. by Narsil in 21196
* [Whispe] Fix pipeline after timestamp merges by ArthurZucker in 21198
* Update modeling doc strings FE -> IP by amyeroberts in 21106
* Generate: documented function to compute the transition scores by gante in 21191
* deleted references of self.vocab_size and self.type_vocab_size for multiple models [TF implementation] by susnato in 21164
* Update `huggingface_hub` version by ydshieh in 21212
* Fix `CONFIG_ARCHIVE_MAP_MAPPING_NAMES` by ydshieh in 21207
* Fix `GPTJ` doctest by ydshieh in 21213
* Declare __len__ method in PreTrainedTokenizerBase by thomasw21 in 21210
* Fix code example in training tutorial by stevhliu in 21201
* Make `parallelism` for CircleCI jobs work - but keep it `1` for now by ydshieh in 21157
* Fix OneFormer Docstrings by praeclarumjj3 in 21215
* Fix task summary doctest by stevhliu in 21200
* Remove all hf-internal-testing checkpoints that can be removed by sgugger in 21199
* Microphone live inference catching up when inference is too slow (whisper). by Narsil in 21219
* [`BLIP`] fix docstring for `BlipTextxxx` by younesbelkada in 21224
* Skip failing test for now by sgugger in 21226
* [`BLIP`] fix doctest by younesbelkada in 21217
* Generate: precision fix in compute_transition_scores doctests by gante in 21251
* Extend Script to enable conversion of Encoder Only T5x Models to Pytorch by ToluClassics in 20907
* Add test_image_processing_common.py by amyeroberts in 20785
* [GIT] Convert more checkpoints by NielsRogge in 21245
* Optimize by not computing gradients for parameters set to requires_grad=False by raghavanone in 21236
* Fix reformer CI by ydshieh in 21254
* Add Japanese translation installation.mdx by kambehmw in 21241
* Add scikit-learn dependency to train langage-modeling by mostafaelhoushi in 21229
* Add missing checkpoint for doctest by amyeroberts in 21258
* Generate: save generation config with the models' `.save_pretrained()` by gante in 21264
* Replace reduce_labels with do_reduce_labels by amyeroberts in 21218
* Update tests: replace feature extractor tests with image processor by amyeroberts in 20768
* Notebook examples grouping and update by MKhalusova in 21265
* Add: TensorFlow example for semantic segmentation task guide by MKhalusova in 21223
* [ci-daily] Fix pipeline tests by ArthurZucker in 21257
* Add class properties with warnings by amyeroberts in 21195
* [Whisper] fix all issues with unk token by ArthurZucker in 21250
* Supported pipeline tasks update by MKhalusova in 21268
* Models docstring by sgugger in 21225
* Hotifx remove tuple for git config image processor. by Narsil in 21278
* Fix MaskFormerImageProcessor.post_process_instance_segmentation by alaradirik in 21256

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* fcakyon
* [New Model] Add TimeSformer model (18908)
* kamalkraj
* Add BioGPT (20420)
* Update flan-t5 original model link (20897)
* ringohoffman
* ESM openfold_utils type hints (20544)
* SamuelzXu
* Documentation fixes (20607)
* Fixing DistilBert error message (20945)
* Remove Roberta Dependencies from XLM Roberta Flax and Tensorflow models (21047)
* alceballosa
* Spanish translation of asr.mdx and add_new_pipeline.mdx (20569)
* ekgren
* Add gpt-sw3 model to transformers (20209)
* AndreasMadsen
* Implement Roberta PreLayerNorm (20305)
* IvanLauLinTiong
* Remove Bert tokenizer dependency from DistillBert (slow/fast) tokenizers (20933)
* Fix T5 docstring (20957)
* jongjyh
* Add AltCLIP (20446)
* SD-13
* Remove T5 dependency from mT5 model (20949)
* Bearnardd
* fix levit timm conversion file (20938)
* Efficientformer (20459)
* praeclarumjj3
* Add OneFormer Model (20577)
* Fix OneFormer Docstrings (21215)
* kambehmw
* Add Japanese translation index.mdx (21186)
* Add Japanese translation installation.mdx (21241)

4.25.1

Not secure
PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack. You can enable `torch.compile` on any of our models, and get support with the `Trainer` (and in all our PyTorch examples) by using the `torchdynamo` training argument. For instance, just add `--torchdynamo inductor` when launching those examples from the command line.

This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

Note that to get the best performance, we recommend:
- using an Ampere GPU (or more recent)
- sticking to fixed shaped for now (so use `--pad_to_max_length` in our examples)

* Repurpose torchdynamo training args towards torch._dynamo by sgugger in 20498

Audio Spectrogram Transformer

The Audio Spectrogram Transformer model was proposed in [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a [Vision Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/vit) to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

* Add Audio Spectogram Transformer by NielsRogge in 19981

Jukebox

The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

* Add Jukebox model (replaces 16875) by ArthurZucker in 17826

Switch Transformers

The SwitchTransformers model was proposed in [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer.

It is the first MoE model supported in `transformers`, with the largest checkpoint currently available currently containing 1T parameters.

* Add Switch transformers by younesbelkada and ArthurZucker in 19323

RocBert

The RoCBert model was proposed in [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.

* Add RocBert by sww9370 in 20013

CLIPSeg

The CLIPSeg model was proposed in [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker. [CLIP](https://huggingface.co/docs/transformers/main/en/model_doc/clip)Seg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

* Add CLIPSeg by NielsRogge in 20066

NAT and DiNAT

NAT

NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

DiNAT

DiNAT was proposed in [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi.

It extends [NAT](https://huggingface.co/docs/transformers/main/en/model_doc/nat) by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

* Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by alihassanijr in 20219

MobileNetV2

The MobileNet model was proposed in [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

* add MobileNetV2 model by hollance in 17845

MobileNetV1

The MobileNet model was proposed in [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

* add MobileNetV1 model by hollance in 17799

Image processors

Image processors replace feature extractors as the processing class for computer vision models.

Important changes:
* `size` parameter is now a dictionary of `{"height": h, "width": w}`, `{"shortest_edge": s}`, `{"shortest_egde": s, "longest_edge": l}` instead of int or tuple.
* Addition of `data_format` flag. You can now specify if you want your images to be returned in `"channels_first"` - NCHW - or `"channels_last"` - NHWC - format.
* Processing flags e.g. `do_resize` can be passed directly to the `preprocess` method instead of modifying the class attribute: `image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")`
* Leaving `return_tensors` unset will return a list of numpy arrays.

The classes are backwards compatible and can be created using existing feature extractor configurations - with the `size` parameter converted.

* Add Image Processors by amyeroberts in 19796
* Add Donut image processor by amyeroberts 20425
* Add segmentation + object detection image processors by amyeroberts in 20160
* AutoImageProcessor by amyeroberts in 20111

Backbone for computer vision models

We're adding support for a general `AutoBackbone` class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

* Add AutoBackbone + ResNetBackbone by NielsRogge in 20229
* Improve backbone by NielsRogge in 20380
* [AutoBackbone] Improve API by NielsRogge in 20407

Support for `safetensors` offloading

If the model you are using has a `safetensors` checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

* Safetensors offload by sgugger in 20321

Contrastive search in the `generate` method

* Generate: TF contrastive search with XLA support by gante in 20050
* Generate: contrastive search with full optional outputs by gante in 19963

Breaking changes

* 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` by beneyal in 15775

Bugfixes and improvements

* add dataset by stevhliu in 20005
* Add BERT resources by stevhliu in 19852
* Add LayoutLMv3 resource by stevhliu in 19932
* fix typo by stevhliu in 20006
* Update object detection pipeline to use post_process_object_detection methods by alaradirik in 20004
* clean up vision/text config dict arguments by ydshieh in 19954
* make sentencepiece import conditional in bertjapanesetokenizer by ripose-jp in 20012
* Fix gradient checkpoint test in encoder-decoder by ydshieh in 20017
* Quality by sgugger in 20002
* Update auto processor to check image processor created by amyeroberts in 20021
* [Doctest] Add configuration_deberta_v2.py by Saad135 in 19995
* Improve model tester by ydshieh in 19984
* Fix doctest by ydshieh in 20023
* Show installed libraries and their versions in CI jobs by ydshieh in 20026
* reorganize glossary by stevhliu in 20010
* Now supporting pathlike in pipelines too. by Narsil in 20030
* Add **kwargs by amyeroberts in 20037
* Fix some doctests after PR 15775 by ydshieh in 20036
* [Doctest] Add configuration_camembert.py by Saad135 in 20039
* [Whisper Tokenizer] Make more user-friendly by sanchit-gandhi in 19921
* [FuturWarning] Add futur warning for LEDForSequenceClassification by ArthurZucker in 19066
* fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by sywangyi in 19891
* Update esmfold conversion script by Rocketknight1 in 20028
* Fixed torch.finfo issue with torch.fx by michaelbenayoun in 20040
* Only resize embeddings when necessary by sgugger in 20043
* Speed up TF token classification postprocessing by converting complete tensors to numpy by deutschmn in 19976
* Fix ESM LM head test by Rocketknight1 in 20045
* Update README.md by bofenghuang in 20063
* fix `tokenizer_type` to avoid error when loading checkpoint back by pacman100 in 20062
* [Trainer] Fix model name in push_to_hub by sanchit-gandhi in 20064
* PoolformerImageProcessor defaults to match previous FE by amyeroberts in 20048
* change constant torch.tensor to torch.full by MerHS in 20061
* Update READMEs for ESMFold and add notebooks by Rocketknight1 in 20067
* Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by jordiclive in 20068
* Allow passing arguments to model testers for CLIP-like models by ydshieh in 20044
* Show installed libraries and their versions in GA jobs by ydshieh in 20069
* Update defaults and logic to match old FE by amyeroberts in 20065
* Update modeling_tf_utils.py by cakiki in 20076
* Update hub.py by cakiki in 20075
* [Doctest] Add configuration_dpr.py by Saad135 in 20080
* Removing RobertaConfig inheritance from CamembertConfig by Saad135 in 20059
* Skip 2 tests in `VisionTextDualEncoderProcessorTest` by ydshieh in 20098
* Replace unsupported facebookresearch/bitsandbytes by tomaarsen in 20093
* docs: Resolve many typos in the English docs by tomaarsen in 20088
* use huggingface_hub.model_inifo() to get pipline_tag by y-tag in 20077
* Fix `generate_dummy_inputs` for `ImageGPTOnnxConfig` by ydshieh in 20103
* docs: Fixed variables in f-strings by tomaarsen in 20087
* Add new terms to the glossary by stevhliu in 20051
* Replace awkward timm link with the expected one by tomaarsen in 20109
* Fix AutoTokenizer with subfolder passed by sgugger in 20110
* [Audio Processor] Only pass sr to feat extractor by sanchit-gandhi in 20022
* Update github pr docs actions by mishig25 in 20125
* Adapt has_labels test when no labels were found by sgugger in 20113
* Improve tiny model creation script by ydshieh in 20119
* Remove BertConfig inheritance from RobertaConfig by Saad135 in 20124
* [Swin] Add Swin SimMIM checkpoints by NielsRogge in 20034
* Update `CLIPSegModelTester` by ydshieh in 20134
* Update SwinForMaskedImageModeling doctest values by amyeroberts in 20139
* Attempting to test automatically the `_keys_to_ignore`. by Narsil in 20042
* Generate: move generation_*.py src files into generation/*.py by gante in 20096
* add cv + audio labels by stevhliu in 20114
* Update VisionEncoderDecoder to use an image processor by amyeroberts in 20137
* [CLIPSeg] Add resources by NielsRogge in 20118
* Make DummyObject more robust by mariosasko in 20146
* Add `RoCBertTokenizer` to `TOKENIZER_MAPPING_NAMES` by ydshieh in 20141
* Adding support for LayoutLMvX variants for `object-detection`. by Narsil in 20143
* Add doc tests by NielsRogge in 20158
* doc comment fix: Args was in wrong place by hollance in 20164
* Update `OnnxConfig.generate_dummy_inputs` to check `ImageProcessingMixin` by ydshieh in 20157
* Generate: fix TF doctests by gante in 20159
* Fix arg names for our models by Rocketknight1 in 20166
* [processor] Add 'model input names' property by sanchit-gandhi in 20117
* Fix object-detection bug (height, width inversion). by Narsil in 20167
* [OWL-ViT] Make model consistent with CLIP by NielsRogge in 20144
* Fix type - update any PIL.Image.Resampling by amyeroberts in 20172
* Fix tapas scatter by Bearnardd in 20149
* Update README.md by code-with-rajeev in 19530
* Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder just nice to use. by Narsil in 19571
* Pytorch type hints by IMvision12 in 20112
* Generate: TF sample doctest result update by gante in 20208
* [ROC_BERT] Make CI happy by younesbelkada in 20175
* add _keys_to_ignore_on_load_unexpected = [r"pooler"] by ArthurZucker in 20210
* docs: translated index page to korean by wonhyeongseo in 20180
* feat: add i18n issue template by wonhyeongseo in 20199
* [Examples] Generalise Seq2Seq ASR to handle Whisper by sanchit-gandhi in 19519
* mark `test_save_load_fast_init_from_base` as `is_flaky` by ydshieh in 20200
* Update README.md by Nietism in 20188
* Downgrade log warning -> info by amyeroberts in 20202
* Generate: add Bloom fixes for contrastive search by gante in 20213
* Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by Narsil in 20104
* [docs] set overflowing image width to auto-scale by wonhyeongseo in 20197
* Update tokenizer_summary.mdx by bofenghuang in 20135
* Make `ImageSegmentationPipelineTests` less flaky by ydshieh in 20147
* update relative positional embedding by ArthurZucker in 20203
* [WHISPER] Update modeling tests by ArthurZucker in 20162
* Add `accelerate` support for `ViT` family by younesbelkada in 20174
* Add param_name to size_dict logs & tidy by amyeroberts in 20205
* Add object detection + segmentation transforms by amyeroberts in 20003
* Typo on doctring in ElectraTokenizer by FacerAin in 20192
* Remove `authorized_missing_keys`in favor of _keys_to_ignore_on_load_missing by ArthurZucker in 20228
* Add missing ESM autoclass by Rocketknight1 in 20177
* fix device issue by ydshieh in 20227
* fixed spelling error in testing.mdx by kasmith11 in 20220
* Fix `run_clip.py` by ydshieh in 20234
* Fix docstring of CLIPTokenizer(Fast) by TilmannR in 20233
* Fix MaskformerFeatureExtractor by NielsRogge in 20100
* New logging support to "Trainer" Class (ClearML Logger) by skinan in 20184
* Enable PyTorch 1.13 by sgugger in 20168
* [CLIP] allow loading projection layer in vision and text model by patil-suraj in 18962
* Slightly alter Keras dummy loss by Rocketknight1 in 20232
* Add to DeBERTa resources by Saad135 in 20155
* Add clip resources to the transformers documentation by ambujpawar in 20190
* Update reqs to include min gather_for_metrics Accelerate version by muellerzr in 20242
* Allow trainer to return eval. loss for CLIP-like models by ydshieh in 20214
* Adds image-guided object detection support to OWL-ViT by alaradirik in 20136
* Adding `audio-classification` example in the doc. by Narsil in 20235
* Updating the doctest for conversational. by Narsil in 20236
* Adding doctest for `fill-mask` pipeline. by Narsil in 20241
* Adding doctest for `feature-extraction`. by Narsil in 20240
* Adding ASR pipeline example. by Narsil in 20226
* Adding doctest for document-question-answering by Narsil in 20239
* Adding an example for `depth-estimation` pipeline. by Narsil in 20237
* Complete doc migration by mishig25 in 20267
* Fix result saving errors of pytorch examples by li-plus in 20276
* Adding a doctest for `table-question-answering` pipeline. by Narsil in 20260
* Adding doctest for `image-segmentation` pipeline. by Narsil in 20256
* Adding doctest for `text2text-generation` pipeline. by Narsil in 20261
* Adding doctest for `text-generation` pipeline. by Narsil in 20264
* Add TF protein notebook to notebooks doc by Rocketknight1 in 20271
* Rephrasing the link. by Narsil in 20253
* Add Chinese-CLIP implementation by yangapku in 20368
* Adding doctest example for `image-classification` pipeline. by Narsil in 20254
* Adding doctest for `zero-shot-image-classification` pipeline. by Narsil in 20272
* Adding doctest for `zero-shot-classification` pipeline. by Narsil in 20268
* Adding doctest for `visual-question-answering` pipeline. by Narsil in 20266
* Adding doctest for `text-classification` pipeline. by Narsil in 20262
* Adding doctest for `question-answering` pipeline. by Narsil in 20259
* [Docs] Add resources of OpenAI GPT by shogohida in 20084
* Adding doctest for `image-to-text` pipeline. by Narsil in 20257
* Adding doctest for `token-classification` pipeline. by Narsil in 20265
* remaining pytorch type hints by IMvision12 in 20217
* Data collator for token classification pads labels column when receives pytorch tensors by markovalexander in 20244
* [Doctest] Add configuration_deformable_detr.py by Saad135 in 20273
* Fix summarization script by muellerzr in 20286
* [DOCTEST] Fix the documentation of RoCBert by ArthurZucker in 20142
* [bnb] Let's warn users when saving 8-bit models by younesbelkada in 20282
* Adding `zero-shot-object-detection` pipeline doctest. by Narsil in 20274
* Adding doctest for `object-detection` pipeline. by Narsil in 20258
* Image transforms functionality used instead by amyeroberts in 20278
* TF: add test for `PushToHubCallback` by gante in 20231
* Generate: general TF XLA constrastive search are now slow tests by gante in 20277
* Fixing the doctests failures. by Narsil in 20294
* set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by sywangyi in 20289
* Add docstrings for canine model by raghavanone in 19457
* Add missing report button for Example test by ydshieh in 20293
* refactor test by younesbelkada in 20300
* [Tiny model creation] deal with `ImageProcessor` by ydshieh in 20298
* Fix blender bot missleading doc by ArthurZucker in 20301
* remove two tokens that should not be suppressed by ArthurZucker in 20302
* [ASR Examples] Update README for Whisper by sanchit-gandhi in 20230
* Add padding image transformation by amyeroberts in 19838
* Pin TensorFlow by sgugger in 20313
* Add AnyPrecisionAdamW optimizer by atturaioe in 18961
* [Proposal] Breaking change `zero-shot-object-detection` for improved consistency. by Narsil in 20280
* Fix flakey test with seed by muellerzr in 20318
* Pin TF 2.10.1 for Push CI by ydshieh in 20319
* Remove double brackets by stevhliu in 20307
* TF: future proof our keras imports by gante in 20317
* organize pipelines by modality by stevhliu in 20306
* Fix torch device issues by ydshieh in 20304
* Generate: add generation config class by gante in 20218
* translate zh quicktour by bfss in 20095)
* Add Spanish translation of serialization.mdx by donelianc in 20245
* Add LayerScale to NAT/DiNAT by alihassanijr in 20325
* [Switch Transformers] Fix failing slow test by younesbelkada in 20346
* fix: "BigSicence" typo in docs by rajrajhans in 20331
* Generate: `model_kwargs` can also be an input to `prepare_inputs_for_generation` by gante in 20353
* Update Special Language Tokens for PLBART by jordiclive in 19980
* Add resources by NielsRogge in 20296
* Enhance HfArgumentParser functionality and ease of use by konstantinjdobler in 20323
* Add inference section to task guides by stevhliu in 18781
* Fix toctree for Section 3 in Spanish Documentation by donelianc in 20360
* Generate: shorter XLA contrastive search tests by gante in 20354
* revert `keys_to_ignore` for M2M100 by younesbelkada in 20381
* add `accelerate` support for `ESM` by younesbelkada in 20379
* Fix nightly runs by sgugger in 20352
* Optimizes DonutProcessor token2json method for speed by michaelnation26 in 20283
* Indicate better minimal version of PyTorch in big model inference by sgugger in 20385
* Fix longformer onnx broken export by fxmarty in 20292
* Use tiny models for ONNX tests - text modality by lewtun in 20333
* [ESM] fix `accelerate` tests for esmfold by younesbelkada in 20387
* Generate: fix plbart generation tests by gante in 20391
* [bloom] convert script tweaks by stas00 in 18593
* Fix doctest file path by ydshieh in 20400
* [Image Transformers] to_pil fix float edge cases by patrickvonplaten in 20406
* make daily CI happy by younesbelkada in 20410
* fix nasty `bnb` bug by younesbelkada in 20408
* change the way sentinel tokens can retrived by raghavanone in 20373
* [BNB] Throw `ValueError` when trying to cast or assign by younesbelkada in 20409
* Use updated `model_max_length` when saving tokenizers by ydshieh in 20401
* Add Spanish translation of pr_checks.mdx by donelianc in 20339
* fix device in longformer onnx path by fxmarty in 20419
* Fix ModelOutput instantiation when there is only one tuple by sgugger in 20416
* `accelerate` support for `OwlViT` by younesbelkada in 20411
* [AnyPrecisionAdamW] test fix by stas00 in 20454
* fix `word_to_tokens` docstring format by SaulLu in 20450
* Fix typo in FSMT Tokenizer by kamalkraj in 20456
* Fix device issues in `CLIPSegModelIntegrationTest` by ydshieh in 20467
* Fix links for `contrastive_loss` by ydshieh in 20455
* Fix doctests for audio models by ydshieh in 20468
* Fix ESM checkpoints for tests by Rocketknight1 in 20436
* More TF int dtype fixes by Rocketknight1 in 20384
* make tensors in function build_relative_position created on proper device instead of always on cpu by qq775294390 in 20434
* update cpu related doc by sywangyi in 20444
* with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by sywangyi in 20445
* [CLIPTokenizer] Improve warning by patrickvonplaten in 20458
* Replace assertions with value errors on distilbert model by JuheonChu in 20463
* [Doctest] Add configuration_fsmt.py by sha016 in 19936
* Replace assertion with ValueError exceptions in run_image_captioning_flax.py by katiele47 in 20365
* [FLAX] Add dtype to embedding for bert/bart/opt/t5 by merrymercy in 20340
* fix both failing RoCBert tests by ArthurZucker in 20469
* Include image processor in add-new-model-like by amyeroberts in 20439
* chore: add link to the video cls notebook. by sayakpaul in 20386
* add timeout option for deepspeed engine by henghuiz in 20443
* [Maskformer] Add MaskFormerSwin backbone by NielsRogge in 20344
* Extract warnings from CI artifacts by ydshieh in 20474
* Add Donut image processor by amyeroberts in 20425
* Fix torch meshgrid warnings by fxmarty in 20475
* Fix init import_structure sorting by sgugger in 20477
* extract warnings in GH workflows by ydshieh in 20487
* add in layer gpt2 tokenizer by piEsposito in 20421
* Replace assert statements with raise exceptions by miyu386 in 20478
* fixed small typo by sandeepgadhwal in 20490
* Fix documentation code to import facebook/detr-resnet-50 model by JuanFKurucz in 20491
* Fix disk offload for full safetensors checkpoints by sgugger in 20497
* [modelcard] Check for IterableDataset by sanchit-gandhi in 20495
* [modelcard] Set model name if empty by sanchit-gandhi in 20496
* Add segmentation + object detection image processors by amyeroberts in 20160
* remove `attention_mask` truncation in whisper by ydshieh in 20488
* Make `add_special_tokens` more clear by ydshieh in 20424
* [OPT/Galactica] Load large `galactica` models by younesbelkada in 20390
* Support extraction of both train and eval XLA graphs by jeffhataws in 20492
* fix ipex+fp32 jit trace error in ipex 1.13 by sywangyi in 20504
* Expected output for the test changed by ArthurZucker in 20493
* Fix TF nightly tests by Rocketknight1 in 20507
* Update doc examples feature extractor -> image processor by amyeroberts in 20501
* Fix Typo in Docs for GPU by julianpollmann in 20509
* Fix minimum version for device_map by sgugger in 20489
* Update `AutomaticSpeechRecognitionPipeline` doc example by ydshieh in 20512
* Add `natten` for CI by ydshieh in 20511
* Fix Data2VecTextForCasualLM example code documentation by JuanFKurucz in 20510
* Add some warning for Dynamo and enable TF32 when it's set by sgugger in 20515
* [modelcard] Update dataset tags by sanchit-gandhi in 20506
* Change Doctests CI launch time by ydshieh in 20523
* Fix `PLBart` doctest by ydshieh in 20527
* Fix `ConditionalDetrForSegmentation` doc example by ydshieh in 20531
* add doc for by younesbelkada in 20525
* Update `ZeroShotObjectDetectionPipeline` doc example by ydshieh in 20528
* update post_process_image_guided_detection by fcakyon in 20521
* QnA example: add speed metric by sywangyi in 20522
* Fix doctest by NielsRogge in 20534
* Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by JuanFKurucz in 20516
* Fix link in pipeline device map by stevhliu in 20517

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sww9370
* Add RocBert (20013)
* IMvision12
* Pytorch type hints (20112)
* remaining pytorch type hints (20217)
* alihassanijr
* Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (20219)
* Add LayerScale to NAT/DiNAT (20325)
* bfss
* translate zh quicktour(20095) (20181)
* donelianc
* Add Spanish translation of serialization.mdx (20245)
* Fix toctree for Section 3 in Spanish Documentation (20360)
* Add Spanish translation of pr_checks.mdx (20339)
* yangapku
* Add Chinese-CLIP implementation (20368)

4.24.0

Not secure
ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, [from 8 million parameters up to a huge 15 billion parameter model](https://huggingface.co/models?other=esm).

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and `openfold`, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper [Biological structure and function emerge from scaling
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

* Add ESMFold by Rocketknight1 in 19977
* TF port of ESM by Rocketknight1 in 19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable [LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)-like document understanding for many languages.

It was proposed in [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding.

* Add LiLT by NielsRogge in 19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

* Add `flan-t5` documentation page by younesbelkada in 19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in [PubTables-1M: Towards comprehensive table extraction from unstructured documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.

* Add table transformer [v2] by NielsRogge in 19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in [A Contrastive Framework for Neural Text Generation](https://arxiv.org/abs/2202.06417) by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

* Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by gmftbyGMFTBY in 19477

Safety and security

We continue to explore the new serialization format not using Pickle via the [safetensors](https://github.com/huggingface/safetensors) library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

* Safetensors tf by sgugger in 19900

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

* 🚨🚨🚨 TF: Remove `TFWrappedEmbeddings` (breaking: TF embedding initialization updated for encoder-decoder models) by gante in 19263
* 🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations by Narsil in 19678

Bugfixes and improvements

* Enabling custom TF signature draft by dimitreOliveira in 19249
* Fix whisper for `pipeline` by ArthurZucker in 19482
* Extend `nested_XXX` functions to mappings/dicts. by Guillem96 in 19455
* Syntax issues (lines 126, 203) by kant in 19444
* CLI: add import protection to datasets by gante in 19470
* Fix `TFGroupViT` CI by ydshieh in 19461
* Fix doctests for `DeiT` and `TFGroupViT` by ydshieh in 19466
* Update `WhisperModelIntegrationTests.test_large_batched_generation` by ydshieh in 19472
* [Swin] Replace hard-coded batch size to enable dynamic ONNX export by lewtun in 19475
* TF: TFBart embedding initialization by gante in 19460
* Make LayoutLM tokenizers independent from BertTokenizer by arnaudstiegler in 19351
* Make `XLMRoberta` model and config independent from `Roberta` by asofiaoliveira in 19359
* Fix `get_embedding` dtype at init. time by ydshieh in 19473
* Decouples `XLMProphet` model from `Prophet` by srhrshr in 19406
* Implement multiple span support for DocumentQuestionAnswering by ankrgyl in 19204
* Add warning in `generate` & `device_map=auto` & half precision models by younesbelkada in 19468
* Update TF whisper doc tests by amyeroberts in 19484
* Make bert_japanese and cpm independent of their inherited modules by Davidy22 in 19431
* Added tokenize keyword arguments to feature extraction pipeline by quancore in 19382
* Adding the README_es.md and reference to it in the others files readme by Oussamaosman02 in 19427
* [CvT] Tensorflow implementation by mathieujouffroy in 18597
* `python3` instead of `python` in push CI setup job by ydshieh in 19492
* Update PT to TF CLI for audio models by amyeroberts in 19465
* New by IMvision12 in 19481
* Fix `OPTForQuestionAnswering` doctest by ydshieh in 19479
* Use a dynamic configuration for circleCI tests by sgugger in 19325
* Add multi-node conditions in trainer_qa.py and trainer_seq2seq.py by regisss in 19502
* update doc for perf_train_cpu_many by sywangyi in 19506
* Avoid Push CI failing to report due to many commits being merged by ydshieh in 19496
* [Doctest] Add `configuration_bert.py` to doctest by ydshieh in 19485
* Fix whisper doc by ArthurZucker in 19518
* Syntax issue (line 497, 526) Documentation by kant in 19442
* Fix pytorch seq2seq qa by FilipposVentirozos in 19258
* Add depth estimation pipeline by nandwalritik in 18618
* Adding links to pipelines parameters documentation by AndreaSottana in 19227
* fix MarkupLMProcessor option flag by davanstrien in 19526
* [Doctest] Bart configuration update by imarekkus in 19524
* Remove roberta dependency from longformer fast tokenizer by sirmammingtonham in 19501
* made tokenization_roformer independent of bert by naveennamani in 19426
* Remove bert fast dependency from electra by Threepointone4 in 19520
* [Examples] Fix typos in run speech recognition seq2seq by sanchit-gandhi in 19514
* [X-CLIP] Fix doc tests by NielsRogge in 19523
* Update Marian config default vocabulary size by gante in 19464
* Make `MobileBert` tokenizers independent from `Bert` by 501Good in 19531
* [Whisper] Fix gradient checkpointing by sanchit-gandhi in 19538
* Syntax issues (paragraphs 122, 130, 147, 155) Documentation: sgugger by kant in 19437
* using trunc_normal for weight init & cls_token by mathieujouffroy in 19486
* Remove `MarkupLMForMaskedLM` from `MODEL_WITH_LM_HEAD_MAPPING_NAMES` by ydshieh in 19534
* Image transforms library by amyeroberts in 18520
* Add a decorator for flaky tests by sgugger in 19498
* [Doctest] Add `configuration_yolos.py` by daspartho in 19539
* Albert config update by imarekkus in 19541
* [Doctest] `Add configuration_whisper.py` by daspartho in 19540
* Throw an error if `getattribute_from_module` can't find anything by ydshieh in 19535
* [Doctest] Beit Config for doctest by daspartho in 19542
* Create the arange tensor on device for enabling CUDA-Graph for Clip Encoder by RezaYazdaniAminabadi in 19503
* [Doctest] GPT2 Config for doctest by daspartho in 19549
* Build Push CI images also in a daily basis by ydshieh in 19532
* Fix checkpoint used in `MarkupLMConfig` by ydshieh in 19547
* add a note to whisper docs clarifying support of long-form decoding by akashmjn in 19497
* [Whisper] Freeze params of encoder by sanchit-gandhi in 19527
* [Doctest] Fixing the Doctest for imageGPT config by RamitPahwa in 19556
* [Doctest] Fixing mobile bert configuration doctest by RamitPahwa in 19557
* [Doctest] Fixing doctest bert_generation configuration by Threepointone4 in 19558
* [Doctest] DeiT Config for doctest by daspartho in 19560
* [Doctest] Reformer Config for doctest by daspartho in 19562
* [Doctest] RoBERTa Config for doctest by daspartho in 19563
* [Doctest] Add `configuration_vit.py` by daspartho in 19561
* [Doctest] bloom config update by imarekkus in 19566
* [Re-submit] Compute true loss Flax examples by duongna21 in 19504
* Fix fairseq wav2vec2-xls-r pretrained weights conversion scripts by heatz123 in 19508
* [Doctest] CTRL config by imarekkus in 19574
* [Doctest] Add configuration_canine.py by IzicTemi in 19575
* [Doctests] Config files for `ViTMAE` and `YOSO` by grgkaran03 in 19567
* Added type hints to `DebertaV2ForMultipleChoice` Pytorch by IMvision12 in 19536
* [WIP] Add type hints for Lxmert (TF) by elusenji in 19441
* [Doctests] add `configuration_blenderbot.py` by grgkaran03 in 19577
* [Doctest] adds trajectory_transformer config to Docs test by SD-13 in 19586
* [Doctests] add `configuration_blenderbot_small.py` by grgkaran03 in 19589
* [Doctest] Swin V2 Config for doctest by daspartho in 19595
* [Doctest] Swin Config for doctest by daspartho in 19594
* [Doctest] SEW Config for doctest by daspartho in 19597
* [Doctest] UniSpeech Config for doctest by daspartho in 19596
* [Doctest] SEW-D Config for doctest by daspartho in 19598
* [Doctest] fix doc test for megatron bert by RamitPahwa in 19600
* Adding type hints for TFXLnet by thliang01 in 19344
* [Doctest] Add `configuration_bigbird_pegasus.py` and `configuration_big_bird.py` by Xabilahu in 19606
* Cast masks to np.unit8 before converting to PIL.Image.Image by amyeroberts in 19616
* [Whisper] Don't return attention mask in feat extractor by sanchit-gandhi in 19521
* [Time Series Transformer] Add doc tests by NielsRogge in 19607
* fix BLOOM ONNX config by NouamaneTazi in 19573
* Fix `test_tf_encode_plus_sent_to_model` for `TAPAS` by ydshieh in 19559
* Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving by piEsposito in 19590
* add gloo backend support for CPU DDP by sywangyi in 19555
* Fix `ImageToTextPipelineTests.test_small_model_tf` by ydshieh in 19565
* Fix `FlaubertTokenizer` by ydshieh in 19552
* Visual Bert config for doctest by ztjhz in 19605
* GPTTokenizer dependency removed from deberta class by RamitPahwa in 19551
* xlm roberta config for doctest by ztjhz in 19609
* Ernie config for doctest by ztjhz in 19611
* xlm roberta xl config for doctest by ztjhz in 19610
* fix: small error by 0xflotus in 19612
* Improve error messaging for ASR pipeline. by Narsil in 19570
* [Doctest] LeViT Config for doctest by daspartho in 19622
* [Doctest] DistilBERT Config for doctest by daspartho in 19621
* [Whisper] Fix gradient checkpointing (again!) by sanchit-gandhi in 19548
* [Doctest] Add `configuration_resnet.py` by daspartho in 19620
* Fix whisper doc by ArthurZucker in 19608
* Sharding fails in TF when absolute scope was modified if `.` in layer name by ArthurZucker in 19124
* [Doctest] Add configuration_vision_text_dual_encoder.py by SD-13 in 19580
* [Doctest] Add configuration_vision_encoder_decoder.py by SD-13 in 19583
* [Doctest] Add configuration_time_series_transformer.py by SD-13 in 19582
* Tokenizer from_pretrained should not use local files named like tokenizer files by sgugger in 19626
* [Doctest] CodeGen config for doctest by AymenBer99 in 19633
* [Doctest] Add `configuration_data2vec_text.py` by daspartho in 19636
* [Doctest] Conditional DETR config for doctest by AymenBer99 in 19641
* [Doctest] XLNet config for doctest by AymenBer99 in 19649
* [Doctest] Add `configuration_trocr.py` by thliang01 in 19658
* Add doctest info in testingmdx by ArthurZucker in 19623
* Add pillow to layoutlmv3 example requirements.txt by Spacefish in 19663
* add return types for tf gptj, xlm, and xlnet by sirmammingtonham in 19638
* Fix pipeline predict transform methods by s-udhaya in 19657
* Type hints MCTCT by rchan26 in 19618
* added type hints for Yolos Pytorch model by WhiteWolf47 in 19545
* A few CI fixes for `DocumentQuestionAnsweringPipeline` by ankrgyl in 19584
* Removed Bert interdependency from Funnel transformer by mukesh663 in 19655
* fix warnings in deberta by sanderland in 19458
* word replacement line 231 by shreem-123 in 19662
* [Doctest] Add configuration_transfo_xl.py by thliang01 in 19651
* Update perf_train_gpu_one.mdx by cakiki in 19676
* object-detection instead of object_detection by Spacefish in 19677
* add return_tensor parameter for feature extraction by ajsanjoaquin in 19257
* Fix code examples of DETR and YOLOS by NielsRogge in 19669
* Revert "add return_tensor parameter for feature extraction by sgugger in 19257)"
* Fixed the docstring and type hint for forced_decoder_ids option in Ge… by koreyou in 19640
* Add normalize to image transforms module by amyeroberts in 19544
* [Doctest] Data2VecAudio Config for doctest by daspartho in 19635
* Update ESM checkpoints to point to `facebook/` by Rocketknight1 in 19675
* Removed XLMModel inheritance from FlaubertModel(torch+tf) by D3xter1922 in 19432
* [Examples] make default preprocessing_num_workers=1 by Yang-YiFan in 19684
* [Doctest] Add configuration_convbert.py by AymenBer99 in 19643
* [Doctest] Add configuration_realm.py by ak04p in 19646
* Update CONTRIBUTING.md by shreem-123 in 19689
* [Doctest] Add `configuration_data2vec_vision.py` by daspartho in 19637
* Fix some CI torch device issues for PyTorch 1.13 by ydshieh in 19681
* Fix checkpoint used in `VisualBertConfig` doc example by ydshieh in 19692
* Fix dtype in radnomly initialized head by sgugger in 19690
* fix tests by ArthurZucker in 19670
* fix test whisper with new max length by ArthurZucker in 19668
* check decoder_inputs_embeds is None before shifting labels by ArthurZucker in 19671
* Fix docs by NielsRogge in 19687
* update documentation by ArthurZucker in 19706
* Improve DETR models by NielsRogge in 19644
* Small fixes for TF-ESM1b and ESM-1b weight conversions by Rocketknight1 in 19683
* Fix typo in perf docs by cakiki in 19705
* Fix redundant normalization of OWL-ViT text embeddings by alaradirik in 19712
* Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode by falcaopetri in 18351
* [Doctest] CVT config for doctest by AymenBer99 in 19695
* [Doctest] Add configuration_wav2vec2.py to documentation_tests.py by juancopi81 in 19698
* ]Fixed pegasus config doctest by mukesh663 in 19722
* fix seq2seqtrainer predict without labels by IvanSedykh in 19721
* add return_tensors parameter for feature_extraction 2 by Narsil in 19707
* Improving `image-segmentation` pipeline tests. by Narsil in 19710
* [Doctest] Adding config files for convnext by soma2000-lang in 19717
* [Doctest] Fixing doctest `configuration_pegasus_x.py` by mukesh663 in 19725
* Specify TF framework in TF-related pipeline tests by ydshieh in 19719
* Add docs by NielsRogge in 19729
* Fix activations being all the same module by sgugger in 19728
* add `accelerate` support for `Whisper` by younesbelkada in 19697
* Clean up deprecation warnings by Davidy22 in 19654
* Repo utils test by sgugger in 19696
* Add decorator to flaky test by amyeroberts in 19674
* [Doctest] Add doctest for `FlavaConfig` and `FNetConfig` by ndrohith09 in 19724
* Update contribution guide by stevhliu in 19700
* [Doctest] Add wav2vec2_conformer for doctest by juancopi81 in 19734
* [Doctest] XLM Config for doctest by AymenBer99 in 19685
* [Doctest] Add `configuration_clip.py` by daspartho in 19647
* [Doctest] GPTNeoConfig , GPTNeoXConfig , GPTNeoXJapaneseConfig by ndrohith09 in 19741
* Update modeling_markuplm.py by IMvision12 in 19723
* Fix issue 19300 by raghavanone in 19483
* [Doctest] Add `configuration_wavlm.py` by juancopi81 in 19749
* Specify TF framework explicitly in more pipeline tests by ydshieh in 19748
* Fix cache version file creation by sgugger in 19750
* Image transforms add center crop by amyeroberts in 19718
* [Doctest] Add `configuration_decision_transformer.py` by Xabilahu in 19751
* [Doctest] Add `configuration_detr.py` by Xabilahu in 19752
* Fixed spacing errors by shreya24ag in 19754
* All broken links were fixed in contributing file by mdfaizanahmed786 in 19760
* [Doctest] SpeechToTextTransformer Config for doctest by daspartho in 19757
* [Doctest] SqueezeBERT Config for doctest by daspartho in 19758
* [Doctest] SpeechToTextTransformer2 Config for doctest by daspartho in 19756
* [Doctest] OpenAIGPTConfig and OPTConfig by ndrohith09 in 19763
* `image-segmentation` pipeline: re-enable `small_model_pt` test. by Narsil in 19716
* Update modeling_layoutlmv3.py by IMvision12 in 19753
* adding key pair dataset by rohit1998 in 19765
* Fix exception thrown using MishActivation by chinoll in 19739
* [FLAX] Add dtype to embedding for gpt2 model by merrymercy in 18462
* TF: sample generation compatible with XLA and dynamic batch sizes by gante in 19773
* Install tf2onnx dev version by ydshieh in 19755
* Fix docker image build by ydshieh in 19759
* PT <-> TF for composite models by ydshieh in 19732
* Add warning about restarting runtime to import errors by Rocketknight1 in 19774
* Added support for multivariate independent emission heads by kashif in 19453
* Update `ImageToTextPipelineTests.test_small_model_tf` by ydshieh in 19785
* Make public versions of private tensor utils by sgugger in 19775
* Update training.mdx by ftorres16 in 19791
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. by davialvb in 19779
* Add sentencepiece to BertJapaneseTokenizer by conan1024hao in 19769
* Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript` by ydshieh in 19786
* Fix nightly test setup by sgugger in 19792
* Fix image segmentation pipeline errors, resolve backward compatibility issues by alaradirik in 19768
* Fix error/typo in docstring of TokenClassificationPipeline by pchr8 in 19798
* Use None to detect if truncation was unset by sgugger in 19794
* Generate: contrastive search test updates by gante in 19787
* Run some TF Whisper tests in subprocesses to avoid GPU OOM by ydshieh in 19772
* Added translation of run_scripts.mdx to Portuguese Issue 16824 by davialvb in 19800
* Generate: minor docstring fix by gante in 19801
* [Doctest] `MaskFormerConfig` doctest by sha016 in 19817
* [Doctest] Add `configuration_plbart.py` by ayaka14732 in 19809
* [Doctest] Add `configuration_poolformer.py` by ayaka14732 in 19808
* [Doctest] Add `configuration_electra.py` by ayaka14732 in 19807
* [Doctest] Add `configuration_nezha.py` by ayaka14732 in 19810
* Display the number of trainable parameters when lauching a training by regisss in 19835
* replace reference to Datasets in metrics deprecation with Evaluate by angus-lherrou in 19812
* Fix OOM in Config doctest by ydshieh in 19840
* fix broken links in testing.mdx by XFFXFF in 19820
* fix image2test args forwarding by kventinel in 19648
* Added translation of converting_tensorflow_models.mdx to Portuguese Issue 16824 by davialvb in 19824
* Fix nightly CircleCI by ydshieh in 19837
* fixed typo in fp16 training section for perf_train_gpu_one by dsingal0 in 19736
* Update `LEDModelIntegrationTests` expected values by ydshieh in 19841
* Improve check copies by kventinel in 19829
* Fix doctest for `MarkupLM` by ydshieh in 19845
* add small updates only by stevhliu in 19847
* Refactor conversion function by sgugger in 19799
* Spanish translation of multiple_choice.mdx, question_answering.mdx. by alceballosa in 19821
* Fix doctest for `GenerationMixin.contrastive_search` by ydshieh in 19863
* Add missing lang tokens in M2M100Tokenizer.get_vocab by guillaumekln in 18416
* Added translation of serialization.mdx to Portuguese Issue 16824 by davialvb in 19869
* Generate: contrastive search cosmetic tweaks by gante in 19871
* [Past CI] Vilt only supports PT >= v1.10 by LysandreJik in 19851
* Fix incorrect model<->tokenizer mapping in tokenization testing by ydshieh in 19872
* Update doc for revision and token by sgugger in 19793
* Factored out some code in the `image-segmentation` pipeline. by Narsil in 19727
* [DOCTEST] Config doctest for `MCTCT`, `MBart` and `LayoutLM` by Revanth2002 in 19889
* Fix LR by regisss in 19875
* Correct README image text by KayleeDavisGitHub in 19883
* No conv bn folding in ipex to avoid warning by sanderland in 19870
* Add missing information on token_type_ids for roberta model by raghavanone in 19766
* Change the import of kenlm from github to pypi by raghavanone in 19770
* Update `max_diff` in `test_save_load_fast_init_to_base` by ydshieh in 19849
* Allow flax subfolder by patrickvonplaten in 19902
* `accelerate` support for `RoBERTa` family by younesbelkada in 19906
* Add checkpoint links in a few config classes by ydshieh in 19910
* Generate: contrastive search uses existing abstractions and conventions by gante in 19896
* Convert None logits processor/stopping criteria to empty list. by ccmaymay in 19880
* Some fixes regarding auto mappings and test class names by ydshieh in 19923
* Fix bug in Wav2Vec2's GPU tests by falcaopetri in 19803
* Fix warning when collating list of numpy arrays by sgugger in 19846
* Add type hints to TFPegasusModel by EdAbati in 19858
* Remove embarrassing debug print() in save_pretrained by Rocketknight1 in 19922
* Add `accelerate` support for M2M100 by younesbelkada in 19912
* Add RoBERTa resources by stevhliu in 19911
* Add T5 resources by stevhliu in 19878
* Add BLOOM resources by stevhliu in 19881
* Add GPT2 resources by stevhliu in 19879
* Let inputs of fast tokenizers be tuples as well as lists by sgugger in 19898
* Add `accelerate` support for BART-like models by younesbelkada in 19927
* Create dummy models by ydshieh in 19901
* Support segformer fx by dwlim-nota in 19924
* Use self._trial to generate trial_name for Trainer. by reyoung in 19874
* Add Onnx Config for ImageGPT by RaghavPrabhakar66 in 19868
* Update Code of Conduct to Contributor Covenant v2.1 by pankali in 19935
* add resources for bart by stevhliu in 19928
* add resources for distilbert by stevhliu in 19930
* Add wav2vec2 resources by stevhliu in 19931
* [Conditional, Deformable DETR] Add postprocessing methods by NielsRogge in 19709
* Fix ONNX tests for ONNX Runtime v1.13.1 by lewtun in 19950
* donut -> donut-swin by ydshieh in 19920
* [Doctest] Add configuration_deberta.py by Saad135 in 19968
* gradient checkpointing for GPT-NeoX by chiaolun in 19946
* [modelcard] Update for ASR by sanchit-gandhi in 19985
* [ASR] Update 'tasks' for model card by sanchit-gandhi in 19986
* Tranformers documentation translation to Italian 17459 by draperkm in 19988
* Pin torch to < 1.13 temporarily by ydshieh in 19989
* Add support for gradient checkpointing by NielsRogge in 19990

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* arnaudstiegler
* Make LayoutLM tokenizers independent from BertTokenizer (19351)
* asofiaoliveira
* Make `XLMRoberta` model and config independent from `Roberta` (19359)
* srhrshr
* Decouples `XLMProphet` model from `Prophet` (19406)
* Davidy22
* Make bert_japanese and cpm independent of their inherited modules (19431)
* Clean up deprecation warnings (19654)
* mathieujouffroy
* [CvT] Tensorflow implementation (18597)
* using trunc_normal for weight init & cls_token (19486)
* IMvision12
* New (19481)
* Added type hints to `DebertaV2ForMultipleChoice` Pytorch (19536)
* Update modeling_markuplm.py (19723)
* Update modeling_layoutlmv3.py (19753)
* 501Good
* Make `MobileBert` tokenizers independent from `Bert` (19531)
* mukesh663
* Removed Bert interdependency from Funnel transformer (19655)
* ]Fixed pegasus config doctest (19722)
* [Doctest] Fixing doctest `configuration_pegasus_x.py` (19725)
* D3xter1922
* Removed XLMModel inheritance from FlaubertModel(torch+tf) (19432)
* falcaopetri
* Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (18351)
* Fix bug in Wav2Vec2's GPU tests (19803)
* gmftbyGMFTBY
* Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (19477)
* davialvb
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (19779)
* Added translation of run_scripts.mdx to Portuguese Issue 16824 (19800)
* Added translation of converting_tensorflow_models.mdx to Portuguese Issue 16824 (19824)
* Added translation of serialization.mdx to Portuguese Issue 16824 (19869)
* alceballosa
* Spanish translation of multiple_choice.mdx, question_answering.mdx. (19821)

Page 11 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.