Transformers

Latest version: v4.49.0

Safety actively analyzes 715033 Python packages for vulnerabilities to keep your Python projects secure.

Page 9 of 32

4.35.0

Not secure

New models

Distil-Whisper

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution data. It was proposed in the paper [Robust Knowledge Distillation via Large-Scale Pseudo Labelling](https://arxiv.org/abs/2311.00430).

Distil-Whisper copies the entire encoder from Whisper, meaning it retains Whisper's robustness to different audio conditions. It only copies 2 decoder layers, which significantly reduces the time taken to auto-regressively generate text tokens:

<img src="https://huggingface.co/datasets/distil-whisper/figures/resolve/main/architecture.png" width="800">

Distil-Whisper is MIT licensed and directly available in the Transformers library with chunked long-form inference, Flash Attention 2 support, and Speculative Decoding. For details on using the model, refer to the [following instructions](https://github.com/huggingface/distil-whisper#1-usage).

Joint work from sanchit-gandhi, patrickvonplaten and srush.

* [Assistant Generation] Improve Encoder Decoder by patrickvonplaten in 26701
* [WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding by patrickvonplaten in 27195
* [Whisper, Bart, MBart] Add Flash Attention 2 by patrickvonplaten in 27203

Fuyu

![image](https://huggingface.co/adept/fuyu-8b/resolve/main/architecture.png)

The Fuyu model was created by [ADEPT](https://www.adept.ai/blog/fuyu-8b), and authored by Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, Sağnak Taşırlar.

The authors introduced Fuyu-8B, a decoder-only multimodal model based on the classic transformers architecture, with query and key normalization. A linear encoder is added to create multimodal embeddings from image inputs.

By treating image tokens like text tokens and using a special image-newline character, the model knows when an image line ends. Image positional embeddings are removed. This avoids the need for different training phases for various image resolutions. With 8 billion parameters and licensed under CC-BY-NC, Fuyu-8B is notable for its ability to handle both text and images, its impressive context size of 16K, and its overall performance.

Joint work from molbap, pcuenca, amyeroberts, ArthurZucker

* Add fuyu model by molbap in 26911
* Fuyu: improve image processing by molbap in 27007

SeamlessM4T

![image](https://scontent-zrh1-1.xx.fbcdn.net/v/t39.2365-6/369889300_946056619819708_693331134612217694_n.jpg?_nc_cat=106&ccb=1-7&_nc_sid=e280be&_nc_ohc=HCeS3JOLmWMAX-5yppi&_nc_oc=AQlL8xNHXYCqfDCYJNa-43OYlM0gl5rEtWyqyx90wk49R8gJOA-vSEGpe4aI8h_-vpk&_nc_ht=scontent-zrh1-1.xx&oh=00_AfCuuW_IowORiJSdrLkDJ4NhAliqoCVQK3srt5r0lABoqg&oe=655DF560)

The SeamlessM4T model was proposed in [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team from Meta AI.

SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.

SeamlessM4T enables multiple tasks without relying on separate models:

- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)

[SeamlessM4TModel](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can perform all the above tasks, but each task also has its own dedicated sub-model.

* Add Seamless M4T model by ylacombe in 25693

Kosmos-2

The KOSMOS-2 model was proposed in [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.

<div style="text-align: center">
<img src="https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/annotated_snowman.jpg" width="500" height="400" >
</div>

KOSMOS-2 is a Transformer-based causal language model and is trained using the next-word prediction task on a web-scale dataset of grounded image-text pairs [GRIT](https://huggingface.co/datasets/zzliang/GRIT). The spatial coordinates of the bounding boxes in the dataset are converted to a sequence of location tokens, which are appended to their respective entity text spans (for example, a snowman followed by <patch_index_0044><patch_index_0863>). The data format is similar to “hyperlinks” that connect the object regions in an image to their text span in the corresponding caption.

* Add `Kosmos-2` model by ydshieh in 24709

Owl-v2

OWLv2 was proposed in [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. OWLv2 scales up [OWL-ViT](https://huggingface.co/docs/transformers/main/en/model_doc/owlvit) using self-training, which uses an existing detector to generate pseudo-box annotations on image-text pairs. This results in large gains over the previous state-of-the-art for zero-shot object detection.

* Add OWLv2, bis by NielsRogge in 26668

🚨🚨🚨 Safetensors by default for `torch` serialization 🚨🚨🚨

Version v4.35.0 now puts `safetensors` serialization by default. This is a significant change targeted at making users of the Hugging Face Hub, `transformers`, and any downstream library leveraging it safer.

The [`safetensors`](https://github.com/huggingface/safetensors) library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).

It was already the **default loading mechanism** since v4.30.0 and would therefore already default to loading `model.safetensors` files instead of `pytorch_model.bin` if these were present in the repository.

With v4.35.0, any call to `save_pretrained` for torch models will now save a `safetensors` file. This `safetensors` file is in the PyTorch format, but can be loaded in TensorFlow and Flax models alike.

⚠️ If you run into any issues with this, please let us know ASAP in the issues so that we may help you. Namely, the following errors may indicate something is up:
- Loading a `safetensors` file and having a warning mentioning missing weights unexpectedly
- Obtaining completely wrong/random results at inference after loading a pretrained model that you have saved in `safetensors`

If you wish to continue saving files in the `.bin` format, you can do so by specifying `safe_serialization=False` in all your `save_pretrained` calls.

* Safetensors serialization by default by LysandreJik in 27064

Chat templates

Chat templates have been expanded with the addition of the `add_generation_prompt` argument to `apply_chat_template()`. This has also enabled us to rework the [ConversationalPipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.ConversationalPipeline) class to use chat templates. Any model with a chat template is now automatically usable through `ConversationalPipeline`.

* Add add_generation_prompt argument to apply_chat_template by Rocketknight1 in 26573
* Conversation pipeline fixes by Rocketknight1 in 26795

Guides

Two new guides on LLMs were added the library:

* [docs] LLM prompting guide by MKhalusova in 26274
* [docs] Optimizing LLMs by patrickvonplaten in 26058

Quantization

Exllama-v2 integration

Exllama-v2 provides better GPTQ kernel for higher throughput and lower latency for GPTQ models. The original [code](https://github.com/turboderp/exllamav2) can be found here.

* add exllamav2 arg by SunMarc in 26437
* Add exllamav2 better by SunMarc in 27111

You will need the latest versions of `optimum` and `auto-gptq`. Read more about the integration [here](https://huggingface.co/docs/transformers/main/en/main_classes/quantization#exllama-kernels-for-faster-inference).

AWQ integration

AWQ is a new and popular quantization scheme, already used in various libraries such as TGI, vllm, etc. and known to be faster than GPTQ models according to some benchmarks. The original code can be found [here](https://github.com/mit-han-lab/llm-awq/) and [here](https://arxiv.org/abs/2306.00978) you can read more about the original paper.

![Screenshot 2023-10-24 at 17 56 56](https://github.com/huggingface/transformers/assets/49240599/9c956933-f1fa-45b1-904e-884249074cd0)

We support AWQ inference with original kernels as well as kernels provided through [`autoawq`](https://github.com/casper-hansen/AutoAWQ/) package that you can simply install with `pip install autoawq`.

* [`core` / `Quantization` ] AWQ integration by younesbelkada in 27045

We also provide an example script on [how to push quantized weights on the hub on the original repository](https://github.com/mit-han-lab/llm-awq/blob/main/examples/convert_to_hf.py).

Read more about the benchmarks and the integration [here](https://huggingface.co/docs/transformers/main/en/main_classes/quantization#awq-integration)

GPTQ on CPU !

You can now run GPTQ models on CPU using the latest version of `auto-gptq` thanks to vivekkhandelwal1 !

* Add support for loading GPTQ models on CPU by vivekkhandelwal1 in 26719

Attention mask refactor

We refactored the attention mask logic for major models in transformers. For instance, we removed `padding_mask` argument which was ambiguous for some users

* Remove ambiguous `padding_mask` and instead use a 2D->4D Attn Mask Mapper by patrickvonplaten in 26792
* [Attention Mask] Refactor all encoder-decoder attention mask by patrickvonplaten in 27086

Flash Attention 2 for more models + quantization fine-tuning bug fix

`Gpt-bigcode` (starcoder), whisper, Bart and MBart now supports FA-2 ! Use it by simply passing `use_flash_attention_2=True` to `from_pretrained`. Some bugfixes with respect to mixed precision training with FA2 have been also addressed.

* Add flash attention for `gpt_bigcode` by susnato in 26479
* [`FA2`] Fix flash attention 2 fine-tuning with Falcon by younesbelkada in 26852
* [Whisper, Bart, MBart] Add Flash Attention 2 by patrickvonplaten in 27203

A bugfix with respect to fine-tuning with FA-2 in bfloat16 was addressed. You should now smoothly fine-tune FA-2 models in bfloat16 using quantized base models.

* 🚨🚨🚨 [`Quantization`] Store the original dtype in the config as a private attribute 🚨🚨🚨 by younesbelkada in 26761
* [`FA-2`] Final fix for FA2 dtype by younesbelkada in 26846

Neftune

NEFTune is a new technique to boost Supervised Fine-tuning performance by adding random noise on the embedding vector. Read more about it on the original paper [here](https://arxiv.org/abs/2310.05914)

![Screenshot 2023-10-24 at 17 56 56](https://user-images.githubusercontent.com/49240599/274947323-81995c1d-16c3-4cfc-9fbc-0701a9a87531.png)

We propose a very simple API for users to benefit from this technique, simply pass a valid `neftune_noise_alpha` parameter to `TrainingArguments`

Read more about the API [here](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#boost-your-fine-tuning-performances-using-neftune)

* [FEAT] Add Neftune into transformers Trainer by younesbelkada in 27141

Gradient checkpointing refactor

We have refactored the gradient checkpointing API so that users can pass keyword arguments supported by `torch.utils.checkpoint.checkpoint` directly through `gradient_checkpointing_kwargs` when calling `gradient_checkpointing_enable()`, e.g.

python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False})

`gradient_checkpointing_kwargs` is also supported with `Trainer` through `TrainingArguments`.

* [`Trainer` / `GC`] Add `gradient_checkpointing_kwargs` in trainer and training arguments by younesbelkada in 27068
* [`core`] Refactor of `gradient_checkpointing` by younesbelkada in 27020
* [`core`/ `GC` / `tests`] Stronger GC tests by younesbelkada in 27124
* Fix import of torch.utils.checkpoint by NielsRogge in 27155

The refactor should be totally backward compatible with previous behaviour. For superusers, you can still use the attribute `gradient_checkpointing` on model's submodules to control the activation / deactivation of gradient_checkpointing.

Breaking changes

* 🚨🚨🚨 [`Quantization`] Store the original dtype in the config as a private attribute 🚨🚨🚨 by younesbelkada in 26761
* 🚨🚨 Generate: change order of ops in beam sample to avoid nans by gante in 26843
* 🚨🚨 Raise error when no speaker embeddings in speecht5._generate_speech by ylacombe in 26418

Bugfixes and improvements

* [`Nougat`] from transformers import * by ArthurZucker in 26562
* [Whisper] Allow basic text normalization by sanchit-gandhi in 26149
* 🌐 [i18n-KO] Translated `semantic_segmentation.md` to Korean by jungnerd in 26515
* [Tokenizers] Skip tests temporarily by LysandreJik in 26574
* docs: feat: add clip notebook resources from OSSCA community by junejae in 26505
* Extend Trainer to enable Ascend NPU to use the fused Adamw optimizer when training by statelesshz in 26194
* feat: add trainer label to wandb run upon initialization by parambharat in 26466
* Docstring check by sgugger in 26052
* refactor: change default block_size by pphuc25 in 26229
* [Mistral] Update config docstring by sanchit-gandhi in 26593
* Add Copied from statements to audio feature extractors that use the floats_list function by dg845 in 26581
* Fix embarrassing typo in the doc chat template! by Rocketknight1 in 26596
* Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py by soyoung97 in 26587
* skip flaky hub tests by ArthurZucker in 26594
* Update mistral.md to update 404 link by Galland in 26590
* [Wav2Vec2] Fix tokenizer set lang by sanchit-gandhi in 26349
* add zh translation for installation by yyLeaves in 26084
* [ `NougatProcessor`] Fix the default channel by ArthurZucker in 26608
* [`GPTNeoX`] Faster rotary embedding for GPTNeoX (based on llama changes) by ArthurZucker in 25830
* [Falcon] Set `use_cache=False` before creating `presents` which relies on `use_cache` by yundai424 in 26328
* Fix failing tests on `main` due to torch 2.1 by ydshieh in 26607
* Make `ModelOutput` serializable by cbensimon in 26493
* [`core`] fix silent bug `keep_in_fp32` modules by younesbelkada in 26589
* 26566 swin2 sr allow in out channels by marvingabler in 26568
* Don't close ClearML task if it was created externally by eugen-ajechiloae-clearml in 26614
* Fix `transformers-pytorch-gpu` docker build by ydshieh in 26615
* [docs] Update to scripts building index.md by MKhalusova in 26546
* Don't install `pytorch-quantization` in Doc Builder docker file by ydshieh in 26622
* Remove unnecessary `view`s of `position_ids` by ramiro050 in 26059
* Fixed inconsistency in several fast tokenizers by Towdo in 26561
* Update tokenization_code_llama_fast.py by andyl98 in 26576
* Remove unnecessary unsqueeze - squeeze in rotary positional embedding by fxmarty in 26162
* Update chat template docs with more tips on writing a template by Rocketknight1 in 26625
* fix RoPE t range issue for fp16 by rui-ren in 26602
* Fix failing `MusicgenTest .test_pipeline_text_to_audio` by ydshieh in 26586
* remove SharedDDP as it is deprecated by statelesshz in 25702
* [`LlamaTokenizerFast`] Adds edge cases for the template processor by ArthurZucker in 26606
* [docstring] Fix docstring for `AlbertConfig` by ydshieh in 26636
* docs(zh): review and punctuation & space fix by wfjsw in 26627
* [DINOv2] Convert more checkpoints by NielsRogge in 26177
* Fixed malapropism error by Zhreyu in 26660
* fix links in README.md for the GPT, GPT-2, and Llama2 Models by dcarpintero in 26640
* Avoid CI OOM by ydshieh in 26639
* fix typos in idefics.md by dribnet in 26648
* [docstring] Fix docstring CLIP configs by isaac-chung in 26677
* [docstring] Fix docstring for `CLIPImageProcessor` by isaac-chung in 26676
* [docstring] Fix docstring for DonutImageProcessor by abzdel in 26641
* Fix stale bot by LysandreJik in 26692
* [docstring] Fix docstrings for `CLIP` by isaac-chung in 26691
* Control first downsample stride in ResNet by jiqing-feng in 26374
* Fix Typo: table in deepspeed.md by Pairshoe in 26705
* [docstring] Fix docstring for `LlamaConfig` by pavaris-pm in 26685
* fix a typo in flax T5 attention - attention_mask variable is misnamed by giganttheo in 26663
* Fix source_prefix default value by jheitmann in 26654
* [JAX] Replace uses of `jnp.array` in types with `jnp.ndarray`. by hvaara in 26703
* Make Whisper Encoder's sinusoidal PE non-trainable by default by gau-nernst in 26032
* In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) by sinking-point in 25242
* Update docs to explain disabling callbacks using report_to by nebrelbug in 26155
* `Copied from` for test files by ydshieh in 26713
* [docstring] `SwinModel` docstring fix by shivanandmn in 26679
* fix the model card issue as `use_cuda_amp` is no more available by pacman100 in 26731
* Fix stale bot for locked issues by LysandreJik in 26711
* Fix checkpoint path in `no_trainer` scripts by muellerzr in 26733
* Update docker files to use `torch==2.1.0` by ydshieh in 26735
* Revert 20715 by ydshieh in 26734
* [docstring] Fix docstring for `LlamaTokenizer` and `LlamaTokenizerFast` by minhoryang in 26669
* [docstring] Fix docstring for `CodeLlamaTokenizer` by Bojun-Feng in 26709
* add japanese documentation by rajveer43 in 26138
* Translated the accelerate.md file of the documentation to Chinese by liteli1987gmail in 26161
* Fix doctest for `Blip2ForConditionalGeneration` by ydshieh in 26737
* Add many missing spaces in adjacent strings by tomaarsen in 26751
* Warnings controlled by logger level by LysandreJik in 26527
* Fix `PersimmonIntegrationTest` OOM by ydshieh in 26750
* Fix `MistralIntegrationTest` OOM by ydshieh in 26754
* Fix backward compatibility of Conversation by wdhorton in 26741
* [docstring] Fix `UniSpeech`, `UniSpeechSat`, `Wav2Vec2ForCTC` by gizemt in 26664
* [docstring] Update `GPT2` and `Whisper` by McDonnellJoseph in 26642
* [docstring] Fix docstring for 'BertGenerationConfig' by AdwaitSalankar in 26661
* Fix `PerceiverModelIntegrationTest::test_inference_masked_lm` by ydshieh in 26760
* chore: fix typos by afuetterer in 26756
* [`core`] Fix fa-2 import by younesbelkada in 26785
* Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` by ydshieh in 26764
* 🌐 [i18n-KO] Translated `big_models.md` to Korean by wonhyeongseo in 26245
* Update expect outputs of `IdeficsProcessorTest.test_tokenizer_padding` by ydshieh in 26779
* [docstring] Fix docstring for `RwkvConfig` by Bojun-Feng in 26782
* Fix num. of minimal calls to the Hub with peft for pipeline by ydshieh in 26385
* [docstring] fix docstring `DPRConfig` by AVAniketh0905 in 26674
* Disable default system prompt for LLaMA by Rocketknight1 in 26765
* Fix Falcon generation test by Rocketknight1 in 26770
* Fixed KeyError for Mistral by MatteoRaso in 26682
* [`Flava`] Fix flava doc by younesbelkada in 26789
* Add CLIP resources by eenzeenee in 26534
* translation brazilian portuguese by alvarorichard in 26769
* Fixed typos by Zhreyu in 26810
* [docstring] Fix docstring for `CanineConfig` by Sparty in 26771
* Add Japanese translation by shinshin86 in 26799
* [docstring] Fix docstring for `CodeLlamaTokenizerFast` by Bojun-Feng in 26666
* Image-to-Image Task Guide by merveenoyan in 26595
* Make fsdp ram efficient loading optional by pacman100 in 26631
* fix resume_from_checkpoint bug by Jintao-Huang in 26739
* [OWL-ViT, OWLv2] Add resources by NielsRogge in 26822
* Llama tokenizer: remove space in template comment by pcuenca in 26788
* Better way to run AMD CI with different flavors by ydshieh in 26634
* [docstring] Fix bert generation tokenizer by przemL in 26820
* Conversation pipeline fixes by Rocketknight1 in 26795
* Fix Mistral OOM again by ydshieh in 26847
* Chore: Typo fixed in multiple files of docs/source/en/model_doc by SusheelThapa in 26833
* fix: when window_size is passes as array by dotneet in 26800
* Update logits_process.py docstrings to clarify penalty and reward cases (attempt 2) by larekrow in 26784
* [docstring] Fix docstring for LukeConfig by louietouie in 26858
* Fixed a typo in mistral.md by DTennant in 26879*
* Translating `en/internal` folder docs to Japanese 🇯🇵 by rajveer43 in 26747
* Fix TensorFlow pakage check by jayfurmanek in 26842
* Generate: improve docstrings for custom stopping criteria by gante in 26863
* Knowledge distillation for vision guide by merveenoyan in 25619
* Fix Seq2seqTrainer decoder attention mask by Rocketknight1 in 26841
* [`Tokenizer`] Fix slow and fast serialization by ArthurZucker in 26570
* Emergency PR to skip conversational tests to fix CI by Rocketknight1 in 26906
* Add default template warning by Rocketknight1 in 26637
* Refactor code part in documentation translated to japanese by rajveer43 in 26900
* [i18n-ZH] Translated fast_tokenizers.md to Chinese by yyLeaves in 26910
* [`FA-2`] Revert suggestion that broke FA2 fine-tuning with quantized models by younesbelkada in 26916
* [docstring] Fix docstring for `ChineseCLIP` by Sparty in 26880
* [Docs] Make sure important decode and generate method are nicely displayed in Whisper docs by patrickvonplaten in 26927
* Fix and re-enable ConversationalPipeline tests by Rocketknight1 in 26907
* [docstring] Fix docstrings for `CodeGen` by daniilgaltsev in 26821
* Fix license by MedAymenF in 26931
* Pin Keras for now by Rocketknight1 in 26904
* [`FA-2` / `Mistral`] Supprot fa-2 + right padding + forward by younesbelkada in 26912
* Generate: update basic llm tutorial by gante in 26937
* Corrected modalities description in README_ru.md by letohx in 26913
* [docstring] Fix docstring for speech-to-text config by R055A in 26883
* fix set_transform link docs by diegulio in 26856
* Fix Fuyu image scaling bug by pcuenca in 26918
* Update README_hd.md by biswabaibhab007 in 26872
* Added Telugu [te] translations by hakunamatata1997 in 26828
* fix logit-to-multi-hot conversion in example by ranchlai in 26936
* Limit to inferior fsspec version by LysandreJik in 27010
* python falcon doc-string example typo by SoyGema in 26995
* skip two tests by ArthurZucker in 27013
* Nits in Llama2 docstring by osanseviero in 26996
* Change default `max_shard_size` to smaller value by younesbelkada in 26942
* [`NLLB-MoE`] Fix NLLB MoE 4bit inference by younesbelkada in 27012
* [`SeamlessM4T`] fix copies with NLLB MoE int8 by ArthurZucker in 27018
* small typos found by rafaelpadilla in 26988
* Remove token_type_ids from default TF GPT-2 signature by Rocketknight1 in 26962
* Translate `pipeline_tutorial.md` to chinese by jiaqiw09 in 26954
* 🌐 [i18n-ZH] Translate multilingual into Chinese by yyLeaves in 26935
* translate `preprocessing.md` to Chinese by jiaqiw09 in 26955
* Bugfix device map detr model by pedrogengo in 26849
* Fix little typo by mertyyanik in 27028
* 🌐 [i18n-ZH] Translate create_a_model.md into Chinese by yyLeaves in 27026
* Fix key dtype in GPTJ and CodeGen by fxmarty in 26836
* Register ModelOutput as supported torch pytree nodes by XuehaiPan in 26618
* Add `default_to_square_for_size` to `CLIPImageProcessor` by ydshieh in 26965
* Add descriptive docstring to WhisperTimeStampLogitsProcessor by jprivera44 in 25642
* Normalize only if needed by mjamroz in 26049
* [`TFxxxxForSequenceClassifciation`] Fix the eager mode after 25085 by ArthurZucker in 25751
* Safe import of rgb_to_id from FE modules by amyeroberts in 27037
* add info on TRL docs by lvwerra in 27024
* Add fuyu device map by SunMarc in 26949
* Device agnostic testing by vvvm23 in 25870
* Fix config silent copy in from_pretrained by patrickvonplaten in 27043
* [docs] Performance docs refactor p.2 by MKhalusova in 26791
* Add a default decoder_attention_mask for EncoderDecoderModel during training by hackyon in 26752
* Fix RoPE config validation for FalconConfig + various config typos by tomaarsen in 26929
* Skip-test by ArthurZucker in 27062
* Fix TypicalLogitsWarper tensor OOB indexing edge case by njhill in 26579
* [docstring] fix incorrect llama docstring: encoder -> decoder by ztjhz in 27071
* [DOCS] minor fixes in README.md by Akash190104 in 27048
* [`docs`] Add `MaskGenerationPipeline` in docs by younesbelkada in 27063
* 🌐 [i18n-ZH] Translate custom_models.md into Chinese by yyLeaves in 27065
* Hindi translation of pipeline_tutorial.md by AaryaBalwadkar in 26837
* Handle unsharded Llama2 model types in conversion script by coreyhu in 27069
* Bring back `set_epoch` for Accelerate-based dataloaders by muellerzr in 26850
* Bump`flash_attn` version to `2.1` by younesbelkada in 27079
* Remove unneeded prints in modeling_gpt_neox.py by younesbelkada in 27080
* Add-support for commit description by ArthurZucker in 26704
* [Llama FA2] Re-add _expand_attention_mask and clean a couple things by patrickvonplaten in 27074
* Correct docstrings and a typo in comments by lewis-yeung in 27047
* Save TB logs as part of push_to_hub by muellerzr in 27022
* Added huggingface emoji instead of the markdown format by shettyvarshaa in 27091
* [`T5Tokenizer`] Fix fast and extra tokens by ArthurZucker in 27085
* Revert "add exllamav2 arg" by ArthurZucker in 27102
* Add early stopping for Bark generation via logits processor by isaac-chung in 26675
* Provide alternative when warning on use_auth_token by Wauplin in 27105
* Fix no split modules underlying modules by SunMarc in 27090
* [`core`/ `gradient_checkpointing`] Refactor GC - part 2 by younesbelkada in 27073
* fix detr device map by SunMarc in 27089
* Added Telugu [te] translation for README.md in main by hakunamatata1997 in 27077
* translate transformers_agents.md to Chinese by jiaqiw09 in 27046
* Fix docstring and type hint for resize by daniilgaltsev in 27104
* [Typo fix] flag config in WANDB by SoyGema in 27130
* Fix slack report failing for doctest by ydshieh in 27042
* [`FA2`/ `Mistral`] Revert previous behavior with right padding + forward by younesbelkada in 27125
* Fix data2vec-audio note about attention mask by gau-nernst in 27116
* remove the obsolete code related to fairscale FSDP by statelesshz in 26651
* Fix some tests using `"common_voice"` by ydshieh in 27147
* [`tests` / `Quantization`] Fix bnb test by younesbelkada in 27145
* make tests of pytorch_example device agnostic by statelesshz in 27081
* Remove some Kosmos-2 `copied from` by ydshieh in 27149
* 🌐 [i18n-ZH] Translate serialization.md into Chinese by yyLeaves in 27076
* Translating `en/main_classes` folder docs to Japanese 🇯🇵 by rajveer43 in 26894
* Device agnostic trainer testing by statelesshz in 27131
* Fix: typos in README.md by THEFZNKHAN in 27154
* [KOSMOS-2] Update docs by NielsRogge in 27157
* deprecate function `get_default_device` in `tools/base.py` by statelesshz in 26774
* Remove broken links to s-JoL/Open-Llama by CSRessel in 27164
* [docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig by AksharGoyal in 27128
* [doctring] Fix docstring for BlipTextConfig, BlipVisionConfig by Hangsiin in 27173
* Disable CI runner check by ydshieh in 27170
* fix: Fix typical_p behaviour broken in recent change by njhill in 27165
* Trigger CI if `tiny_model_summary.json` is modified by ydshieh in 27175
* Shorten the conversation tests for speed + fixing position overflows by Rocketknight1 in 26960
* device agnostic pipelines testing by statelesshz in 27129
* Backward compatibility fix for the Conversation class by Rocketknight1 in 27176
* [`Quantization` / `tests` ] Fix bnb MPT test by younesbelkada in 27178
* Fix dropout in `StarCoder` by susnato in 27182
* translate traning.md to chinese by jiaqiw09 in 27122
* [docs] Update CPU/GPU inference docs by stevhliu in 26881
* device agnostic models testing by statelesshz in 27146
* Unify warning styles for better readability by oneonlee in 27184
* 🌐 [i18n-ZH] Translate tflite.md into Chinese by yyLeaves in 27134
* device agnostic fsdp testing by statelesshz in 27120
* Fix docstring get maskformer resize output image size by wesleylp in 27196
* Fix the typos and grammar mistakes in CONTRIBUTING.md. by THEFZNKHAN in 27193
* Fixing docstring in get_resize_output_image_size function by wesleylp in 27191
* added unsqueeze_dim to apply_rotary_pos_emb by ShashankMosaicML in 27117
* Added cache_block_outputs option to enable GPTQ for non-regular models by AlexKoff88 in 27032
* Add TensorFlow implementation of ConvNeXTv2 by neggles in 25558
* Fix docstring in get_oneformer_resize_output_image_size func by wesleylp in 27207
* improving TimmBackbone to support FrozenBatchNorm2d by rafaelpadilla in 27160
* Translate task summary to chinese by jiaqiw09 in 27180
* Fix CPU offload + disk offload tests by LysandreJik in 27204
* Enable split_batches through TrainingArguments by muellerzr in 26798
* support bf16 by etemadiamd in 25879
* Reproducible checkpoint for npu by statelesshz in 27208
* [`core` / `Quantization`] Fix for 8bit serialization tests by younesbelkada in 27234

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* jungnerd
* 🌐 [i18n-KO] Translated `semantic_segmentation.md` to Korean (26515)
* statelesshz
* Extend Trainer to enable Ascend NPU to use the fused Adamw optimizer when training (26194)
* remove SharedDDP as it is deprecated (25702)
* remove the obsolete code related to fairscale FSDP (26651)
* make tests of pytorch_example device agnostic (27081)
* Device agnostic trainer testing (27131)
* deprecate function `get_default_device` in `tools/base.py` (26774)
* device agnostic pipelines testing (27129)
* device agnostic models testing (27146)
* device agnostic fsdp testing (27120)
* Reproducible checkpoint for npu (27208)
* sgugger
* Docstring check (26052)
* yyLeaves
* add zh translation for installation (26084)
* [i18n-ZH] Translated fast_tokenizers.md to Chinese (26910)
* 🌐 [i18n-ZH] Translate multilingual into Chinese (26935)
* 🌐 [i18n-ZH] Translate create_a_model.md into Chinese (27026)
* 🌐 [i18n-ZH] Translate custom_models.md into Chinese (27065)
* 🌐 [i18n-ZH] Translate serialization.md into Chinese (27076)
* 🌐 [i18n-ZH] Translate tflite.md into Chinese (27134)
* sinking-point
* In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (25242)
* rajveer43
* add japanese documentation (26138)
* Translating `en/internal` folder docs to Japanese 🇯🇵 (26747)
* Refactor code part in documentation translated to japanese (26900)
* Translating `en/main_classes` folder docs to Japanese 🇯🇵 (26894)
* alvarorichard
* translation brazilian portuguese (26769)
* hakunamatata1997
* Added Telugu [te] translations (26828)
* Added Telugu [te] translation for README.md in main (27077)
* jiaqiw09
* Translate `pipeline_tutorial.md` to chinese (26954)
* translate `preprocessing.md` to Chinese (26955)
* translate transformers_agents.md to Chinese (27046)
* translate traning.md to chinese (27122)
* Translate task summary to chinese (27180)
* neggles
* Add TensorFlow implementation of ConvNeXTv2 (25558)

4.34.1

Not secure

A patch release was made for the following three commits:
- Add add_generation_prompt argument to apply_chat_template (https://github.com/huggingface/transformers/pull/26573)
- Fix backward compatibility of Conversation (https://github.com/huggingface/transformers/pull/26741)
- [Tokenizer] Fix slow and fast serialization (https://github.com/huggingface/transformers/pull/26570)

4.34.0

Not secure

New models

Mistral

Mistral-7B-v0.1 is a decoder-based LM with the following architectural choices:

- Sliding Window Attention - Trained with 8k context length and fixed cache size, with a theoretical attention span of 128K tokens
- GQA (Grouped Query Attention) - allowing faster inference and lower cache size.
- Byte-fallback BPE tokenizer - ensures that characters are never mapped to out-of-vocabulary tokens.

* [Mistral] Mistral-7B-v0.1 support by Bam4d in 26447

Persimmon

The authors introduced Persimmon-8B, a decoder model based on the classic transformers architecture, with query and key normalization. Persimmon-8B is a fully permissively licensed model with approximately 8 billion parameters, released under the Apache license. Some of the key attributes of Persimmon-8B are long context size (16K), performance, and capabilities for multimodal extensions.

* [`Persimmon`] Add support for persimmon by ArthurZucker in 26042

BROS

BROS stands for BERT Relying On Spatiality. It is an encoder-only Transformer model that takes a sequence of tokens and their bounding boxes as inputs and outputs a sequence of hidden states. BROS encode relative spatial information instead of using absolute spatial information.

* Add BROS by jinhopark8345 in 23190

ViTMatte

ViTMatte leverages plain [Vision Transformers](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for the task of image matting, which is the process of accurately estimating the foreground object in images and videos.

* Add ViTMatte by NielsRogge in 25843

Nougat

Nougat uses the same architecture as [Donut](https://huggingface.co/docs/transformers/main/en/model_doc/donut), meaning an image Transformer encoder and an autoregressive text Transformer decoder to translate scientific PDFs to markdown, enabling easier access to them.

* Add Nougat by NielsRogge and molbap in 25942

Prompt templating

We've added a new [template](https://huggingface.co/docs/transformers/main/chat_templating) feature for chat models. This allows the formatting that a chat model was trained with to be saved with the model, ensuring that users can exactly reproduce that formatting when they want to fine-tune the model or use it for inference. For more information, see [our template documentation](https://huggingface.co/docs/transformers/main/chat_templating).

* Overhaul Conversation class and prompt templating by Rocketknight1 in 25323

🚨🚨 Tokenizer refactor

* [`Tokenizer`] attemp to fix add_token issues by ArthurZucker in 23909
* Nit-added-tokens by ArthurZucker in 26538 adds some fix to 23909 .

🚨Workflow Changes 🚨:

These are not breaking changes per se but rather bugfixes. However, we understand that this may result in some workflow changes so we highlight them below.

- unique_no_split_tokens attribute removed and not used in the internal logic
- sanitize_special_tokens() follows a deprecation cycle and does nothing
- All attributes in SPECIAL_TOKENS_ATTRIBUTES are stored as AddedTokens and no strings.
- loading a slow from a fast or a fast from a slow will no longer raise and error if the tokens added don't have the correct index. This is because they will always be added following the order of the added_tokens but will correct mistakes in the saved vocabulary if there are any. (And there are a lot in old format tokenizers)
- the length of a tokenizer is now max(set(self.get_vocab().keys())) accounting for holes in the vocab. The vocab_size no longer takes into account the added vocab for most of the tokenizers (as it should not). Mostly breaking for T5
- Adding a token using tokenizer.add_tokens([AddedToken("hey", rstrip=False, normalized=True)]) now takes into account rstrip, lstrip, normalized information.
- added_tokens_decoder holds AddedToken, not strings.
- add_tokens() for both fast and slow will always be updated if the token is already part of the vocab, allowing for custom stripping.
- initializing a tokenizer form scratch will now add missing special tokens to the vocab.
- stripping is not always done for special tokens! 🚨 Only if the AddedToken has lstrip=True and rstrip=True
- fairseq_ids_to_tokens attribute removed for Barthez (was not used)

➕ Most visible features:
- printing a tokenizer now shows `tokenizer.added_tokens_decoder` for both fast and slow tokenizers. Moreover, additional tokens that were already part of the initial vocab are also found there.
- faster `from_pretrained`, faster `add_tokens` because special and non special can be mixed together and the trie is not always rebuilt.
- faster encode/decode with caching mechanism for `added_tokens_decoder/encoder`.
- information is fully saved in the `tokenizer_config.json`

**For any issues relating to this, make sure to open a new issue and ping ArthurZucker.**

Flash Attention 2

FA2 support added to transformers for most popular architectures (llama, mistral, falcon) architectures actively being contributed in this issue (https://github.com/huggingface/transformers/issues/26350). Simply pass `use_flash_attention_2=True` when calling `from_pretrained`

In the future, PyTorch will support Flash Attention 2 through [`torch.scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), users would be able to benefit from both (transformers core & transformers + SDPA) implementations of Flash Attention-2 with simple changes (`model.to_bettertransformer()` and force-dispatch the SDPA kernel to FA-2 in the case of SDPA)

* [`core` ] Integrate Flash attention 2 in most used models by younesbelkada in 25598

For our future plans regarding integrating F.sdpa from PyTorch in core transformers, see here: https://github.com/huggingface/transformers/issues/26557

Lazy import structure

Support for lazy loading integration libraries has been added. This will drastically speed up importing `transformers` and related object from the library.

Example before this change:

2023-09-11 11:07:52.010179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
python3 -c "from transformers import CLIPTextModel" 3.31s user 3.06s system 220% cpu 2.893 total

After this change:

python3 -c "from transformers import CLIPTextModel" 1.70s user 1.49s system 220% cpu 1.447 total

* [Core] Add lazy import structure to imports by patrickvonplaten in 26090

Bugfixes and improvements

* Fix typo by susnato in 25966
* Fix Detr CI by ydshieh in 25972
* Fix `test_load_img_url_timeout` by ydshieh in 25976
* nn.Identity is not required to be compatible with PyTorch < 1.1.0 as the minimum PyTorch version we currently support is 1.10.0 by statelesshz in 25974
* Add `Pop2Piano` space demo. by susnato in 25975
* fix typo by kai01ai in 25981
* Use main in conversion script by ydshieh in 25973
* [doc] Always call it Agents for consistency by julien-c in 25958
* Update RAG README.md with correct path to examples/seq2seq by tleyden in 25953
* Update training_args.py to remove the runtime error by sahel-sh in 25920
* Trainer: delegate default generation values to `generation_config` by gante in 25987
* Show failed tests on CircleCI layout in a better way by ydshieh in 25895
* Patch with accelerate xpu by abhilash1910 in 25714
* PegasusX add _no_split_modules by andreeahedes in 25933
* Add TFDebertaV2ForMultipleChoice by raghavanone in 25932
* deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler by pacman100 in 25863
* [Wav2Vec2 Conformer] Fix inference float16 by sanchit-gandhi in 25985
* Add LLaMA resources by eenzeenee in 25859
* [`CI`] Fix red CI and ERROR failed should show by ArthurZucker in 25995
* [`VITS`] tokenizer integration test: fix revision did not exist by ArthurZucker in 25996
* Fix Mega chunking error when using decoder-only model by tanaymeh in 25765
* save space when converting hf model to megatron model. by flower-with-safe in 25950
* Update README.md by NinoRisteski in 26003
* Falcon: fix revision propagation by LysandreJik in 26006
* TF-OPT attention mask fixes by Rocketknight1 in 25238
* Fix small typo README.md by zspo in 25934
* 🌐[i18n-KO] Translated `llm_tutorial.md` to Korean by harheem in 25791
* Remove Falcon from undocumented list by Rocketknight1 in 26008
* modify context length for GPTQ + version bump by SunMarc in 25899
* Fix err with FSDP by muellerzr in 25991
* fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 by kai01ai in 26024
* Fix CircleCI config by ydshieh in 26023
* Add `tgs` speed metrics by CokeDong in 25858
* [VITS] Fix nightly tests by sanchit-gandhi in 25986
* Added HerBERT to README.md by Muskan011 in 26020
* Fix vilt config docstring parameter to match value in init by raghavanone in 26017
* Punctuation fix by kwonmha in 26025
* Try to fix training Loss inconsistent after resume from old checkpoint by dumpmemory in 25872
* Fix Dropout Implementation in Graphormer by alexanderkrauck in 24817
* Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D by gau-nernst in 26031
* Skip warning if tracing with dynamo by angelayi in 25581
* 🌐 [i18n-KO] Translated `llama.md` to Korean by harheem in 26044
* [`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset by ArthurZucker in 26041
* [`CITests`] skip failing tests until 26054 is merged by ArthurZucker in 26063
* only main process should call _save on deepspeed zero3 by zjjMaiMai in 25959
* docs: update link huggingface map by pphuc25 in 26077
* docs: add space to docs by pphuc25 in 26067
* [`core`] Import tensorflow inside relevant methods in `trainer_utils` by younesbelkada in 26106
* Generate: legacy mode is only triggered when `generation_config` is untouched by gante in 25962
* Update logits_process.py docstrings by larekrow in 25971
* Fix ExponentialDecayLengthPenalty negative logits issue by pokjay in 25594
* 🌐 [i18n-KO] Translated `llama2.md` to Korean by mjk0618 in 26047
* [docs] Updates to TTS task guide with regards to the new TTS pipeline by MKhalusova in 26095
* 🌐 [i18n-KO] Translated `contributing.md` to Korean by mjk0618 in 25877
* enable optuna multi-objectives feature by sywangyi in 25969
* chore: correct update_step and correct gradient_accumulation_steps by pphuc25 in 26068
* Text2text pipeline: don't parameterize from the config by gante in 26118
* Fix `MarianTokenizer` to remove metaspace character in `decode` by tanaymeh in 26091
* safeguard torch distributed check by pacman100 in 26056
* fix the deepspeed tests by pacman100 in 26021
* Fix AutoTokenizer docstring typo by amyeroberts in 26117
* [`core`] fix 4bit `num_parameters` by younesbelkada in 26132
* Add missing space in generation/utils.py by jbochi in 26121
* Update spectrogram and waveform model mapping for TTS/A pipeline by Vaibhavs10 in 26114
* [`RWKV`] Final fix RWMV 4bit by younesbelkada in 26134
* docs: feat: add llama2 notebook resources from OSSCA community by junejae in 26076
* Generate: ignore warning when `generation_config.max_length` is set to `None` by gante in 26147
* Fix `test_finetune_bert2bert` by ydshieh in 25984
* Falcon: batched generation by gante in 26137
* Fix `beam_scores` shape when token scores shape changes after `logits_processor` by BakerBunker in 25980
* Update training_args.py - addition of self.distributed_state when using XPU by Serizao in 25999
* [docs] last hidden state vs hidden_states[-1] by MKhalusova in 26142
* Flex xpu bug fix by abhilash1910 in 26135
* Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses by rachthree in 25638
* Fix eval accumulation when `accelerate` > 0.20.3 by sam-scale in 26060
* [Whisper Tokenizer] Encode timestamps by sanchit-gandhi in 26054
* [`PEFT`] Fix PEFT + gradient checkpointing by younesbelkada in 25846
* [MusicGen] Add streamer to generate by sanchit-gandhi in 25320
* Fix beam search when using model parallel by pfldy2850 in 24969
* [MusicGen] Add sampling rate to config by sanchit-gandhi in 26136
* [Whisper] Fix word-level timestamps for audio < 30 seconds by xenova in 25607
* [BLIP-2] Improve conversion script by NielsRogge in 24854
* IDEFICS: allow interpolation of vision's pos embeddings by leot13 in 26029
* [TTA Pipeline] Test MusicGen and VITS by sanchit-gandhi in 26146
* Tweaks to Chat Templates docs by Rocketknight1 in 26168
* [Whisper] Check length of prompt + max new tokens by sanchit-gandhi in 26164
* Update notebook.py to support multi eval datasets by matrix1001 in 25796
* Fix pad to multiple of by ArthurZucker in 25732
* [docs] IDEFICS guide and task guides restructure by MKhalusova in 26035
* [PEFT] Allow PEFT model dict to be loaded by patrickvonplaten in 25721
* No doctest for `convert_bros_to_pytorch.py` by ydshieh in 26212
* Remove `utils/documentation_tests.txt` by ydshieh in 26213
* moved `ctrl` to `Salesforce/ctrl` by julien-c in 26183
* Fix ConversationalPipeline tests by Rocketknight1 in 26217
* [FSMT] Fix non-shared weights by LysandreJik in 26187
* refactor decay_parameters production into its own function by shijie-wu in 26152
* refactor: change default block_size in block size > max position embeddings by pphuc25 in 26069
* [Wav2Vec2-Conf / LLaMA] Style fix by sanchit-gandhi in 26188
* [Permisson] Style fix by sanchit-gandhi in 26228
* [Check] Fix config docstring by sanchit-gandhi in 26222
* 🌐 [i18n-KO] Translated `whisper.md` to Korean by nuatmochoi in 26002
* Create the return value on device to avoid unnecessary copying from CPU by mksit in 26151
* [AutoBackbone] Add test by NielsRogge in 26094
* Update README.md by NinoRisteski in 26198
* Update add_new_pipeline.md by NinoRisteski in 26197
* [docs] Fix model reference in zero shot image classification example by Aleksandar1932 in 26206
* Fix the gitlab user mention in issue templates to the correct user by muellerz in 26237
* Fix some docstring in image processors by ydshieh in 26235
* Fix gated repo tests by Wauplin in 26257
* Fix `Error` not captured in PR doctesting by ydshieh in 26215
* DeepSpeed ZeRO-3 handling when resizing embedding layers by pacman100 in 26259
* [FIX] resize_token_embeddings by passaglia in 26102
* FSDP tests and checkpointing fixes by pacman100 in 26180
* fix name error when accelerate is not available by pacman100 in 26278
* Update bros checkpoint by jinhopark8345 in 26277
* Integrate AMD GPU in CI/CD environment by mfuntowicz in 26007
* Rewrite for custom code warning messages by Rocketknight1 in 26291
* fix deepspeed available detection by fxmarty in 26252
* add bbox input validation by jinhopark8345 in 26294
* include changes from llama by ArthurZucker in 26260
* [`Trainer`] Refactor trainer + bnb logic by younesbelkada in 26248
* add custom RMSNorm to `ALL_LAYERNORM_LAYERS` by shijie-wu in 26227
* Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed by fxmarty in 26225
* Fix FSMT weight sharing by LysandreJik in 26292
* update hf hub dependency to be compatible with the new tokenizers by ArthurZucker in 26301
* Porting the torchaudio kaldi fbank implementation to audio_utils by ylacombe in 26182
* More error message fixup, plus some linebreaks! by Rocketknight1 in 26296
* [QUICK FIX LINK] Update trainer.py by SoyGema in 26293
* Use CircleCI `store_test_results` by ydshieh in 26223
* Fix doctest CI by ydshieh in 26324
* [doc] fixed indices in obj detection example by MKhalusova in 26343
* [TTA Pipeline] Fix MusicGen test by sanchit-gandhi in 26348
* Add image to image pipeline by LeviVasconcelos in 25393
* feat: adding num_proc to load_dataset by pphuc25 in 26326
* Fixed unclosed p tags by HanSeokhyeon in 26240
* Update add_new_model.md by NinoRisteski in 26365
* Fix MusicGen logging error by osanseviero in 26370
* [docs] removed MaskFormerSwin and TimmBackbone from the table on index.md by MKhalusova in 26347
* Update tiny model information and pipeline tests by ydshieh in 26285
* Add Russian localization for README by qweme32 in 26208
* 🌐 [i18n-KO] Translated `audio_classification.mdx` to Korean by gabrielwithappy in 26200
* [ViTMatte] Add resources by NielsRogge in 26317
* Deleted duplicate sentence by titi-devv in 26394
* added support for gradient checkpointing in ESM models by sanjeevk-os in 26386
* Fix DeepSpeed issue with Idefics by HugoLaurencon in 26393
* Add torch `RMSProp` optimizer by natolambert in 26425
* Fix padding for IDEFICS by shauray8 in 26396
* Update semantic_segmentation.md by zekaouinoureddine in 26419
* Fixing tokenizer when `transformers` is installed without `tokenizers` by urialon in 26236
* [`FA` / `tests`] Add use_cache tests for FA models by younesbelkada in 26415
* add bf16 mixed precision support for NPU by statelesshz in 26163
* [`PEFT`] Fix PEFT multi adapters support by younesbelkada in 26407
* Fix failing doctest by LysandreJik in 26450
* Update `runs-on` in workflow files by ydshieh in 26435
* [i18n-DE] Complete first toc chapter by flozi00 in 26311
* 🌐 [i18n-KO] Translated `debugging.md` to Korean by wonhyeongseo in 26246
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean by wonhyeongseo in 26244
* optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by NormXU in 26139
* Fix `cos_sin` device issue in Falcon model by ydshieh in 26448
* docs: change assert to raise and some small docs by pphuc25 in 26232
* change mention of decoder_input_ids to input_ids and same with decode_inputs_embeds by tmabraham in 26406
* [VITS] Fix speaker_embed device mismatch by fakhirali in 26115
* [`PEFT`] introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model by younesbelkada in 26270
* Do not warn about unexpected decoder weights when loading T5EncoderModel and LongT5EncoderModel by fleonce in 26211
* fix_mbart_tied_weights by SunMarc in 26422
* Esm checkpointing by Amelie-Schreiber in 26454
* [Whisper Tokenizer] Make decoding faster after adding timestamps by sanchit-gandhi in 26299
* [docs] Update offline mode docs by stevhliu in 26478
* [docs] navigation improvement between text gen pipelines and text gen params by MKhalusova in 26477
* Skip 2 failing persimmon pipeline tests for now by ydshieh in 26485
* Avoid all-zeor attnetion mask used in testing by ydshieh in 26469
* [Flax Examples] Seq2Seq ASR Fine-Tuning Script by sanchit-gandhi in 21764
* [ASR Pipe] Improve docs and error messages by sanchit-gandhi in 26476
* Revert falcon exception by LysandreJik in 26472
* Fix num_heads in _upad_input by fs4r in 26490
* Fix requests connection error during modelcard creation by jphme in 26518
* Fix issue of canine forward requiring input_ids anyway by marcmk6 in 26290
* Fix broken link to video classification task by HelgeS in 26487
* [`PEFT`] Pass token when calling `find_adapter_config` by younesbelkada in 26488
* [`core`/ `auto` ] Fix bnb test with code revision + bug with code revision by younesbelkada in 26431
* Fix model integration ci by ArthurZucker in 26322
* [`PEFT`] Protect `adapter_kwargs` check by younesbelkada in 26537
* Remove-warns by ArthurZucker in 26483
* [Doctest] Add configuration_roformer.py by Adithya4720 in 26530
* Code-llama-nit by ArthurZucker in 26300
* add build_inputs_with_special_tokens to LlamaFast by ArthurZucker in 26297
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean by wonhyeongseo in 26243
* [i18n-DE] contribute chapter by flozi00 in 26481
* [RFC, Logging] Change warning to info by patrickvonplaten in 26545
* Add tokenizer kwargs to fill mask pipeline. by nmcahill in 26234
* [Wav2Vec2 and Co] Update init tests for PT 2.1 by sanchit-gandhi in 26494
* [AMD] Add initial version for run_tests_multi_gpu by mfuntowicz in 26346
* [Doctest] Add `configuration_encoder_decoder.py` by SrijanSahaySrivastava in 26519
* [InternLM] Add support for InternLM by Rocketknight1 in 26302

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* jinhopark8345
* Add BROS (23190)
* Update bros checkpoint (26277)
* add bbox input validation (26294)
* qweme32
* Add Russian localization for README (26208)
* Bam4d
* [Mistral] Mistral-7B-v0.1 support (26447)
* flozi00
* [i18n-DE] Complete first toc chapter (26311)
* [i18n-DE] contribute chapter (26481)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `debugging.md` to Korean (26246)
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean (26244)
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean (26243)

4.33.3

Not secure

A patch release was made for the following three commits:

- DeepSpeed ZeRO-3 handling when resizing embedding layers (26259)
- [doc] Always call it Agents for consistency (25958)
- deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (25863)

4.33.2

Not secure

A patch release was done for these two commits:

- Fix pad to multiple of (25732)
- fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 (26024)

4.33.1

Not secure

Falcon

Falcon is a class of causal decoder-only models built by [TII](https://www.tii.ae/). The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the [RefinedWeb](https://arxiv.org/abs/2306.01116) corpus. They are made available under the Apache 2.0 license.

Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both ‘base’ models trained only as causal language models as well as ‘instruct’ models that have received further fine-tuning are available.

* Falcon port 24523 by Rocketknight1
* Falcon: Add RoPE scaling by gante in 25878
* Add proper Falcon docs and conversion script by Rocketknight1 in 25954
* Put Falcon back by LysandreJik in 25960
* [`Falcon`] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by younesbelkada in 25947

Code Llama

Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

* [`CodeLlama`] Add support for `CodeLlama` by ArthurZucker in 25740
* [`CodeLlama`] Fix CI by ArthurZucker in 25890

ViTDet

ViTDet reuses the ViT model architecture, adapted to object detection.

* Add ViTDet by NielsRogge in 25524

DINO v2

DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.

* [DINOv2] Add backbone class by NielsRogge in 25520

VITS

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.

* add VITS model by hollance in 24085

Breaking changes:
* 🚨🚨🚨 [`Refactor`] Move third-party related utility files into `integrations/` folder 🚨🚨🚨 by younesbelkada in 25599

Moves all third party libs (outside HF ecosystem) related utility files inside `integrations/` instead of having them in `transformers` directly.

In order to get the previous usage you should be changing your call to the following:
diff
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig

Bugfixes and improvements

* [DOCS] MusicGen Docs Update by xNul in 25510
* [MINOR:TYPO] by cakiki in 25646
* Pass the proper token to PEFT integration in auto classes by sgugger in 25649
* Put IDEFICS in the right section of the doc by sgugger in 25650
* TF 2.14 compatibility by Rocketknight1 in 25630
* Fix bloom add prefix space by ArthurZucker in 25652
* removing unnecesssary extra parameter by rafaelpadilla in 25643
* Adds `TRANSFORMERS_TEST_BACKEND` by vvvm23 in 25655
* stringify config by AleksanderWWW in 25637
* Add input_embeds functionality to gpt_neo Causal LM by gaasher in 25659
* Update doc toctree by ydshieh in 25661
* Add Llama2 resources by wonhyeongseo in 25531
* [`SPM`] Patch `spm` Llama and T5 by ArthurZucker in 25656
* [`GPTNeo`] Add input_embeds functionality to gpt_neo Causal LM by ArthurZucker in 25664
* fix wrong path in some doc by ydshieh in 25658
* Remove `utils/documentation_tests.txt` by ydshieh in 25680
* Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by norabelrose in 24941
* ⚠️ [CLAP] Fix dtype of logit scales in init by sanchit-gandhi in 25682
* Sets the stalebot to 10 AM CEST by LysandreJik in 25678
* Fix `pad_token` check condition by ydshieh in 25685
* [DOCS] Added docstring example for EpsilonLogitsWarper 24783 by sanjeevk-os in 25378
* correct resume training steps number in progress bar by pphuc25 in 25691
* Generate: general test for decoder-only generation from `inputs_embeds` by gante in 25687
* Fix typo in `configuration_gpt2.py` by susnato in 25676
* fix ram efficient fsdp init by pacman100 in 25686
* [`LlamaTokenizer`] make unk_token_length a property by ArthurZucker in 25689
* Update list of persons to tag by sgugger in 25708
* docs: Resolve typos in warning text by tomaarsen in 25711
* Fix failing `test_batch_generation` for bloom by ydshieh in 25718
* [`PEFT`] Fix peft version by younesbelkada in 25710
* Fix number of minimal calls to the Hub with peft integration by sgugger in 25715
* [`AutoGPTQ`] Add correct installation of GPTQ library + fix slow tests by younesbelkada in 25713
* Generate: nudge towards `do_sample=False` when `temperature=0.0` by gante in 25722
* [`from_pretrained`] Simpler code for peft by ArthurZucker in 25726
* [idefics] idefics-9b test use 4bit quant by stas00 in 25734
* ImageProcessor - check if input pixel values between 0-255 by amyeroberts in 25688
* [`from_pretrained`] Fix failing PEFT tests by younesbelkada in 25733
* [ASR Pipe Test] Fix CTC timestamps error message by sanchit-gandhi in 25727
* 🌐 [i18n-KO] Translated `visual_question_answering.md` to Korean by wonhyeongseo in 25679
* [`PEFT`] Fix PeftConfig save pretrained when calling `add_adapter` by younesbelkada in 25738
* fixed typo in speech encoder decoder doc by asusevski in 25745
* Add FlaxCLIPTextModelWithProjection by pcuenca in 25254
* Generate: add missing logits processors docs by gante in 25653
* [DOCS] Add example for HammingDiversityLogitsProcessor by jessthebp in 25481
* Generate: logits processors are doctested and fix broken doctests by gante in 25692
* [CLAP] Fix logit scales dtype for fp16 by sanchit-gandhi in 25754
* [`Sentencepiece`] make sure `legacy` do not require `protobuf` by ArthurZucker in 25684
* fix encoder hook by SunMarc in 25735
* Docs: fix indentation in `HammingDiversityLogitsProcessor` by gante in 25756
* Add type hints for several pytorch models (batch-3) by nablabits in 25705
* Correct attention mask dtype for Flax GPT2 by liutianlin0121 in 25636
* fix a typo in docsting by statelesshz in 25759
* [idefics] small fixes by stas00 in 25764
* Add docstrings and fix VIVIT examples by Geometrein in 25628
* [`LlamaFamiliy`] add a tip about dtype by ArthurZucker in 25794
* Add type hints for several pytorch models (batch-2) by nablabits in 25557
* Add type hints for pytorch models (final batch) by nablabits in 25750
* Add type hints for several pytorch models (batch-4) by nablabits in 25749
* [idefics] fix vision's `hidden_act` by stas00 in 25787
* Arde/fsdp activation checkpointing by arde171 in 25771
* Fix incorrect Boolean value in deepspeed example by tmm1 in 25788
* fixing name position_embeddings to object_queries by Lorenzobattistela in 24652
* Resolving Attribute error when using the FSDP ram efficient feature by pacman100 in 25820
* [`Docs`] More clarifications on BT + FA by younesbelkada in 25823
* fix register by zspo in 25779
* Minor wording changes for Code Llama by osanseviero in 25815
* [`LlamaTokenizer`] `tokenize` nits. by ArthurZucker in 25793
* fix warning trigger for embed_positions when loading xglm by MattYoon in 25798
* 🌐 [i18n-KO] Translated peft.md to Korean by nuatmochoi in 25706
* 🌐 [i18n-KO] `model_memory_anatomy.md` to Korean by mjk0618 in 25755
* Error with checking args.eval_accumulation_steps to gather tensors by chaumng in 25819
* Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by gante in 25763
* 🌐 [i18n-KO] Translated `add_new_pipeline.md` to Korean by heuristicwave in 25498
* 🌐 [i18n-KO] Translated `community.md` to Korean by sim-so in 25674
* 🤦update warning to If you want to use the new behaviour, set `legacy=… by ArthurZucker in 25833
* update remaining `Pop2Piano` checkpoints by susnato in 25827
* [AutoTokenizer] Add data2vec to mapping by sanchit-gandhi in 25835
* MaskFormer,Mask2former - reduce memory load by amyeroberts in 25741
* Support loading base64 images in pipelines by InventivetalentDev in 25633
* Update README.md by NinoRisteski in 25834
* Generate: models with custom `generate()` return `True` in `can_generate()` by gante in 25838
* Update README.md by NinoRisteski in 25832
* minor typo fix in PeftAdapterMixin docs by tmm1 in 25829
* Add flax installation in daily doctest workflow by ydshieh in 25860
* Add Blip2 model in VQA pipeline by jpizarrom in 25532
* Remote tools are turned off by LysandreJik in 25867
* Fix imports by ydshieh in 25869
* fix max_memory for bnb by SunMarc in 25842
* Docs: fix example failing doctest in `generation_strategies.md ` by gante in 25874
* pin pandas==2.0.3 by ydshieh in 25875
* Reduce CI output by ydshieh in 25876
* [ViTDet] Fix doc tests by NielsRogge in 25880
* For xla tensors, use an alternative way to get a unique id by qihqi in 25802
* fix ds z3 checkpointing when `stage3_gather_16bit_weights_on_model_save=False` by pacman100 in 25817
* Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by veezbo in 25807
* [`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed by ArthurZucker in 25626
* Save image_processor while saving pipeline (ImageSegmentationPipeline) by raghavanone in 25884
* [`InstructBlip`] FINAL Fix instructblip test by younesbelkada in 25887
* Add type hints for tf models batch 1 by nablabits in 25853
* Update `setup.py` by ydshieh in 25893
* Smarter check for `is_tensor` by sgugger in 25871
* remove torch_dtype override by SunMarc in 25894
* fix FSDP model resume optimizer & scheduler by pkumc in 25852
* Better error message for pipeline loading by ydshieh in 25912
* Remove broken docs for MusicGen by osanseviero in 25905
* Revert frozen training arguments by muellerzr in 25903
* [VITS] Add to TTA pipeline by sanchit-gandhi in 25906
* [MMS] Update docs with HF TTS implementation by sanchit-gandhi in 25907
* [VITS] Only trigger tokenizer warning for uroman by sanchit-gandhi in 25915
* Update-llama-code by ArthurZucker in 25826
* Update model_memory_anatomy.md by NinoRisteski in 25896
* Skip offload tests for `ViTDet` by ydshieh in 25913
* Fix typos by omahs in 25936
* Update community.md by NinoRisteski in 25928
* Update autoclass_tutorial.md by NinoRisteski in 25929
* Update README.md by NinoRisteski in 25941
* [MMS] Fix pip install in docs by sanchit-gandhi in 25949
* [VITS] Handle deprecated weight norm by sanchit-gandhi in 25946
* Import deepspeed utilities from integrations by osanseviero in 25919
* Update README.md by NinoRisteski in 25922
* [VITS] Fix init test by sanchit-gandhi in 25945
* Fix failing test by LysandreJik in 25963
* Fix smart check by ydshieh in 25955
* Add type hints for tf models final batch by nablabits in 25883

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nablabits
* Add type hints for several pytorch models (batch-3) (25705)
* Add type hints for several pytorch models (batch-2) (25557)
* Add type hints for pytorch models (final batch) (25750)
* Add type hints for several pytorch models (batch-4) (25749)
* Add type hints for tf models batch 1 (25853)
* Add type hints for tf models final batch (25883)
* Lorenzobattistela
* fixing name position_embeddings to object_queries (24652)
* hollance
* add VITS model (24085)

Page 9 of 32

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 9 of 32

4.35.0

4.34.1

4.34.0

4.33.3

4.33.2

4.33.1

Page 9 of 32

Links

Releases