Transformers

Latest version: v4.48.3

Safety actively analyzes 706259 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 31

4.34.1

Not secure
A patch release was made for the following three commits:
- Add add_generation_prompt argument to apply_chat_template (https://github.com/huggingface/transformers/pull/26573)
- Fix backward compatibility of Conversation (https://github.com/huggingface/transformers/pull/26741)
- [Tokenizer] Fix slow and fast serialization (https://github.com/huggingface/transformers/pull/26570)

4.34.0

Not secure
New models

Mistral

Mistral-7B-v0.1 is a decoder-based LM with the following architectural choices:

- Sliding Window Attention - Trained with 8k context length and fixed cache size, with a theoretical attention span of 128K tokens
- GQA (Grouped Query Attention) - allowing faster inference and lower cache size.
- Byte-fallback BPE tokenizer - ensures that characters are never mapped to out-of-vocabulary tokens.

* [Mistral] Mistral-7B-v0.1 support by Bam4d in 26447

Persimmon

The authors introduced Persimmon-8B, a decoder model based on the classic transformers architecture, with query and key normalization. Persimmon-8B is a fully permissively licensed model with approximately 8 billion parameters, released under the Apache license. Some of the key attributes of Persimmon-8B are long context size (16K), performance, and capabilities for multimodal extensions.

* [`Persimmon`] Add support for persimmon by ArthurZucker in 26042

BROS

BROS stands for BERT Relying On Spatiality. It is an encoder-only Transformer model that takes a sequence of tokens and their bounding boxes as inputs and outputs a sequence of hidden states. BROS encode relative spatial information instead of using absolute spatial information.

* Add BROS by jinhopark8345 in 23190

ViTMatte

ViTMatte leverages plain [Vision Transformers](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for the task of image matting, which is the process of accurately estimating the foreground object in images and videos.

* Add ViTMatte by NielsRogge in 25843

Nougat

Nougat uses the same architecture as [Donut](https://huggingface.co/docs/transformers/main/en/model_doc/donut), meaning an image Transformer encoder and an autoregressive text Transformer decoder to translate scientific PDFs to markdown, enabling easier access to them.

* Add Nougat by NielsRogge and molbap in 25942

Prompt templating

We've added a new [template](https://huggingface.co/docs/transformers/main/chat_templating) feature for chat models. This allows the formatting that a chat model was trained with to be saved with the model, ensuring that users can exactly reproduce that formatting when they want to fine-tune the model or use it for inference. For more information, see [our template documentation](https://huggingface.co/docs/transformers/main/chat_templating).

* Overhaul Conversation class and prompt templating by Rocketknight1 in 25323

🚨🚨 Tokenizer refactor

* [`Tokenizer`] attemp to fix add_token issues by ArthurZucker in 23909
* Nit-added-tokens by ArthurZucker in 26538 adds some fix to 23909 .

🚨Workflow Changes 🚨:

These are not breaking changes per se but rather bugfixes. However, we understand that this may result in some workflow changes so we highlight them below.

- unique_no_split_tokens attribute removed and not used in the internal logic
- sanitize_special_tokens() follows a deprecation cycle and does nothing
- All attributes in SPECIAL_TOKENS_ATTRIBUTES are stored as AddedTokens and no strings.
- loading a slow from a fast or a fast from a slow will no longer raise and error if the tokens added don't have the correct index. This is because they will always be added following the order of the added_tokens but will correct mistakes in the saved vocabulary if there are any. (And there are a lot in old format tokenizers)
- the length of a tokenizer is now max(set(self.get_vocab().keys())) accounting for holes in the vocab. The vocab_size no longer takes into account the added vocab for most of the tokenizers (as it should not). Mostly breaking for T5
- Adding a token using tokenizer.add_tokens([AddedToken("hey", rstrip=False, normalized=True)]) now takes into account rstrip, lstrip, normalized information.
- added_tokens_decoder holds AddedToken, not strings.
- add_tokens() for both fast and slow will always be updated if the token is already part of the vocab, allowing for custom stripping.
- initializing a tokenizer form scratch will now add missing special tokens to the vocab.
- stripping is not always done for special tokens! 🚨 Only if the AddedToken has lstrip=True and rstrip=True
- fairseq_ids_to_tokens attribute removed for Barthez (was not used)

➕ Most visible features:
- printing a tokenizer now shows `tokenizer.added_tokens_decoder` for both fast and slow tokenizers. Moreover, additional tokens that were already part of the initial vocab are also found there.
- faster `from_pretrained`, faster `add_tokens` because special and non special can be mixed together and the trie is not always rebuilt.
- faster encode/decode with caching mechanism for `added_tokens_decoder/encoder`.
- information is fully saved in the `tokenizer_config.json`

**For any issues relating to this, make sure to open a new issue and ping ArthurZucker.**

Flash Attention 2

FA2 support added to transformers for most popular architectures (llama, mistral, falcon) architectures actively being contributed in this issue (https://github.com/huggingface/transformers/issues/26350). Simply pass `use_flash_attention_2=True` when calling `from_pretrained`

In the future, PyTorch will support Flash Attention 2 through [`torch.scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), users would be able to benefit from both (transformers core & transformers + SDPA) implementations of Flash Attention-2 with simple changes (`model.to_bettertransformer()` and force-dispatch the SDPA kernel to FA-2 in the case of SDPA)

* [`core` ] Integrate Flash attention 2 in most used models by younesbelkada in 25598

For our future plans regarding integrating F.sdpa from PyTorch in core transformers, see here: https://github.com/huggingface/transformers/issues/26557

Lazy import structure

Support for lazy loading integration libraries has been added. This will drastically speed up importing `transformers` and related object from the library.

Example before this change:


2023-09-11 11:07:52.010179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
python3 -c "from transformers import CLIPTextModel" 3.31s user 3.06s system 220% cpu 2.893 total


After this change:


python3 -c "from transformers import CLIPTextModel" 1.70s user 1.49s system 220% cpu 1.447 total


* [Core] Add lazy import structure to imports by patrickvonplaten in 26090

Bugfixes and improvements

* Fix typo by susnato in 25966
* Fix Detr CI by ydshieh in 25972
* Fix `test_load_img_url_timeout` by ydshieh in 25976
* nn.Identity is not required to be compatible with PyTorch < 1.1.0 as the minimum PyTorch version we currently support is 1.10.0 by statelesshz in 25974
* Add `Pop2Piano` space demo. by susnato in 25975
* fix typo by kai01ai in 25981
* Use main in conversion script by ydshieh in 25973
* [doc] Always call it Agents for consistency by julien-c in 25958
* Update RAG README.md with correct path to examples/seq2seq by tleyden in 25953
* Update training_args.py to remove the runtime error by sahel-sh in 25920
* Trainer: delegate default generation values to `generation_config` by gante in 25987
* Show failed tests on CircleCI layout in a better way by ydshieh in 25895
* Patch with accelerate xpu by abhilash1910 in 25714
* PegasusX add _no_split_modules by andreeahedes in 25933
* Add TFDebertaV2ForMultipleChoice by raghavanone in 25932
* deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler by pacman100 in 25863
* [Wav2Vec2 Conformer] Fix inference float16 by sanchit-gandhi in 25985
* Add LLaMA resources by eenzeenee in 25859
* [`CI`] Fix red CI and ERROR failed should show by ArthurZucker in 25995
* [`VITS`] tokenizer integration test: fix revision did not exist by ArthurZucker in 25996
* Fix Mega chunking error when using decoder-only model by tanaymeh in 25765
* save space when converting hf model to megatron model. by flower-with-safe in 25950
* Update README.md by NinoRisteski in 26003
* Falcon: fix revision propagation by LysandreJik in 26006
* TF-OPT attention mask fixes by Rocketknight1 in 25238
* Fix small typo README.md by zspo in 25934
* 🌐[i18n-KO] Translated `llm_tutorial.md` to Korean by harheem in 25791
* Remove Falcon from undocumented list by Rocketknight1 in 26008
* modify context length for GPTQ + version bump by SunMarc in 25899
* Fix err with FSDP by muellerzr in 25991
* fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 by kai01ai in 26024
* Fix CircleCI config by ydshieh in 26023
* Add `tgs` speed metrics by CokeDong in 25858
* [VITS] Fix nightly tests by sanchit-gandhi in 25986
* Added HerBERT to README.md by Muskan011 in 26020
* Fix vilt config docstring parameter to match value in init by raghavanone in 26017
* Punctuation fix by kwonmha in 26025
* Try to fix training Loss inconsistent after resume from old checkpoint by dumpmemory in 25872
* Fix Dropout Implementation in Graphormer by alexanderkrauck in 24817
* Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D by gau-nernst in 26031
* Skip warning if tracing with dynamo by angelayi in 25581
* 🌐 [i18n-KO] Translated `llama.md` to Korean by harheem in 26044
* [`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset by ArthurZucker in 26041
* [`CITests`] skip failing tests until 26054 is merged by ArthurZucker in 26063
* only main process should call _save on deepspeed zero3 by zjjMaiMai in 25959
* docs: update link huggingface map by pphuc25 in 26077
* docs: add space to docs by pphuc25 in 26067
* [`core`] Import tensorflow inside relevant methods in `trainer_utils` by younesbelkada in 26106
* Generate: legacy mode is only triggered when `generation_config` is untouched by gante in 25962
* Update logits_process.py docstrings by larekrow in 25971
* Fix ExponentialDecayLengthPenalty negative logits issue by pokjay in 25594
* 🌐 [i18n-KO] Translated `llama2.md` to Korean by mjk0618 in 26047
* [docs] Updates to TTS task guide with regards to the new TTS pipeline by MKhalusova in 26095
* 🌐 [i18n-KO] Translated `contributing.md` to Korean by mjk0618 in 25877
* enable optuna multi-objectives feature by sywangyi in 25969
* chore: correct update_step and correct gradient_accumulation_steps by pphuc25 in 26068
* Text2text pipeline: don't parameterize from the config by gante in 26118
* Fix `MarianTokenizer` to remove metaspace character in `decode` by tanaymeh in 26091
* safeguard torch distributed check by pacman100 in 26056
* fix the deepspeed tests by pacman100 in 26021
* Fix AutoTokenizer docstring typo by amyeroberts in 26117
* [`core`] fix 4bit `num_parameters` by younesbelkada in 26132
* Add missing space in generation/utils.py by jbochi in 26121
* Update spectrogram and waveform model mapping for TTS/A pipeline by Vaibhavs10 in 26114
* [`RWKV`] Final fix RWMV 4bit by younesbelkada in 26134
* docs: feat: add llama2 notebook resources from OSSCA community by junejae in 26076
* Generate: ignore warning when `generation_config.max_length` is set to `None` by gante in 26147
* Fix `test_finetune_bert2bert` by ydshieh in 25984
* Falcon: batched generation by gante in 26137
* Fix `beam_scores` shape when token scores shape changes after `logits_processor` by BakerBunker in 25980
* Update training_args.py - addition of self.distributed_state when using XPU by Serizao in 25999
* [docs] last hidden state vs hidden_states[-1] by MKhalusova in 26142
* Flex xpu bug fix by abhilash1910 in 26135
* Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses by rachthree in 25638
* Fix eval accumulation when `accelerate` > 0.20.3 by sam-scale in 26060
* [Whisper Tokenizer] Encode timestamps by sanchit-gandhi in 26054
* [`PEFT`] Fix PEFT + gradient checkpointing by younesbelkada in 25846
* [MusicGen] Add streamer to generate by sanchit-gandhi in 25320
* Fix beam search when using model parallel by pfldy2850 in 24969
* [MusicGen] Add sampling rate to config by sanchit-gandhi in 26136
* [Whisper] Fix word-level timestamps for audio < 30 seconds by xenova in 25607
* [BLIP-2] Improve conversion script by NielsRogge in 24854
* IDEFICS: allow interpolation of vision's pos embeddings by leot13 in 26029
* [TTA Pipeline] Test MusicGen and VITS by sanchit-gandhi in 26146
* Tweaks to Chat Templates docs by Rocketknight1 in 26168
* [Whisper] Check length of prompt + max new tokens by sanchit-gandhi in 26164
* Update notebook.py to support multi eval datasets by matrix1001 in 25796
* Fix pad to multiple of by ArthurZucker in 25732
* [docs] IDEFICS guide and task guides restructure by MKhalusova in 26035
* [PEFT] Allow PEFT model dict to be loaded by patrickvonplaten in 25721
* No doctest for `convert_bros_to_pytorch.py` by ydshieh in 26212
* Remove `utils/documentation_tests.txt` by ydshieh in 26213
* moved `ctrl` to `Salesforce/ctrl` by julien-c in 26183
* Fix ConversationalPipeline tests by Rocketknight1 in 26217
* [FSMT] Fix non-shared weights by LysandreJik in 26187
* refactor decay_parameters production into its own function by shijie-wu in 26152
* refactor: change default block_size in block size > max position embeddings by pphuc25 in 26069
* [Wav2Vec2-Conf / LLaMA] Style fix by sanchit-gandhi in 26188
* [Permisson] Style fix by sanchit-gandhi in 26228
* [Check] Fix config docstring by sanchit-gandhi in 26222
* 🌐 [i18n-KO] Translated `whisper.md` to Korean by nuatmochoi in 26002
* Create the return value on device to avoid unnecessary copying from CPU by mksit in 26151
* [AutoBackbone] Add test by NielsRogge in 26094
* Update README.md by NinoRisteski in 26198
* Update add_new_pipeline.md by NinoRisteski in 26197
* [docs] Fix model reference in zero shot image classification example by Aleksandar1932 in 26206
* Fix the gitlab user mention in issue templates to the correct user by muellerz in 26237
* Fix some docstring in image processors by ydshieh in 26235
* Fix gated repo tests by Wauplin in 26257
* Fix `Error` not captured in PR doctesting by ydshieh in 26215
* DeepSpeed ZeRO-3 handling when resizing embedding layers by pacman100 in 26259
* [FIX] resize_token_embeddings by passaglia in 26102
* FSDP tests and checkpointing fixes by pacman100 in 26180
* fix name error when accelerate is not available by pacman100 in 26278
* Update bros checkpoint by jinhopark8345 in 26277
* Integrate AMD GPU in CI/CD environment by mfuntowicz in 26007
* Rewrite for custom code warning messages by Rocketknight1 in 26291
* fix deepspeed available detection by fxmarty in 26252
* add bbox input validation by jinhopark8345 in 26294
* include changes from llama by ArthurZucker in 26260
* [`Trainer`] Refactor trainer + bnb logic by younesbelkada in 26248
* add custom RMSNorm to `ALL_LAYERNORM_LAYERS` by shijie-wu in 26227
* Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed by fxmarty in 26225
* Fix FSMT weight sharing by LysandreJik in 26292
* update hf hub dependency to be compatible with the new tokenizers by ArthurZucker in 26301
* Porting the torchaudio kaldi fbank implementation to audio_utils by ylacombe in 26182
* More error message fixup, plus some linebreaks! by Rocketknight1 in 26296
* [QUICK FIX LINK] Update trainer.py by SoyGema in 26293
* Use CircleCI `store_test_results` by ydshieh in 26223
* Fix doctest CI by ydshieh in 26324
* [doc] fixed indices in obj detection example by MKhalusova in 26343
* [TTA Pipeline] Fix MusicGen test by sanchit-gandhi in 26348
* Add image to image pipeline by LeviVasconcelos in 25393
* feat: adding num_proc to load_dataset by pphuc25 in 26326
* Fixed unclosed p tags by HanSeokhyeon in 26240
* Update add_new_model.md by NinoRisteski in 26365
* Fix MusicGen logging error by osanseviero in 26370
* [docs] removed MaskFormerSwin and TimmBackbone from the table on index.md by MKhalusova in 26347
* Update tiny model information and pipeline tests by ydshieh in 26285
* Add Russian localization for README by qweme32 in 26208
* 🌐 [i18n-KO] Translated `audio_classification.mdx` to Korean by gabrielwithappy in 26200
* [ViTMatte] Add resources by NielsRogge in 26317
* Deleted duplicate sentence by titi-devv in 26394
* added support for gradient checkpointing in ESM models by sanjeevk-os in 26386
* Fix DeepSpeed issue with Idefics by HugoLaurencon in 26393
* Add torch `RMSProp` optimizer by natolambert in 26425
* Fix padding for IDEFICS by shauray8 in 26396
* Update semantic_segmentation.md by zekaouinoureddine in 26419
* Fixing tokenizer when `transformers` is installed without `tokenizers` by urialon in 26236
* [`FA` / `tests`] Add use_cache tests for FA models by younesbelkada in 26415
* add bf16 mixed precision support for NPU by statelesshz in 26163
* [`PEFT`] Fix PEFT multi adapters support by younesbelkada in 26407
* Fix failing doctest by LysandreJik in 26450
* Update `runs-on` in workflow files by ydshieh in 26435
* [i18n-DE] Complete first toc chapter by flozi00 in 26311
* 🌐 [i18n-KO] Translated `debugging.md` to Korean by wonhyeongseo in 26246
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean by wonhyeongseo in 26244
* optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by NormXU in 26139
* Fix `cos_sin` device issue in Falcon model by ydshieh in 26448
* docs: change assert to raise and some small docs by pphuc25 in 26232
* change mention of decoder_input_ids to input_ids and same with decode_inputs_embeds by tmabraham in 26406
* [VITS] Fix speaker_embed device mismatch by fakhirali in 26115
* [`PEFT`] introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model by younesbelkada in 26270
* Do not warn about unexpected decoder weights when loading T5EncoderModel and LongT5EncoderModel by fleonce in 26211
* fix_mbart_tied_weights by SunMarc in 26422
* Esm checkpointing by Amelie-Schreiber in 26454
* [Whisper Tokenizer] Make decoding faster after adding timestamps by sanchit-gandhi in 26299
* [docs] Update offline mode docs by stevhliu in 26478
* [docs] navigation improvement between text gen pipelines and text gen params by MKhalusova in 26477
* Skip 2 failing persimmon pipeline tests for now by ydshieh in 26485
* Avoid all-zeor attnetion mask used in testing by ydshieh in 26469
* [Flax Examples] Seq2Seq ASR Fine-Tuning Script by sanchit-gandhi in 21764
* [ASR Pipe] Improve docs and error messages by sanchit-gandhi in 26476
* Revert falcon exception by LysandreJik in 26472
* Fix num_heads in _upad_input by fs4r in 26490
* Fix requests connection error during modelcard creation by jphme in 26518
* Fix issue of canine forward requiring input_ids anyway by marcmk6 in 26290
* Fix broken link to video classification task by HelgeS in 26487
* [`PEFT`] Pass token when calling `find_adapter_config` by younesbelkada in 26488
* [`core`/ `auto` ] Fix bnb test with code revision + bug with code revision by younesbelkada in 26431
* Fix model integration ci by ArthurZucker in 26322
* [`PEFT`] Protect `adapter_kwargs` check by younesbelkada in 26537
* Remove-warns by ArthurZucker in 26483
* [Doctest] Add configuration_roformer.py by Adithya4720 in 26530
* Code-llama-nit by ArthurZucker in 26300
* add build_inputs_with_special_tokens to LlamaFast by ArthurZucker in 26297
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean by wonhyeongseo in 26243
* [i18n-DE] contribute chapter by flozi00 in 26481
* [RFC, Logging] Change warning to info by patrickvonplaten in 26545
* Add tokenizer kwargs to fill mask pipeline. by nmcahill in 26234
* [Wav2Vec2 and Co] Update init tests for PT 2.1 by sanchit-gandhi in 26494
* [AMD] Add initial version for run_tests_multi_gpu by mfuntowicz in 26346
* [Doctest] Add `configuration_encoder_decoder.py` by SrijanSahaySrivastava in 26519
* [InternLM] Add support for InternLM by Rocketknight1 in 26302


Significant community contributions

The following contributors have made significant changes to the library over the last release:

* jinhopark8345
* Add BROS (23190)
* Update bros checkpoint (26277)
* add bbox input validation (26294)
* qweme32
* Add Russian localization for README (26208)
* Bam4d
* [Mistral] Mistral-7B-v0.1 support (26447)
* flozi00
* [i18n-DE] Complete first toc chapter (26311)
* [i18n-DE] contribute chapter (26481)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `debugging.md` to Korean (26246)
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean (26244)
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean (26243)

4.33.3

Not secure
A patch release was made for the following three commits:

- DeepSpeed ZeRO-3 handling when resizing embedding layers (26259)
- [doc] Always call it Agents for consistency (25958)
- deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (25863)

4.33.2

Not secure
A patch release was done for these two commits:

- Fix pad to multiple of (25732)
- fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 (26024)

4.33.1

Not secure
Falcon

Falcon is a class of causal decoder-only models built by [TII](https://www.tii.ae/). The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the [RefinedWeb](https://arxiv.org/abs/2306.01116) corpus. They are made available under the Apache 2.0 license.

Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both ‘base’ models trained only as causal language models as well as ‘instruct’ models that have received further fine-tuning are available.

* Falcon port 24523 by Rocketknight1
* Falcon: Add RoPE scaling by gante in 25878
* Add proper Falcon docs and conversion script by Rocketknight1 in 25954
* Put Falcon back by LysandreJik in 25960
* [`Falcon`] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by younesbelkada in 25947

Code Llama

Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

* [`CodeLlama`] Add support for `CodeLlama` by ArthurZucker in 25740
* [`CodeLlama`] Fix CI by ArthurZucker in 25890

ViTDet

ViTDet reuses the ViT model architecture, adapted to object detection.

* Add ViTDet by NielsRogge in 25524

DINO v2

DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.

* [DINOv2] Add backbone class by NielsRogge in 25520

VITS

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.

* add VITS model by hollance in 24085

Breaking changes:
* 🚨🚨🚨 [`Refactor`] Move third-party related utility files into `integrations/` folder 🚨🚨🚨 by younesbelkada in 25599

Moves all third party libs (outside HF ecosystem) related utility files inside `integrations/` instead of having them in `transformers` directly.

In order to get the previous usage you should be changing your call to the following:
diff
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig


Bugfixes and improvements

* [DOCS] MusicGen Docs Update by xNul in 25510
* [MINOR:TYPO] by cakiki in 25646
* Pass the proper token to PEFT integration in auto classes by sgugger in 25649
* Put IDEFICS in the right section of the doc by sgugger in 25650
* TF 2.14 compatibility by Rocketknight1 in 25630
* Fix bloom add prefix space by ArthurZucker in 25652
* removing unnecesssary extra parameter by rafaelpadilla in 25643
* Adds `TRANSFORMERS_TEST_BACKEND` by vvvm23 in 25655
* stringify config by AleksanderWWW in 25637
* Add input_embeds functionality to gpt_neo Causal LM by gaasher in 25659
* Update doc toctree by ydshieh in 25661
* Add Llama2 resources by wonhyeongseo in 25531
* [`SPM`] Patch `spm` Llama and T5 by ArthurZucker in 25656
* [`GPTNeo`] Add input_embeds functionality to gpt_neo Causal LM by ArthurZucker in 25664
* fix wrong path in some doc by ydshieh in 25658
* Remove `utils/documentation_tests.txt` by ydshieh in 25680
* Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by norabelrose in 24941
* ⚠️ [CLAP] Fix dtype of logit scales in init by sanchit-gandhi in 25682
* Sets the stalebot to 10 AM CEST by LysandreJik in 25678
* Fix `pad_token` check condition by ydshieh in 25685
* [DOCS] Added docstring example for EpsilonLogitsWarper 24783 by sanjeevk-os in 25378
* correct resume training steps number in progress bar by pphuc25 in 25691
* Generate: general test for decoder-only generation from `inputs_embeds` by gante in 25687
* Fix typo in `configuration_gpt2.py` by susnato in 25676
* fix ram efficient fsdp init by pacman100 in 25686
* [`LlamaTokenizer`] make unk_token_length a property by ArthurZucker in 25689
* Update list of persons to tag by sgugger in 25708
* docs: Resolve typos in warning text by tomaarsen in 25711
* Fix failing `test_batch_generation` for bloom by ydshieh in 25718
* [`PEFT`] Fix peft version by younesbelkada in 25710
* Fix number of minimal calls to the Hub with peft integration by sgugger in 25715
* [`AutoGPTQ`] Add correct installation of GPTQ library + fix slow tests by younesbelkada in 25713
* Generate: nudge towards `do_sample=False` when `temperature=0.0` by gante in 25722
* [`from_pretrained`] Simpler code for peft by ArthurZucker in 25726
* [idefics] idefics-9b test use 4bit quant by stas00 in 25734
* ImageProcessor - check if input pixel values between 0-255 by amyeroberts in 25688
* [`from_pretrained`] Fix failing PEFT tests by younesbelkada in 25733
* [ASR Pipe Test] Fix CTC timestamps error message by sanchit-gandhi in 25727
* 🌐 [i18n-KO] Translated `visual_question_answering.md` to Korean by wonhyeongseo in 25679
* [`PEFT`] Fix PeftConfig save pretrained when calling `add_adapter` by younesbelkada in 25738
* fixed typo in speech encoder decoder doc by asusevski in 25745
* Add FlaxCLIPTextModelWithProjection by pcuenca in 25254
* Generate: add missing logits processors docs by gante in 25653
* [DOCS] Add example for HammingDiversityLogitsProcessor by jessthebp in 25481
* Generate: logits processors are doctested and fix broken doctests by gante in 25692
* [CLAP] Fix logit scales dtype for fp16 by sanchit-gandhi in 25754
* [`Sentencepiece`] make sure `legacy` do not require `protobuf` by ArthurZucker in 25684
* fix encoder hook by SunMarc in 25735
* Docs: fix indentation in `HammingDiversityLogitsProcessor` by gante in 25756
* Add type hints for several pytorch models (batch-3) by nablabits in 25705
* Correct attention mask dtype for Flax GPT2 by liutianlin0121 in 25636
* fix a typo in docsting by statelesshz in 25759
* [idefics] small fixes by stas00 in 25764
* Add docstrings and fix VIVIT examples by Geometrein in 25628
* [`LlamaFamiliy`] add a tip about dtype by ArthurZucker in 25794
* Add type hints for several pytorch models (batch-2) by nablabits in 25557
* Add type hints for pytorch models (final batch) by nablabits in 25750
* Add type hints for several pytorch models (batch-4) by nablabits in 25749
* [idefics] fix vision's `hidden_act` by stas00 in 25787
* Arde/fsdp activation checkpointing by arde171 in 25771
* Fix incorrect Boolean value in deepspeed example by tmm1 in 25788
* fixing name position_embeddings to object_queries by Lorenzobattistela in 24652
* Resolving Attribute error when using the FSDP ram efficient feature by pacman100 in 25820
* [`Docs`] More clarifications on BT + FA by younesbelkada in 25823
* fix register by zspo in 25779
* Minor wording changes for Code Llama by osanseviero in 25815
* [`LlamaTokenizer`] `tokenize` nits. by ArthurZucker in 25793
* fix warning trigger for embed_positions when loading xglm by MattYoon in 25798
* 🌐 [i18n-KO] Translated peft.md to Korean by nuatmochoi in 25706
* 🌐 [i18n-KO] `model_memory_anatomy.md` to Korean by mjk0618 in 25755
* Error with checking args.eval_accumulation_steps to gather tensors by chaumng in 25819
* Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by gante in 25763
* 🌐 [i18n-KO] Translated `add_new_pipeline.md` to Korean by heuristicwave in 25498
* 🌐 [i18n-KO] Translated `community.md` to Korean by sim-so in 25674
* 🤦update warning to If you want to use the new behaviour, set `legacy=… by ArthurZucker in 25833
* update remaining `Pop2Piano` checkpoints by susnato in 25827
* [AutoTokenizer] Add data2vec to mapping by sanchit-gandhi in 25835
* MaskFormer,Mask2former - reduce memory load by amyeroberts in 25741
* Support loading base64 images in pipelines by InventivetalentDev in 25633
* Update README.md by NinoRisteski in 25834
* Generate: models with custom `generate()` return `True` in `can_generate()` by gante in 25838
* Update README.md by NinoRisteski in 25832
* minor typo fix in PeftAdapterMixin docs by tmm1 in 25829
* Add flax installation in daily doctest workflow by ydshieh in 25860
* Add Blip2 model in VQA pipeline by jpizarrom in 25532
* Remote tools are turned off by LysandreJik in 25867
* Fix imports by ydshieh in 25869
* fix max_memory for bnb by SunMarc in 25842
* Docs: fix example failing doctest in `generation_strategies.md ` by gante in 25874
* pin pandas==2.0.3 by ydshieh in 25875
* Reduce CI output by ydshieh in 25876
* [ViTDet] Fix doc tests by NielsRogge in 25880
* For xla tensors, use an alternative way to get a unique id by qihqi in 25802
* fix ds z3 checkpointing when `stage3_gather_16bit_weights_on_model_save=False` by pacman100 in 25817
* Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by veezbo in 25807
* [`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed by ArthurZucker in 25626
* Save image_processor while saving pipeline (ImageSegmentationPipeline) by raghavanone in 25884
* [`InstructBlip`] FINAL Fix instructblip test by younesbelkada in 25887
* Add type hints for tf models batch 1 by nablabits in 25853
* Update `setup.py` by ydshieh in 25893
* Smarter check for `is_tensor` by sgugger in 25871
* remove torch_dtype override by SunMarc in 25894
* fix FSDP model resume optimizer & scheduler by pkumc in 25852
* Better error message for pipeline loading by ydshieh in 25912
* Remove broken docs for MusicGen by osanseviero in 25905
* Revert frozen training arguments by muellerzr in 25903
* [VITS] Add to TTA pipeline by sanchit-gandhi in 25906
* [MMS] Update docs with HF TTS implementation by sanchit-gandhi in 25907
* [VITS] Only trigger tokenizer warning for uroman by sanchit-gandhi in 25915
* Update-llama-code by ArthurZucker in 25826
* Update model_memory_anatomy.md by NinoRisteski in 25896
* Skip offload tests for `ViTDet` by ydshieh in 25913
* Fix typos by omahs in 25936
* Update community.md by NinoRisteski in 25928
* Update autoclass_tutorial.md by NinoRisteski in 25929
* Update README.md by NinoRisteski in 25941
* [MMS] Fix pip install in docs by sanchit-gandhi in 25949
* [VITS] Handle deprecated weight norm by sanchit-gandhi in 25946
* Import deepspeed utilities from integrations by osanseviero in 25919
* Update README.md by NinoRisteski in 25922
* [VITS] Fix init test by sanchit-gandhi in 25945
* Fix failing test by LysandreJik in 25963
* Fix smart check by ydshieh in 25955
* Add type hints for tf models final batch by nablabits in 25883


Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nablabits
* Add type hints for several pytorch models (batch-3) (25705)
* Add type hints for several pytorch models (batch-2) (25557)
* Add type hints for pytorch models (final batch) (25750)
* Add type hints for several pytorch models (batch-4) (25749)
* Add type hints for tf models batch 1 (25853)
* Add type hints for tf models final batch (25883)
* Lorenzobattistela
* fixing name position_embeddings to object_queries (24652)
* hollance
* add VITS model (24085)

4.32.1

Not secure
Patch release including several patches from v4.31.0, listed below:

- Put IDEFICS in the right section of the doc (25650)
- removing unnecesssary extra parameter (25643)
- [SPM] Patch spm Llama and T5 (25656)
- Fix bloom add prefix space (25652)
- Generate: add missing logits processors docs (25653)
- [idefics] small fixes (25764)

Page 9 of 31

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.