Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 30

4.34.0

Not secure
New models

Mistral

Mistral-7B-v0.1 is a decoder-based LM with the following architectural choices:

- Sliding Window Attention - Trained with 8k context length and fixed cache size, with a theoretical attention span of 128K tokens
- GQA (Grouped Query Attention) - allowing faster inference and lower cache size.
- Byte-fallback BPE tokenizer - ensures that characters are never mapped to out-of-vocabulary tokens.

* [Mistral] Mistral-7B-v0.1 support by Bam4d in 26447

Persimmon

The authors introduced Persimmon-8B, a decoder model based on the classic transformers architecture, with query and key normalization. Persimmon-8B is a fully permissively licensed model with approximately 8 billion parameters, released under the Apache license. Some of the key attributes of Persimmon-8B are long context size (16K), performance, and capabilities for multimodal extensions.

* [`Persimmon`] Add support for persimmon by ArthurZucker in 26042

BROS

BROS stands for BERT Relying On Spatiality. It is an encoder-only Transformer model that takes a sequence of tokens and their bounding boxes as inputs and outputs a sequence of hidden states. BROS encode relative spatial information instead of using absolute spatial information.

* Add BROS by jinhopark8345 in 23190

ViTMatte

ViTMatte leverages plain [Vision Transformers](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for the task of image matting, which is the process of accurately estimating the foreground object in images and videos.

* Add ViTMatte by NielsRogge in 25843

Nougat

Nougat uses the same architecture as [Donut](https://huggingface.co/docs/transformers/main/en/model_doc/donut), meaning an image Transformer encoder and an autoregressive text Transformer decoder to translate scientific PDFs to markdown, enabling easier access to them.

* Add Nougat by NielsRogge and molbap in 25942

Prompt templating

We've added a new [template](https://huggingface.co/docs/transformers/main/chat_templating) feature for chat models. This allows the formatting that a chat model was trained with to be saved with the model, ensuring that users can exactly reproduce that formatting when they want to fine-tune the model or use it for inference. For more information, see [our template documentation](https://huggingface.co/docs/transformers/main/chat_templating).

* Overhaul Conversation class and prompt templating by Rocketknight1 in 25323

🚨🚨 Tokenizer refactor

* [`Tokenizer`] attemp to fix add_token issues by ArthurZucker in 23909
* Nit-added-tokens by ArthurZucker in 26538 adds some fix to 23909 .

🚨Workflow Changes 🚨:

These are not breaking changes per se but rather bugfixes. However, we understand that this may result in some workflow changes so we highlight them below.

- unique_no_split_tokens attribute removed and not used in the internal logic
- sanitize_special_tokens() follows a deprecation cycle and does nothing
- All attributes in SPECIAL_TOKENS_ATTRIBUTES are stored as AddedTokens and no strings.
- loading a slow from a fast or a fast from a slow will no longer raise and error if the tokens added don't have the correct index. This is because they will always be added following the order of the added_tokens but will correct mistakes in the saved vocabulary if there are any. (And there are a lot in old format tokenizers)
- the length of a tokenizer is now max(set(self.get_vocab().keys())) accounting for holes in the vocab. The vocab_size no longer takes into account the added vocab for most of the tokenizers (as it should not). Mostly breaking for T5
- Adding a token using tokenizer.add_tokens([AddedToken("hey", rstrip=False, normalized=True)]) now takes into account rstrip, lstrip, normalized information.
- added_tokens_decoder holds AddedToken, not strings.
- add_tokens() for both fast and slow will always be updated if the token is already part of the vocab, allowing for custom stripping.
- initializing a tokenizer form scratch will now add missing special tokens to the vocab.
- stripping is not always done for special tokens! 🚨 Only if the AddedToken has lstrip=True and rstrip=True
- fairseq_ids_to_tokens attribute removed for Barthez (was not used)

βž• Most visible features:
- printing a tokenizer now shows `tokenizer.added_tokens_decoder` for both fast and slow tokenizers. Moreover, additional tokens that were already part of the initial vocab are also found there.
- faster `from_pretrained`, faster `add_tokens` because special and non special can be mixed together and the trie is not always rebuilt.
- faster encode/decode with caching mechanism for `added_tokens_decoder/encoder`.
- information is fully saved in the `tokenizer_config.json`

**For any issues relating to this, make sure to open a new issue and ping ArthurZucker.**

Flash Attention 2

FA2 support added to transformers for most popular architectures (llama, mistral, falcon) architectures actively being contributed in this issue (https://github.com/huggingface/transformers/issues/26350). Simply pass `use_flash_attention_2=True` when calling `from_pretrained`

In the future, PyTorch will support Flash Attention 2 through [`torch.scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), users would be able to benefit from both (transformers core & transformers + SDPA) implementations of Flash Attention-2 with simple changes (`model.to_bettertransformer()` and force-dispatch the SDPA kernel to FA-2 in the case of SDPA)

* [`core` ]Β Integrate Flash attention 2 in most used models by younesbelkada in 25598

For our future plans regarding integrating F.sdpa from PyTorch in core transformers, see here: https://github.com/huggingface/transformers/issues/26557

Lazy import structure

Support for lazy loading integration libraries has been added. This will drastically speed up importing `transformers` and related object from the library.

Example before this change:


2023-09-11 11:07:52.010179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
python3 -c "from transformers import CLIPTextModel" 3.31s user 3.06s system 220% cpu 2.893 total


After this change:


python3 -c "from transformers import CLIPTextModel" 1.70s user 1.49s system 220% cpu 1.447 total


* [Core] Add lazy import structure to imports by patrickvonplaten in 26090

Bugfixes and improvements

* Fix typo by susnato in 25966
* Fix Detr CI by ydshieh in 25972
* Fix `test_load_img_url_timeout` by ydshieh in 25976
* nn.Identity is not required to be compatible with PyTorch < 1.1.0 as the minimum PyTorch version we currently support is 1.10.0 by statelesshz in 25974
* Add `Pop2Piano` space demo. by susnato in 25975
* fix typo by kai01ai in 25981
* Use main in conversion script by ydshieh in 25973
* [doc] Always call it Agents for consistency by julien-c in 25958
* Update RAG README.md with correct path to examples/seq2seq by tleyden in 25953
* Update training_args.py to remove the runtime error by sahel-sh in 25920
* Trainer: delegate default generation values to `generation_config` by gante in 25987
* Show failed tests on CircleCI layout in a better way by ydshieh in 25895
* Patch with accelerate xpu by abhilash1910 in 25714
* PegasusX add _no_split_modules by andreeahedes in 25933
* Add TFDebertaV2ForMultipleChoice by raghavanone in 25932
* deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler by pacman100 in 25863
* [Wav2Vec2 Conformer] Fix inference float16 by sanchit-gandhi in 25985
* Add LLaMA resources by eenzeenee in 25859
* [`CI`] Fix red CI and ERROR failed should show by ArthurZucker in 25995
* [`VITS`] tokenizer integration test: fix revision did not exist by ArthurZucker in 25996
* Fix Mega chunking error when using decoder-only model by tanaymeh in 25765
* save space when converting hf model to megatron model. by flower-with-safe in 25950
* Update README.md by NinoRisteski in 26003
* Falcon: fix revision propagation by LysandreJik in 26006
* TF-OPT attention mask fixes by Rocketknight1 in 25238
* Fix small typo README.md by zspo in 25934
* 🌐[i18n-KO] Translated `llm_tutorial.md` to Korean by harheem in 25791
* Remove Falcon from undocumented list by Rocketknight1 in 26008
* modify context length for GPTQ + version bump by SunMarc in 25899
* Fix err with FSDP by muellerzr in 25991
* fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 by kai01ai in 26024
* Fix CircleCI config by ydshieh in 26023
* Add `tgs` speed metrics by CokeDong in 25858
* [VITS] Fix nightly tests by sanchit-gandhi in 25986
* Added HerBERT to README.md by Muskan011 in 26020
* Fix vilt config docstring parameter to match value in init by raghavanone in 26017
* Punctuation fix by kwonmha in 26025
* Try to fix training Loss inconsistent after resume from old checkpoint by dumpmemory in 25872
* Fix Dropout Implementation in Graphormer by alexanderkrauck in 24817
* Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D by gau-nernst in 26031
* Skip warning if tracing with dynamo by angelayi in 25581
* 🌐 [i18n-KO] Translated `llama.md` to Korean by harheem in 26044
* [`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset by ArthurZucker in 26041
* [`CITests`] skip failing tests until 26054 is merged by ArthurZucker in 26063
* only main process should call _save on deepspeed zero3 by zjjMaiMai in 25959
* docs: update link huggingface map by pphuc25 in 26077
* docs: add space to docs by pphuc25 in 26067
* [`core`] Import tensorflow inside relevant methods in `trainer_utils` by younesbelkada in 26106
* Generate: legacy mode is only triggered when `generation_config` is untouched by gante in 25962
* Update logits_process.py docstrings by larekrow in 25971
* Fix ExponentialDecayLengthPenalty negative logits issue by pokjay in 25594
* 🌐 [i18n-KO] Translated `llama2.md` to Korean by mjk0618 in 26047
* [docs] Updates to TTS task guide with regards to the new TTS pipeline by MKhalusova in 26095
* 🌐 [i18n-KO] Translated `contributing.md` to Korean by mjk0618 in 25877
* enable optuna multi-objectives feature by sywangyi in 25969
* chore: correct update_step and correct gradient_accumulation_steps by pphuc25 in 26068
* Text2text pipeline: don't parameterize from the config by gante in 26118
* Fix `MarianTokenizer` to remove metaspace character in `decode` by tanaymeh in 26091
* safeguard torch distributed check by pacman100 in 26056
* fix the deepspeed tests by pacman100 in 26021
* Fix AutoTokenizer docstring typo by amyeroberts in 26117
* [`core`] fix 4bit `num_parameters` by younesbelkada in 26132
* Add missing space in generation/utils.py by jbochi in 26121
* Update spectrogram and waveform model mapping for TTS/A pipeline by Vaibhavs10 in 26114
* [`RWKV`] Final fix RWMV 4bit by younesbelkada in 26134
* docs: feat: add llama2 notebook resources from OSSCA community by junejae in 26076
* Generate: ignore warning when `generation_config.max_length` is set to `None` by gante in 26147
* Fix `test_finetune_bert2bert` by ydshieh in 25984
* Falcon: batched generation by gante in 26137
* Fix `beam_scores` shape when token scores shape changes after `logits_processor` by BakerBunker in 25980
* Update training_args.py - addition of self.distributed_state when using XPU by Serizao in 25999
* [docs] last hidden state vs hidden_states[-1] by MKhalusova in 26142
* Flex xpu bug fix by abhilash1910 in 26135
* Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses by rachthree in 25638
* Fix eval accumulation when `accelerate` > 0.20.3 by sam-scale in 26060
* [Whisper Tokenizer] Encode timestamps by sanchit-gandhi in 26054
* [`PEFT`] Fix PEFT + gradient checkpointing by younesbelkada in 25846
* [MusicGen] Add streamer to generate by sanchit-gandhi in 25320
* Fix beam search when using model parallel by pfldy2850 in 24969
* [MusicGen] Add sampling rate to config by sanchit-gandhi in 26136
* [Whisper] Fix word-level timestamps for audio < 30 seconds by xenova in 25607
* [BLIP-2] Improve conversion script by NielsRogge in 24854
* IDEFICS: allow interpolation of vision's pos embeddings by leot13 in 26029
* [TTA Pipeline] Test MusicGen and VITS by sanchit-gandhi in 26146
* Tweaks to Chat Templates docs by Rocketknight1 in 26168
* [Whisper] Check length of prompt + max new tokens by sanchit-gandhi in 26164
* Update notebook.py to support multi eval datasets by matrix1001 in 25796
* Fix pad to multiple of by ArthurZucker in 25732
* [docs] IDEFICS guide and task guides restructure by MKhalusova in 26035
* [PEFT] Allow PEFT model dict to be loaded by patrickvonplaten in 25721
* No doctest for `convert_bros_to_pytorch.py` by ydshieh in 26212
* Remove `utils/documentation_tests.txt` by ydshieh in 26213
* moved `ctrl` to `Salesforce/ctrl` by julien-c in 26183
* Fix ConversationalPipeline tests by Rocketknight1 in 26217
* [FSMT] Fix non-shared weights by LysandreJik in 26187
* refactor decay_parameters production into its own function by shijie-wu in 26152
* refactor: change default block_size in block size > max position embeddings by pphuc25 in 26069
* [Wav2Vec2-Conf / LLaMA] Style fix by sanchit-gandhi in 26188
* [Permisson] Style fix by sanchit-gandhi in 26228
* [Check] Fix config docstring by sanchit-gandhi in 26222
* 🌐 [i18n-KO] Translated `whisper.md` to Korean by nuatmochoi in 26002
* Create the return value on device to avoid unnecessary copying from CPU by mksit in 26151
* [AutoBackbone] Add test by NielsRogge in 26094
* Update README.md by NinoRisteski in 26198
* Update add_new_pipeline.md by NinoRisteski in 26197
* [docs] Fix model reference in zero shot image classification example by Aleksandar1932 in 26206
* Fix the gitlab user mention in issue templates to the correct user by muellerz in 26237
* Fix some docstring in image processors by ydshieh in 26235
* Fix gated repo tests by Wauplin in 26257
* Fix `Error` not captured in PR doctesting by ydshieh in 26215
* DeepSpeed ZeRO-3 handling when resizing embedding layers by pacman100 in 26259
* [FIX] resize_token_embeddings by passaglia in 26102
* FSDP tests and checkpointing fixes by pacman100 in 26180
* fix name error when accelerate is not available by pacman100 in 26278
* Update bros checkpoint by jinhopark8345 in 26277
* Integrate AMD GPU in CI/CD environment by mfuntowicz in 26007
* Rewrite for custom code warning messages by Rocketknight1 in 26291
* fix deepspeed available detection by fxmarty in 26252
* add bbox input validation by jinhopark8345 in 26294
* include changes from llama by ArthurZucker in 26260
* [`Trainer`] Refactor trainer + bnb logic by younesbelkada in 26248
* add custom RMSNorm to `ALL_LAYERNORM_LAYERS` by shijie-wu in 26227
* Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed by fxmarty in 26225
* Fix FSMT weight sharing by LysandreJik in 26292
* update hf hub dependency to be compatible with the new tokenizers by ArthurZucker in 26301
* Porting the torchaudio kaldi fbank implementation to audio_utils by ylacombe in 26182
* More error message fixup, plus some linebreaks! by Rocketknight1 in 26296
* [QUICK FIX LINK] Update trainer.py by SoyGema in 26293
* Use CircleCI `store_test_results` by ydshieh in 26223
* Fix doctest CI by ydshieh in 26324
* [doc] fixed indices in obj detection example by MKhalusova in 26343
* [TTA Pipeline] Fix MusicGen test by sanchit-gandhi in 26348
* Add image to image pipeline by LeviVasconcelos in 25393
* feat: adding num_proc to load_dataset by pphuc25 in 26326
* Fixed unclosed p tags by HanSeokhyeon in 26240
* Update add_new_model.md by NinoRisteski in 26365
* Fix MusicGen logging error by osanseviero in 26370
* [docs] removed MaskFormerSwin and TimmBackbone from the table on index.md by MKhalusova in 26347
* Update tiny model information and pipeline tests by ydshieh in 26285
* Add Russian localization for README by qweme32 in 26208
* 🌐 [i18n-KO] Translated `audio_classification.mdx` to Korean by gabrielwithappy in 26200
* [ViTMatte] Add resources by NielsRogge in 26317
* Deleted duplicate sentence by titi-devv in 26394
* added support for gradient checkpointing in ESM models by sanjeevk-os in 26386
* Fix DeepSpeed issue with Idefics by HugoLaurencon in 26393
* Add torch `RMSProp` optimizer by natolambert in 26425
* Fix padding for IDEFICS by shauray8 in 26396
* Update semantic_segmentation.md by zekaouinoureddine in 26419
* Fixing tokenizer when `transformers` is installed without `tokenizers` by urialon in 26236
* [`FA` / `tests`] Add use_cache tests for FA models by younesbelkada in 26415
* add bf16 mixed precision support for NPU by statelesshz in 26163
* [`PEFT`] Fix PEFT multi adapters support by younesbelkada in 26407
* Fix failing doctest by LysandreJik in 26450
* Update `runs-on` in workflow files by ydshieh in 26435
* [i18n-DE] Complete first toc chapter by flozi00 in 26311
* 🌐 [i18n-KO] Translated `debugging.md` to Korean by wonhyeongseo in 26246
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean by wonhyeongseo in 26244
* optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by NormXU in 26139
* Fix `cos_sin` device issue in Falcon model by ydshieh in 26448
* docs: change assert to raise and some small docs by pphuc25 in 26232
* change mention of decoder_input_ids to input_ids and same with decode_inputs_embeds by tmabraham in 26406
* [VITS] Fix speaker_embed device mismatch by fakhirali in 26115
* [`PEFT`]Β introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model by younesbelkada in 26270
* Do not warn about unexpected decoder weights when loading T5EncoderModel and LongT5EncoderModel by fleonce in 26211
* fix_mbart_tied_weights by SunMarc in 26422
* Esm checkpointing by Amelie-Schreiber in 26454
* [Whisper Tokenizer] Make decoding faster after adding timestamps by sanchit-gandhi in 26299
* [docs] Update offline mode docs by stevhliu in 26478
* [docs] navigation improvement between text gen pipelines and text gen params by MKhalusova in 26477
* Skip 2 failing persimmon pipeline tests for now by ydshieh in 26485
* Avoid all-zeor attnetion mask used in testing by ydshieh in 26469
* [Flax Examples] Seq2Seq ASR Fine-Tuning Script by sanchit-gandhi in 21764
* [ASR Pipe] Improve docs and error messages by sanchit-gandhi in 26476
* Revert falcon exception by LysandreJik in 26472
* Fix num_heads in _upad_input by fs4r in 26490
* Fix requests connection error during modelcard creation by jphme in 26518
* Fix issue of canine forward requiring input_ids anyway by marcmk6 in 26290
* Fix broken link to video classification task by HelgeS in 26487
* [`PEFT`] Pass token when calling `find_adapter_config` by younesbelkada in 26488
* [`core`/ `auto` ] Fix bnb test with code revision + bug with code revision by younesbelkada in 26431
* Fix model integration ci by ArthurZucker in 26322
* [`PEFT`] Protect `adapter_kwargs` check by younesbelkada in 26537
* Remove-warns by ArthurZucker in 26483
* [Doctest] Add configuration_roformer.py by Adithya4720 in 26530
* Code-llama-nit by ArthurZucker in 26300
* add build_inputs_with_special_tokens to LlamaFast by ArthurZucker in 26297
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean by wonhyeongseo in 26243
* [i18n-DE] contribute chapter by flozi00 in 26481
* [RFC, Logging] Change warning to info by patrickvonplaten in 26545
* Add tokenizer kwargs to fill mask pipeline. by nmcahill in 26234
* [Wav2Vec2 and Co] Update init tests for PT 2.1 by sanchit-gandhi in 26494
* [AMD] Add initial version for run_tests_multi_gpu by mfuntowicz in 26346
* [Doctest] Add `configuration_encoder_decoder.py` by SrijanSahaySrivastava in 26519
* [InternLM] Add support for InternLM by Rocketknight1 in 26302


Significant community contributions

The following contributors have made significant changes to the library over the last release:

* jinhopark8345
* Add BROS (23190)
* Update bros checkpoint (26277)
* add bbox input validation (26294)
* qweme32
* Add Russian localization for README (26208)
* Bam4d
* [Mistral] Mistral-7B-v0.1 support (26447)
* flozi00
* [i18n-DE] Complete first toc chapter (26311)
* [i18n-DE] contribute chapter (26481)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `debugging.md` to Korean (26246)
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean (26244)
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean (26243)

4.33.3

Not secure
A patch release was made for the following three commits:

- DeepSpeed ZeRO-3 handling when resizing embedding layers (26259)
- [doc] Always call it Agents for consistency (25958)
- deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (25863)

4.33.2

Not secure
A patch release was done for these two commits:

- Fix pad to multiple of (25732)
- fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 (26024)

4.33.1

Not secure
Falcon

Falcon is a class of causal decoder-only models built by [TII](https://www.tii.ae/). The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the [RefinedWeb](https://arxiv.org/abs/2306.01116) corpus. They are made available under the Apache 2.0 license.

Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both β€˜base’ models trained only as causal language models as well as β€˜instruct’ models that have received further fine-tuning are available.

* Falcon port 24523 by Rocketknight1
* Falcon: Add RoPE scaling by gante in 25878
* Add proper Falcon docs and conversion script by Rocketknight1 in 25954
* Put Falcon back by LysandreJik in 25960
* [`Falcon`] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by younesbelkada in 25947

Code Llama

Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

* [`CodeLlama`] Add support for `CodeLlama` by ArthurZucker in 25740
* [`CodeLlama`] Fix CI by ArthurZucker in 25890

ViTDet

ViTDet reuses the ViT model architecture, adapted to object detection.

* Add ViTDet by NielsRogge in 25524

DINO v2

DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.

* [DINOv2] Add backbone class by NielsRogge in 25520

VITS

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.

* add VITS model by hollance in 24085

Breaking changes:
* 🚨🚨🚨 [`Refactor`] Move third-party related utility files into `integrations/` folder 🚨🚨🚨 by younesbelkada in 25599

Moves all third party libs (outside HF ecosystem) related utility files inside `integrations/` instead of having them in `transformers` directly.

In order to get the previous usage you should be changing your call to the following:
diff
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig


Bugfixes and improvements

* [DOCS] MusicGen Docs Update by xNul in 25510
* [MINOR:TYPO] by cakiki in 25646
* Pass the proper token to PEFT integration in auto classes by sgugger in 25649
* Put IDEFICS in the right section of the doc by sgugger in 25650
* TF 2.14 compatibility by Rocketknight1 in 25630
* Fix bloom add prefix space by ArthurZucker in 25652
* removing unnecesssary extra parameter by rafaelpadilla in 25643
* Adds `TRANSFORMERS_TEST_BACKEND` by vvvm23 in 25655
* stringify config by AleksanderWWW in 25637
* Add input_embeds functionality to gpt_neo Causal LM by gaasher in 25659
* Update doc toctree by ydshieh in 25661
* Add Llama2 resources by wonhyeongseo in 25531
* [`SPM`] Patch `spm` Llama and T5 by ArthurZucker in 25656
* [`GPTNeo`] Add input_embeds functionality to gpt_neo Causal LM by ArthurZucker in 25664
* fix wrong path in some doc by ydshieh in 25658
* Remove `utils/documentation_tests.txt` by ydshieh in 25680
* Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by norabelrose in 24941
* ⚠️ [CLAP] Fix dtype of logit scales in init by sanchit-gandhi in 25682
* Sets the stalebot to 10 AM CEST by LysandreJik in 25678
* Fix `pad_token` check condition by ydshieh in 25685
* [DOCS] Added docstring example for EpsilonLogitsWarper 24783 by sanjeevk-os in 25378
* correct resume training steps number in progress bar by pphuc25 in 25691
* Generate: general test for decoder-only generation from `inputs_embeds` by gante in 25687
* Fix typo in `configuration_gpt2.py` by susnato in 25676
* fix ram efficient fsdp init by pacman100 in 25686
* [`LlamaTokenizer`] make unk_token_length a property by ArthurZucker in 25689
* Update list of persons to tag by sgugger in 25708
* docs: Resolve typos in warning text by tomaarsen in 25711
* Fix failing `test_batch_generation` for bloom by ydshieh in 25718
* [`PEFT`] Fix peft version by younesbelkada in 25710
* Fix number of minimal calls to the Hub with peft integration by sgugger in 25715
* [`AutoGPTQ`] Add correct installation of GPTQ library + fix slow tests by younesbelkada in 25713
* Generate: nudge towards `do_sample=False` when `temperature=0.0` by gante in 25722
* [`from_pretrained`] Simpler code for peft by ArthurZucker in 25726
* [idefics] idefics-9b test use 4bit quant by stas00 in 25734
* ImageProcessor - check if input pixel values between 0-255 by amyeroberts in 25688
* [`from_pretrained`] Fix failing PEFT tests by younesbelkada in 25733
* [ASR Pipe Test] Fix CTC timestamps error message by sanchit-gandhi in 25727
* 🌐 [i18n-KO] Translated `visual_question_answering.md` to Korean by wonhyeongseo in 25679
* [`PEFT`] Fix PeftConfig save pretrained when calling `add_adapter` by younesbelkada in 25738
* fixed typo in speech encoder decoder doc by asusevski in 25745
* Add FlaxCLIPTextModelWithProjection by pcuenca in 25254
* Generate: add missing logits processors docs by gante in 25653
* [DOCS] Add example for HammingDiversityLogitsProcessor by jessthebp in 25481
* Generate: logits processors are doctested and fix broken doctests by gante in 25692
* [CLAP] Fix logit scales dtype for fp16 by sanchit-gandhi in 25754
* [`Sentencepiece`] make sure `legacy` do not require `protobuf` by ArthurZucker in 25684
* fix encoder hook by SunMarc in 25735
* Docs: fix indentation in `HammingDiversityLogitsProcessor` by gante in 25756
* Add type hints for several pytorch models (batch-3) by nablabits in 25705
* Correct attention mask dtype for Flax GPT2 by liutianlin0121 in 25636
* fix a typo in docsting by statelesshz in 25759
* [idefics] small fixes by stas00 in 25764
* Add docstrings and fix VIVIT examples by Geometrein in 25628
* [`LlamaFamiliy`] add a tip about dtype by ArthurZucker in 25794
* Add type hints for several pytorch models (batch-2) by nablabits in 25557
* Add type hints for pytorch models (final batch) by nablabits in 25750
* Add type hints for several pytorch models (batch-4) by nablabits in 25749
* [idefics] fix vision's `hidden_act` by stas00 in 25787
* Arde/fsdp activation checkpointing by arde171 in 25771
* Fix incorrect Boolean value in deepspeed example by tmm1 in 25788
* fixing name position_embeddings to object_queries by Lorenzobattistela in 24652
* Resolving Attribute error when using the FSDP ram efficient feature by pacman100 in 25820
* [`Docs`] More clarifications on BT + FA by younesbelkada in 25823
* fix register by zspo in 25779
* Minor wording changes for Code Llama by osanseviero in 25815
* [`LlamaTokenizer`] `tokenize` nits. by ArthurZucker in 25793
* fix warning trigger for embed_positions when loading xglm by MattYoon in 25798
* 🌐 [i18n-KO] Translated peft.md to Korean by nuatmochoi in 25706
* 🌐 [i18n-KO] `model_memory_anatomy.md` to Korean by mjk0618 in 25755
* Error with checking args.eval_accumulation_steps to gather tensors by chaumng in 25819
* Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by gante in 25763
* 🌐 [i18n-KO] Translated `add_new_pipeline.md` to Korean by heuristicwave in 25498
* 🌐 [i18n-KO] Translated `community.md` to Korean by sim-so in 25674
* 🀦update warning to If you want to use the new behaviour, set `legacy=… by ArthurZucker in 25833
* update remaining `Pop2Piano` checkpoints by susnato in 25827
* [AutoTokenizer] Add data2vec to mapping by sanchit-gandhi in 25835
* MaskFormer,Mask2former - reduce memory load by amyeroberts in 25741
* Support loading base64 images in pipelines by InventivetalentDev in 25633
* Update README.md by NinoRisteski in 25834
* Generate: models with custom `generate()` return `True` in `can_generate()` by gante in 25838
* Update README.md by NinoRisteski in 25832
* minor typo fix in PeftAdapterMixin docs by tmm1 in 25829
* Add flax installation in daily doctest workflow by ydshieh in 25860
* Add Blip2 model in VQA pipeline by jpizarrom in 25532
* Remote tools are turned off by LysandreJik in 25867
* Fix imports by ydshieh in 25869
* fix max_memory for bnb by SunMarc in 25842
* Docs: fix example failing doctest in `generation_strategies.md ` by gante in 25874
* pin pandas==2.0.3 by ydshieh in 25875
* Reduce CI output by ydshieh in 25876
* [ViTDet] Fix doc tests by NielsRogge in 25880
* For xla tensors, use an alternative way to get a unique id by qihqi in 25802
* fix ds z3 checkpointing when `stage3_gather_16bit_weights_on_model_save=False` by pacman100 in 25817
* Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by veezbo in 25807
* [`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed by ArthurZucker in 25626
* Save image_processor while saving pipeline (ImageSegmentationPipeline) by raghavanone in 25884
* [`InstructBlip`] FINAL Fix instructblip test by younesbelkada in 25887
* Add type hints for tf models batch 1 by nablabits in 25853
* Update `setup.py` by ydshieh in 25893
* Smarter check for `is_tensor` by sgugger in 25871
* remove torch_dtype override by SunMarc in 25894
* fix FSDP model resume optimizer & scheduler by pkumc in 25852
* Better error message for pipeline loading by ydshieh in 25912
* Remove broken docs for MusicGen by osanseviero in 25905
* Revert frozen training arguments by muellerzr in 25903
* [VITS] Add to TTA pipeline by sanchit-gandhi in 25906
* [MMS] Update docs with HF TTS implementation by sanchit-gandhi in 25907
* [VITS] Only trigger tokenizer warning for uroman by sanchit-gandhi in 25915
* Update-llama-code by ArthurZucker in 25826
* Update model_memory_anatomy.md by NinoRisteski in 25896
* Skip offload tests for `ViTDet` by ydshieh in 25913
* Fix typos by omahs in 25936
* Update community.md by NinoRisteski in 25928
* Update autoclass_tutorial.md by NinoRisteski in 25929
* Update README.md by NinoRisteski in 25941
* [MMS] Fix pip install in docs by sanchit-gandhi in 25949
* [VITS] Handle deprecated weight norm by sanchit-gandhi in 25946
* Import deepspeed utilities from integrations by osanseviero in 25919
* Update README.md by NinoRisteski in 25922
* [VITS] Fix init test by sanchit-gandhi in 25945
* Fix failing test by LysandreJik in 25963
* Fix smart check by ydshieh in 25955
* Add type hints for tf models final batch by nablabits in 25883


Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nablabits
* Add type hints for several pytorch models (batch-3) (25705)
* Add type hints for several pytorch models (batch-2) (25557)
* Add type hints for pytorch models (final batch) (25750)
* Add type hints for several pytorch models (batch-4) (25749)
* Add type hints for tf models batch 1 (25853)
* Add type hints for tf models final batch (25883)
* Lorenzobattistela
* fixing name position_embeddings to object_queries (24652)
* hollance
* add VITS model (24085)

4.32.1

Not secure
Patch release including several patches from v4.31.0, listed below:

- Put IDEFICS in the right section of the doc (25650)
- removing unnecesssary extra parameter (25643)
- [SPM] Patch spm Llama and T5 (25656)
- Fix bloom add prefix space (25652)
- Generate: add missing logits processors docs (25653)
- [idefics] small fixes (25764)

4.32.0

Not secure
IDEFICS

The IDEFICS model was proposed in [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: [hf.co/blog/idefics](http://huggingface.co/blog/idefics)
Playground: [HuggingFaceM4/idefics_playground](http://huggingface.co/spaces/HuggingFaceM4/idefics_playground)

![image](https://github.com/huggingface/transformers/assets/30755778/a69feb0c-34ea-45f7-9d31-9e1162247d7e)

* new model: IDEFICS via HuggingFaceM4 by stas00 in 24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

* [`MPT`] Add MosaicML's `MPT` model to transformers by ArthurZucker & younesbelkada in 24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the `optimum` library. The backend relies on the [auto_gptq](https://github.com/PanQiWei/AutoGPTQ) library, from which we use the `GPTQ` and `QuantLinear` classes.

See below for an example of the API, quantizing a model using the new `GPTQConfig` configuration utility.

py
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)


Most models under [TheBloke](https://huggingface.co/TheBloke) namespace with the suffix `GPTQ` should be supported, for example, to load a GPTQ quantized model on `TheBloke/Llama-2-13B-chat-GPTQ` simply run (after installing latest optimum and auto-gptq libraries):

python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

* GPTQ integration by SunMarc in 25062

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into `transformers`: `SpeechT5ForTextToSpeech`, `MusicGen` and `Bark`.

See below for an example:
py
from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]



* Add Text-To-Speech pipeline by ylacombe in 24952

Classifier-Free Guidance decoding

Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in [this paper](https://arxiv.org/abs/2306.17806). With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its [docs](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.UnbatchedClassifierFreeGuidanceLogitsProcessor) for usage instructions.

* add CFG for .generate() by Vermeille in 24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

* VQA task guide by MKhalusova in 25244

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

* Deprecate unused OpenLlama architecture by tomaarsen in 24922

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

* 🌐 [i18n-KO] Translated`tasks/document_question_answering.md` to Korean by jungnerd in 24588
* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` by wonhyeongseo in 24664
* 🌐 [i18n-KO] Updated Korean `serialization.md` by wonhyeongseo in 24686
* 🌐 [i18n-KO] Translated performance.md to Korean by augustinLib in 24883
* 🌐 [i18n-KO] Translated `testing.md` to Korean by Sunmin0520 in 24900
* 🌐 [i18n-KO] Translated `perf_train_cpu.md` to Korean by seank021 in 24911
* 🌐 [i18n-KO] Translated `<tf_xla>.md` to Korean by 54data in 24904
* 🌐 [i18n-KO] Translated `perf_hardware.md` to Korean by augustinLib in 24966
* 🌐 [i18n-KO] Translated `hpo_train.md` to Korean by harheem in 24968
* 🌐 [i18n-KO] Translated `perf_infer_cpu.md` to Korean by junejae in 24920
* 🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by kihoon71 in 24828
* 🌐 [i18n-KO] Translated `transformers_agents.md` to Korean by sim-so in 24881
* 🌐 [i18n-KO] Translated `perf_infer_gpu_many.md` to Korean by heuristicwave in 24943
* 🌐 [i18n-KO] Translated `perf_infer_gpu_one.md` to Korean by eenzeenee in 24978
* 🌐 [i18n-KO] Translated `add_tensorflow_model.md` to Korean by keonju2 in 25017
* 🌐 [i18n-KO] Translated `perf_train_cpu_many.md` to Korean by nuatmochoi in 24923
* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean by mjk0618 in 24957
* 🌐 [i18n-KO] Translated `model_summary.md` to Korean by 0525hhgus in 24625
* 🌐 [i18n-KO] Translated `philosophy.md` to Korean by TaeYupNoh in 25010
* 🌐 [i18n-KO] Translated `perf_train_tpu_tf.md` to Korean by 0525hhgus in 25433
* 🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by sronger in 24987

Explicit input data format for image processing

Addition of `input_data_format` argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

python
import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")


* Input data format by amyeroberts in 25464
* Add input_data_format argument, image transforms by amyeroberts in 25462

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Users are not aware that it is possible to force dispatch `torch.scaled_dot_product_attention` method from `torch` to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

* [Docs / BetterTransformer ] Added more details about flash attention + SDPA : https://github.com/huggingface/transformers/pull/25265

In a nutshell, one can just run:

diff
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass `fsdp_transformer_layer_cls_to_wrap` as the code now use `_no_split_modules` by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

* add util for ram efficient loading of model when using fsdp by pacman100 in 25107
* fix fsdp checkpointing issues by pacman100 in 24926
* fsdp fixes and enhancements by pacman100 in 24980
* fix deepspeed load best model at end when the model gets sharded by pacman100 in 25057
* resolving zero3 init when using accelerate config with Trainer by pacman100 in 25227
* fix z3 init when using accelerate launcher by pacman100 in 25589

Breaking changes

Default optimizer in the `Trainer` class

The default optimizer in the `Trainer` class has been updated to be `adam_torch` rather than our own `adam_hf`, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the `optim` value in your `TrainingArguments`.

* 🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 by muellerzr in 25109

ViVit and EfficientNet rescale bugfix

There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.

* 🚨🚨🚨 Fix rescale ViVit Efficientnet by amyeroberts in 25174
* 🚨🚨🚨 Vivit update default rescale_factor value by amyeroberts in 25547

Removing softmax for the image classification EfficientNet class

The `EfficientNetForImageClassification` model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

* 🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by amyeroberts in 25501

Bug fixes with SPM models

Some SPM models had issues with their management of added tokens. Namely the `Llama` and `T5`, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.

An option to obtain the previous behavior was added through the `legacy` flag, as explained in the PR linked above.

* 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 by ArthurZucker in 25224

Bugfixes and improvements

* Disable ipex env var if false by muellerzr in 24885
* Check for accelerate env var when doing CPU only by muellerzr in 24890
* Avoid some pipeline tasks to use `use_cache=True` by ydshieh in 24893
* Update tested versions in READMEs by EliahKagan in 24895
* Fix `test_model_parallelism` for `FalconModel` by ydshieh in 24914
* Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by madhavajay in 24907
* fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by 21jun in 24902
* Fix minor llama2.md model doc typos by tmc in 24909
* [`Llama2`] replace `self.pretraining_tp` with `self.config.pretraining_tp` by younesbelkada in 24906
* [doc] `image_processing_vilt.py` wrong default documented by stas00 in 24931
* Add multi-label text classification support to pytorch example by ranchlai in 24770
* replace no_cuda with use_cpu in test_pytorch_examples by statelesshz in 24944
* Generate: sequence bias can handle same terminations by gante in 24822
* Update processing_vision_text_dual_encoder.py by premsa in 24950
* Fix `main_input_name` in `src/transformers/keras_callbacks.py` by ydshieh in 24916
* [DOCS] Example for `LogitsProcessor` class by shauray8 in 24848
* fix type annotations for arguments in training_args by shauray8 in 24550
* [`RWKV`] Add Gradient Checkpointing support for RWKV by younesbelkada in 24955
* Change logic for logging in the examples by muellerzr in 24956
* Contrastive Search peak memory reduction by blbadger in 24120
* Fallback for missing attribute `Parameter.ds_numel` by apoorvkh in 24942
* fix fsdp checkpointing issues by pacman100 in 24926
* fix: cast input pixels to appropriate dtype for image_to_text pipelines by JimAllanson in 24947
* fsdp fixes and enhancements by pacman100 in 24980
* Fix missing spaces in system prompt of Llama2 tokenizer by chenjoya in 24930
* [`LlamaConfig`] Nit: pad token should be None by default by ArthurZucker in 24958
* Remove tokenizers from the doc table by sgugger in 24963
* Avoid importing all models when instantiating a pipeline by sgugger in 24960
* Fix type annotation for deepspeed training arg by sgugger in 24988
* Use main_input_name for include_inputs_for_metrics by sgugger in 24993
* Fix `llama` tokenization doctest by ydshieh in 24990
* [`bnb`] Add simple check for bnb import by younesbelkada in 24995
* [`Llama`] remove persistent `inv_freq` tensor by ArthurZucker in 24998
* improve from_pretrained for zero3 multi gpus mode by 1ytic in 24964
* Move template doc file to md by sgugger in 25004
* [check_config_docstrings.py] improve diagnostics by stas00 in 25012
* [`logging.py`] set default `stderr` path if `None` by ArthurZucker in 25033
* fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. by parambharat in 25035
* [docs] Performance docs tidy up, part 1 by MKhalusova in 23963
* Support GatedRepoError + use raise from by Wauplin in 25034
* Better handling missing SYS in llama conversation tokenizer by ichernev in 24997
* Add dispatch_batches to training arguments by muellerzr in 25038
* Fix typo in LlamaTokenizerFast docstring example by sbrunk in 25018
* Make more test models smaller by sgugger in 25005
* Pvt model by Xrenya in 24720
* compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by njbrake in 25044
* [`8bit`] Fix 8bit corner case with Blip2 8bit by younesbelkada in 25047
* Better error message when signal is not supported on OS by sgugger in 25049
* [`RWKV`] Add note in doc on `RwkvStoppingCriteria` by ArthurZucker in 25055
* Generate - add beam indices output in contrained beam search by gante in 25042
* [Docs] fix rope_scaling doc string by kashif in 25072
* Fix last models for common tests that are too big. by sgugger in 25058
* fix: add TOC anchor link by eenzeenee in 25066
* Set `TF32` flag for PyTorch cuDNN backend by XuehaiPan in 25075
* Fix broken link in README_hd.md by susnato in 25067
* replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme of multiple-choice task by statelesshz in 25078
* [`generate`] Only warn users if the `generation_config`'s `max_length` is set to the default value by ArthurZucker in 25030
* Fix: repeat per sample for SAM image embeddings by xk-huang in 25074
* [DOCS] add example NoBadWordsLogitsProcessor by SoyGema in 25046
* Allow generic composite models to pass more kwargs by ydshieh in 24927
* [ `ForSequenceClassification`] Support `left` padding by ArthurZucker in 24979
* [`TF`] Also apply patch to support left padding by ArthurZucker in 25085
* Edit err message and comment in `test_model_is_small` by connor-henderson in 25087
* [ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer by ArthurZucker in 25053
* Hotfix for failing `MusicgenForConditionalGeneration` tests by ydshieh in 25091
* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification by sjrl in 24726
* Fix doctest by ydshieh in 25031
* fix tied_params for meta tensor by SunMarc in 25101
* documentation for llama2 models by shauray8 in 25102
* Fix `PvtModelIntegrationTest::test_inference_fp16` by ydshieh in 25106
* Add descriptive docstring to TemperatureLogitsWarper by nablabits in 24892
* fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by liucw2012 in 24772
* update `use_auth_token` -> `token` by ydshieh in 25083
* Fix past CI after 24334 by ydshieh in 25113
* Move common image processing methods to BaseImageProcessor by amyeroberts in 25089
* Fix ViT docstring regarding default dropout values. by ebezzam in 25118
* MaskFormer - enable return_dict in order to compile by amyeroberts in 25052
* Move center_crop to BaseImageProcessor by amyeroberts in 25122
* fix deepspeed load best model at end when the model gets sharded by pacman100 in 25057
* fix delete all checkpoints when save_total_limit is set to 1 by Pbihao in 25136
* [`T5/LlamaTokenizer`] default legacy to `None` to not always warn by ArthurZucker in 25131
* Clarify 4/8 bit loading log message by BramVanroy in 25134
* [`MptConfig`] support from pretrained args by ArthurZucker in 25116
* Add offload support to Bark by ylacombe in 25037
* More `token` things by ydshieh in 25146
* Add bloom flax by sanchit-gandhi in 25094
* Add new model in doc table of content by sgugger in 25148
* Fix `.push_to_hub` and cleanup `get_full_repo_name` usage by Wauplin in 25120
* Add test when downloading from gated repo by Wauplin in 25039
* override .cuda() to check if model is already quantized by ranchlai in 25166
* Represent query_length in a different way to solve jit issue by jiqing-feng in 25164
* make run_generation more generic for other devices by statelesshz in 25133
* added compiled model support for inference by markovalexander in 25124
* Update `use_auth_token` -> `token` in example scripts by ydshieh in 25167
* [`Mpt`] Fix mpt slow test by younesbelkada in 25170
* [`InstructBlip`] Fix instructblip slow test by younesbelkada in 25171
* Fix beam search to sample at least 1 non eos token by yonigottesman in 25103)
* [MusicGen] Fix integration tests by sanchit-gandhi in 25169
* Musicgen: CFG is manually added by gante in 25173
* Better error message in `_prepare_output_docstrings` by ydshieh in 25202
* [`PreTrainedModel`] Wrap `cuda` and `to` method correctly by younesbelkada in 25206
* Fix `all_model_classes` in `FlaxBloomGenerationTest` by ydshieh in 25211
* [quantization.md] fix by stas00 in 25190
* [`pipeline`] revisit device check for pipeline by younesbelkada in 25207
* Update tiny model info. and pipeline testing by ydshieh in 25213
* Fix docker image build failure by ydshieh in 25214
* make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by sywangyi in 25193
* [`Pix2Struct`] Fix pix2struct cross attention by younesbelkada in 25200
* [`Docs`/`quantization`] Clearer explanation on how things works under the hood. + remove outdated info by younesbelkada in 25216
* [`MPT`] Add `require_bitsandbytes` on MPT integration tests by younesbelkada in 25201
* [`Detr`] Fix detr BatchNorm replacement issue by younesbelkada in 25230
* Move rescale dtype recasting to match torchvision ToTensor by amyeroberts in 25229
* Fix set of model parallel in the Trainer when no GPUs are available by sgugger in 25239
* fix get_keys_to_not_convert() to return correct modules for full precision inference by ranchlai in 25105
* add pathname and line number to logging formatter in debug mode by ranchlai in 25203
* Add `token` arugment in example scripts by ydshieh in 25172
* resolving zero3 init when using accelerate config with Trainer by pacman100 in 25227
* Update rescale tests - cast to float after rescaling to reflect 25229 by amyeroberts in 25259
* Fix some bugs for two stage training of deformable detr by jypjypjypjyp in 25045
* [DOCS] Add example and modified docs of EtaLogitsWarper by ashishthomaschempolil in 25125
* Fix return_dict_in_generate bug in InstructBlip generate function by eohomegrownapps in 25246
* Remove `pytest_options={"rA": None}` in CI by ydshieh in 25263
* recommend DeepSpeed's Argument Parsing documentation by BurnzZ in 25268
* [MMS] Fix mms by patrickvonplaten in 25267
* CI with `num_hidden_layers=2` πŸš€πŸš€πŸš€ by ydshieh in 25266
* CI with `pytest_num_workers=8` for torch/tf jobs by ydshieh in 25274
* Docs: Update list of `report_to` logging integrations in docstring by tomaarsen in 25281
* Update InstructBLIP & Align values after rescale update by amyeroberts in 25209
* Docs: separate generate section by gante in 25235
* Update bark doc by ylacombe in 25234
* add generate method to SpeechT5ForTextToSpeech by ylacombe in 25233
* Add timeout parameter to load_image function by rolisz in 25184
* [JAX] Bump min version by sanchit-gandhi in 25286
* [small] llama2.md typo by H-Huang in 25295
* Fix typo: Roberta -> RoBERTa by MrGeislinger in 25302
* Move usage of deprecated logging.warn to logging.warning by PeterJCLaw in 25310
* Give more memory in test_disk_offload by sgugger in 25315
* Generate: get generation mode as an enum by gante in 25292
* Add offline mode for agents by sgugger in 25226
* Deal with nested configs better in base class by sgugger in 25237
* Document check copies by sgugger in 25291
* Make `bark` could have tiny model by ydshieh in 25290
* Document toc check and doctest check scripts by sgugger in 25319
* [Whisper] Better error message for outdated generation config by sanchit-gandhi in 25298
* Remove jnp.DeviceArray since it is deprecated. by mariecwhite in 24875
* Update TF pin in docker image by ydshieh in 25343
* Generalize CFG to allow for positive prompts by oobabooga in 25339
* Loosen output shape restrictions on GPT-style models by calpt in 25188
* Allow `trust_remote_code` in example scripts by Jackmin801 in 25248
* Generate: remove Marian hack by gante in 25294
* Fix more offload edge cases by ydshieh in 25342
* Migrate Trainer from `Repository` to `upload_folder` by sgugger in 25095
* Adding more information in help parser on train_file and validation_file by pphuc25 in 25324
* [DOCS] Add `NoRepeatNGramLogitsProcessor` Example for `LogitsProcessor` class by Rishab26 in 25186
* Docs: Added benchmarks for `torch.compile()`Β for vision models by merveenoyan in 24748
* Add mask2former fp16 support by pedrohml in 25093
* [DOCS] Add descriptive docstring to MinNewTokensLength by nablabits in 25196
* Register ModelOutput subclasses as supported torch.utils._pytree nodes by ringohoffman in 25358
* Fix `test_model_parallelism` by ydshieh in 25359
* Add warning for missing attention mask when pad tokens are detected by hackyon in 25345
* [ASR Pipeline] Clarify return timestamps by sanchit-gandhi in 25344
* MaskFormer, Mask2Former - replace einsum for tracing by amyeroberts in 25297
* Load state in else by muellerzr in 25318
* Fix `token` in example template by ydshieh in 25351
* Enable tests to run on third-party devcies by statelesshz in 25327
* Fix `torch_job` worker(s) crashing by ydshieh in 25374
* Generate: add config-level validation by gante in 25381
* Fix missing usage of `token` by ydshieh in 25382
* Use small config for `OneFormerModelTest.test_model_with_labels` by ydshieh in 25383
* Add copied from for image processor methods by amyeroberts in 25121
* change version by SunMarc in 25387
* [DOCS] Add example for `TopPLogitsWarper` by chiral-carbon in 25361
* 16059 - Add missing type hints for ASTModel by nablabits in 25364
* rm useless condition since the previous condition contains it. by jiqing-feng in 25403
* Fix path for dynamic module creation by sgugger in 25402
* YOLOS - Revert default return_pixel_mask value by amyeroberts in 25404
* Docs: introduction to generation with LLMs by gante in 25240
* Generate: length validation by gante in 25384
* Improve training args by statelesshz in 25401
* Generate: generation config validation fixes in docs by gante in 25405
* 16059 - Add extra type hints for AltCLIPModel by nablabits in 25399
* Generate: lower severity of parameterization checks by gante in 25407
* Update Bark generation configs and tests by ylacombe in 25409
* aligned sample_beam output selection with beam_search by hukuda222 in 25375
* Enable passing number of channels when inferring data format by amyeroberts in 25412
* Bark: flexible generation config overload by gante in 25414
* [DINOv2] Update pooler output by NielsRogge in 25392
* Doc checks by sgugger in 25408
* Generation: strict generation config validation at save time by gante in 25411
* [WavLM] Fix Arxiv link and authors by sanchit-gandhi in 25415
* Generate: Load generation config when `device_map` is passed by gante in 25413
* Fix rendering for `torch.compile()` docs by merveenoyan in 25432
* Add `examples` to tests to run when `setup.py` is modified by ydshieh in 25437
* Fix issue with ratio evaluation steps and auto find batch size by muellerzr in 25436
* docs: add LLaMA-Efficient-Tuning to awesome-transformers by statelesshz in 25441
* Fix for 25437 by ydshieh in 25454
* Refactor image processor testers by amyeroberts in 25450
* Switch Transformers: remove overwritten beam sample test by gante in 25458
* Reuse the cache created for latest `main` on PRs/branches if `setup.py` is not modified by ydshieh in 25445
* Update run_translation.py broken link example Pytoch by SoyGema in 25461
* Add input_data_format argument, image transforms by amyeroberts in 25462
* Mark flaky tests by amyeroberts in 25463
* Revert "Reuse the cache created for latest `main` on PRs/branches" by ydshieh in 25466
* import required torch and numpy libraries by eze1376 in 25483
* fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by nour-elkamel in 25472
* Remove logging code in TF Longformer that fails to compile by Rocketknight1 in 25496
* Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by nablabits in 25488
* Set can_generate for SpeechT5ForTextToSpeech by ylacombe in 25493
* MaskFormer post_process_instance_segmentation bug fix convert out side of loop by amyeroberts in 25497
* fix gptq nits by SunMarc in 25500
* Conditional DETR type hint fix by Rocketknight1 in 25505
* Check for case where `auxiliary_head` is `None` in `UperNetPreTrainedModel` by mmurray in 25514
* add __repr__ to the BitsAndBytesConfig class by ranchlai in 25517
* Make training args fully immutable by muellerzr in 25435
* Use dynamic past key-values shape in TF-Whisper by Rocketknight1 in 25523
* [TYPO] fix typo/format in quicktour.md by lishukan in 25519
* Fix nested configs of Jukebox by sgugger in 25533
* Marian: post-hack-fix correction by gante in 25459
* Document the test fetcher by sgugger in 25521
* Generate: fix default max length warning by gante in 25539
* fix vit hybrid test by SunMarc in 25543
* Fix `MaskFormerModelIntegrationTest` OOM by ydshieh in 25544
* More frozen args by muellerzr in 25540
* Input data format by amyeroberts in 25464
* [ASR Pipeline] Fix init with timestamps by sanchit-gandhi in 25438
* More utils doc by sgugger in 25457
* Update trainer.py by yundai424 in 25553
* Add documentation to dynamic module utils by sgugger in 25534
* Fix MPT CI by ydshieh in 25548
* Fix `torch.fx` tests on nightly CI by ydshieh in 25549
* YOLOS - reset default return_pixel_mask value by amyeroberts in 25559
* Skip `test_onnx_runtime_optimize` for now by ydshieh in 25560
* [`Docs`] Fix un-rendered images by younesbelkada in 25561
* Adds `TRANSFORMERS_TEST_DEVICE` by vvvm23 in 25506
* Skip `test_beam_search_xla_generate_simple` for `T5` by ydshieh in 25566
* [`resize_embedding`] Introduce `pad_to_multiple_of` and guidance by ArthurZucker in 25088
* [`SwitchTransformers`] Remove unused module by ArthurZucker in 25427
* Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by sinamoeini in 25394
* [`NllbMoe`] Update code to properly support loss computation by ArthurZucker in 25429
* [`Tests`] Fix failing 8bit test by younesbelkada in 25564
* Revert "change version by SunMarc in 25387)"
* add util for ram efficient loading of model when using fsdp by pacman100 in 25107
* Skip `test_contrastive_generate` for `TFXLNet` by ydshieh in 25574
* add warning for 8bit optimizers by SunMarc in 25575
* Fix typo in example code by amelietamreymond in 25583
* Suggestions on Pipeline_webserver by kihoon71 in 25570
* [`Docs` / `BetterTransformer` ] Added more details about flash attention + SDPA by younesbelkada in 25265
* Added missing parenthesis in call to is_fsdp_enabled by marma in 25585
* Replaces calls to `.cuda` with `.to(torch_device)` in tests by vvvm23 in 25571
* [`split_special_tokens`] Add support for `split_special_tokens` argument to encode by ArthurZucker in 25081
* [`Llama`] remove prompt and fix prefix finetuning by ArthurZucker in 25565
* [Time series Informer] fix dtype of cumsum by kashif in 25431
* fix z3 init when using accelerate launcher by pacman100 in 25589
* [`TokenizerFast`] Fix setting prefix space in __init__ by ArthurZucker in 25563
* Make TTS automodels importable by osanseviero in 25595
* reattach hooks when using `resize_token_embeddings` by SunMarc in 25596
* Ignore all exceptions from signal in dynamic code by sgugger in 25623
* Fix PEFT integration failures on nightly CI by younesbelkada in 25624
* Run doctest for new files by ydshieh in 25588
* Fix test_modeling_mpt typo in model id by JuanFKurucz in 25606

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* ranchlai
* Add multi-label text classification support to pytorch example (24770)
* override .cuda() to check if model is already quantized (25166)
* fix get_keys_to_not_convert() to return correct modules for full precision inference (25105)
* add pathname and line number to logging formatter in debug mode (25203)
* add __repr__ to the BitsAndBytesConfig class (25517)
* wonhyeongseo
* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` (24664)
* 🌐 [i18n-KO] Updated Korean `serialization.md` (24686)
* Sunmin0520
* 🌐 [i18n-KO] Translated `testing.md` to Korean (24900)
* Xrenya
* Pvt model (24720)
* susnato
* Fix broken link in README_hd.md (25067)
* Add Pop2Piano (21785)
* sjrl
* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (24726)
* Jackmin801
* Allow `trust_remote_code` in example scripts (25248)
* mjk0618
* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean (24957)

Page 8 of 30

Β© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.