Transformers

Latest version: v4.48.3

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 31

4.32.0

Not secure
IDEFICS

The IDEFICS model was proposed in [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: [hf.co/blog/idefics](http://huggingface.co/blog/idefics)
Playground: [HuggingFaceM4/idefics_playground](http://huggingface.co/spaces/HuggingFaceM4/idefics_playground)

![image](https://github.com/huggingface/transformers/assets/30755778/a69feb0c-34ea-45f7-9d31-9e1162247d7e)

* new model: IDEFICS via HuggingFaceM4 by stas00 in 24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

* [`MPT`] Add MosaicML's `MPT` model to transformers by ArthurZucker & younesbelkada in 24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the `optimum` library. The backend relies on the [auto_gptq](https://github.com/PanQiWei/AutoGPTQ) library, from which we use the `GPTQ` and `QuantLinear` classes.

See below for an example of the API, quantizing a model using the new `GPTQConfig` configuration utility.

py
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)


Most models under [TheBloke](https://huggingface.co/TheBloke) namespace with the suffix `GPTQ` should be supported, for example, to load a GPTQ quantized model on `TheBloke/Llama-2-13B-chat-GPTQ` simply run (after installing latest optimum and auto-gptq libraries):

python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

* GPTQ integration by SunMarc in 25062

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into `transformers`: `SpeechT5ForTextToSpeech`, `MusicGen` and `Bark`.

See below for an example:
py
from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]



* Add Text-To-Speech pipeline by ylacombe in 24952

Classifier-Free Guidance decoding

Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in [this paper](https://arxiv.org/abs/2306.17806). With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its [docs](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.UnbatchedClassifierFreeGuidanceLogitsProcessor) for usage instructions.

* add CFG for .generate() by Vermeille in 24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

* VQA task guide by MKhalusova in 25244

Model deprecation

We continue the deprecation of models that was introduced in https://github.com/huggingface/transformers/pull/24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

* Deprecate unused OpenLlama architecture by tomaarsen in 24922

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

* 🌐 [i18n-KO] Translated`tasks/document_question_answering.md` to Korean by jungnerd in 24588
* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` by wonhyeongseo in 24664
* 🌐 [i18n-KO] Updated Korean `serialization.md` by wonhyeongseo in 24686
* 🌐 [i18n-KO] Translated performance.md to Korean by augustinLib in 24883
* 🌐 [i18n-KO] Translated `testing.md` to Korean by Sunmin0520 in 24900
* 🌐 [i18n-KO] Translated `perf_train_cpu.md` to Korean by seank021 in 24911
* 🌐 [i18n-KO] Translated `<tf_xla>.md` to Korean by 54data in 24904
* 🌐 [i18n-KO] Translated `perf_hardware.md` to Korean by augustinLib in 24966
* 🌐 [i18n-KO] Translated `hpo_train.md` to Korean by harheem in 24968
* 🌐 [i18n-KO] Translated `perf_infer_cpu.md` to Korean by junejae in 24920
* 🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by kihoon71 in 24828
* 🌐 [i18n-KO] Translated `transformers_agents.md` to Korean by sim-so in 24881
* 🌐 [i18n-KO] Translated `perf_infer_gpu_many.md` to Korean by heuristicwave in 24943
* 🌐 [i18n-KO] Translated `perf_infer_gpu_one.md` to Korean by eenzeenee in 24978
* 🌐 [i18n-KO] Translated `add_tensorflow_model.md` to Korean by keonju2 in 25017
* 🌐 [i18n-KO] Translated `perf_train_cpu_many.md` to Korean by nuatmochoi in 24923
* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean by mjk0618 in 24957
* 🌐 [i18n-KO] Translated `model_summary.md` to Korean by 0525hhgus in 24625
* 🌐 [i18n-KO] Translated `philosophy.md` to Korean by TaeYupNoh in 25010
* 🌐 [i18n-KO] Translated `perf_train_tpu_tf.md` to Korean by 0525hhgus in 25433
* 🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by sronger in 24987

Explicit input data format for image processing

Addition of `input_data_format` argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

python
import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")


* Input data format by amyeroberts in 25464
* Add input_data_format argument, image transforms by amyeroberts in 25462

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Users are not aware that it is possible to force dispatch `torch.scaled_dot_product_attention` method from `torch` to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

* [Docs / BetterTransformer ] Added more details about flash attention + SDPA : https://github.com/huggingface/transformers/pull/25265

In a nutshell, one can just run:

diff
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass `fsdp_transformer_layer_cls_to_wrap` as the code now use `_no_split_modules` by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

* add util for ram efficient loading of model when using fsdp by pacman100 in 25107
* fix fsdp checkpointing issues by pacman100 in 24926
* fsdp fixes and enhancements by pacman100 in 24980
* fix deepspeed load best model at end when the model gets sharded by pacman100 in 25057
* resolving zero3 init when using accelerate config with Trainer by pacman100 in 25227
* fix z3 init when using accelerate launcher by pacman100 in 25589

Breaking changes

Default optimizer in the `Trainer` class

The default optimizer in the `Trainer` class has been updated to be `adam_torch` rather than our own `adam_hf`, as the official Torch optimizer is more robust and fixes some issues.

In order to keep the old behavior, ensure that you pass "adamw_hf" as the `optim` value in your `TrainingArguments`.

* 🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 by muellerzr in 25109

ViVit and EfficientNet rescale bugfix

There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.

* 🚨🚨🚨 Fix rescale ViVit Efficientnet by amyeroberts in 25174
* 🚨🚨🚨 Vivit update default rescale_factor value by amyeroberts in 25547

Removing softmax for the image classification EfficientNet class

The `EfficientNetForImageClassification` model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.

In order to obtain previous results, pass the model logits through a softmax.

* 🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by amyeroberts in 25501

Bug fixes with SPM models

Some SPM models had issues with their management of added tokens. Namely the `Llama` and `T5`, among others, were behaving incorrectly. These have been updated in https://github.com/huggingface/transformers/pull/25224.

An option to obtain the previous behavior was added through the `legacy` flag, as explained in the PR linked above.

* 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 by ArthurZucker in 25224

Bugfixes and improvements

* Disable ipex env var if false by muellerzr in 24885
* Check for accelerate env var when doing CPU only by muellerzr in 24890
* Avoid some pipeline tasks to use `use_cache=True` by ydshieh in 24893
* Update tested versions in READMEs by EliahKagan in 24895
* Fix `test_model_parallelism` for `FalconModel` by ydshieh in 24914
* Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by madhavajay in 24907
* fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by 21jun in 24902
* Fix minor llama2.md model doc typos by tmc in 24909
* [`Llama2`] replace `self.pretraining_tp` with `self.config.pretraining_tp` by younesbelkada in 24906
* [doc] `image_processing_vilt.py` wrong default documented by stas00 in 24931
* Add multi-label text classification support to pytorch example by ranchlai in 24770
* replace no_cuda with use_cpu in test_pytorch_examples by statelesshz in 24944
* Generate: sequence bias can handle same terminations by gante in 24822
* Update processing_vision_text_dual_encoder.py by premsa in 24950
* Fix `main_input_name` in `src/transformers/keras_callbacks.py` by ydshieh in 24916
* [DOCS] Example for `LogitsProcessor` class by shauray8 in 24848
* fix type annotations for arguments in training_args by shauray8 in 24550
* [`RWKV`] Add Gradient Checkpointing support for RWKV by younesbelkada in 24955
* Change logic for logging in the examples by muellerzr in 24956
* Contrastive Search peak memory reduction by blbadger in 24120
* Fallback for missing attribute `Parameter.ds_numel` by apoorvkh in 24942
* fix fsdp checkpointing issues by pacman100 in 24926
* fix: cast input pixels to appropriate dtype for image_to_text pipelines by JimAllanson in 24947
* fsdp fixes and enhancements by pacman100 in 24980
* Fix missing spaces in system prompt of Llama2 tokenizer by chenjoya in 24930
* [`LlamaConfig`] Nit: pad token should be None by default by ArthurZucker in 24958
* Remove tokenizers from the doc table by sgugger in 24963
* Avoid importing all models when instantiating a pipeline by sgugger in 24960
* Fix type annotation for deepspeed training arg by sgugger in 24988
* Use main_input_name for include_inputs_for_metrics by sgugger in 24993
* Fix `llama` tokenization doctest by ydshieh in 24990
* [`bnb`] Add simple check for bnb import by younesbelkada in 24995
* [`Llama`] remove persistent `inv_freq` tensor by ArthurZucker in 24998
* improve from_pretrained for zero3 multi gpus mode by 1ytic in 24964
* Move template doc file to md by sgugger in 25004
* [check_config_docstrings.py] improve diagnostics by stas00 in 25012
* [`logging.py`] set default `stderr` path if `None` by ArthurZucker in 25033
* fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. by parambharat in 25035
* [docs] Performance docs tidy up, part 1 by MKhalusova in 23963
* Support GatedRepoError + use raise from by Wauplin in 25034
* Better handling missing SYS in llama conversation tokenizer by ichernev in 24997
* Add dispatch_batches to training arguments by muellerzr in 25038
* Fix typo in LlamaTokenizerFast docstring example by sbrunk in 25018
* Make more test models smaller by sgugger in 25005
* Pvt model by Xrenya in 24720
* compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by njbrake in 25044
* [`8bit`] Fix 8bit corner case with Blip2 8bit by younesbelkada in 25047
* Better error message when signal is not supported on OS by sgugger in 25049
* [`RWKV`] Add note in doc on `RwkvStoppingCriteria` by ArthurZucker in 25055
* Generate - add beam indices output in contrained beam search by gante in 25042
* [Docs] fix rope_scaling doc string by kashif in 25072
* Fix last models for common tests that are too big. by sgugger in 25058
* fix: add TOC anchor link by eenzeenee in 25066
* Set `TF32` flag for PyTorch cuDNN backend by XuehaiPan in 25075
* Fix broken link in README_hd.md by susnato in 25067
* replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme of multiple-choice task by statelesshz in 25078
* [`generate`] Only warn users if the `generation_config`'s `max_length` is set to the default value by ArthurZucker in 25030
* Fix: repeat per sample for SAM image embeddings by xk-huang in 25074
* [DOCS] add example NoBadWordsLogitsProcessor by SoyGema in 25046
* Allow generic composite models to pass more kwargs by ydshieh in 24927
* [ `ForSequenceClassification`] Support `left` padding by ArthurZucker in 24979
* [`TF`] Also apply patch to support left padding by ArthurZucker in 25085
* Edit err message and comment in `test_model_is_small` by connor-henderson in 25087
* [ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer by ArthurZucker in 25053
* Hotfix for failing `MusicgenForConditionalGeneration` tests by ydshieh in 25091
* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification by sjrl in 24726
* Fix doctest by ydshieh in 25031
* fix tied_params for meta tensor by SunMarc in 25101
* documentation for llama2 models by shauray8 in 25102
* Fix `PvtModelIntegrationTest::test_inference_fp16` by ydshieh in 25106
* Add descriptive docstring to TemperatureLogitsWarper by nablabits in 24892
* fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by liucw2012 in 24772
* update `use_auth_token` -> `token` by ydshieh in 25083
* Fix past CI after 24334 by ydshieh in 25113
* Move common image processing methods to BaseImageProcessor by amyeroberts in 25089
* Fix ViT docstring regarding default dropout values. by ebezzam in 25118
* MaskFormer - enable return_dict in order to compile by amyeroberts in 25052
* Move center_crop to BaseImageProcessor by amyeroberts in 25122
* fix deepspeed load best model at end when the model gets sharded by pacman100 in 25057
* fix delete all checkpoints when save_total_limit is set to 1 by Pbihao in 25136
* [`T5/LlamaTokenizer`] default legacy to `None` to not always warn by ArthurZucker in 25131
* Clarify 4/8 bit loading log message by BramVanroy in 25134
* [`MptConfig`] support from pretrained args by ArthurZucker in 25116
* Add offload support to Bark by ylacombe in 25037
* More `token` things by ydshieh in 25146
* Add bloom flax by sanchit-gandhi in 25094
* Add new model in doc table of content by sgugger in 25148
* Fix `.push_to_hub` and cleanup `get_full_repo_name` usage by Wauplin in 25120
* Add test when downloading from gated repo by Wauplin in 25039
* override .cuda() to check if model is already quantized by ranchlai in 25166
* Represent query_length in a different way to solve jit issue by jiqing-feng in 25164
* make run_generation more generic for other devices by statelesshz in 25133
* added compiled model support for inference by markovalexander in 25124
* Update `use_auth_token` -> `token` in example scripts by ydshieh in 25167
* [`Mpt`] Fix mpt slow test by younesbelkada in 25170
* [`InstructBlip`] Fix instructblip slow test by younesbelkada in 25171
* Fix beam search to sample at least 1 non eos token by yonigottesman in 25103)
* [MusicGen] Fix integration tests by sanchit-gandhi in 25169
* Musicgen: CFG is manually added by gante in 25173
* Better error message in `_prepare_output_docstrings` by ydshieh in 25202
* [`PreTrainedModel`] Wrap `cuda` and `to` method correctly by younesbelkada in 25206
* Fix `all_model_classes` in `FlaxBloomGenerationTest` by ydshieh in 25211
* [quantization.md] fix by stas00 in 25190
* [`pipeline`] revisit device check for pipeline by younesbelkada in 25207
* Update tiny model info. and pipeline testing by ydshieh in 25213
* Fix docker image build failure by ydshieh in 25214
* make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by sywangyi in 25193
* [`Pix2Struct`] Fix pix2struct cross attention by younesbelkada in 25200
* [`Docs`/`quantization`] Clearer explanation on how things works under the hood. + remove outdated info by younesbelkada in 25216
* [`MPT`] Add `require_bitsandbytes` on MPT integration tests by younesbelkada in 25201
* [`Detr`] Fix detr BatchNorm replacement issue by younesbelkada in 25230
* Move rescale dtype recasting to match torchvision ToTensor by amyeroberts in 25229
* Fix set of model parallel in the Trainer when no GPUs are available by sgugger in 25239
* fix get_keys_to_not_convert() to return correct modules for full precision inference by ranchlai in 25105
* add pathname and line number to logging formatter in debug mode by ranchlai in 25203
* Add `token` arugment in example scripts by ydshieh in 25172
* resolving zero3 init when using accelerate config with Trainer by pacman100 in 25227
* Update rescale tests - cast to float after rescaling to reflect 25229 by amyeroberts in 25259
* Fix some bugs for two stage training of deformable detr by jypjypjypjyp in 25045
* [DOCS] Add example and modified docs of EtaLogitsWarper by ashishthomaschempolil in 25125
* Fix return_dict_in_generate bug in InstructBlip generate function by eohomegrownapps in 25246
* Remove `pytest_options={"rA": None}` in CI by ydshieh in 25263
* recommend DeepSpeed's Argument Parsing documentation by BurnzZ in 25268
* [MMS] Fix mms by patrickvonplaten in 25267
* CI with `num_hidden_layers=2` πŸš€πŸš€πŸš€ by ydshieh in 25266
* CI with `pytest_num_workers=8` for torch/tf jobs by ydshieh in 25274
* Docs: Update list of `report_to` logging integrations in docstring by tomaarsen in 25281
* Update InstructBLIP & Align values after rescale update by amyeroberts in 25209
* Docs: separate generate section by gante in 25235
* Update bark doc by ylacombe in 25234
* add generate method to SpeechT5ForTextToSpeech by ylacombe in 25233
* Add timeout parameter to load_image function by rolisz in 25184
* [JAX] Bump min version by sanchit-gandhi in 25286
* [small] llama2.md typo by H-Huang in 25295
* Fix typo: Roberta -> RoBERTa by MrGeislinger in 25302
* Move usage of deprecated logging.warn to logging.warning by PeterJCLaw in 25310
* Give more memory in test_disk_offload by sgugger in 25315
* Generate: get generation mode as an enum by gante in 25292
* Add offline mode for agents by sgugger in 25226
* Deal with nested configs better in base class by sgugger in 25237
* Document check copies by sgugger in 25291
* Make `bark` could have tiny model by ydshieh in 25290
* Document toc check and doctest check scripts by sgugger in 25319
* [Whisper] Better error message for outdated generation config by sanchit-gandhi in 25298
* Remove jnp.DeviceArray since it is deprecated. by mariecwhite in 24875
* Update TF pin in docker image by ydshieh in 25343
* Generalize CFG to allow for positive prompts by oobabooga in 25339
* Loosen output shape restrictions on GPT-style models by calpt in 25188
* Allow `trust_remote_code` in example scripts by Jackmin801 in 25248
* Generate: remove Marian hack by gante in 25294
* Fix more offload edge cases by ydshieh in 25342
* Migrate Trainer from `Repository` to `upload_folder` by sgugger in 25095
* Adding more information in help parser on train_file and validation_file by pphuc25 in 25324
* [DOCS] Add `NoRepeatNGramLogitsProcessor` Example for `LogitsProcessor` class by Rishab26 in 25186
* Docs: Added benchmarks for `torch.compile()`Β for vision models by merveenoyan in 24748
* Add mask2former fp16 support by pedrohml in 25093
* [DOCS] Add descriptive docstring to MinNewTokensLength by nablabits in 25196
* Register ModelOutput subclasses as supported torch.utils._pytree nodes by ringohoffman in 25358
* Fix `test_model_parallelism` by ydshieh in 25359
* Add warning for missing attention mask when pad tokens are detected by hackyon in 25345
* [ASR Pipeline] Clarify return timestamps by sanchit-gandhi in 25344
* MaskFormer, Mask2Former - replace einsum for tracing by amyeroberts in 25297
* Load state in else by muellerzr in 25318
* Fix `token` in example template by ydshieh in 25351
* Enable tests to run on third-party devcies by statelesshz in 25327
* Fix `torch_job` worker(s) crashing by ydshieh in 25374
* Generate: add config-level validation by gante in 25381
* Fix missing usage of `token` by ydshieh in 25382
* Use small config for `OneFormerModelTest.test_model_with_labels` by ydshieh in 25383
* Add copied from for image processor methods by amyeroberts in 25121
* change version by SunMarc in 25387
* [DOCS] Add example for `TopPLogitsWarper` by chiral-carbon in 25361
* 16059 - Add missing type hints for ASTModel by nablabits in 25364
* rm useless condition since the previous condition contains it. by jiqing-feng in 25403
* Fix path for dynamic module creation by sgugger in 25402
* YOLOS - Revert default return_pixel_mask value by amyeroberts in 25404
* Docs: introduction to generation with LLMs by gante in 25240
* Generate: length validation by gante in 25384
* Improve training args by statelesshz in 25401
* Generate: generation config validation fixes in docs by gante in 25405
* 16059 - Add extra type hints for AltCLIPModel by nablabits in 25399
* Generate: lower severity of parameterization checks by gante in 25407
* Update Bark generation configs and tests by ylacombe in 25409
* aligned sample_beam output selection with beam_search by hukuda222 in 25375
* Enable passing number of channels when inferring data format by amyeroberts in 25412
* Bark: flexible generation config overload by gante in 25414
* [DINOv2] Update pooler output by NielsRogge in 25392
* Doc checks by sgugger in 25408
* Generation: strict generation config validation at save time by gante in 25411
* [WavLM] Fix Arxiv link and authors by sanchit-gandhi in 25415
* Generate: Load generation config when `device_map` is passed by gante in 25413
* Fix rendering for `torch.compile()` docs by merveenoyan in 25432
* Add `examples` to tests to run when `setup.py` is modified by ydshieh in 25437
* Fix issue with ratio evaluation steps and auto find batch size by muellerzr in 25436
* docs: add LLaMA-Efficient-Tuning to awesome-transformers by statelesshz in 25441
* Fix for 25437 by ydshieh in 25454
* Refactor image processor testers by amyeroberts in 25450
* Switch Transformers: remove overwritten beam sample test by gante in 25458
* Reuse the cache created for latest `main` on PRs/branches if `setup.py` is not modified by ydshieh in 25445
* Update run_translation.py broken link example Pytoch by SoyGema in 25461
* Add input_data_format argument, image transforms by amyeroberts in 25462
* Mark flaky tests by amyeroberts in 25463
* Revert "Reuse the cache created for latest `main` on PRs/branches" by ydshieh in 25466
* import required torch and numpy libraries by eze1376 in 25483
* fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by nour-elkamel in 25472
* Remove logging code in TF Longformer that fails to compile by Rocketknight1 in 25496
* Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by nablabits in 25488
* Set can_generate for SpeechT5ForTextToSpeech by ylacombe in 25493
* MaskFormer post_process_instance_segmentation bug fix convert out side of loop by amyeroberts in 25497
* fix gptq nits by SunMarc in 25500
* Conditional DETR type hint fix by Rocketknight1 in 25505
* Check for case where `auxiliary_head` is `None` in `UperNetPreTrainedModel` by mmurray in 25514
* add __repr__ to the BitsAndBytesConfig class by ranchlai in 25517
* Make training args fully immutable by muellerzr in 25435
* Use dynamic past key-values shape in TF-Whisper by Rocketknight1 in 25523
* [TYPO] fix typo/format in quicktour.md by lishukan in 25519
* Fix nested configs of Jukebox by sgugger in 25533
* Marian: post-hack-fix correction by gante in 25459
* Document the test fetcher by sgugger in 25521
* Generate: fix default max length warning by gante in 25539
* fix vit hybrid test by SunMarc in 25543
* Fix `MaskFormerModelIntegrationTest` OOM by ydshieh in 25544
* More frozen args by muellerzr in 25540
* Input data format by amyeroberts in 25464
* [ASR Pipeline] Fix init with timestamps by sanchit-gandhi in 25438
* More utils doc by sgugger in 25457
* Update trainer.py by yundai424 in 25553
* Add documentation to dynamic module utils by sgugger in 25534
* Fix MPT CI by ydshieh in 25548
* Fix `torch.fx` tests on nightly CI by ydshieh in 25549
* YOLOS - reset default return_pixel_mask value by amyeroberts in 25559
* Skip `test_onnx_runtime_optimize` for now by ydshieh in 25560
* [`Docs`] Fix un-rendered images by younesbelkada in 25561
* Adds `TRANSFORMERS_TEST_DEVICE` by vvvm23 in 25506
* Skip `test_beam_search_xla_generate_simple` for `T5` by ydshieh in 25566
* [`resize_embedding`] Introduce `pad_to_multiple_of` and guidance by ArthurZucker in 25088
* [`SwitchTransformers`] Remove unused module by ArthurZucker in 25427
* Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by sinamoeini in 25394
* [`NllbMoe`] Update code to properly support loss computation by ArthurZucker in 25429
* [`Tests`] Fix failing 8bit test by younesbelkada in 25564
* Revert "change version by SunMarc in 25387)"
* add util for ram efficient loading of model when using fsdp by pacman100 in 25107
* Skip `test_contrastive_generate` for `TFXLNet` by ydshieh in 25574
* add warning for 8bit optimizers by SunMarc in 25575
* Fix typo in example code by amelietamreymond in 25583
* Suggestions on Pipeline_webserver by kihoon71 in 25570
* [`Docs` / `BetterTransformer` ] Added more details about flash attention + SDPA by younesbelkada in 25265
* Added missing parenthesis in call to is_fsdp_enabled by marma in 25585
* Replaces calls to `.cuda` with `.to(torch_device)` in tests by vvvm23 in 25571
* [`split_special_tokens`] Add support for `split_special_tokens` argument to encode by ArthurZucker in 25081
* [`Llama`] remove prompt and fix prefix finetuning by ArthurZucker in 25565
* [Time series Informer] fix dtype of cumsum by kashif in 25431
* fix z3 init when using accelerate launcher by pacman100 in 25589
* [`TokenizerFast`] Fix setting prefix space in __init__ by ArthurZucker in 25563
* Make TTS automodels importable by osanseviero in 25595
* reattach hooks when using `resize_token_embeddings` by SunMarc in 25596
* Ignore all exceptions from signal in dynamic code by sgugger in 25623
* Fix PEFT integration failures on nightly CI by younesbelkada in 25624
* Run doctest for new files by ydshieh in 25588
* Fix test_modeling_mpt typo in model id by JuanFKurucz in 25606

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* ranchlai
* Add multi-label text classification support to pytorch example (24770)
* override .cuda() to check if model is already quantized (25166)
* fix get_keys_to_not_convert() to return correct modules for full precision inference (25105)
* add pathname and line number to logging formatter in debug mode (25203)
* add __repr__ to the BitsAndBytesConfig class (25517)
* wonhyeongseo
* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` (24664)
* 🌐 [i18n-KO] Updated Korean `serialization.md` (24686)
* Sunmin0520
* 🌐 [i18n-KO] Translated `testing.md` to Korean (24900)
* Xrenya
* Pvt model (24720)
* susnato
* Fix broken link in README_hd.md (25067)
* Add Pop2Piano (21785)
* sjrl
* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (24726)
* Jackmin801
* Allow `trust_remote_code` in example scripts (25248)
* mjk0618
* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean (24957)

4.31.0

Not secure
New models

Llama v2

Llama 2 was proposed in [LLaMA: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.

* Add support for Llama 2 by ArthurZucker in 24891

Musicgen

The MusicGen model was proposed in the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre DΓ©fossez.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

* Add Musicgen by sanchit-gandhi in 24109

Bark

Bark is a transformer-based text-to-speech model proposed by Suno AI in [suno-ai/bark](https://github.com/suno-ai/bark).

* Add bark by ylacombe in 24086

MMS

The MMS model was proposed in [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

* Add MMS CTC Fine-Tuning by patrickvonplaten in 24281

EnCodec

The EnCodec neural codec model was proposed in [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre DΓ©fossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

* Add EnCodec model by hollance in 23655

InstructBLIP

The InstructBLIP model was proposed in [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2) architecture for visual instruction tuning.

* Add InstructBLIP by NielsRogge in 23460

Umt5

The UMT5 model was proposed in [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

* [`Umt5`] Add google's umt5 to `transformers` by ArthurZucker in 24477

MRA

The MRA model was proposed in [Multi Resolution Analysis (MRA) for Approximate Self-Attention](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.

* Add Multi Resolution Analysis (MRA) by novice03 in 24513

ViViT

The Vivit model was proposed in [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario LučiΔ‡, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.

* Add ViViT by jegork in 22518

Python 3.7

The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.

* ⚠️ Time to say goodbye to py37 by ydshieh in 24091

4.30.2

Not secure
- Fix push to hubby NielsRogge in 24187
- Fix how we detect the TF package by Rocketknight1 in 24255

4.30.1

Not secure
- Fix bnb config json serialization in 24137 by younesbelkada
- Correctly build models and import call_context for older TF versions in 24138 by Rocketknight1
- Fix bugs with trainer in 24134 by pacman100

4.30.0

Not secure
100k

Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of `transformers` and we have decided to create an [awesome-transformers](https://github.com/huggingface/transformers/blob/main/awesome-transformers.md) page to do just that.

We accept PRs to add projects to the list!

* Top 100 by LysandreJik in 22912
* Add LlamaIndex to awesome-transformers.md by ravi03071991 in 23484
* add cleanlab to awesome-transformers tools list by jwmueller in 23440

4-bit quantization and QLoRA

By leveraging the `bitsandbytes` library by TimDettmers, we add 4-bit support to `transformers` models!

* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 23479

Agents

The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
- Local agent capabilities, to load a generative model directly from `transformers` instead of relying on APIs.
- Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
- We add an `AzureOpenAiAgent` class to support Azure OpenAI agents.

* Add local agent by sgugger in 23438
* Enable prompts on the Hub by sgugger in 23662
* Add AzureOpenAiAgent by sgugger in 24058

Safetensors

The `safetensors` library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).

It has now become a core dependency of `transformers`.

* Making `safetensors` a core dependency. by Narsil in 23254

New models

Swiftformer

The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called β€˜SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2Γ— faster compared to MobileViT-v2.

* Add swiftformer by shehanmunasinghe in 22686

Autoformer

This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.

* [Time-Series] Autoformer model by elisim in 21891

MobileViTv2

MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

* Add MobileViTv2 by shehanmunasinghe in 22820

PerSAM

PerSAM proposes a minimal modification to [SAM](https://huggingface.co/docs/transformers/model_doc/sam) to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.

* Add PerSAM [bis] by NielsRogge in 23659

Timm backbone

We add support for loading `timm` weights within the `AutoBackbone` API in `transformers`. `timm` models can be instantiated through the `TimmBackbone` class, and then used with any vision model that needs a backbone.

* Add TimmBackbone model by amyeroberts in 22619

Image to text pipeline conditional support

We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.

* [image-to-text pipeline] Add conditional text support + GIT by NielsRogge in 23362

TensorFlow implementations

* Add TensorFlow implementation of EfficientFormer by D-Roberts in 22620

Accelerate Migration

A major rework of the internals of the `Trainer` is underway, leveraging `accelerate` instead of redefining them in `transformers`. This should unify both framework and lead to increased interoperability and more efficient development.

* Smangrul/accelerate mp integrate by pacman100 in 23148
* Smangrul/accelerate ddp integrate by pacman100 in 23151
* fix trainer slow tests related to hyperparam search by pacman100 in 24011
* remove the extra `accelerator.prepare` by pacman100 in 23914
* move fsdp handling to accelerate by pacman100 in 23158
* shift torch dynamo handling to accelerate by pacman100 in 23168
* accelerate deepspeed and gradient accumulation integrate by pacman100 in 23236
* fix executable batch size issue by pacman100 in 24067
* fix accelerator prepare during eval only mode by pacman100 in 24014
* reset accelerate env variables after each test by pacman100 in 24107
* Fix translation no_trainer by muellerzr in 23407
* Update error message when Accelerate isn't installed by muellerzr in 23373
* Fix parallel mode check by muellerzr in 23409
* Muellerzr fix deepspeed by muellerzr in 23657
* Update all no_trainer with skip_first_batches by muellerzr in 23664
* Fix sagemaker DP/MP by muellerzr in 23681
* Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by muellerzr in 23800
* Up pinned accelerate version by muellerzr in 24089
* Move import check to before state reset by muellerzr in 23906
* Upgrade safetensors version by muellerzr in 23911
* Act on deprecations in Accelerate no_trainer examples by muellerzr in 24053
* Oops, missed one by muellerzr in 24054

Bugfixes and improvements

* chore: allow protobuf 3.20.3 requirement by jose-turintech in 22759
* Fix link displayed for custom tools by sgugger in 23274
* Remove missplaced test file by sgugger in 23275
* Bring back the PR `Refactor doctests + add CI` to `main` by ydshieh in 23271
* [`gpt`] Gpt2 fix half precision causal mask by younesbelkada in 23256
* Temporary tolerance fix for flaky whipser PT-TF equiv. test by amyeroberts in 23257
* Add `top_k` argument to post-process of conditional/deformable-DETR by CreatlV in 22787
* `transformers-cli` -> `huggingface-cli` by AlpinDale in 23276
* Temporarily increase tol for PT-FLAX whisper tests by amyeroberts in 23288
* Added missing " in CHAT_PROMPT_TEMPLATE by galatolofederico in 23287
* Update custom_tools.mdx: fix link by mishig25 in 23292
* Update transformers_agents.mdx by mishig25 in 23289
* Convert numpy arrays to lists before saving the evaluation metrics as json by harisankar95 in 23268
* Fix doctest files fetch issue by ydshieh in 23277
* skip `test_run_squad_no_trainer` for now by ydshieh in 23302
* Better check for packages availability by apbard in 23163
* Add gradient_checkpointing parameter to FlaxWhisperEncoder by raghavanone in 23300
* Agents extras by LysandreJik in 23301
* Fix broken links in the agent docs by sgugger in 23297
* Fix typo in gradio-tools docs by freddyaboulton in 23305
* Fix image segmentation tool test by sgugger in 23306
* unpin tf prob by ydshieh in 23293
* Revert "search buffers for dtype" by sgugger in 23308
* Remove `LanguageIdentificationTool` in `__init__.py` as we don't have it yet by ydshieh in 23326
* Fix docker image (caused by `tensorflow_text`) by ydshieh in 23321
* Compute the mask in-place, with less memory reads, and on CUDA on `XLNetLMHeadModel` by lezcano in 23332
* Only add files with modification outside doc blocks by ydshieh in 23327
* [docs] Fix Agents and Tools docstring by stevhliu in 23313
* OR am I crazy? by hwuebben in 23295
* Handle padding warning in generation when using `inputs_embeds` by zrthxn in 23131
* replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by susnato in 23273
* Use cu118 with cudnn >= 8.6 in docker file by ydshieh in 23339
* Removing one of the twice defined position_embeddings in LongFormer by GregorySenay in 23343
* Fix issue introduced in PR 23163 by ydshieh in 23363
* Typo suggestion by richardachen in 23360
* Fix some `is_xxx_available` by ydshieh in 23365
* Fix `BigBirdForMaskedLM` doctest by ydshieh in 23369
* Fix `OwlViTForObjectDetection.image_guided_detection` doc example by ydshieh in 23370
* Revert "Only add files with modification outside doc blocks" by ydshieh in 23371
* [Bugfix] `OPTDecoderLayer` does not return attentions when `gradient_checkpointing` and `training` is enabled. by gmlwns2000 in 23367
* Skip failing `AlignModelTest::test_multi_gpu_data_parallel_forward` by ydshieh in 23374
* Fix test typos - audio feature extractors by LWprogramming in 23310
* Added type hints for `Graphormer` pytorch version by dewasahu2003 in 23073
* Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by gojiteji in 23356
* Use `mkstemp` to replace deprecated `mktemp` by ready-research in 23372
* Fix `RwkvModel` by ydshieh in 23392
* Update `test_batched_inference_image_captioning_conditioned` by ydshieh in 23391
* OPT/BioGPT: Improved attention mask shape exception by gante in 23270
* Fix chat prompt in HFAgent by IvanSedykh in 23335
* 🌐 [i18n-KO] Translated `asr.mdx` to Korean by sim-so in 23106
* Minor fixes in transformers-tools by Wauplin in 23364
* [`Pix2Struct`] Add conditional generation on docstring example by younesbelkada in 23399
* Generate: faster `can_generate` check on TF and Flax by gante in 23398
* [AutoModel] fix `torch_dtype=auto` in `from_pretrained` by stas00 in 23379
* Docs: add link to assisted generation blog post by gante in 23397
* Build with non Python files by sgugger in 23405
* Generate: add test to check KV format by gante in 23403
* Replace appends with list comprehension. by ttsugriy in 23359
* Fix smdistributed check by sgugger in 23414
* Why crash the whole run when HFHub gives a 50x error? by ropoctl in 23320
* Run doctest (in PRs) only when some doc example(s) are modified by ydshieh in 23387
* Update `ConvNextV2ModelIntegrationTest::test_inference_image_classification_head` by ydshieh in 23402
* Fix a typo in HfAgent docstring. by ttsugriy in 23420
* Use dict.items to avoid unnecessary lookups. by ttsugriy in 23415
* Update 3 docker files to use cu118 by ydshieh in 23406
* [`SAM`] fix sam slow test by younesbelkada in 23376
* Return early once stop token is found. by ttsugriy in 23421
* [Reland] search model buffers for dtype as the last resort by cyyever in 23319
* Add Missing tokenization test [electra] by IMvision12 in 22997
* Small fixes and link in the README by LysandreJik in 23428
* TF: embeddings out of bounds check factored into function by gante in 23427
* Update Bigbird Pegasus tests by ydshieh in 23431
* Encoder-Decoder: add informative exception when the decoder is not compatible by gante in 23426
* Remove hardcoded prints in Trainer by hugoabonizio in 23432
* Fix device issue in `SwiftFormerModelIntegrationTest::test_inference_image_classification_head` by ydshieh in 23435
* Generate: skip left-padding tests on old models by gante in 23437
* remove unnecessary print in gpt neox sequence classifier by cfhammill in 23433
* 🌐 [i18n-KO] Translated `tasks/zero_shot_object_detection.mdx` to Korean by HanNayeoniee in 23430
* Fix (skip) a pipeline test for `RwkvModel` by ydshieh in 23444
* Fix DecisionTransformerConfig doctring by joaoareis in 23450
* TF: GPT2 with native embedding layers by gante in 23436
* Make `RwkvModel` accept `attention_mask` but discard it internally by ydshieh in 23442
* Less flaky `test_assisted_decoding_matches_greedy_search` by ydshieh in 23451
* Update tiny models and pipeline tests by ydshieh in 23446
* Properly guard PyTorch stuff by sgugger in 23452
* Add an option to log result from the Agent by sgugger in 23454
* Clean up CUDA kernels by sgugger in 23455
* fix bug in group_texts function, that was inserting short batches by BodaSadalla98 in 23429
* feat: Whisper prompting by connor-henderson in 22496
* README: Fix affiliation for MEGA by julien-c in 23394
* Remove .data usages in optimizations.py by alanwaketan in 23417
* TF port of the Segment Anything Model (SAM) by Rocketknight1 in 22970
* [`RWKV`] Rwkv fix for 8bit inference by younesbelkada in 23468
* Use config to set name and description if not present by sgugger in 23473
* Fix `transformers`' DeepSpeed CI job by ydshieh in 23463
* Fix PretrainedConfig `min_length` docstring by joaoareis in 23471
* Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by loevlie in 23475
* [`Blip`] Remove redundant shift right by younesbelkada in 23153
* Fix DeepSpeed stuff in the nightly CI by ydshieh in 23478
* Fix confusing `transformers` installation in CI by ydshieh in 23465
* Fix `tests/repo_utils/test_get_test_info.py` by ydshieh in 23485
* Debug example code for MegaForCausalLM by Tylersuard in 23382
* Remove erroneous `img` closing tag by xenova in 23646
* Fix tensor device while attention_mask is not None by zspo in 23538
* Fix accelerate logger bug by younesbelkada in 23650
* Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by TimDettmers in 23535
* Fix wav2vec2 is_batched check to include 2-D numpy arrays by LWprogramming in 23223
* changing the requirements to a cpu torch version that works by sshahrokhi in 23483
* Fix SAM tests and use smaller checkpoints by Rocketknight1 in 23656
* Update workflow files by ydshieh in 23658
* small fix to remove unused eos in processor when it's not used. by Narsil in 23408
* Fix typo in a parameter name for open llama model by aaalexlit in 23637
* Fix PyTorch SAM tests by ydshieh in 23682
* 🌐 [i18n-KO] Translated `tasks/monocular_depth_estimation.mdx` to Korean by HanNayeoniee in 23621
* Fix a `BridgeTower` test by ydshieh in 23694
* [`SAM`]Β Fixes pipeline and adds a dummy pipeline test by younesbelkada in 23684
* TF version compatibility fixes by Rocketknight1 in 23663
* [`Blip`] Fix blip doctest by younesbelkada in 23698
* is_batched fix for remaining 2-D numpy arrays by LWprogramming in 23309
* Skip `TFCvtModelTest::test_keras_fit_mixed_precision` for now by ydshieh in 23699
* fix: load_best_model_at_end error when load_in_8bit is True by dkqkxx in 23443
* Fix some docs what layerdrop does by zspo in 23691
* add GPTJ/bloom/llama/opt into model list and enhance the jit support by sywangyi in 23291

* Paged Optimizer + Lion Optimizer for Trainer by TimDettmers in 23217
* Export to ONNX doc refocused on using optimum, added tflite by MKhalusova in 23434
* fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by uchuhimo in 23683
* fix gptj could not jit.trace in GPU by sywangyi in 23317
* Better TF docstring types by Rocketknight1 in 23477
* Minor awesome-transformers.md fixes by pagarsky in 23453
* TF SAM memory reduction by Rocketknight1 in 23732
* fix: delete duplicate sentences in `document_question_answering.mdx` by jungnerd in 23735
* fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by connor-henderson in 23724
* Overhaul TF serving signatures + dummy inputs by Rocketknight1 in 23234
* [Whisper] Reduce batch size in tests by sanchit-gandhi in 23736
* Fix the regex in `get_imports` to support multiline try blocks and excepts with specific exception types by dakinggg in 23725
* Remove the last few TF serving sigs by Rocketknight1 in 23738
* Fix `pip install --upgrade accelerate` command in modeling_utils.py by tloen in 23747
* Fix psuh_to_hub in Trainer when nothing needs pushing by sgugger in 23751
* Revamp test selection for the example tests by sgugger in 23737
* [LongFormer] code nits, removed unused parameters by ArthurZucker in 23749
* Fix is_ninja_available() by niltok in 23752
* [`Nllb-Moe`] Fix nllb moe accelerate issue by younesbelkada in 23758
* [OPT] Doc nit, using fast is fine by ArthurZucker in 23789
* Fix RWKV backward on GPU by sgugger in 23774
* Update trainer.mdx class_weights example by amitportnoy in 23787
* no_cuda does not take effect in non distributed environment by sywangyi in 23795
* Fix no such file or directory error by RissyRan in 23783
* Enable code-specific revision for code on the Hub by sgugger in 23799
* add type hint in pipeline model argument by y3sar in 23740
* TF SAM shape flexibility fixes by Rocketknight1 in 23842
* fix Whisper tests on GPU by hollance in 23753
* 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean by KIHOON71 in 22956
* [i18n-KO] Translated video_classification.mdx to Korean by KIHOON71 in 23026
* 🌐 [i18n-KO] Translated `troubleshooting.mdx` to Korean by 0525hhgus in 23166
* Adds a FlyteCallback by peridotml in 23759
* Update collating_graphormer.py by clefourrier in 23862
* [LlamaTokenizerFast] nit update `post_processor` on the fly by ArthurZucker in 23855
* 23388 Issue: Update RoBERTa configuration by vijethmoudgalya in 23863
* [from_pretrained] imporve the error message when `_no_split_modules` is not defined by ArthurZucker in 23861
* Editing issue with pickle def with lambda function by Natyren in 23869
* Adds AutoProcessor.from_pretrained support for MCTCTProcessor by Ubadub in 23856
* 🌐 [i18n-KO] Translated `pad_truncation.mdx` to Korean by sim-so in 23823
* Fix bug leading to missing token in GPTSanJapaneseTokenizer by passaglia in 23883
* Fix last instances of kbit -> quantized by sgugger in 23797
* fix(configuration_llama): add `keys_to_ignore_at_inference` to `LlamaConfig` by calico-1226 in 23891
* Fix Trainer when model is loaded on a different GPU by sgugger in 23792
* Support shared tensors by thomasw21 in 23871
* ensure banned_mask and indices in same device by cauyxy in 23901
* Unpin numba by sanchit-gandhi in 23162
* [`bnb`] add warning when no linear by younesbelkada in 23894
* fix: Replace `add_prefix_space` in `get_prompt_ids` with manual space for FastTokenizer compatibility by connor-henderson in 23796
* [`RWKV`] Fix RWKV 4bit by younesbelkada in 23910
* add conditional statement for auxiliary loss calculation by harisankar95 in 23899
* Raise error if loss can't be calculated - ViT MIM by amyeroberts in 23872
* Empty circleci config by sgugger in 23913
* Bug fix - flip_channel_order for channels first images by amyeroberts in 23701
* Re-enable squad test by sgugger in 23912
* Update the update metadata job to use upload_folder by sgugger in 23917
* [PushToHub] Make it possible to upload folders by NielsRogge in 23920
* Skip device placement for past key values in decoder models by sgugger in 23919
* [Flax Whisper] Update decode docstring by sanchit-gandhi in 23908
* Effectively allow `encoder_outputs` input to be a tuple in pix2struct by fxmarty in 23932
* Fix doc string nits by sheonhan in 23929
* Pin rhoknp by sgugger in 23937
* rename DocumentQuestionAnsweringTool parameter input to match docstring by Adam-D-Lewis in 23939
* Update stale.yml to use HuggingFaceBot by LysandreJik in 23941
* Make TF ESM inv_freq non-trainable like PyTorch by Rocketknight1 in 23940
* Revert "Update stale.yml to use HuggingFaceBot" by LysandreJik in 23943
* 23675 Registering Malay language by soongbren in 23689
* Modify device_map behavior when loading a model using from_pretrained by SunMarc in 23922
* use _make_causal_mask in clip/vit models by kashif in 23942
* Fix `ReduceLROnPlateau` object has no attribute 'get_last_lr' by wasupandceacar in 23944
* [MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by patrickvonplaten in 23813
* add new mms functions to doc by patrickvonplaten in 23954
* 🌐 [i18n-KO] Translated object_detection.mdx to Korean by KIHOON71 in 23164
* Trainer: fixed evaluate raising `KeyError` for ReduceLROnPlateau by claudius-kienle in 23952
* [Whisper Tokenizer] Skip special tokens when decoding with timestamps by sanchit-gandhi in 23945
* Add an option to reduce compile() console spam by Rocketknight1 in 23938
* Added time-series blogs to the models by elisim in 23857
* Fix typo in doc comment of BitsAndBytesConfig by ledyba in 23978
* Skip `test_multi_gpu_data_parallel_forward` for `MobileViTV2ModelTest` by ydshieh in 24017
* Update README.md by ydshieh in 24022
* Auto tokenizer registration by Bearnardd in 23965
* expose safe_serialization argument in the pipeline API by yessenzhar in 23775
* Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by affjljoo3581 in 23976
* TensorBoard callback no longer adds hparams by bri25yu in 23999
* 🌐 [i18n-KO] Translated `tasks_explained.mdx` to Korean by 0525hhgus in 23844
* Fix `MobileViTV2` checkpoint name by ydshieh in 24018
* Pin `deepspeed` to `0.9.2` for now by ydshieh in 24024
* 🌐 [i18n-KO] Translated `language-modeling.mdx` by wonhyeongseo in 23969
* 🌐 [i18n-KO] Translated `bertology.mdx` to Korean by wonhyeongseo in 23968
* Add check for tied parameters by SunMarc in 24029
* Fixing single candidate_label return. by Narsil in 24023
* Use TruncatedNormal from Keras initializers by hvaara in 24036
* Prevent ZeroDivisionError on `trainer.evaluate` if model and dataset are tiny by tomaarsen in 24049
* Modification of one text example file should trigger said test by sgugger in 24051
* Tiny fix for `check_self_hosted_runner.py` by ydshieh in 24052
* Reduce memory usage in TF building by Rocketknight1 in 24046
* Move TF building to an actual build() method by Rocketknight1 in 23760
* Use new parametrization based weight norm if available by ezyang in 24030
* bring back `filtered_test_list_cross_tests.txt` by ydshieh in 24055
* Fix device placement for model-parallelism in generate for encoder/de… by sgugger in 24025
* Remote code improvements by sgugger in 23959
* Generate: increase left-padding test atol by gante in 23448
* [Wav2Vec2] Fix torch srcipt by patrickvonplaten in 24062
* Add support for non-rust implemented tokenization for `__getitem__` method. by jacklanda in 24039
* Support PEFT models when saving the model using trainer by younesbelkada in 24073
* [`Hub`] Add `safe_serialization` in push_to_hub by younesbelkada in 24074
* Fix `is_optimum_neuron_available` by michaelbenayoun in 23961
* [`bnb`] Fix bnb skip modules by younesbelkada in 24043
* Be nice to TF by ydshieh in 24076
* Make the TF dummies even smaller by Rocketknight1 in 24071
* [doc build] Use secrets by mishig25 in 24079
* Fix expected value in tests of the test fetcher by sgugger in 24077
* Update delete_doc_comment_trigger.yml by mishig25 in 24084
* Do not prepare lr scheduler as it as the right number of steps by sgugger in 24088
* Fix a tiny typo in `WhisperForConditionalGeneration::generate` docstring by sadra-barikbin in 24045
* [`Trainer`] Correct behavior of `_load_best_model` for PEFT models by younesbelkada in 24103

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* shehanmunasinghe
* Add swiftformer (22686)
* Add MobileViTv2 (22820)
* TimDettmers
* Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (23535)
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (23479)
* Paged Optimizer + Lion Optimizer for Trainer (23217)
* elisim
* [Time-Series] Autoformer model (21891)
* Added time-series blogs to the models (23857)
* KIHOON71
* 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean (22956)
* [i18n-KO] Translated video_classification.mdx to Korean (23026)
* 🌐 [i18n-KO] Translated object_detection.mdx to Korean (23164)
* D-Roberts
* Add TensorFlow implementation of EfficientFormer (22620)
* soongbren
* 23675 Registering Malay language (23689)

4.29.2

Not secure
Fixes the package so non-Python files (like CUDA kernels) are properly included.

Page 10 of 31

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.