Transformers

Latest version: v4.50.3

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 30 of 33

2.0

v4.41.0 introduces a significant refactor of the Agents framework.

With this release, we allow you to build state-of-the-art agent systems, including the React Code Agent that writes its actions as code in ReAct iterations, following the insights from [Wang et al., 2024](https://huggingface.co/papers/2402.01030)

Just install with `pip install "transformers[agents]"`. Then you're good to go!

py
from transformers import ReactCodeAgent

agent = ReactCodeAgent(tools=[])

code = """
list=[0, 1, 2]

for i in range(4):
print(list(i))
"""

corrected_code = agent.run(
"I have some code that creates a bug: please debug it and return the final code",
code=code,
)


Quantization

New quant methods

In this release we support new quantization methods: HQQ & EETQ contributed by the community. Read more about how to quantize any transformers model using HQQ & EETQ in the [dedicated documentation section](https://huggingface.co/docs/transformers/quantization)

* Add HQQ quantization support by mobicham in https://github.com/huggingface/transformers/pull/29637
* [FEAT]: EETQ quantizer support by dtlzhuangz in https://github.com/huggingface/transformers/pull/30262

`dequantize` API for bitsandbytes models

In case you want to dequantize models that have been loaded with bitsandbytes, this is now possible through the `dequantize` API (e.g. to merge adapter weights)

* FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models by younesbelkada in https://github.com/huggingface/transformers/pull/30806

API-wise, you can achieve that with the following:

python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))


Generation updates

* Add Watermarking LogitsProcessor and WatermarkDetector by zucchini-nlp in https://github.com/huggingface/transformers/pull/29676
* Cache: Static cache as a standalone object by gante in https://github.com/huggingface/transformers/pull/30476
* Generate: add `min_p` sampling by gante in https://github.com/huggingface/transformers/pull/30639
* Make `Gemma` work with `torch.compile` by ydshieh in https://github.com/huggingface/transformers/pull/30775

SDPA support

* [`BERT`] Add support for sdpa by hackyon in https://github.com/huggingface/transformers/pull/28802
* Add sdpa and fa2 the Wav2vec2 family. by kamilakesbi in https://github.com/huggingface/transformers/pull/30121
* add sdpa to ViT [follow up of 29325] by hyenal in https://github.com/huggingface/transformers/pull/30555

Improved Object Detection

Addition of fine-tuning script for object detection models

* Fix YOLOS image processor resizing by qubvel in https://github.com/huggingface/transformers/pull/30436
* Add examples for detection models finetuning by qubvel in https://github.com/huggingface/transformers/pull/30422
* Add installation of examples requirements in CI by qubvel in https://github.com/huggingface/transformers/pull/30708
* Update object detection guide by qubvel in https://github.com/huggingface/transformers/pull/30683

Interpolation of embeddings for vision models

Add interpolation of embeddings. This enables predictions from pretrained models on input images of sizes different than those the model was originally trained on. Simply pass `interpolate_pos_embedding=True` when calling the model.

Added for: BLIP, BLIP 2, InstructBLIP, SigLIP, ViViT

py
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

image = Image.open(requests.get("https://huggingface.co/hf-internal-testing/blip-test-image/resolve/main/demo.jpg", stream=True).raw)
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b",
torch_dtype=torch.float16
).to("cuda")
inputs = processor(images=image, size={"height": 500, "width": 500}, return_tensors="pt").to("cuda")

predictions = model(**inputs, interpolate_pos_encoding=True)
Generated text: "a woman and dog on the beach"
generated_text = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()


* Blip dynamic input resolution by zafstojano in https://github.com/huggingface/transformers/pull/30722
* Add dynamic resolution input/interpolate position embedding to SigLIP by davidgxue in https://github.com/huggingface/transformers/pull/30719
* Enable dynamic resolution for vivit by jla524 in https://github.com/huggingface/transformers/pull/30630


🚨 might be breaking
* 🚨🚨🚨Deprecate `evaluation_strategy` to `eval_strategy`🚨🚨🚨 by muellerzr in https://github.com/huggingface/transformers/pull/30190
* 🚨 Add training compatibility for Musicgen-like models by ylacombe in https://github.com/huggingface/transformers/pull/29802
* 🚨 Update image_processing_vitmatte.py by rb-synth in https://github.com/huggingface/transformers/pull/30566

Cleanups
* Remove task guides auto-update in favor of links towards task pages by LysandreJik in https://github.com/huggingface/transformers/pull/30429
* Remove add-new-model in favor of add-new-model-like by LysandreJik in https://github.com/huggingface/transformers/pull/30424
* Remove mentions of models in the READMEs and link to the documentation page in which they are featured. by LysandreJik in https://github.com/huggingface/transformers/pull/30420

Not breaking but important for Llama tokenizers
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881


Fixes

* Fix missing `prev_ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30313
* Fix: remove `pad token id` in pipeline forward arguments by zucchini-nlp in https://github.com/huggingface/transformers/pull/30285
* fix Parameter dtype in audio models by ylacombe in https://github.com/huggingface/transformers/pull/30310
* disable use_cache if using gradient checkpointing by chenzizhao in https://github.com/huggingface/transformers/pull/30320
* Fix test transposing image with EXIF Orientation tag by albertvillanova in https://github.com/huggingface/transformers/pull/30319
* Avoid `jnp` import in `utils/generic.py` by ydshieh in https://github.com/huggingface/transformers/pull/30322
* Fix `AssertionError` in clip conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30321
* [UDOP] Add special tokens to tokenizer by NielsRogge in https://github.com/huggingface/transformers/pull/29594
* Enable multi-device for some models by jla524 in https://github.com/huggingface/transformers/pull/30207
* feat: Upgrade Weights & Biases callback by parambharat in https://github.com/huggingface/transformers/pull/30135
* [Feature Extractors] Fix kwargs to pre-trained by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30260
* Pipeline: fix `pad_token_id` again by zucchini-nlp in https://github.com/huggingface/transformers/pull/30338
* [Whisper] Fix slow tests by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30152
* parallel job limit for doctest by ydshieh in https://github.com/huggingface/transformers/pull/30342
* Transformers Metadata by LysandreJik in https://github.com/huggingface/transformers/pull/30344
* Deprecate default chat templates by Rocketknight1 in https://github.com/huggingface/transformers/pull/30346
* Restore casting of masked_spec_embed by ylacombe in https://github.com/huggingface/transformers/pull/30336
* Update unwrap from accelerate by SunMarc in https://github.com/huggingface/transformers/pull/29933
* Do not remove half seq length in generation tests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30016
* Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained by hiyouga in https://github.com/huggingface/transformers/pull/30299
* Add TF swiftformer by joaocmd in https://github.com/huggingface/transformers/pull/23342
* [Grounding DINO] Add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30232
* Nits for model docs by merveenoyan in https://github.com/huggingface/transformers/pull/29795
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30379
* GenerationConfig: warn if pad token is negative by zucchini-nlp in https://github.com/huggingface/transformers/pull/30187
* Add FSDP config for CPU RAM efficient loading through accelerate by helloworld1 in https://github.com/huggingface/transformers/pull/30002
* `Llama` family, fix `use_cache=False` generation by ArthurZucker in https://github.com/huggingface/transformers/pull/30380
* Update docstrings for text generation pipeline by Rocketknight1 in https://github.com/huggingface/transformers/pull/30343
* Terminator strings for generate() by Rocketknight1 in https://github.com/huggingface/transformers/pull/28932
* Fix layerwise GaLore optimizer hard to converge with warmup scheduler by hiyouga in https://github.com/huggingface/transformers/pull/30372
* Jamba: fix left-padding test by gante in https://github.com/huggingface/transformers/pull/30389
* Fix DETA save_pretrained by qubvel in https://github.com/huggingface/transformers/pull/30326
* FIX / PEFT: Pass device correctly to peft by younesbelkada in https://github.com/huggingface/transformers/pull/30397
* [docs] LLM inference by stevhliu in https://github.com/huggingface/transformers/pull/29791
* show `-rs` to show skip reasons by ArthurZucker in https://github.com/huggingface/transformers/pull/30318
* Add inputs embeds in generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30269
* [Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention by EduardoPach in https://github.com/huggingface/transformers/pull/30364
* remove redundant logging from longformer by riklopfer in https://github.com/huggingface/transformers/pull/30365
* fix: link to HF repo/tree/revision when a file is missing by mapmeld in https://github.com/huggingface/transformers/pull/30406
* [tests] add `require_torch_sdpa` for test that needs sdpa support by faaany in https://github.com/huggingface/transformers/pull/30408
* Jax: scipy version pin by gante in https://github.com/huggingface/transformers/pull/30402
* Fix on "cache position" for assisted generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30068
* fix for itemsize => element_size() for torch backwards compat by winglian in https://github.com/huggingface/transformers/pull/30133
* Make EosTokenCriteria compatible with mps by pcuenca in https://github.com/huggingface/transformers/pull/30376
* FIX: re-add bnb on docker image by younesbelkada in https://github.com/huggingface/transformers/pull/30427
* Fix LayoutLMv2 init issue and doctest by ydshieh in https://github.com/huggingface/transformers/pull/30278
* Remove old TF port docs by Rocketknight1 in https://github.com/huggingface/transformers/pull/30426
* Rename torch.run to torchrun by steven-basart in https://github.com/huggingface/transformers/pull/30405
* Fix use_cache for xla fsdp by alanwaketan in https://github.com/huggingface/transformers/pull/30353
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881
* New model PR needs green (slow tests) CI by ydshieh in https://github.com/huggingface/transformers/pull/30341
* Add llama3 by ArthurZucker in https://github.com/huggingface/transformers/pull/30334
* [`Llava`] + CIs fix red cis and llava integration tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30440
* [tests] make test device-agnostic by faaany in https://github.com/huggingface/transformers/pull/30444
* fix uncaught init of linear layer in clip's/siglip's for image classification models by vasqu in https://github.com/huggingface/transformers/pull/30435
* fix jamba slow foward for multi-gpu by SunMarc in https://github.com/huggingface/transformers/pull/30418
* [SegGPT] Fix loss calculation by EduardoPach in https://github.com/huggingface/transformers/pull/30421
* Add `paths` filter to avoid the chance of being triggered by ydshieh in https://github.com/huggingface/transformers/pull/30453
* Fix wrong indent in `utils/check_if_new_model_added.py` by ydshieh in https://github.com/huggingface/transformers/pull/30456
* [`research_project`] Most of the security issues come from this requirement.txt by ArthurZucker in https://github.com/huggingface/transformers/pull/29977
* Neuron: When save_safetensor=False, no need to move model to CPU by jeffhataws in https://github.com/huggingface/transformers/pull/29703
* Enable fp16 on CPU by muellerzr in https://github.com/huggingface/transformers/pull/30459
* Non blocking support to torch DL's by muellerzr in https://github.com/huggingface/transformers/pull/30465
* consistent job / pytest report / artifact name correspondence by ydshieh in https://github.com/huggingface/transformers/pull/30392
* Workflow / ENH: Add SSH into our runners workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30425
* FIX / Workflow: Change tailscale trigger condition by younesbelkada in https://github.com/huggingface/transformers/pull/30471
* FIX / Workflow: Fix SSH workflow bug by younesbelkada in https://github.com/huggingface/transformers/pull/30474
* [fix codellama conversion] by ArthurZucker in https://github.com/huggingface/transformers/pull/30472
* Script for finding candidate models for deprecation by amyeroberts in https://github.com/huggingface/transformers/pull/29686
* Fix SigLip classification doctest by amyeroberts in https://github.com/huggingface/transformers/pull/30475
* Don't run fp16 MusicGen tests on CPU by amyeroberts in https://github.com/huggingface/transformers/pull/30466
* Prevent crash with `WandbCallback` with third parties by tomaarsen in https://github.com/huggingface/transformers/pull/30477
* Add WSD scheduler by visheratin in https://github.com/huggingface/transformers/pull/30231
* Fix Issue 29817 Video Classification Task Guide Using Undeclared Variables by manju-rangam in https://github.com/huggingface/transformers/pull/30457
* Make accelerate install non-torch dependent by muellerzr in https://github.com/huggingface/transformers/pull/30463
* Introduce Stateful Callbacks by muellerzr in https://github.com/huggingface/transformers/pull/29666
* Fix Llava for 0-embeddings by zucchini-nlp in https://github.com/huggingface/transformers/pull/30473
* Do not use deprecated `SourceFileLoader.load_module()` in dynamic module loading by XuehaiPan in https://github.com/huggingface/transformers/pull/30370
* Add sidebar tutorial for chat models by Rocketknight1 in https://github.com/huggingface/transformers/pull/30401
* Quantization: `HfQuantizer` quant method update by younesbelkada in https://github.com/huggingface/transformers/pull/30484
* [docs] Spanish translation of pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30252
* FEAT: PEFT support for EETQ by younesbelkada in https://github.com/huggingface/transformers/pull/30449
* Fix the `bitsandbytes` error formatting ("Some modules are dispatched on ...") by kyo-takano in https://github.com/huggingface/transformers/pull/30494
* Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types by mgoin in https://github.com/huggingface/transformers/pull/30488
* Use the Keras set_random_seed in tests by Rocketknight1 in https://github.com/huggingface/transformers/pull/30504
* Remove skipping logic now that set_epoch exists by muellerzr in https://github.com/huggingface/transformers/pull/30501
* [`DETR`] Remove timm hardcoded logic in modeling files by amyeroberts in https://github.com/huggingface/transformers/pull/29038
* [examples] update whisper fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/29938
* Fix GroundingDINO, DPR after BERT SDPA update by amyeroberts in https://github.com/huggingface/transformers/pull/30506
* load_image - decode b64encode and encodebytes strings by amyeroberts in https://github.com/huggingface/transformers/pull/30192
* [SegGPT] Fix seggpt image processor by EduardoPach in https://github.com/huggingface/transformers/pull/29550
* Fix link in dbrx.md by eitanturok in https://github.com/huggingface/transformers/pull/30509
* Allow boolean FSDP options in fsdp_config by helloworld1 in https://github.com/huggingface/transformers/pull/30439
* Pass attn_implementation when using AutoXXX.from_config by amyeroberts in https://github.com/huggingface/transformers/pull/30507
* Fix broken link to Transformers notebooks by clinty in https://github.com/huggingface/transformers/pull/30512
* Update runner tag for PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30535
* Fix repo. fetch/checkout in PR slow CI job by ydshieh in https://github.com/huggingface/transformers/pull/30537
* Reenable SDPA's FA2 During Training with torch.compile by warner-benjamin in https://github.com/huggingface/transformers/pull/30442
* Include safetensors as part of `_load_best_model` by muellerzr in https://github.com/huggingface/transformers/pull/30553
* Pass `use_cache` in kwargs for GPTNeoX by zucchini-nlp in https://github.com/huggingface/transformers/pull/30538
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30409
* Generate: update links on LLM tutorial doc by gante in https://github.com/huggingface/transformers/pull/30550
* DBRX: make fixup by gante in https://github.com/huggingface/transformers/pull/30578
* Fix seq2seq collator padding by vasqu in https://github.com/huggingface/transformers/pull/30556
* BlipModel: get_multimodal_features method by XavierSpycy in https://github.com/huggingface/transformers/pull/30438
* Add chat templating support for KeyDataset in text-generation pipeline by DarshanDeshpande in https://github.com/huggingface/transformers/pull/30558
* Fix generation doctests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30263
* General PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30540
* Remove `use_square_size` after loading by ydshieh in https://github.com/huggingface/transformers/pull/30567
* Use text config's vocab size in testing models by zucchini-nlp in https://github.com/huggingface/transformers/pull/30568
* Encoder-decoder models: move embedding scale to nn.Module by zucchini-nlp in https://github.com/huggingface/transformers/pull/30410
* Fix Marian model conversion by zucchini-nlp in https://github.com/huggingface/transformers/pull/30173
* Refactor default chat template warnings by Rocketknight1 in https://github.com/huggingface/transformers/pull/30551
* Fix QA example by Rocketknight1 in https://github.com/huggingface/transformers/pull/30580
* remove jax example by ArthurZucker in https://github.com/huggingface/transformers/pull/30498
* Fix canonical model --model_type in examples by amyeroberts in https://github.com/huggingface/transformers/pull/30480
* Gemma: update activation warning by pcuenca in https://github.com/huggingface/transformers/pull/29995
* Bump gitpython from 3.1.32 to 3.1.41 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30587
* Fix image segmentation example - don't reopen image by amyeroberts in https://github.com/huggingface/transformers/pull/30481
* Improve object detection task guideline by NielsRogge in https://github.com/huggingface/transformers/pull/29967
* Generate: remove deprecated public decoding functions and streamline logic 🧼 by gante in https://github.com/huggingface/transformers/pull/29956
* Fix llava half precision and autocast issues by frasermince in https://github.com/huggingface/transformers/pull/29721
* Fix: failing CI after 30568 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30599
* Fix for Neuron by michaelbenayoun in https://github.com/huggingface/transformers/pull/30259
* Fix memory leak with CTC training script on Chinese languages by lucky-bai in https://github.com/huggingface/transformers/pull/30358
* Fix copies for DBRX - neuron fix by amyeroberts in https://github.com/huggingface/transformers/pull/30610
* fix:missing `output_router_logits` in SwitchTransformers by lausannel in https://github.com/huggingface/transformers/pull/30573
* Use `contiguous()` in clip checkpoint conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30613
* phi3 chat_template does not support system role by amitportnoy in https://github.com/huggingface/transformers/pull/30606
* Docs: fix `generate`-related rendering issues by gante in https://github.com/huggingface/transformers/pull/30600
* Docs: add missing `StoppingCriteria` autodocs by gante in https://github.com/huggingface/transformers/pull/30617
* Generate: fix `SinkCache` on Llama models by gante in https://github.com/huggingface/transformers/pull/30581
* Fix FX tracing issues for Llama by michaelbenayoun in https://github.com/huggingface/transformers/pull/30619
* Output `None` as attention when layer is skipped by jonghwanhyeon in https://github.com/huggingface/transformers/pull/30597
* Fix CI after 30410 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30612
* add mlp bias for llama models by mayank31398 in https://github.com/huggingface/transformers/pull/30031
* Fix W&B run name by qubvel in https://github.com/huggingface/transformers/pull/30462
* HQQ: PEFT support for HQQ by younesbelkada in https://github.com/huggingface/transformers/pull/30632
* Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters by yting27 in https://github.com/huggingface/transformers/pull/30362
* Avoid duplication in PR slow CI model list by ydshieh in https://github.com/huggingface/transformers/pull/30634
* [`CI update`] Try to use dockers and no cache by ArthurZucker in https://github.com/huggingface/transformers/pull/29202
* Check if the current compiled version of pytorch supports MPS by jiaqianjing in https://github.com/huggingface/transformers/pull/30664
* Hotfix-change-ci by ArthurZucker in https://github.com/huggingface/transformers/pull/30669
* Quantization / HQQ: Fix HQQ tests on our runner by younesbelkada in https://github.com/huggingface/transformers/pull/30668
* Fix llava next tie_word_embeddings config by SunMarc in https://github.com/huggingface/transformers/pull/30640
* Trainer._load_from_checkpoint - support loading multiple Peft adapters by claralp in https://github.com/huggingface/transformers/pull/30505
* Trainer - add cache clearing and the option for batched eval metrics computation by FoamoftheSea in https://github.com/huggingface/transformers/pull/28769
* Fix typo: llama3.md by mimbres in https://github.com/huggingface/transformers/pull/30653
* Respect `resume_download` deprecation by Wauplin in https://github.com/huggingface/transformers/pull/30620
* top-k instead of top-p in MixtralConfig docstring by sorgfresser in https://github.com/huggingface/transformers/pull/30687
* Bump jinja2 from 3.1.3 to 3.1.4 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30680
* Bump werkzeug from 3.0.1 to 3.0.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30679
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True by hackyon in https://github.com/huggingface/transformers/pull/29024
* Fix `cache_position` initialisation for generation with `use_cache=False` by nurlanov-zh in https://github.com/huggingface/transformers/pull/30485
* Word-level timestamps broken for short-form audio by kamilakesbi in https://github.com/huggingface/transformers/pull/30325
* Updated docs of `forward` in `Idefics2ForConditionalGeneration` with correct `ignore_index` value by zafstojano in https://github.com/huggingface/transformers/pull/30678
* Bump tqdm from 4.63.0 to 4.66.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30646
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/visual_bert by dependabot in https://github.com/huggingface/transformers/pull/30645
* Reboot Agents by aymeric-roucher in https://github.com/huggingface/transformers/pull/30387
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/lxmert by dependabot in https://github.com/huggingface/transformers/pull/30644
* Separate tokenizer tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30675
* Update `workflow_id` in `utils/get_previous_daily_ci.py` by ydshieh in https://github.com/huggingface/transformers/pull/30695
* Rename artifact name `prev_ci_results` to `ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30697
* Add safetensors to model not found error msg for default use_safetensors value by davidgxue in https://github.com/huggingface/transformers/pull/30602
* Pin deepspeed by muellerzr in https://github.com/huggingface/transformers/pull/30701
* Patch CLIP image preprocessor by rootonchair in https://github.com/huggingface/transformers/pull/30698
* [BitsandBytes] Verify if GPU is available by NielsRogge in https://github.com/huggingface/transformers/pull/30533
* Llava: remove dummy labels by zucchini-nlp in https://github.com/huggingface/transformers/pull/30706
* Immutability for data collators by vasqu in https://github.com/huggingface/transformers/pull/30603
* Cache: models return input cache type by gante in https://github.com/huggingface/transformers/pull/30716
* Removal of deprecated maps by LysandreJik in https://github.com/huggingface/transformers/pull/30576
* Fix image post-processing for OWLv2 by jla524 in https://github.com/huggingface/transformers/pull/30686
* KV cache is no longer a model attribute by zucchini-nlp in https://github.com/huggingface/transformers/pull/30730
* Generate: consistently handle special tokens as tensors by gante in https://github.com/huggingface/transformers/pull/30624
* Update CodeLlama references by osanseviero in https://github.com/huggingface/transformers/pull/30218
* [docs] Update es/pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30684
* Update llama3.md, fix typo by mimbres in https://github.com/huggingface/transformers/pull/30739
* mlp_only_layers is more flexible than decoder_sparse_step by eigen2017 in https://github.com/huggingface/transformers/pull/30552
* PEFT / Trainer: Make use of `model.active_adapters()` instead of deprecated `model.active_adapter` whenever possible by younesbelkada in https://github.com/huggingface/transformers/pull/30738
* [docs] Update link in es/pipeline_webserver.md by aaronjimv in https://github.com/huggingface/transformers/pull/30745
* hqq - fix weight check in check_quantized_param by mobicham in https://github.com/huggingface/transformers/pull/30748
* [awq] replace scale when we have GELU by SunMarc in https://github.com/huggingface/transformers/pull/30074
* Workflow: Replace `actions/post-slack` with centrally defined workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30737
* [GroundingDino] Adding ms_deform_attn kernels by EduardoPach in https://github.com/huggingface/transformers/pull/30768
* Llama: fix custom 4D masks, v2 by poedator in https://github.com/huggingface/transformers/pull/30348
* Generation / FIX: Fix multi-device generation by younesbelkada in https://github.com/huggingface/transformers/pull/30746
* Qwen: incorrect setup flag by gante in https://github.com/huggingface/transformers/pull/30776
* enable Pipeline to get device from model by faaany in https://github.com/huggingface/transformers/pull/30534
* [Object detection pipeline] Lower threshold by NielsRogge in https://github.com/huggingface/transformers/pull/30710
* Generate: remove near-duplicate sample/greedy copy by gante in https://github.com/huggingface/transformers/pull/30773
* Port IDEFICS to tensorflow by a8nova in https://github.com/huggingface/transformers/pull/26870
* Generate: assistant should be greedy in assisted decoding by gante in https://github.com/huggingface/transformers/pull/30778
* Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) by ydshieh in https://github.com/huggingface/transformers/pull/30699
* Deprecate models script by amyeroberts in https://github.com/huggingface/transformers/pull/30184
* skip low_cpu_mem_usage tests by SunMarc in https://github.com/huggingface/transformers/pull/30782
* CI: update to ROCm 6.0.2 and test MI300 by fxmarty in https://github.com/huggingface/transformers/pull/30266
* Fix OWLv2 Doc by jla524 in https://github.com/huggingface/transformers/pull/30794
* Fix cache type in Idefics2 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30729
* PEFT: Access active_adapters as a property in Trainer by pashminacameron in https://github.com/huggingface/transformers/pull/30790
* CI: more models wo cache support by gante in https://github.com/huggingface/transformers/pull/30780
* Deprecate TF weight conversion since we have full Safetensors support now by Rocketknight1 in https://github.com/huggingface/transformers/pull/30786
* [T5] Adding `model_parallel = False` to `T5ForTokenClassification` and `MT5ForTokenClassification` by retarfi in https://github.com/huggingface/transformers/pull/30763
* Added the necessay import of module by ankur0904 in https://github.com/huggingface/transformers/pull/30804
* Add support for custom checkpoints in MusicGen by jla524 in https://github.com/huggingface/transformers/pull/30011
* Add missing dependencies in image classification example by jla524 in https://github.com/huggingface/transformers/pull/30820
* Support mixed-language batches in `WhisperGenerationMixin` by cifkao in https://github.com/huggingface/transformers/pull/29688
* Remove unused module DETR based models by conditionedstimulus in https://github.com/huggingface/transformers/pull/30823
* Jamba - Skip 4d custom attention mask test by amyeroberts in https://github.com/huggingface/transformers/pull/30826
* Missing `Optional` in typing. by xkszltl in https://github.com/huggingface/transformers/pull/30821
* Update ds_config_zero3.json by pacman100 in https://github.com/huggingface/transformers/pull/30829
* Better llava next. by nxphi47 in https://github.com/huggingface/transformers/pull/29850
* Deprecate models script - correctly set the model name for the doc file by amyeroberts in https://github.com/huggingface/transformers/pull/30785
* Use `torch 2.3` for CI by ydshieh in https://github.com/huggingface/transformers/pull/30837
* Fix llama model sdpa attention forward function masking bug when output_attentions=True by Aladoro in https://github.com/huggingface/transformers/pull/30652
* [LLaVa-NeXT] Small fixes by NielsRogge in https://github.com/huggingface/transformers/pull/30841
* [Idefics2] Improve docs, add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30717
* Cache: add new flag to distinguish models that `Cache` but not static cache by gante in https://github.com/huggingface/transformers/pull/30800
* Disable the FA backend for SDPA on AMD GPUs by mht-sharma in https://github.com/huggingface/transformers/pull/30850
* Video-LLaVa: Fix docs by zucchini-nlp in https://github.com/huggingface/transformers/pull/30855
* Docs: update example with assisted generation + sample by gante in https://github.com/huggingface/transformers/pull/30853
* TST / Quantization: Reverting to torch==2.2.1 by younesbelkada in https://github.com/huggingface/transformers/pull/30866
* Fix VideoLlava imports by amyeroberts in https://github.com/huggingface/transformers/pull/30867
* TEST: Add llama logits tests by younesbelkada in https://github.com/huggingface/transformers/pull/30835
* Remove deprecated logic and warnings by amyeroberts in https://github.com/huggingface/transformers/pull/30743
* Enable device map by darshana1406 in https://github.com/huggingface/transformers/pull/30870
* Fix dependencies for image classification example by jla524 in https://github.com/huggingface/transformers/pull/30842
* [whisper] fix multilingual fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30865
* update release script by ArthurZucker in https://github.com/huggingface/transformers/pull/30880

New Contributors
* joaocmd made their first contribution in https://github.com/huggingface/transformers/pull/23342
* kamilakesbi made their first contribution in https://github.com/huggingface/transformers/pull/30121
* dtlzhuangz made their first contribution in https://github.com/huggingface/transformers/pull/30262
* steven-basart made their first contribution in https://github.com/huggingface/transformers/pull/30405
* manju-rangam made their first contribution in https://github.com/huggingface/transformers/pull/30457
* kyo-takano made their first contribution in https://github.com/huggingface/transformers/pull/30494
* mgoin made their first contribution in https://github.com/huggingface/transformers/pull/30488
* eitanturok made their first contribution in https://github.com/huggingface/transformers/pull/30509
* clinty made their first contribution in https://github.com/huggingface/transformers/pull/30512
* warner-benjamin made their first contribution in https://github.com/huggingface/transformers/pull/30442
* XavierSpycy made their first contribution in https://github.com/huggingface/transformers/pull/30438
* DarshanDeshpande made their first contribution in https://github.com/huggingface/transformers/pull/30558
* frasermince made their first contribution in https://github.com/huggingface/transformers/pull/29721
* lucky-bai made their first contribution in https://github.com/huggingface/transformers/pull/30358
* rb-synth made their first contribution in https://github.com/huggingface/transformers/pull/30566
* lausannel made their first contribution in https://github.com/huggingface/transformers/pull/30573
* jonghwanhyeon made their first contribution in https://github.com/huggingface/transformers/pull/30597
* mobicham made their first contribution in https://github.com/huggingface/transformers/pull/29637
* yting27 made their first contribution in https://github.com/huggingface/transformers/pull/30362
* jiaqianjing made their first contribution in https://github.com/huggingface/transformers/pull/30664
* claralp made their first contribution in https://github.com/huggingface/transformers/pull/30505
* mimbres made their first contribution in https://github.com/huggingface/transformers/pull/30653
* sorgfresser made their first contribution in https://github.com/huggingface/transformers/pull/30687
* nurlanov-zh made their first contribution in https://github.com/huggingface/transformers/pull/30485
* zafstojano made their first contribution in https://github.com/huggingface/transformers/pull/30678
* davidgxue made their first contribution in https://github.com/huggingface/transformers/pull/30602
* rootonchair made their first contribution in https://github.com/huggingface/transformers/pull/30698
* eigen2017 made their first contribution in https://github.com/huggingface/transformers/pull/30552
* Nilabhra made their first contribution in https://github.com/huggingface/transformers/pull/30771
* a8nova made their first contribution in https://github.com/huggingface/transformers/pull/26870
* pashminacameron made their first contribution in https://github.com/huggingface/transformers/pull/30790
* retarfi made their first contribution in https://github.com/huggingface/transformers/pull/30763
* yikangshen made their first contribution in https://github.com/huggingface/transformers/pull/30005
* ankur0904 made their first contribution in https://github.com/huggingface/transformers/pull/30804
* conditionedstimulus made their first contribution in https://github.com/huggingface/transformers/pull/30823
* nxphi47 made their first contribution in https://github.com/huggingface/transformers/pull/29850
* Aladoro made their first contribution in https://github.com/huggingface/transformers/pull/30652
* hyenal made their first contribution in https://github.com/huggingface/transformers/pull/30555
* darshana1406 made their first contribution in https://github.com/huggingface/transformers/pull/30870

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.40.2...v4.41.0

2.0.0

Not secure
Name change: welcome 🤗 Transformers

Following the extension to TensorFlow 2.0, `pytorch-transformers` => `transformers`

Install with `pip install transformers`

Also, note that PyTorch is **no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch** to be able to use (and load) the models.

TensorFlow 2.0 - PyTorch

All the PyTorch `nn.Module` classes now have their counterpart in TensorFlow 2.0 as `tf.keras.Model` classes. TensorFlow 2.0 classes have the same name as their PyTorch counterparts prefixed with `TF`.

The interoperability between TensorFlow and PyTorch is actually **a lot deeper** than what is usually meant when talking about libraries with multiple backends:
- each model (not just the static computation graph) can be seamlessly moved from one framework to the other during the lifetime of the model for training/evaluation/usage (`from_pretrained` can load weights saved from models saved in one or the other framework),
- an example is given in the quick-tour on TF 2.0 and PyTorch in the readme in which a model is trained using keras.fit before being opened in PyTorch for quick debugging/inspection.

Remaining unsupported operations in TF 2.0 (to be added later):
- resizing input embeddings to add new tokens
- pruning model heads

TPU support
Training on TPU using free TPUs provided in the TensorFlow Research Cloud (TFRC) program is possible but requires to implement a custom training loop (not possible with keras.fit at the moment).
We will add an example of such a custom training loop soon.

Improved tokenizers

Tokenizers have been improved to provide extended encoding methods `encoding_plus` and additional arguments to `encoding`. Please refer to the doc for detailed usage of the new options.

Breaking changes

Positional order of some model keywords inputs changed (better TorchScript support)

To be able to better use Torchscript both on CPU and GPUs (see 1010, 1204 and 1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.

If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any breaking change.

If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you should double-check the exact order of input arguments.

Dependency requirements have changed

PyTorch is no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch to be able to use (and load) the models.

Renamed method

The method `add_special_tokens_sentence_pair` has been renamed to the more appropriate name `add_special_tokens_sequence_pair`.
The same holds true for the method `add_special_tokens_single_sentence` which has been changed to `add_special_tokens_single_sequence`.

Community additions/bug-fixes/improvements
- new German model (Timoeller)
- new script for MultipleChoice training (SWAG, RocStories...) (erenup)
- better fp16 support (ziliwang and bryant1410)
- fix evaluation in run_lm_finetuning (SKRohit)
- fiw LM finetuning to prevent crashing on assert len(tokens_b)>=1 (searchivarius)
- Various doc and docstring fixes (sshleifer, Maxpa1n, mattolson93, t080)

1.10

The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).

* Byebye torch 1.10 by ydshieh in 28207

Model tagging

You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have `trl` tag you can search: https://huggingface.co/models?other=trl&sort=created

* [`core`/ FEAT] Add the possibility to push custom tags using `PreTrainedModel` itself by younesbelkada in 28405 - e.g.

python
from transformers import AutoModelForCausalLM

model_name = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_name)

model.add_model_tags(["tag-test"])
model.push_to_hub("llama-tagged")


Bugfixes and improvements

* Fix PatchTSMixer Docstrings by vijaye12 in 27943
* use logger.warning_once to avoid massive outputs by ranchlai in 27428
* Docs for AutoBackbone & Backbone by merveenoyan in 27456
* Fix test for auto_find_batch_size on multi-GPU by muellerzr in 27947
* Update import message by NielsRogge in 27946
* Fix parameter count in readme for mixtral 45b by CyberTimon in 27945
* In PreTrainedTokenizerBase add missing word in error message by petergtz in 27949
* Fix AMD scheduled CI not triggered by ydshieh in 27951
* Add deepspeed test to amd scheduled CI by echarlaix in 27633
* Fix a couple of typos and add an illustrative test by rjenc29 in 26941
* fix bug in mask2former: cost matrix is infeasible by xuchenhao001 in 27897
* Fix for stochastic depth decay rule in the TimeSformer implementation by atawari in 27875
* fix no sequence length models error by AdamLouly in 27522
* [`Mixtral`] Change mistral op order by younesbelkada in 27955
* Update bounding box format everywhere by NielsRogge in 27944
* Support PeftModel signature inspect by dancingpipi in 27865
* fixed typos (issue 27919) by asusevski in 27920
* Hot-fix-mixstral-loss by ArthurZucker in 27948
* Fix link in README.md of Image Captioning by saswatmeher in 27969
* Better key error for AutoConfig by Rocketknight1 in 27976
* [doc] fix typo by stas00 in 27981
* fix typo in dvclive callback by dberenbaum in 27983
* [`Tokenizer Serialization`] Fix the broken serialisation by ArthurZucker in 27099
* [`Whisper`] raise better errors by ArthurZucker in 27971
* Fix PatchTSMixer slow tests by ajati in 27997
* [`CI slow`] Fix expected values by ArthurZucker in 27999
* Fix bug with rotating checkpoints by muellerzr in 28009
* [Doc] Spanish translation of glossary.md by aaronjimv in 27958
* Add model_docs from cpmant.md to derformable_detr.md by rajveer43 in 27884
* well well well by ArthurZucker in 28011
* [`SeamlessM4TTokenizer`] Safe import by ArthurZucker in 28026
* [`core` / `modeling`] Fix training bug with PEFT + GC by younesbelkada in 28031
* Fix AMD push CI not triggered by ydshieh in 28029
* SeamlessM4T: `test_retain_grad_hidden_states_attentions` is flaky by gante in 28035
* Fix languages covered by M4Tv2 by ylacombe in 28019
* Fixed spelling error in T5 tokenizer warning message (s/thouroughly/t… by jeddobson in 28014
* Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window by gante in 28037
* [Seamless] Fix links in docs by sanchit-gandhi in 27905
* Remove warning when Annotion enum is created by amyeroberts in 28048
* [`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` by younesbelkada in 28043
* [`Modeling` / `Mixtral`] Fix GC + PEFT issues with Mixtral by younesbelkada in 28061
* [Flax BERT] Update deprecated 'split' method by sanchit-gandhi in 28012
* [Flax LLaMA] Fix attn dropout by sanchit-gandhi in 28059
* Remove SpeechT5 deprecated argument by ylacombe in 28062
* doc: Correct spelling mistake by caiyili in 28064
* [`Mixtral`] update conversion script to reflect new changes by younesbelkada in 28068
* Skip M4T `test_retain_grad_hidden_states_attentions` by ylacombe in 28060
* [LLaVa] Add past_key_values to _skip_keys_device_placement to fix multi-GPU dispatch by aismlv in 28051
* Make GPT2 traceable in meta state by kwen2501 in 28054
* Fix bug for checkpoint saving on multi node training setting by dumpmemory in 28078
* Update fixtures-image-utils by lhoestq in 28080
* Fix `low_cpu_mem_usage` Flag Conflict with DeepSpeed Zero 3 in `from_pretrained` for Models with `keep_in_fp32_modules`" by kotarotanahashi in 27762
* Fix wrong examples in llava usage. by Lyken17 in 28020
* [docs] Trainer by stevhliu in 27986
* [docs] MPS by stevhliu in 28016
* fix resuming from ckpt when using FSDP with FULL_STATE_DICT by pacman100 in 27891
* Fix the deprecation warning of _torch_pytree._register_pytree_node by cyyever in 27803
* Spelling correction by saeneas in 28110
* in peft finetune, only the trainable parameters need to be saved by sywangyi in 27825
* fix ConversationalPipeline docstring by not-lain in 28091
* Disable jitter noise during evaluation in SwitchTransformers by DaizeDong in 28077
* Remove warning if `DISABLE_TELEMETRY` is used by Wauplin in 28113
* Fix indentation error - semantic_segmentation.md by rajveer43 in 28117
* [docs] General doc fixes by stevhliu in 28087
* Fix a typo in tokenizer documentation by mssalvatore in 28118
* [Doc] Fix token link in What 🤗 Transformers can do by aaronjimv in 28123
* When save a model on TPU, make a copy to be moved to CPU by qihqi in 27993
* Update split string in doctest to reflect 28087 by amyeroberts in 28135
* [`Mixtral`] Fix loss + nits by ArthurZucker in 28115
* Update modeling_utils.py by mzelling in 28127
* [docs] Fix mistral link in mixtral.md by aaronjimv in 28143
* Remove deprecated CPU dockerfiles by ashahba in 28149
* Fix FA2 integration by pacman100 in 28142
* [gpt-neox] Add attention_bias config to support model trained without attention biases by dalgarak in 28126
* move code to Trainer.evaluate to enable use of that function with multiple datasets by peter-sk in 27844
* Fix weights not properly initialized due to shape mismatch by ydshieh in 28122
* Avoid unnecessary warnings when loading `CLIPConfig` by ydshieh in 28108
* Update FA2 exception msg to point to hub discussions by amyeroberts in 28161
* Align backbone stage selection with out_indices & out_features by amyeroberts in 27606
* [docs] Trainer docs by stevhliu in 28145
* Fix yolos resizing by amyeroberts in 27663
* disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest by dwyatte in 28169
* Fix `input_embeds` docstring in encoder-decoder architectures by gante in 28168
* [Whisper] Use torch for stft if available by sanchit-gandhi in 26119
* Fix slow backbone tests - out_indices must match stage name ordering by amyeroberts in 28186
* Update YOLOS slow test values by amyeroberts in 28187
* Update `docs/source/en/perf_infer_gpu_one.md` by ydshieh in 28198
* Fix ONNX export for causal LM sequence classifiers by removing reverse indexing by dwyatte in 28144
* Add Swinv2 backbone by NielsRogge in 27742
* Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class by nicholasneo78 in 27914
* Bug: `training_args.py` fix missing import with accelerate with version `accelerate==0.20.1` by michaelfeil in 28171
* Fix the check of models supporting FA/SDPA not run by ydshieh in 28202
* Drop `feature_extractor_type` when loading an image processor file by ydshieh in 28195
* [Whisper] Fix word-level timestamps with bs>1 or num_beams>1 by ylacombe in 28114
* Fixing visualization code for object detection to support both types of bounding box. by Anindyadeep in 27842
* update the logger message with accordant weights_file_name by izyForever in 28181
* [`Llava`] Fix llava index errors by younesbelkada in 28032
* fix FA2 when using quantization by pacman100 in 28203
* small typo by stas00 in 28229
* Update docs around mixing hf scheduler with deepspeed optimizer by dwyatte in 28223
* Fix trainer saving safetensors: metadata is None by hiyouga in 28219
* fix bug:divide by zero in _maybe_log_save_evaluate() by frankenliu in 28251
* [Whisper] Fix errors with MPS backend introduced by new code on word-level timestamps computation by ercaronte in 28288
* Remove fast tokenization warning in Data Collators by dbuos in 28213
* fix documentation for zero_shot_object_detection by not-lain in 28267
* Remove token_type_ids from model_input_names (like 24788) by Apsod in 28325
* Translate contributing.md into Chinese by Mayfsz in 28243
* [docs] Sort es/toctree.yml | Translate performance.md by aaronjimv in 28262
* Fix error in M4T feature extractor by ylacombe in 28340
* README: install transformers from conda-forge channel by kevherro in 28313
* Don't check the device when device_map=auto by yuanwu2017 in 28351
* Fix pos_mask application and update tests accordingly by ferjorosa in 27892
* fix FA2 when using quantization for remaining models by susnato in 28341
* Update VITS modeling to enable ONNX export by echarlaix in 28141
* chore: Fix typo s/exclusivelly/exclusively/ by hugo-syn in 28361
* Enhancing Code Readability and Maintainability with Simplified Activation Function Selection. by hi-sushanta in 28349
* Fix building alibi tensor when num_heads is not a power of 2 by abuelnasr0 in 28380
* remove two deprecated function by statelesshz in 28220
* Bugfix / ffmpeg input device (mic) not working on Windows by Teapack1 in 27051
* [AttentionMaskConverter] fix sdpa unmask unattended by zspo in 28369
* Remove shell=True from subprocess.Popen to Mitigate Security Risk by avimanyu786 in 28299
* Add segmentation map processing to SAM Image Processor by rwood-97 in 27463
* update warning for image processor loading by ydshieh in 28209
* Fix initialization for missing parameters in `from_pretrained` under ZeRO-3 by XuehaiPan in 28245
* Fix `_merge_input_ids_with_image_features` for llava model by VictorSanh in 28333
* Use mmap option to load_state_dict by weimingzha0 in 28331
* [BUG] BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch by inkinworld in 28201
* Skip now failing test in the Trainer tests by muellerzr in 28421
* Support `DeepSpeed` when using auto find batch size by muellerzr in 28088
* Fix number of models in README.md by prasatee in 28430
* CI: limit natten version by gante in 28432
* Fix for checkpoint rename race condition by tblattner in 28364
* Fix load correct tokenizer in Mixtral model documentation by JuanFKurucz in 28437
* [docstring] Fix docstring for ErnieConfig, ErnieMConfig by Sparty in 27029
* [Whisper] Fix slow test by patrickvonplaten in 28407
* Assitant model may on a different device by jiqing-feng in 27995
* Enable multi-label image classification in pipeline by amyeroberts in 28433
* Optimize the speed of the truncate_sequences function. by ikkvix in 28263
* Use python 3.10 for docbuild by ydshieh in 28399
* Fix docker file by ydshieh in 28452
* Set `cache_dir` for `evaluate.load()` in example scripts by aphedges in 28422
* Optionally preprocess segmentation maps for MobileViT by harisankar95 in 28420
* Correctly resolve trust_remote_code=None for AutoTokenizer by Rocketknight1 in 28419
* Fix load balancing loss func for mixtral by liangxuZhang in 28256
* Doc by jiqing-feng in 28431
* Fix docstring checker issues with PIL enums by Rocketknight1 in 28450
* Fix broken link on page by keenranger in 28451
* Mark two logger tests as flaky by amyeroberts in 28458
* Update metadata loading for oneformer by amyeroberts in 28398
* Fix torch.ones usage in xlnet by sungho-ham in 28471
* Generate: deprecate old public functions by gante in 28478
* Docs: add model paths by gante in 28475
* Generate: refuse to save bad generation config files by gante in 28477
* TF: purge `TFTrainer` by gante in 28483
* Fix docstrings and update docstring checker error message by Rocketknight1 in 28460
* Change progress logging to once across all nodes by siddartha-RE in 28373
* Generate: fix candidate device placement by gante in 28493
* Fix paths to AI Sweden Models reference and model loading by JuanFKurucz in 28423
* [`chore`] Update warning text, a word was missing by tomaarsen in 28017
* Don't set `finetuned_from` if it is a local path by ydshieh in 28482
* Add the XPU device check for pipeline mode by yuanwu2017 in 28326
* Tokenizer kwargs in textgeneration pipe by thedamnedrhino in 28362
* [GPTQ] Fix test by SunMarc in 28018
* Fixed minor typos by rishit5 in 28489
* Add a use_safetensors arg to TFPreTrainedModel.from_pretrained() by Rocketknight1 in 28511
* Generate: consolidate output classes by gante in 28494
* fix: sampling in flax keeps EOS by borisdayma in 28378
* improve dev setup comments and hints by 4imothy in 28495
* SiLU activation wrapper for safe importing by amyeroberts in 28509
* Remove `task` arg in `load_dataset` in image-classification example by regisss in 28408
* Improving Training Performance and Scalability Documentation by HamzaFB in 28497
* Fix mismatching loading in from_pretrained with/without accelerate by fxmarty in 28414
* Fix/speecht5 bug by NimaYaqmuri in 28481
* [ `TokenizationUtils`] Fix `add_special_tokens` when the token is already there by ArthurZucker in 28520
* [`TokenizationRoformerFast`] Fix the save and loading by ArthurZucker in 28527
* [`SpeechT5Tokenization`] Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme by ArthurZucker in 28522
* Clearer error for SDPA when explicitely requested by fxmarty in 28006
* Add is_model_supported for fx by inisis in 28521
* Config: warning when saving generation kwargs in the model config by gante in 28514
* [Makefile] Exclude research projects from format by patrickvonplaten in 28551
* symbolic_trace: add past_key_values, llama, sdpa support by fxmarty in 28447
* Allow to train dinov2 with different dtypes like bf16 by StarCycle in 28504
* Fix Switch Transformers When sparse_step = 1 by agemagician in 28564
* Save `Processor` by ydshieh in 27761
* Use `weights_only` only if torch >= 1.13 by ydshieh in 28506
* [`Core Tokenization`] Support a fix for spm fast models by ArthurZucker in 26678
* Use `LoggingLevel` context manager in 3 tests by ydshieh in 28575
* Fix the documentation checkpoint for xlm-roberta-xl by jeremyfowers in 28567
* [ASR Pipe] Update init to set model type and subsequently call parent init method by sanchit-gandhi in 28486
* [Whisper Tok] Move token ids to CPU when computing offsets by sanchit-gandhi in 28485
* [Whisper] Fix audio classification with weighted layer sum by sanchit-gandhi in 28563
* Making CTC training example more general by ylacombe in 28582
* Don't save `processor_config.json` if a processor has no extra attribute by ydshieh in 28584
* Fix wrong xpu device in DistributedType.MULTI_XPU mode by faaany in 28386
* [GPTNeoX] Fix BC issue with 4.36 by ArthurZucker in 28602

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* aaronjimv
* [Doc] Spanish translation of glossary.md (27958)
* [Doc] Fix token link in What 🤗 Transformers can do (28123)
* [docs] Fix mistral link in mixtral.md (28143)
* [docs] Sort es/toctree.yml | Translate performance.md (28262)
* rajveer43
* Add model_docs from cpmant.md to derformable_detr.md (27884)
* Fix indentation error - semantic_segmentation.md (28117)
* poedator
* 4D `attention_mask` support (27539)
* [bnb] Let's make serialization of 4bit models possible (26037)
* connor-henderson
* Add FastSpeech2Conformer (23439)
* JustinLin610
* Add qwen2 (28436)
* SangbumChoi
* enable training mask2former and maskformer for transformers trainer by SangbumChoi in 28277
* [DETA] Improvement and Sync from DETA especially for training by SangbumChoi in 27990
* fix auxiliary loss training in DetrSegmentation by SangbumChoi in 28354

1.10.0

1.9

The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.

* Byebye pytorch 1.9 by ydshieh in 24080

RoPE scaling

This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:
- Linear scaling
- Dynamic NTK scaling

* Llama/GPTNeoX: add RoPE scaling by gante in 24653

Agents

Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.

* Tool types by LysandreJik in 24032

Tied weights load

Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.

* Tied weights load by sgugger in 24310
* Clean load keys by sgugger in 24505

Whisper word-level timestamps

This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.

* add word-level timestamps to Whisper by hollance in 23205

Auto model addition

A new auto model is added, `AutoModelForTextEncoding`. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.

* [AutoModel] Add AutoModelForTextEncoding by sanchit-gandhi in 24305

Model deprecation

Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them.
(enfin ça
The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:

- BORT
- M-CTC-T
- MMBT
- RetriBERT
- TAPEX
- Trajectory Transformer
- VAN

* Deprecate models by sgugger in 24787

Breaking changes

Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.

* ⚠️⚠️[`T5Tokenize`] Fix T5 family tokenizers⚠️⚠️ by ArthurZucker in 24565

Bugfixes and improvements

* add trust_remote_code option to CLI download cmd by radames in 24097
* Fix typo in Llama docstrings by Kh4L in 24020
* Avoid `GPT-2` daily CI job OOM (in TF tests) by ydshieh in 24106
* [Lllama] Update tokenization code to ensure parsing of the special tokens [core] by ArthurZucker in 24042
* PLAM => PaLM by xingener in 24129
* [`bnb`] Fix bnb config json serialization by younesbelkada in 24137
* Correctly build models and import call_context for older TF versions by Rocketknight1 in 24138
* Generate: PT's `top_p` enforces `min_tokens_to_keep` when it is `1` by gante in 24111
* fix bugs with trainer by pacman100 in 24134
* Fix TF Rag OOM issue by ydshieh in 24122
* Fix SAM OOM issue on CI by ydshieh in 24125
* Fix XGLM OOM on CI by ydshieh in 24123
* [`SAM`] Fix sam slow test by younesbelkada in 24140
* [lamaTokenizerFast] Update documentation by ArthurZucker in 24132
* [BlenderBotSmall] Update doc example by ArthurZucker in 24092
* Fix Pipeline CI OOM issue by ydshieh in 24124
* [documentation] grammatical fixes in image_classification.mdx by LiamSwayne in 24141
* Fix typo in streamers.py by freddiev4 in 24144
* [tests] fix bitsandbytes import issue by stas00 in 24151
* Avoid OOM in doctest CI by ydshieh in 24139
* Fix `Wav2Vec2` CI OOM by ydshieh in 24190
* Fix push to hub by NielsRogge in 24187
* Change ProgressCallback to use dynamic_ncols=True by gmlwns2000 in 24101
* [i18n]Translated "attention.mdx" to korean by kihoon71 in 23878
* Generate: force caching on the main model, in assisted generation by gante in 24177
* Fix device issue in `OpenLlamaModelTest::test_model_parallelism` by ydshieh in 24195
* Update `GPTNeoXLanguageGenerationTest` by ydshieh in 24193
* typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by zsj9509 in 24184
* Generate: detect special architectures when loaded from PEFT by gante in 24198
* 🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by kihoon71 in 23977
* 🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by muellerzr in 24028
* Fix `_load_pretrained_model` by SunMarc in 24200
* Fix steps bugs in no trainer examples by Ethan-yt in 24197
* Skip RWKV test in past CI by ydshieh in 24204
* Remove unnecessary aten::to overhead in llama by fxmarty in 24203
* Update `WhisperForAudioClassification` doc example by ydshieh in 24188
* Finish dataloader integration by muellerzr in 24201
* Add the number of `model` test failures to slack CI report by ydshieh in 24207
* fix: TextIteratorStreamer cannot work with pipeline by yuanwu2017 in 23641
* Update `(TF)SamModelIntegrationTest` by ydshieh in 24199
* Improving error message when using `use_safetensors=True`. by Narsil in 24232
* Safely import pytest in testing_utils.py by amyeroberts in 24241
* fix overflow when training mDeberta in fp16 by sjrl in 24116
* deprecate `use_mps_device` by pacman100 in 24239
* Tied params cleanup by sgugger in 24211
* [Time Series] use mean scaler when scaling is a boolean True by kashif in 24237
* TF: standardize `test_model_common_attributes` for language models by gante in 23457
* Generate: GenerationConfig can overwrite attributes at from_pretrained time by gante in 24238
* Add `torch >=1.12` requirement for `Tapas` by ydshieh in 24251
* Update urls in warnings for rich rendering by IvanReznikov in 24136
* Fix how we detect the TF package by Rocketknight1 in 24255
* Stop storing references to bound methods via tf.function by Rocketknight1 in 24146
* Skip `GPT-J` fx tests for torch < 1.12 by ydshieh in 24256
* docs wrt using accelerate launcher with trainer by pacman100 in 24250
* update FSDP save and load logic by pacman100 in 24249
* Fix URL in comment for contrastive loss function by taepd in 24271
* QA doc: import torch before it is used by ByronHsu in 24228
* Skip some `TQAPipelineTests` tests in past CI by ydshieh in 24267
* TF: CTRL with native embedding layers by gante in 23456
* Adapt Wav2Vec2 conversion for MMS lang identification by patrickvonplaten in 24234
* Update check of core deps by sgugger in 24277
* `Pix2StructImageProcessor` requires `torch>=1.11.0` by ydshieh in 24270
* Fix Debertav2 embed_proj by WissamAntoun in 24205
* Clean up old Accelerate checks by sgugger in 24279
* Fix bug in slow tokenizer conversion, make it a lot faster by stephantul in 24266
* Fix `check_config_attributes`: check all configuration classes by ydshieh in 24231
* Fix LLaMa beam search when using parallelize by FeiWang96 in 24224
* remove unused is_decoder parameter in DetrAttention by JayL0321 in 24226
* Split common test from core tests by sgugger in 24284
* [fix] bug in BatchEncoding.__getitem__ by flybird1111 in 24293
* Fix image segmentation tool bug by amyeroberts in 23897
* [Docs] Improve docs for MMS loading of other languages by patrickvonplaten in 24292
* Update README_zh-hans.md by CooperFu in 24181
* deepspeed init during eval fix by pacman100 in 24298
* [EnCodec] Changes for 32kHz ckpt by sanchit-gandhi in 24296
* [Docs] Fix the paper URL for MMS model by hitchhicker in 24302
* Update tokenizer_summary.mdx (grammar) by belladoreai in 24286
* Beam search type by jprivera44 in 24288
* Make `can_generate` as class method by ydshieh in 24299
* Update test versions on README.md by sqali in 24307
* [`SwitchTransformers`] Fix return values by ArthurZucker in 24300
* Fix functional TF Whisper and modernize tests by Rocketknight1 in 24301
* Big TF test cleanup by Rocketknight1 in 24282
* Fix ner average grouping with no groups by Narsil in 24319
* Fix ImageGPT doc example by amyeroberts in 24317
* Add test for proper TF input signatures by Rocketknight1 in 24320
* Adding ddp_broadcast_buffers argument to Trainer by TevenLeScao in 24326
* error bug on saving distributed optim state when using data parallel by xshaun in 24108
* 🌐 [i18n-KO] Fixed `tutorial/preprocessing.mdx` by sim-so in 24156
* pin `apex` to a speicifc commit (for DeepSpeed CI docker image) by ydshieh in 24351
* byebye Hub connection timeout by ydshieh in 24350
* Clean up disk sapce during docker image build for `transformers-pytorch-gpu` by ydshieh in 24346
* Fix `KerasMetricCallback`: pass `generate_kwargs` even if `use_xla_generation` is False by Kripner in 24333
* Fix device issue in `SwitchTransformers` by ydshieh in 24352
* Update MMS integration docs by vineelpratap in 24311
* Make `AutoFormer` work with previous torch version by ydshieh in 24357
* Fix ImageGPT doctest by amyeroberts in 24353
* Fix link to documentation in Install from Source by SoyGema in 24336
* docs: add BentoML to awesome-transformers by aarnphm in 24344
* [Doc Fix] Fix model name path in the transformers doc for AutoClasses by riteshghorse in 24329
* Fix the order in `GPTNeo`'s docstring by qgallouedec in 24358
* Respect explicitly set framework parameter in pipeline by denis-ismailaj in 24322
* Allow passing kwargs through to TFBertTokenizer by Rocketknight1 in 24324
* Fix resuming PeftModel checkpoints in Trainer by llohann-speranca in 24274
* TensorFlow CI fixes by Rocketknight1 in 24360
* Update tiny models for pipeline testing. by ydshieh in 24364
* [modelcard] add audio classification to task list by sanchit-gandhi in 24363
* [Whisper] Make tests faster by sanchit-gandhi in 24105
* Rename test to be more accurate by sgugger in 24374
* Add a check in `ImageToTextPipeline._forward` by ydshieh in 24373
* [Tokenizer doc] Clarification about `add_prefix_space` by ArthurZucker in 24368
* style: add BitsAndBytesConfig __repr__ function by aarnphm in 24331
* Better test name and enable pipeline test for `pix2struct` by ydshieh in 24377
* Skip a tapas (tokenization) test in past CI by ydshieh in 24378
* [Whisper Docs] Nits by ArthurZucker in 24367
* [GPTNeoX] Nit in config by ArthurZucker in 24349
* [Wav2Vec2 - MMS] Correct directly loading adapters weights by patrickvonplaten in 24335
* Migrate doc files to Markdown. by sgugger in 24376
* Update deprecated torch.ger by kit1980 in 24387
* [docs] Fix NLLB-MoE links by stevhliu in 24388
* Add `ffmpeg` for `doc_test_job` on CircleCI by ydshieh in 24397
* byebye Hub connection timeout - Recast by ydshieh in 24399
* fix type annotation for debug arg by Bearnardd in 24033
* [Trainer] Fix optimizer step on PyTorch TPU by cowanmeg in 24389
* Fix gradient checkpointing + fp16 autocast for most models by younesbelkada in 24247
* Clean up dist import by muellerzr in 24402
* Check auto mappings could be imported via `from transformers` by ydshieh in 24400
* Remove redundant code from TrainingArgs by muellerzr in 24401
* Explicit arguments in `from_pretrained` by ydshieh in 24306
* [ASR pipeline] Check for torchaudio by sanchit-gandhi in 23953
* TF safetensors reduced mem usage by Rocketknight1 in 24404
* Skip `test_conditional_generation_pt_pix2struct` in Past CI (torch < 1.11) by ydshieh in 24417
* [`bnb`] Fix bnb serialization issue with new release by younesbelkada in 24416
* Revert "Fix gradient checkpointing + fp16 autocast for most models" by younesbelkada in 24420
* Fix `save_cache` version in `config.yml` by ydshieh in 24419
* Update RayTune doc link for Hyperparameter tuning by JoshuaEPSamuel in 24422
* TF CI fix for Segformer by Rocketknight1 in 24426
* Refactor hyperparameter search backends by alexmojaki in 24384
* Clarify batch size displayed when using DataParallel by sgugger in 24430
* Save `site-packages` as cache in CircleCI job by ydshieh in 24424
* [llama] Fix comments in weights converter by weimingzha0 in 24436
* [`Trainer`] Fix `.to` call on 4bit models by younesbelkada in 24444
* fix the grad_acc issue at epoch boundaries by pacman100 in 24415
* Replace python random with torch.rand to enable dynamo.export by BowenBao in 24434
* Fix typo by siryuon in 24440
* Fix some `TFWhisperModelIntegrationTests` by ydshieh in 24428
* fixes issue when saving fsdp via accelerate's FSDP plugin by pacman100 in 24446
* Allow dict input for audio classification pipeline by sanchit-gandhi in 23445
* Update `JukeboxConfig.from_pretrained` by ydshieh in 24443
* Improved keras imports by Rocketknight1 in 24448
* add missing alignment_heads to Whisper integration test by hollance in 24487
* Fix tpu_metrics_debug by cowanmeg in 24452
* Update AlbertModel type annotation by amyeroberts in 24450
* [`pipeline`] Fix str device issue by younesbelkada in 24396
* when resume from peft checkpoint, the model should be trainable by sywangyi in 24463
* deepspeed z1/z2 state dict fix by pacman100 in 24489
* Update `InstructBlipModelIntegrationTest` by ydshieh in 24490
* Update token_classification.md by condor-cp in 24484
* Add support for for loops in python interpreter by sgugger in 24429
* [`InstructBlip`] Add accelerate support for instructblip by younesbelkada in 24488
* Compute `dropout_probability` only in training mode by ydshieh in 24486
* Fix 'local_rank' AttiributeError in Trainer class by mocobeta in 24297
* Compute `dropout_probability` only in training mode (SpeechT5) by ydshieh in 24498
* Fix link in utils by SoyGema in 24501
* 🚨🚨 Fix group beam search by hukuda222 in 24407
* Generate: `group_beam_search` requires `diversity_penalty>0.0` by gante in 24456
* Generate: `min_tokens_to_keep` has to be `>= 1` by gante in 24453
* Fix TypeError: Object of type int64 is not JSON serializable by xiaoli in 24340
* Fix poor past ci by ydshieh in 24485
* 🌐 [i18n-KO] Translated `tflite.mdx` to Korean by 0525hhgus in 24435
* use accelerate autocast in jit eval path, since mix precision logic is… by sywangyi in 24460
* Update hyperparameter_search.py by pacman100 in 24515
* [`T5`] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by sjrl in 24481
* set model to training mode before accelerate.prepare by sywangyi in 24520
* Update `huggingface_hub` commit sha by ydshieh in 24527
* Find module name in an OS-agnostic fashion by sgugger in 24526
* Fix LR scheduler based on bs from auto bs finder by muellerzr in 24521
* [Mask2Former] Remove SwinConfig by NielsRogge in 24259
* Allow backbones not in backbones_supported - Maskformer Mask2Former by amyeroberts in 24532
* Fix Typo by tony9402 in 24530
* Finishing tidying keys to ignore on load by sgugger in 24535
* Add bitsandbytes support for gpt2 models by DarioSucic in 24504
* ⚠️ Time to say goodbye to py37 by ydshieh in 24091
* Unpin DeepSpeed and require DS >= 0.9.3 by ydshieh in 24541
* Allow for warn_only selection in enable_full_determinism by Frank995 in 24496
* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by mryab in 24549
* Update PT/TF weight conversion after 24030 by ydshieh in 24547
* Update `EncodecIntegrationTest` by ydshieh in 24553
* [`gpt2-int8`] Add gpt2-xl int8 test by younesbelkada in 24543
* Fix processor __init__ bug if image processor undefined by amyeroberts in 24554
* [`InstructBlip`] Add instruct blip int8 test by younesbelkada in 24555
* Update PT/Flax weight conversion after 24030 by ydshieh in 24556
* Make PT/Flax tests could be run on GPU by ydshieh in 24557
* Update masked_language_modeling.md by condor-cp in 24560
* Fixed OwlViTModel inplace operations by pasqualedem in 24529
* Update old existing feature extractor references by amyeroberts in 24552
* Fix Typo by tony9402 in 24559
* Fix annotations by tony9402 in 24571
* Docs: 4 bit doc corrections by gante in 24572
* Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by sgugger in 24574
* Update some torchscript tests after 24505 by ydshieh in 24566
* Removal of deprecated vision methods and specify deprecation versions by amyeroberts in 24570
* Fix ESM models buffers by sgugger in 24576
* Check all objects are equally in the main `__init__` file by ydshieh in 24573
* Fix annotations by tony9402 in 24582
* fix peft ckpts not being pushed to hub by pacman100 in 24578
* Udate link to RunHouse hardware setup documentation. by BioGeek in 24590
* Show a warning for missing attention masks when pad_token_id is not None by hackyon in 24510
* Make (TF) CI faster (test only a subset of model classes) by ydshieh in 24592
* Speed up TF tests by reducing hidden layer counts by Rocketknight1 in 24595
* [several models] improve readability by stas00 in 24585
* Use protobuf 4 by ydshieh in 24599
* Limit Pydantic to V1 in dependencies by lig in 24596
* 🌐 [i18n-KO] Translated `perplexity.mdx` to Korean by HanNayeoniee in 23850
* [Time-Series] Added blog-post to tips by elisim in 24482
* Pin `Pillow` for now by ydshieh in 24633
* Fix loading dataset docs link in run_translation.py example by SoyGema in 24594
* Generate: multi-device support for contrastive search by gante in 24635
* Generate: force cache with `inputs_embeds` forwarding by gante in 24639
* precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by shahad-mahmud in 24618
* Fix audio feature extractor deps by sanchit-gandhi in 24636
* llama fp16 torch.max bug fix by prathikr in 24561
* documentation_tests.txt - sort filenames alphabetically by amyeroberts in 24647
* Update warning messages reffering to post_process_object_detection by rafaelpadilla in 24649
* Add `finetuned_from` property in the autogenerated model card by sgugger in 24528
* Make warning disappear for remote code in pipelines by sgugger in 24603
* Fix `EncodecModelTest::test_multi_gpu_data_parallel_forward` by ydshieh in 24663
* Fix `VisionTextDualEncoderIntegrationTest` by ydshieh in 24661
* Add `is_torch_mps_available` function to utils by NripeshN in 24660
* Unpin `huggingface_hub` by ydshieh in 24667
* Fix model referenced and results in documentation. Model mentioned was inaccessible by rafaelpadilla in 24609
* Add Nucleotide Transformer notebooks and restructure notebook list by Rocketknight1 in 24669
* LlamaTokenizer should be picklable by icyblade in 24681
* Add dropouts to GPT-NeoX by ZHAOTING in 24680
* DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by pacman100 in 24591
* Avoid import `sentencepiece_model_pb2` in `utils.__init__.py` by ydshieh in 24689
* Fix integration with Accelerate and failing test by muellerzr in 24691
* [`MT5`] Fix CONFIG_MAPPING issue leading it to load umt5 class by ArthurZucker in 24678
* Fix flaky `test_for_warning_if_padding_and_no_attention_mask` by ydshieh in 24706
* Whisper: fix prompted max length by gante in 24666
* Enable `conversational` pipeline for `GPTSw3Tokenizer` by saattrupdan in 24648
* [`T5`] Adding model_parallel = False to `T5ForQuestionAnswering` and `MT5ForQuestionAnswering` by sjrl in 24684
* Docs: change some `input_ids` doc reference from `BertTokenizer` to `AutoTokenizer` by gante in 24730
* add link to accelerate doc by SunMarc in 24601
* [Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by ArthurZucker in 24622
* Fix typo in LocalAgent by jamartin9 in 24736
* fix: Text splitting in the BasicTokenizer by connor-henderson in 22280
* Docs: add `kwargs` type to fix formatting by gante in 24733
* add gradient checkpointing for distilbert by jordane95 in 24719
* Skip keys not in the state dict when finding mismatched weights by sgugger in 24749
* Fix non-deterministic Megatron-LM checkpoint name by janEbert in 24674
* [InstructBLIP] Fix bos token of LLaMa checkpoints by NielsRogge in 24492
* Skip some slow tests for doctesting in PRs (Circle)CI by ydshieh in 24753
* Fix lr scheduler not being reset on reruns by muellerzr in 24758
* :bug: Handle empty gen_kwargs for seq2seq trainer prediction_step function by gkumbhat in 24759
* Allow existing configs to be registered by sgugger in 24760
* Unpin protobuf in docker file (for daily CI) by ydshieh in 24761
* Fix eval_accumulation_steps leading to incorrect metrics by muellerzr in 24756
* Add MobileVitV2 to doctests by amyeroberts in 24771
* Docs: Update logit processors __call__ docs by gante in 24729
* Replacement of 20 asserts with exceptions by Baukebrenninkmeijer in 24757
* Update default values of bos/eos token ids in `CLIPTextConfig` by ydshieh in 24773
* Fix pad across processes dim in trainer and not being able to set the timeout by muellerzr in 24775
* gpt-bigcode: avoid `zero_` to support Core ML by pcuenca in 24755
* Remove WWT from README by LysandreJik in 24672
* Rm duplicate pad_across_processes by muellerzr in 24780
* Revert "Unpin protobuf in docker file (for daily CI)" by ydshieh in 24800
* Removing unnecessary `device=device` in modeling_llama.py by Liyang90 in 24696
* [fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by SeongBeomLEE in 24769
* [DOC] Clarify relationshi load_best_model_at_end and save_total_limit by BramVanroy in 24614
* Upgrade jax/jaxlib/flax pin versions by ydshieh in 24791
* Fix MobileVitV2 doctest checkpoint by amyeroberts in 24805
* Skip torchscript tests for `MusicgenForConditionalGeneration` by ydshieh in 24782
* Generate: add SequenceBiasLogitsProcessor by gante in 24334
* Add accelerate version in transformers-cli env by amyeroberts in 24806
* Fix typo 'submosules' by dymil in 24809
* Remove Falcon docs for the release until TGI is ready by Rocketknight1 in 24808
* Update setup.py to be compatible with pipenv by georgiemathews in 24789
* Use _BaseAutoModelClass's register method by fadynakhla in 24810
* Run hub tests by sgugger in 24807
* Copy code when using local trust remote code by sgugger in 24785
* Fixing double `use_auth_token.pop` (preventing private models from being visible). by Narsil in 24812
* set correct model input names for gptsw3tokenizer by DarioSucic in 24788
* Check models used for common tests are small by sgugger in 24824
* [🔗 Docs] Fixed Incorrect Migration Link by kadirnar in 24793
* deprecate `sharded_ddp` training argument by statelesshz in 24825
* 🌐 [i18n-KO] Translated `custom_tools.mdx` to Korean by sim-so in 24580
* Remove unused code in GPT-Neo by namespace-Pt in 24826
* Add Multimodal heading and Document question answering in task_summary.mdx by y3sar in 23318
* Fix `is_vision_available` by ydshieh in 24853
* Fix comments for `_merge_heads` by bofenghuang in 24855
* fix broken links in READMEs by younesbelkada in 24861
* Add TAPEX to the list of deprecated models by sgugger in 24859

* Fix token pass by sgugger in 24862

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* hollance
* [WIP] add EnCodec model (23655)
* add word-level timestamps to Whisper (23205)
* add missing alignment_heads to Whisper integration test (24487)
* sim-so
* 🌐 [i18n-KO] Fixed `tutorial/preprocessing.mdx` (24156)
* 🌐 [i18n-KO] Translated `custom_tools.mdx` to Korean (24580)
* novice03
* Add Multi Resolution Analysis (MRA) (New PR) (24513)
* jegork
* Add ViViT (22518)

1.6

Llava next is the next version of Llava, which includes better support for non padded images, improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.

Compared with LLaVA-1.5, LLaVA-NeXT has several improvements:
- Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.
- Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture.
- Better visual conversation for more scenarios, covering different applications.
- Better world knowledge and logical reasoning.
- Along with performance improvements, LLaVA-NeXT maintains the minimalist design and data efficiency of LLaVA-1.5. It re-uses the pretrained connector of LLaVA-1.5, and still uses less than 1M visual instruction tuning samples. The largest 34B variant finishes training in ~1 day with 32 A100s.*

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava_next_overview.png"
alt="drawing" width="600"/>

<small> LLaVa-NeXT incorporates a higher input resolution by encoding various patches of the input image. Taken from the <a href="https://arxiv.org/abs/2310.03744">original paper.</a> </small>

MusicGen Melody

The MusicGen Melody model was proposed in [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.

MusicGen Melody is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

* Add MusicGen Melody by ylacombe in 28819

PvT-v2

The PVTv2 model was proposed in [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. As an improved variant of PVT, it eschews position embeddings, relying instead on positional information encoded through zero-padding and overlapping patch embeddings. This lack of reliance on position embeddings simplifies the architecture, and enables running inference at any resolution without needing to interpolate them.

* Add PvT-v2 Model by FoamoftheSea in 26812

UDOP

The UDOP model was proposed in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal. UDOP adopts an encoder-decoder Transformer architecture based on [T5](https://huggingface.co/docs/transformers/main/en/model_doc/t5) for document AI tasks like document image classification, document parsing and document visual question answering.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/udop_architecture.jpg"
alt="drawing" width="600"/>

<small> UDOP architecture. Taken from the <a href="https://arxiv.org/abs/2212.02623">original paper.</a> </small>

* Add UDOP by NielsRogge in 22940

Mamba

This model is a new paradigm architecture based on state-space-models, rather than attention like transformer models.
The checkpoints are compatible with the original ones

* [`Add Mamba`] Adds support for the `Mamba` models by ArthurZucker in 28094

StarCoder2

StarCoder2 is a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. The flagship StarCoder2-15B model is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2. All models use Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and were trained using the Fill-in-the-Middle objective.

* Starcoder2 model - bis by RaymondLi0 in 29215

SegGPT

The SegGPT model was proposed in [SegGPT: Segmenting Everything In Context](https://arxiv.org/abs/2304.03284) by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. SegGPT employs a decoder-only Transformer that can generate a segmentation mask given an input image, a prompt image and its corresponding prompt mask. The model achieves remarkable one-shot results with 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000.

* Adding SegGPT by EduardoPach in 27735

Galore optimizer

![image](https://cdn-uploads.huggingface.co/production/uploads/61f4d468587c793cdf55b4dd/RPcpdcYkoUR8PwkTvjYJ0.png)

With [Galore](https://huggingface.co/papers/2403.03507), you can pre-train large models on consumer-type hardwares, making LLM pre-training much more accessible to anyone from the community.

> Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.

Galore is based on low rank approximation of the gradients and can be used out of the box for any model.

Below is a simple snippet that demonstrates how to pre-train `mistralai/Mistral-7B-v0.1` on imdb:

python
import torch
import datasets
from transformers import TrainingArguments, AutoConfig, AutoTokenizer, AutoModelForCausalLM
import trl

train_dataset = datasets.load_dataset('imdb', split='train')

args = TrainingArguments(
output_dir="./test-galore",
max_steps=100,
per_device_train_batch_size=2,
optim="galore_adamw",
optim_target_modules=["attn", "mlp"]
)

model_id = "mistralai/Mistral-7B-v0.1"

config = AutoConfig.from_pretrained(model_id)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_config(config).to(0)

trainer = trl.SFTTrainer(
model=model,
args=args,
train_dataset=train_dataset,
dataset_text_field='text',
max_seq_length=512,
)

trainer.train()


Quantization

Quanto integration

Quanto has been integrated with transformers ! You can apply simple quantization algorithms with few lines of code with tiny changes. Quanto is also compatible with `torch.compile`

Check out [the announcement blogpost](https://huggingface.co/blog/quanto-introduction) for more details

* [Quantization] Quanto quantizer by SunMarc in 29023

Exllama 🤝 AWQ

Exllama and AWQ combined together for faster AWQ inference - check out the relevant documentation section for more details on how to use Exllama + AWQ.

* Exllama kernels support for AWQ models by IlyasMoutawwakil in 28634

MLX Support

Allow models saved or fine-tuned with Apple’s [MLX framework](https://github.com/ml-explore/mlx) to be loaded in transformers (as long as the model parameters use the same names), and improve tensor interoperability. This leverages MLX's adoption of [safetensors](https://huggingface.co/docs/safetensors/en/index) as their checkpoint format.

* Add mlx support to BatchEncoding.convert_to_tensors by Y4hL in 29406
* Add support for metadata format MLX by alexweberk in 29335
* Typo in mlx tensor support by pcuenca in 29509
* Experimental loading of MLX files by pcuenca in 29511

Highligted improvements

Notable memory reduction in Gemma/LLaMa by changing the causal mask buffer type from int64 to boolean.

* Use `torch.bool` instead of `torch.int64` for non-persistant causal mask buffer by fxmarty in 29241

Remote code improvements

* Allow remote code repo names to contain "." by Rocketknight1 in 29175
* simplify get_class_in_module and fix for paths containing a dot by cebtenzzre in 29262

Breaking changes

The PRs below introduced slightly breaking changes that we believed was necessary for the repository; if these seem to impact your usage of transformers, we recommend checking out the PR description to get more insights in how to leverage the new behavior.

* 🚨🚨[Whisper Tok] Update integration test by sanchit-gandhi in 29368
* 🚨 Fully revert atomic checkpointing 🚨 by muellerzr in 29370
* [BC 4.37 -> 4.38] for Llama family, memory and speed 29753 (causal mask is no longer a registered buffer)
Fixes and improvements

* FIX [`Gemma`] Fix bad rebase with transformers main by younesbelkada in 29170
* Add training version check for AQLM quantizer. by BlackSamorez in 29142
* [Gemma] Fix eager attention by sanchit-gandhi in 29187
* [Mistral, Mixtral] Improve docs by NielsRogge in 29084
* Fix `torch.compile` with `fullgraph=True` when `attention_mask` input is used by fxmarty in 29211
* fix(mlflow): check mlflow version to use the synchronous flag by cchen-dialpad in 29195
* Fix missing translation in README_ru by strikoder in 29054
* Improve _update_causal_mask performance by alessandropalla in 29210
* [`Doc`] update model doc qwen2 by ArthurZucker in 29238
* Use torch 2.2 for daily CI (model tests) by ydshieh in 29208
* Cache `is_vision_available` result by bmuskalla in 29280
* Use `DS_DISABLE_NINJA=1` by ydshieh in 29290
* Add `non_device_test` pytest mark to filter out non-device tests by fxmarty in 29213
* Add feature extraction mapping for automatic metadata update by merveenoyan in 28944
* Generate: v4.38 removals and related updates by gante in 29171
* Track each row separately for stopping criteria by zucchini-nlp in 29116
* [docs] Spanish translation of tasks_explained.md by aaronjimv in 29224
* [i18n-zh] Translated torchscript.md into Chinese by windsonsea in 29234
* 🌐 [i18n-ZH] Translate chat_templating.md into Chinese by shibing624 in 28790
* [i18n-vi] Translate README.md to Vietnamese by hoangsvit in 29229
* [i18n-zh] Translated task/asr.md into Chinese by windsonsea in 29233
* Fixed Deformable Detr typo when loading cuda kernels for MSDA by EduardoPach in 29294
* GenerationConfig validate both constraints and force_words_ids by FredericOdermatt in 29163
* Add generate kwargs to VQA pipeline by regisss in 29134
* Cleaner Cache `dtype` and `device` extraction for CUDA graph generation for quantizers compatibility by BlackSamorez in 29079
* Image Feature Extraction docs by merveenoyan in 28973
* Fix `attn_implementation` documentation by fxmarty in 29295
* [tests] enable benchmark unit tests on XPU by faaany in 29284
* Use torch 2.2 for deepspeed CI by ydshieh in 29246
* Add compatibility with skip_memory_metrics for mps device by SunMarc in 29264
* Token level timestamps for long-form generation in Whisper by zucchini-nlp in 29148
* Fix a few typos in `GenerationMixin`'s docstring by sadra-barikbin in 29277
* [i18n-zh] Translate fsdp.md into Chinese by windsonsea in 29305
* FIX [`Gemma` / `CI`] Make sure our runners have access to the model by younesbelkada in 29242
* Remove numpy usage from owlvit by fxmarty in 29326
* [`require_read_token`] fix typo by ArthurZucker in 29345
* [`T5 and Llama Tokenizer`] remove warning by ArthurZucker in 29346
* [`Llama ROPE`] Fix torch export but also slow downs in forward by ArthurZucker in 29198
* Disable Mixtral `output_router_logits` during inference by LeonardoEmili in 29249
* Idefics: generate fix by gante in 29320
* RoPE loses precision for Llama / Gemma + Gemma logits.float() by danielhanchen in 29285
* check if position_ids exists before using it by jiqing-feng in 29306
* [CI] Quantization workflow by SunMarc in 29046
* Better SDPA unmasking implementation by fxmarty in 29318
* [i18n-zh] Sync source/zh/index.md by windsonsea in 29331
* FIX [`CI` / `starcoder2`] Change starcoder2 path to correct one for slow tests by younesbelkada in 29359
* FIX [`CI`]: Fix failing tests for peft integration by younesbelkada in 29330
* FIX [`CI`] `require_read_token` in the llama FA2 test by younesbelkada in 29361
* Avoid using uncessary `get_values(MODEL_MAPPING)` by ydshieh in 29362
* Patch YOLOS and others by NielsRogge in 29353
* Fix require_read_token in tests by Wauplin in 29367
* Expose `offload_buffers` parameter of `accelerate` to `PreTrainedModel.from_pretrained` method by notsyncing in 28755
* Fix Base Model Name of LlamaForQuestionAnswering by lenglaender in 29258
* FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes by younesbelkada in 29329
* [`Llama + AWQ`] fix `prepare_inputs_for_generation` 🫠 by ArthurZucker in 29381
* [`YOLOS`] Fix - return padded annotations by amyeroberts in 29300
* Support subfolder with `AutoProcessor` by JingyaHuang in 29169
* Fix llama + gemma accelete tests by SunMarc in 29380
* Fix deprecated arg issue by muellerzr in 29372
* Correct zero division error in inverse sqrt scheduler by DavidAfonsoValente in 28982
* [tests] enable automatic speech recognition pipeline tests on XPU by faaany in 29308
* update path to hub files in the error message by poedator in 29369
* [Mixtral] Fixes attention masking in the loss by DesmonDay in 29363
* Workaround for 27758 to avoid ZeroDivisionError by tleyden in 28756
* Convert SlimSAM checkpoints by NielsRogge in 28379
* Fix: Fixed the previous tracking URI setting logic to prevent clashes with original MLflow code. by seanswyi in 29096
* Fix OneFormer `post_process_instance_segmentation` for panoptic tasks by nickthegroot in 29304
* Fix grad_norm unserializable tensor log failure by svenschultze in 29212
* Avoid edge case in audio utils by ylacombe in 28836
* DeformableDETR support bfloat16 by DonggeunYu in 29232
* [Docs] Spanish Translation -Torchscript md & Trainer md by njackman-2344 in 29310
* FIX [`Generation`] Fix some issues when running the MaxLength criteria on CPU by younesbelkada in 29317
* Fix max length for BLIP generation by zucchini-nlp in 29296
* [docs] Update starcoder2 paper link by xenova in 29418
* [tests] enable test_pipeline_accelerate_top_p on XPU by faaany in 29309
* [`UdopTokenizer`] Fix post merge imports by ArthurZucker in 29451
* more fix by ArthurZucker (direct commit on main)
* Revert-commit 0d52f9f582efb82a12e8d9162b43a01b1aa0200f by ArthurZucker in 29455
* [`Udop imports`] Processor tests were not run. by ArthurZucker in 29456
* Generate: inner decoding methods are no longer public by gante in 29437
* Fix bug with passing capture_* args to neptune callback by AleksanderWWW in 29041
* Update pytest `import_path` location by loadams in 29154
* Automatic safetensors conversion when lacking these files by LysandreJik in 29390
* [i18n-zh] Translate add_new_pipeline.md into Chinese by windsonsea in 29432
* 🌐 [i18n-KO] Translated generation_strategies.md to Korean by AI4Harmony in 29086
* [FIX] `offload_weight()` takes from 3 to 4 positional arguments but 5 were given by faaany in 29457
* [`Docs` / `Awq`] Add docs on exllamav2 + AWQ by younesbelkada in 29474
* [`docs`] Add starcoder2 docs by younesbelkada in 29454
* Fix TrainingArguments regression with torch <2.0.0 for dataloader_prefetch_factor by ringohoffman in 29447
* Generate: add tests for caches with `pad_to_multiple_of` by gante in 29462
* Generate: get generation mode from the generation config instance 🧼 by gante in 29441
* Avoid dummy token in PLD to optimize performance by ofirzaf in 29445
* Fix test failure on DeepSpeed by muellerzr in 29444
* Generate: torch.compile-ready generation config preparation by gante in 29443
* added the max_matching_ngram_size to GenerationConfig by mosheber in 29131
* Fix `TextGenerationPipeline.__call__` docstring by alvarobartt in 29491
* Substantially reduce memory usage in _update_causal_mask for large batches by using .expand instead of .repeat [needs tests+sanity check] by nqgl in 29413
* Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS device by currybab in 29439
* Enable BLIP for auto VQA by regisss in 29499
* v4.39 deprecations 🧼 by gante in 29492
* Revert "Automatic safetensors conversion when lacking these files by LysandreJik in 2…
* fix: Avoid error when fsdp_config is missing xla_fsdp_v2 by ashokponkumar in 29480
* Flava multimodal add attention mask by zucchini-nlp in 29446
* test_generation_config_is_loaded_with_model - fall back to pytorch model for now by amyeroberts in 29521
* Set `inputs` as kwarg in `TextClassificationPipeline` by alvarobartt in 29495
* Fix `VisionEncoderDecoder` Positional Arg by nickthegroot in 29497
* Generate: left-padding test, revisited by gante in 29515
* [tests] add the missing `require_sacremoses` decorator by faaany in 29504
* fix image-to-text batch incorrect output issue by sywangyi in 29342
* Typo fix in error message by clefourrier in 29535
* [tests] use `torch_device` instead of `auto` for model testing by faaany in 29531
* StableLM: Fix dropout argument type error by liangjs in 29236
* Make sliding window size inclusive in eager attention by jonatanklosko in 29519
* fix typos in FSDP config parsing logic in `TrainingArguments` by yundai424 in 29189
* Fix WhisperNoSpeechDetection when input is full silence by ylacombe in 29065
* [tests] use the correct `n_gpu` in `TrainerIntegrationTest::test_train_and_eval_dataloaders` for XPU by faaany in 29307
* Fix eval thread fork bomb by muellerzr in 29538
* feat: use `warning_advice` for tensorflow warning by winstxnhdw in 29540
* [`Mamba doc`] Post merge updates by ArthurZucker in 29472
* [`Docs`] fixed minor typo by j-gc in 29555
* Add Fill-in-the-middle training objective example - PyTorch by tanaymeh in 27464
* Bark model Flash Attention 2 Enabling to pass on check_device_map parameter to super() by damithsenanayake in 29357
* Make torch xla available on GPU by yitongh in 29334
* [Docs] Fix FastSpeech2Conformer model doc links by khipp in 29574
* Don't use a subset in test fetcher if on `main` branch by ydshieh in 28816
* fix error: TypeError: Object of type Tensor is not JSON serializable … by yuanzhoulvpi2017 in 29568
* Add missing localized READMEs to the copies check by khipp in 29575
* Fixed broken link by amritgupta98 in 29558
* Tiny improvement for doc by fzyzcjy in 29581
* Fix Fuyu doc typos by zucchini-nlp in 29601
* Fix minor typo: softare => software by DriesVerachtert in 29602
* Stop passing None to compile() in TF examples by Rocketknight1 in 29597
* Fix typo (determine) by koayon in 29606
* Implemented add_pooling_layer arg to TFBertModel by tomigee in 29603
* Update legacy Repository usage in various example files by Hvanderwilk in 29085
* Set env var to hold Keras at Keras 2 by Rocketknight1 in 29598
* Update flava tests by ydshieh in 29611
* Fix typo ; Update quantization.md by furkanakkurt1335 in 29615
* Add tests for batching support by zucchini-nlp in 29297
* Fix: handle logging of scalars in Weights & Biases summary by parambharat in 29612
* Examples: check `max_position_embeddings` in the translation example by gante in 29600
* [`Gemma`] Supports converting directly in half-precision by younesbelkada in 29529
* [Flash Attention 2] Add flash attention 2 for GPT-J by bytebarde in 28295
* Core: Fix copies on main by younesbelkada in 29624
* [Whisper] Deprecate forced ids for v4.39 by sanchit-gandhi in 29485
* Warn about tool use by LysandreJik in 29628
* Adds pretrained IDs directly in the tests by LysandreJik in 29534
* [generate] deprecate forced ids processor by sanchit-gandhi in 29487
* Fix minor typo: infenrece => inference by DriesVerachtert in 29621
* [`MaskFormer`, `Mask2Former`] Use einsum where possible by amyeroberts in 29544
* Llama: allow custom 4d masks by gante in 29618
* [PyTorch/XLA] Fix extra TPU compilations introduced by recent changes by alanwaketan in 29158
* [docs] Spanish translate chat_templating.md & yml addition by njackman-2344 in 29559
* Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by pacman100 in 29587
* [`Mask2Former`] Move normalization for numerical stability by amyeroberts in 29542
* [tests] make `test_trainer_log_level_replica` to run on accelerators with more than 2 devices by faaany in 29609
* Refactor TFP call to just sigmoid() by Rocketknight1 in 29641
* Fix batching tests for new models (Mamba and SegGPT) by zucchini-nlp in 29633
* Fix `multi_gpu_data_parallel_forward` for `MusicgenTest` by ydshieh in 29632
* [docs] Remove broken ChatML format link from chat_templating.md by aaronjimv in 29643
* Add newly added PVTv2 model to all README files. by robinverduijn in 29647
* [`PEFT`] Fix `save_pretrained` to make sure adapters weights are also saved on TPU by shub-kris in 29388
* Fix TPU checkpointing inside Trainer by shub-kris in 29657
* Add `dataset_revision` argument to `RagConfig` by ydshieh in 29610
* Fix PVT v2 tests by ydshieh in 29660
* Generate: handle `cache_position` update in `generate` by gante in 29467
* Allow apply_chat_template to pass kwargs to the template and support a dict of templates by Rocketknight1 in 29658
* Inaccurate code example within inline code-documentation by MysteryManav in 29661
* Extend import utils to cover "editable" torch versions by bhack in 29000
* Trainer: fail early in the presence of an unsavable `generation_config` by gante in 29675
* Pipeline: use tokenizer pad token at generation time if the model pad token is unset. by gante in 29614
* [tests] remove deprecated tests for model loading by faaany in 29450
* Fix AutoformerForPrediction example code by m-torhan in 29639
* [tests] ensure device-required software is available in the testing environment before testing by faaany in 29477
* Fix wrong condition used in `filter_models` by ydshieh in 29673
* fix: typos by testwill in 29653
* Rename `glue` to `nyu-mll/glue` by lhoestq in 29679
* Generate: replace breaks by a loop condition by gante in 29662
* [FIX] Fix speech2test modeling tests by ylacombe in 29672
* Revert "Fix wrong condition used in `filter_models`" by ydshieh in 29682
* [docs] Spanish translation of attention.md by aaronjimv in 29681
* CI / generate: batch size computation compatible with all models by gante in 29671
* Fix `filter_models` by ydshieh in 29710
* FIX [`bnb`] Make `unexpected_keys` optional by younesbelkada in 29420
* Update the pipeline tutorial to include `gradio.Interface.from_pipeline` by abidlabs in 29684
* Use logging.warning instead of warnings.warn in pipeline.__call__ by tokestermw in 29717

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* windsonsea
* [i18n-zh] Translated torchscript.md into Chinese (29234)
* [i18n-zh] Translated task/asr.md into Chinese (29233)
* [i18n-zh] Translate fsdp.md into Chinese (29305)
* [i18n-zh] Sync source/zh/index.md (29331)
* [i18n-zh] Translate add_new_pipeline.md into Chinese (29432)
* hoangsvit
* [i18n-vi] Translate README.md to Vietnamese (29229)
* EduardoPach
* Fixed Deformable Detr typo when loading cuda kernels for MSDA (29294)
* Adding SegGPT (27735)
* RaymondLi0
* Starcoder2 model - bis (29215)
* njackman-2344
* [Docs] Spanish Translation -Torchscript md & Trainer md (29310)
* [docs] Spanish translate chat_templating.md & yml addition (29559)
* tanaymeh
* Add Fill-in-the-middle training objective example - PyTorch (27464)
* Hvanderwilk
* Update legacy Repository usage in various example files (29085)
* FoamoftheSea
* Add PvT-v2 Model (26812)
* saurabhdash2512
* Cohere Model Release (29622)

Page 30 of 33

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.