Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 27 of 30

2.2.0

Not secure
New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta

Four new models have been added in v2.2.0

- ALBERT (Pytorch & TF) (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
- CamemBERT (Pytorch) (from Facebook AI Research, INRIA, and La Sorbonne Université), as the first large-scale Transformer language model. Released alongside the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot. It was added by louismartin with the help of julien-c.
- DistilRoberta (Pytorch & TF) from VictorSanh as the third distilled model after DistilBERT and DistilGPT-2.
- GPT-2 XL (Pytorch & TF) as the last GPT-2 checkpoint released by OpenAI

Encoder-Decoder architectures

We welcome the possibility to create fully seq2seq models by incorporating Encoder-Decoder architectures using a `PreTrainedEncoderDecoder` class that can be initialized from pre-trained models. The base BERT class has be modified so that it may behave as a decoder.

Furthermore, a `Model2Model` class that simplifies the definition of an encoder-decoder when both encoder and decoder are based on the same model has been added. rlouf

Benchmarks and performance improvements

Works by tlkh and LysandreJik aiming to benchmark the library models with different technologies: with TensorFlow and Pytorch, with mixed precision (AMP and FP-16) and with model tracing (Torchscript and XLA). A new section was created in the documentation: [benchmarks](https://huggingface.co/transformers/benchmarks.html) pointing to Google sheets with the results.

Breaking changes

__Tokenizers now add special tokens by default.__ LysandreJik

New model templates

Model templates to ease the addition of new models to the library have been added. thomwolf

Inputs Embeddings

A new input has been added to all models' `forward` (for Pytorch) and `call` (for TensorFlow) methods. These `inputs_embeds` are a direct embedded representation. This is useful as it gives more control over how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. julien-c

Getters and setters for input and output embeddings

A new API for the input and output embeddings are available. These methods are model-independent and allow easy acquisition/modification of the models' embeddings. thomwolf

Additional architectures

New model architectures are available, namely: `DistilBertForTokenClassification`, `CamembertForTokenClassification` stefan-it

Community additions/bug-fixes/improvements

- The Fairseq RoBERTa model conversion script has been patched. louismartin
- einsum now runs in FP-16 in the library's examples slayton58
- In-depth work on the squad script for XLNet to reproduce the original paper's results hlums
- Additional improvements on the run_squad script by WilliamTambellini, orena1
- The run_generation script has seen several improvements by leo-du
- The RoBERTaTensorFlow model has been patched for several use-cases: TPU and keras.fit LysandreJik
- The documentation is now versioned, links are available on the github readme LysandreJik
- The run_ner script has seen several improvements mmaybeno, oneraghavan, manansanghi
- The run_tf_glue script now works for all GLUE tasks LysandreJik
- The run_lm_finetuning script now correctly evaluates perplexity on MLM tasks altsoph
- An issue related to the XLM TensorFlow implementation's training has been fixed tlkh
- run_bertology has been updated to be closer to the run_glue example adrianbg
- Fixed added special tokens in decoded sequences LysandreJik
- Several performance improvements have been done to the tokenizers iedmrc
- A memory leak has been identified and patched in the library's schedulers rlouf
- Correct warning when encoding a sequence too long while specifying a maximum length LysandreJik
- Resizing the token embeddings now works as expected in the run_lm_finetuning script iedmrc
- The difference in versions between Pypi/source in order to run the examples has been clarified rlouf

2.1.1

Not secure
New model architectures: CTRL, DistilGPT-2

Two new models have been added since release 2.0.

- CTRL (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858), by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher. This model has been added to the library by keskarnitish with the help of thomwolf.
- DistilGPT-2 (from HuggingFace), as the second distilled model after DistilBERT in version 1.2.0. Released alongside the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)

Distillation

Several updates have been made to the distillation script, including the possibility to distill GPT-2 and to distill on the SQuAD task. By VictorSanh.

Pytorch TPU support

The `run_glue.py` example script can now run on a Pytorch TPU.

Updates to example scripts

Several example scripts have been improved and refactored to use the full potential of the new tokenizer functions:

- `run_multiple_choice.py` has been refactored to include `encode_plus` by julien-c and erenup
- `run_lm_finetuning.py` has been improved with the help of dennymarcels, jinoobaek-qz and LysandreJik
- `run_glue.py` has been improved with the help of brian41005

QOL enhancements on the tokenizer

Enhancements have been made on the tokenizers. Two new methods have been added: `get_special_tokens_mask ` and `truncate_sequences `.

The former returns a mask indicating which tokens are special tokens in a token list, and which are tokens from the initial sequences. The latter truncate sequences according to a strategy.

Both of those methods are called by the `encode_plus` method, which itself is called by the `encode` method. The `encode_plus` now returns a larger dictionary which holds information about the special tokens, as well as the overflowing tokens.

Thanks to julien-c, thomwolf, and LysandreJik for these additions.

New German BERT models

- Support for new German BERT models (cased and uncased) from stefan-it dbmdz

Breaking changes

- The two methods `add_special_tokens_single_sequence` and `add_special_tokens_sequence_pair` have been removed. They have been replaced by the single method `build_inputs_with_special_tokens ` which has a more comprehensible name and manages both sequence singletons and pairs.

- The boolean parameter `truncate_first_sequence ` has been removed in tokenizers' `encode` and `encode_plus` methods, being replaced by a strategy in the form of a string: 'longest_first', 'only_second', 'only_first' or 'do_not_truncate' are accepted strategies.

- When the `encode` or `encode_plus` methods are called with a specified `max_length`, the sequences will now always be truncated or throw an error if overflowing.

Guidelines and requirements

New contributing guidelines have been added, alongside library development requirements by rlouf, the newest member of the HuggingFace team.

Community additions/bug-fixes/improvements

- GLUE Processors have been refactored to handle inputs for all tasks coming from the `tensorflow_datasets`. This work has been done by agrinh and philipp-eisen.
- The padding_idx is now correctly initialized to 1 in randomly initialized RoBERTa models. ikuyamada
- The documentation CSS has been adapted to work on older browsers. TimYagan
- An addition concerning the management of hidden states has been added to the README by BramVanroy.
- Integration of TF 2.0 models with other Keras modules thomwolf
- Past values can be opted-out thomwolf

2.1.0

Not secure

2.0

v4.41.0 introduces a significant refactor of the Agents framework.

With this release, we allow you to build state-of-the-art agent systems, including the React Code Agent that writes its actions as code in ReAct iterations, following the insights from [Wang et al., 2024](https://huggingface.co/papers/2402.01030)

Just install with `pip install "transformers[agents]"`. Then you're good to go!

py
from transformers import ReactCodeAgent

agent = ReactCodeAgent(tools=[])

code = """
list=[0, 1, 2]

for i in range(4):
print(list(i))
"""

corrected_code = agent.run(
"I have some code that creates a bug: please debug it and return the final code",
code=code,
)


Quantization

New quant methods

In this release we support new quantization methods: HQQ & EETQ contributed by the community. Read more about how to quantize any transformers model using HQQ & EETQ in the [dedicated documentation section](https://huggingface.co/docs/transformers/quantization)

* Add HQQ quantization support by mobicham in https://github.com/huggingface/transformers/pull/29637
* [FEAT]: EETQ quantizer support by dtlzhuangz in https://github.com/huggingface/transformers/pull/30262

`dequantize` API for bitsandbytes models

In case you want to dequantize models that have been loaded with bitsandbytes, this is now possible through the `dequantize` API (e.g. to merge adapter weights)

* FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models by younesbelkada in https://github.com/huggingface/transformers/pull/30806

API-wise, you can achieve that with the following:

python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))


Generation updates

* Add Watermarking LogitsProcessor and WatermarkDetector by zucchini-nlp in https://github.com/huggingface/transformers/pull/29676
* Cache: Static cache as a standalone object by gante in https://github.com/huggingface/transformers/pull/30476
* Generate: add `min_p` sampling by gante in https://github.com/huggingface/transformers/pull/30639
* Make `Gemma` work with `torch.compile` by ydshieh in https://github.com/huggingface/transformers/pull/30775

SDPA support

* [`BERT`] Add support for sdpa by hackyon in https://github.com/huggingface/transformers/pull/28802
* Add sdpa and fa2 the Wav2vec2 family. by kamilakesbi in https://github.com/huggingface/transformers/pull/30121
* add sdpa to ViT [follow up of 29325] by hyenal in https://github.com/huggingface/transformers/pull/30555

Improved Object Detection

Addition of fine-tuning script for object detection models

* Fix YOLOS image processor resizing by qubvel in https://github.com/huggingface/transformers/pull/30436
* Add examples for detection models finetuning by qubvel in https://github.com/huggingface/transformers/pull/30422
* Add installation of examples requirements in CI by qubvel in https://github.com/huggingface/transformers/pull/30708
* Update object detection guide by qubvel in https://github.com/huggingface/transformers/pull/30683

Interpolation of embeddings for vision models

Add interpolation of embeddings. This enables predictions from pretrained models on input images of sizes different than those the model was originally trained on. Simply pass `interpolate_pos_embedding=True` when calling the model.

Added for: BLIP, BLIP 2, InstructBLIP, SigLIP, ViViT

py
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

image = Image.open(requests.get("https://huggingface.co/hf-internal-testing/blip-test-image/resolve/main/demo.jpg", stream=True).raw)
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b",
torch_dtype=torch.float16
).to("cuda")
inputs = processor(images=image, size={"height": 500, "width": 500}, return_tensors="pt").to("cuda")

predictions = model(**inputs, interpolate_pos_encoding=True)
Generated text: "a woman and dog on the beach"
generated_text = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()


* Blip dynamic input resolution by zafstojano in https://github.com/huggingface/transformers/pull/30722
* Add dynamic resolution input/interpolate position embedding to SigLIP by davidgxue in https://github.com/huggingface/transformers/pull/30719
* Enable dynamic resolution for vivit by jla524 in https://github.com/huggingface/transformers/pull/30630


🚨 might be breaking
* 🚨🚨🚨Deprecate `evaluation_strategy` to `eval_strategy`🚨🚨🚨 by muellerzr in https://github.com/huggingface/transformers/pull/30190
* 🚨 Add training compatibility for Musicgen-like models by ylacombe in https://github.com/huggingface/transformers/pull/29802
* 🚨 Update image_processing_vitmatte.py by rb-synth in https://github.com/huggingface/transformers/pull/30566

Cleanups
* Remove task guides auto-update in favor of links towards task pages by LysandreJik in https://github.com/huggingface/transformers/pull/30429
* Remove add-new-model in favor of add-new-model-like by LysandreJik in https://github.com/huggingface/transformers/pull/30424
* Remove mentions of models in the READMEs and link to the documentation page in which they are featured. by LysandreJik in https://github.com/huggingface/transformers/pull/30420

Not breaking but important for Llama tokenizers
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881


Fixes

* Fix missing `prev_ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30313
* Fix: remove `pad token id` in pipeline forward arguments by zucchini-nlp in https://github.com/huggingface/transformers/pull/30285
* fix Parameter dtype in audio models by ylacombe in https://github.com/huggingface/transformers/pull/30310
* disable use_cache if using gradient checkpointing by chenzizhao in https://github.com/huggingface/transformers/pull/30320
* Fix test transposing image with EXIF Orientation tag by albertvillanova in https://github.com/huggingface/transformers/pull/30319
* Avoid `jnp` import in `utils/generic.py` by ydshieh in https://github.com/huggingface/transformers/pull/30322
* Fix `AssertionError` in clip conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30321
* [UDOP] Add special tokens to tokenizer by NielsRogge in https://github.com/huggingface/transformers/pull/29594
* Enable multi-device for some models by jla524 in https://github.com/huggingface/transformers/pull/30207
* feat: Upgrade Weights & Biases callback by parambharat in https://github.com/huggingface/transformers/pull/30135
* [Feature Extractors] Fix kwargs to pre-trained by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30260
* Pipeline: fix `pad_token_id` again by zucchini-nlp in https://github.com/huggingface/transformers/pull/30338
* [Whisper] Fix slow tests by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30152
* parallel job limit for doctest by ydshieh in https://github.com/huggingface/transformers/pull/30342
* Transformers Metadata by LysandreJik in https://github.com/huggingface/transformers/pull/30344
* Deprecate default chat templates by Rocketknight1 in https://github.com/huggingface/transformers/pull/30346
* Restore casting of masked_spec_embed by ylacombe in https://github.com/huggingface/transformers/pull/30336
* Update unwrap from accelerate by SunMarc in https://github.com/huggingface/transformers/pull/29933
* Do not remove half seq length in generation tests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30016
* Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained by hiyouga in https://github.com/huggingface/transformers/pull/30299
* Add TF swiftformer by joaocmd in https://github.com/huggingface/transformers/pull/23342
* [Grounding DINO] Add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30232
* Nits for model docs by merveenoyan in https://github.com/huggingface/transformers/pull/29795
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30379
* GenerationConfig: warn if pad token is negative by zucchini-nlp in https://github.com/huggingface/transformers/pull/30187
* Add FSDP config for CPU RAM efficient loading through accelerate by helloworld1 in https://github.com/huggingface/transformers/pull/30002
* `Llama` family, fix `use_cache=False` generation by ArthurZucker in https://github.com/huggingface/transformers/pull/30380
* Update docstrings for text generation pipeline by Rocketknight1 in https://github.com/huggingface/transformers/pull/30343
* Terminator strings for generate() by Rocketknight1 in https://github.com/huggingface/transformers/pull/28932
* Fix layerwise GaLore optimizer hard to converge with warmup scheduler by hiyouga in https://github.com/huggingface/transformers/pull/30372
* Jamba: fix left-padding test by gante in https://github.com/huggingface/transformers/pull/30389
* Fix DETA save_pretrained by qubvel in https://github.com/huggingface/transformers/pull/30326
* FIX / PEFT: Pass device correctly to peft by younesbelkada in https://github.com/huggingface/transformers/pull/30397
* [docs] LLM inference by stevhliu in https://github.com/huggingface/transformers/pull/29791
* show `-rs` to show skip reasons by ArthurZucker in https://github.com/huggingface/transformers/pull/30318
* Add inputs embeds in generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30269
* [Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention by EduardoPach in https://github.com/huggingface/transformers/pull/30364
* remove redundant logging from longformer by riklopfer in https://github.com/huggingface/transformers/pull/30365
* fix: link to HF repo/tree/revision when a file is missing by mapmeld in https://github.com/huggingface/transformers/pull/30406
* [tests] add `require_torch_sdpa` for test that needs sdpa support by faaany in https://github.com/huggingface/transformers/pull/30408
* Jax: scipy version pin by gante in https://github.com/huggingface/transformers/pull/30402
* Fix on "cache position" for assisted generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30068
* fix for itemsize => element_size() for torch backwards compat by winglian in https://github.com/huggingface/transformers/pull/30133
* Make EosTokenCriteria compatible with mps by pcuenca in https://github.com/huggingface/transformers/pull/30376
* FIX: re-add bnb on docker image by younesbelkada in https://github.com/huggingface/transformers/pull/30427
* Fix LayoutLMv2 init issue and doctest by ydshieh in https://github.com/huggingface/transformers/pull/30278
* Remove old TF port docs by Rocketknight1 in https://github.com/huggingface/transformers/pull/30426
* Rename torch.run to torchrun by steven-basart in https://github.com/huggingface/transformers/pull/30405
* Fix use_cache for xla fsdp by alanwaketan in https://github.com/huggingface/transformers/pull/30353
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881
* New model PR needs green (slow tests) CI by ydshieh in https://github.com/huggingface/transformers/pull/30341
* Add llama3 by ArthurZucker in https://github.com/huggingface/transformers/pull/30334
* [`Llava`] + CIs fix red cis and llava integration tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30440
* [tests] make test device-agnostic by faaany in https://github.com/huggingface/transformers/pull/30444
* fix uncaught init of linear layer in clip's/siglip's for image classification models by vasqu in https://github.com/huggingface/transformers/pull/30435
* fix jamba slow foward for multi-gpu by SunMarc in https://github.com/huggingface/transformers/pull/30418
* [SegGPT] Fix loss calculation by EduardoPach in https://github.com/huggingface/transformers/pull/30421
* Add `paths` filter to avoid the chance of being triggered by ydshieh in https://github.com/huggingface/transformers/pull/30453
* Fix wrong indent in `utils/check_if_new_model_added.py` by ydshieh in https://github.com/huggingface/transformers/pull/30456
* [`research_project`] Most of the security issues come from this requirement.txt by ArthurZucker in https://github.com/huggingface/transformers/pull/29977
* Neuron: When save_safetensor=False, no need to move model to CPU by jeffhataws in https://github.com/huggingface/transformers/pull/29703
* Enable fp16 on CPU by muellerzr in https://github.com/huggingface/transformers/pull/30459
* Non blocking support to torch DL's by muellerzr in https://github.com/huggingface/transformers/pull/30465
* consistent job / pytest report / artifact name correspondence by ydshieh in https://github.com/huggingface/transformers/pull/30392
* Workflow / ENH: Add SSH into our runners workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30425
* FIX / Workflow: Change tailscale trigger condition by younesbelkada in https://github.com/huggingface/transformers/pull/30471
* FIX / Workflow: Fix SSH workflow bug by younesbelkada in https://github.com/huggingface/transformers/pull/30474
* [fix codellama conversion] by ArthurZucker in https://github.com/huggingface/transformers/pull/30472
* Script for finding candidate models for deprecation by amyeroberts in https://github.com/huggingface/transformers/pull/29686
* Fix SigLip classification doctest by amyeroberts in https://github.com/huggingface/transformers/pull/30475
* Don't run fp16 MusicGen tests on CPU by amyeroberts in https://github.com/huggingface/transformers/pull/30466
* Prevent crash with `WandbCallback` with third parties by tomaarsen in https://github.com/huggingface/transformers/pull/30477
* Add WSD scheduler by visheratin in https://github.com/huggingface/transformers/pull/30231
* Fix Issue 29817 Video Classification Task Guide Using Undeclared Variables by manju-rangam in https://github.com/huggingface/transformers/pull/30457
* Make accelerate install non-torch dependent by muellerzr in https://github.com/huggingface/transformers/pull/30463
* Introduce Stateful Callbacks by muellerzr in https://github.com/huggingface/transformers/pull/29666
* Fix Llava for 0-embeddings by zucchini-nlp in https://github.com/huggingface/transformers/pull/30473
* Do not use deprecated `SourceFileLoader.load_module()` in dynamic module loading by XuehaiPan in https://github.com/huggingface/transformers/pull/30370
* Add sidebar tutorial for chat models by Rocketknight1 in https://github.com/huggingface/transformers/pull/30401
* Quantization: `HfQuantizer` quant method update by younesbelkada in https://github.com/huggingface/transformers/pull/30484
* [docs] Spanish translation of pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30252
* FEAT: PEFT support for EETQ by younesbelkada in https://github.com/huggingface/transformers/pull/30449
* Fix the `bitsandbytes` error formatting ("Some modules are dispatched on ...") by kyo-takano in https://github.com/huggingface/transformers/pull/30494
* Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types by mgoin in https://github.com/huggingface/transformers/pull/30488
* Use the Keras set_random_seed in tests by Rocketknight1 in https://github.com/huggingface/transformers/pull/30504
* Remove skipping logic now that set_epoch exists by muellerzr in https://github.com/huggingface/transformers/pull/30501
* [`DETR`] Remove timm hardcoded logic in modeling files by amyeroberts in https://github.com/huggingface/transformers/pull/29038
* [examples] update whisper fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/29938
* Fix GroundingDINO, DPR after BERT SDPA update by amyeroberts in https://github.com/huggingface/transformers/pull/30506
* load_image - decode b64encode and encodebytes strings by amyeroberts in https://github.com/huggingface/transformers/pull/30192
* [SegGPT] Fix seggpt image processor by EduardoPach in https://github.com/huggingface/transformers/pull/29550
* Fix link in dbrx.md by eitanturok in https://github.com/huggingface/transformers/pull/30509
* Allow boolean FSDP options in fsdp_config by helloworld1 in https://github.com/huggingface/transformers/pull/30439
* Pass attn_implementation when using AutoXXX.from_config by amyeroberts in https://github.com/huggingface/transformers/pull/30507
* Fix broken link to Transformers notebooks by clinty in https://github.com/huggingface/transformers/pull/30512
* Update runner tag for PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30535
* Fix repo. fetch/checkout in PR slow CI job by ydshieh in https://github.com/huggingface/transformers/pull/30537
* Reenable SDPA's FA2 During Training with torch.compile by warner-benjamin in https://github.com/huggingface/transformers/pull/30442
* Include safetensors as part of `_load_best_model` by muellerzr in https://github.com/huggingface/transformers/pull/30553
* Pass `use_cache` in kwargs for GPTNeoX by zucchini-nlp in https://github.com/huggingface/transformers/pull/30538
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30409
* Generate: update links on LLM tutorial doc by gante in https://github.com/huggingface/transformers/pull/30550
* DBRX: make fixup by gante in https://github.com/huggingface/transformers/pull/30578
* Fix seq2seq collator padding by vasqu in https://github.com/huggingface/transformers/pull/30556
* BlipModel: get_multimodal_features method by XavierSpycy in https://github.com/huggingface/transformers/pull/30438
* Add chat templating support for KeyDataset in text-generation pipeline by DarshanDeshpande in https://github.com/huggingface/transformers/pull/30558
* Fix generation doctests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30263
* General PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30540
* Remove `use_square_size` after loading by ydshieh in https://github.com/huggingface/transformers/pull/30567
* Use text config's vocab size in testing models by zucchini-nlp in https://github.com/huggingface/transformers/pull/30568
* Encoder-decoder models: move embedding scale to nn.Module by zucchini-nlp in https://github.com/huggingface/transformers/pull/30410
* Fix Marian model conversion by zucchini-nlp in https://github.com/huggingface/transformers/pull/30173
* Refactor default chat template warnings by Rocketknight1 in https://github.com/huggingface/transformers/pull/30551
* Fix QA example by Rocketknight1 in https://github.com/huggingface/transformers/pull/30580
* remove jax example by ArthurZucker in https://github.com/huggingface/transformers/pull/30498
* Fix canonical model --model_type in examples by amyeroberts in https://github.com/huggingface/transformers/pull/30480
* Gemma: update activation warning by pcuenca in https://github.com/huggingface/transformers/pull/29995
* Bump gitpython from 3.1.32 to 3.1.41 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30587
* Fix image segmentation example - don't reopen image by amyeroberts in https://github.com/huggingface/transformers/pull/30481
* Improve object detection task guideline by NielsRogge in https://github.com/huggingface/transformers/pull/29967
* Generate: remove deprecated public decoding functions and streamline logic 🧼 by gante in https://github.com/huggingface/transformers/pull/29956
* Fix llava half precision and autocast issues by frasermince in https://github.com/huggingface/transformers/pull/29721
* Fix: failing CI after 30568 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30599
* Fix for Neuron by michaelbenayoun in https://github.com/huggingface/transformers/pull/30259
* Fix memory leak with CTC training script on Chinese languages by lucky-bai in https://github.com/huggingface/transformers/pull/30358
* Fix copies for DBRX - neuron fix by amyeroberts in https://github.com/huggingface/transformers/pull/30610
* fix:missing `output_router_logits` in SwitchTransformers by lausannel in https://github.com/huggingface/transformers/pull/30573
* Use `contiguous()` in clip checkpoint conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30613
* phi3 chat_template does not support system role by amitportnoy in https://github.com/huggingface/transformers/pull/30606
* Docs: fix `generate`-related rendering issues by gante in https://github.com/huggingface/transformers/pull/30600
* Docs: add missing `StoppingCriteria` autodocs by gante in https://github.com/huggingface/transformers/pull/30617
* Generate: fix `SinkCache` on Llama models by gante in https://github.com/huggingface/transformers/pull/30581
* Fix FX tracing issues for Llama by michaelbenayoun in https://github.com/huggingface/transformers/pull/30619
* Output `None` as attention when layer is skipped by jonghwanhyeon in https://github.com/huggingface/transformers/pull/30597
* Fix CI after 30410 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30612
* add mlp bias for llama models by mayank31398 in https://github.com/huggingface/transformers/pull/30031
* Fix W&B run name by qubvel in https://github.com/huggingface/transformers/pull/30462
* HQQ: PEFT support for HQQ by younesbelkada in https://github.com/huggingface/transformers/pull/30632
* Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters by yting27 in https://github.com/huggingface/transformers/pull/30362
* Avoid duplication in PR slow CI model list by ydshieh in https://github.com/huggingface/transformers/pull/30634
* [`CI update`] Try to use dockers and no cache by ArthurZucker in https://github.com/huggingface/transformers/pull/29202
* Check if the current compiled version of pytorch supports MPS by jiaqianjing in https://github.com/huggingface/transformers/pull/30664
* Hotfix-change-ci by ArthurZucker in https://github.com/huggingface/transformers/pull/30669
* Quantization / HQQ: Fix HQQ tests on our runner by younesbelkada in https://github.com/huggingface/transformers/pull/30668
* Fix llava next tie_word_embeddings config by SunMarc in https://github.com/huggingface/transformers/pull/30640
* Trainer._load_from_checkpoint - support loading multiple Peft adapters by claralp in https://github.com/huggingface/transformers/pull/30505
* Trainer - add cache clearing and the option for batched eval metrics computation by FoamoftheSea in https://github.com/huggingface/transformers/pull/28769
* Fix typo: llama3.md by mimbres in https://github.com/huggingface/transformers/pull/30653
* Respect `resume_download` deprecation by Wauplin in https://github.com/huggingface/transformers/pull/30620
* top-k instead of top-p in MixtralConfig docstring by sorgfresser in https://github.com/huggingface/transformers/pull/30687
* Bump jinja2 from 3.1.3 to 3.1.4 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30680
* Bump werkzeug from 3.0.1 to 3.0.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30679
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True by hackyon in https://github.com/huggingface/transformers/pull/29024
* Fix `cache_position` initialisation for generation with `use_cache=False` by nurlanov-zh in https://github.com/huggingface/transformers/pull/30485
* Word-level timestamps broken for short-form audio by kamilakesbi in https://github.com/huggingface/transformers/pull/30325
* Updated docs of `forward` in `Idefics2ForConditionalGeneration` with correct `ignore_index` value by zafstojano in https://github.com/huggingface/transformers/pull/30678
* Bump tqdm from 4.63.0 to 4.66.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30646
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/visual_bert by dependabot in https://github.com/huggingface/transformers/pull/30645
* Reboot Agents by aymeric-roucher in https://github.com/huggingface/transformers/pull/30387
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/lxmert by dependabot in https://github.com/huggingface/transformers/pull/30644
* Separate tokenizer tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30675
* Update `workflow_id` in `utils/get_previous_daily_ci.py` by ydshieh in https://github.com/huggingface/transformers/pull/30695
* Rename artifact name `prev_ci_results` to `ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30697
* Add safetensors to model not found error msg for default use_safetensors value by davidgxue in https://github.com/huggingface/transformers/pull/30602
* Pin deepspeed by muellerzr in https://github.com/huggingface/transformers/pull/30701
* Patch CLIP image preprocessor by rootonchair in https://github.com/huggingface/transformers/pull/30698
* [BitsandBytes] Verify if GPU is available by NielsRogge in https://github.com/huggingface/transformers/pull/30533
* Llava: remove dummy labels by zucchini-nlp in https://github.com/huggingface/transformers/pull/30706
* Immutability for data collators by vasqu in https://github.com/huggingface/transformers/pull/30603
* Cache: models return input cache type by gante in https://github.com/huggingface/transformers/pull/30716
* Removal of deprecated maps by LysandreJik in https://github.com/huggingface/transformers/pull/30576
* Fix image post-processing for OWLv2 by jla524 in https://github.com/huggingface/transformers/pull/30686
* KV cache is no longer a model attribute by zucchini-nlp in https://github.com/huggingface/transformers/pull/30730
* Generate: consistently handle special tokens as tensors by gante in https://github.com/huggingface/transformers/pull/30624
* Update CodeLlama references by osanseviero in https://github.com/huggingface/transformers/pull/30218
* [docs] Update es/pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30684
* Update llama3.md, fix typo by mimbres in https://github.com/huggingface/transformers/pull/30739
* mlp_only_layers is more flexible than decoder_sparse_step by eigen2017 in https://github.com/huggingface/transformers/pull/30552
* PEFT / Trainer: Make use of `model.active_adapters()` instead of deprecated `model.active_adapter` whenever possible by younesbelkada in https://github.com/huggingface/transformers/pull/30738
* [docs] Update link in es/pipeline_webserver.md by aaronjimv in https://github.com/huggingface/transformers/pull/30745
* hqq - fix weight check in check_quantized_param by mobicham in https://github.com/huggingface/transformers/pull/30748
* [awq] replace scale when we have GELU by SunMarc in https://github.com/huggingface/transformers/pull/30074
* Workflow: Replace `actions/post-slack` with centrally defined workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30737
* [GroundingDino] Adding ms_deform_attn kernels by EduardoPach in https://github.com/huggingface/transformers/pull/30768
* Llama: fix custom 4D masks, v2 by poedator in https://github.com/huggingface/transformers/pull/30348
* Generation / FIX: Fix multi-device generation by younesbelkada in https://github.com/huggingface/transformers/pull/30746
* Qwen: incorrect setup flag by gante in https://github.com/huggingface/transformers/pull/30776
* enable Pipeline to get device from model by faaany in https://github.com/huggingface/transformers/pull/30534
* [Object detection pipeline] Lower threshold by NielsRogge in https://github.com/huggingface/transformers/pull/30710
* Generate: remove near-duplicate sample/greedy copy by gante in https://github.com/huggingface/transformers/pull/30773
* Port IDEFICS to tensorflow by a8nova in https://github.com/huggingface/transformers/pull/26870
* Generate: assistant should be greedy in assisted decoding by gante in https://github.com/huggingface/transformers/pull/30778
* Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) by ydshieh in https://github.com/huggingface/transformers/pull/30699
* Deprecate models script by amyeroberts in https://github.com/huggingface/transformers/pull/30184
* skip low_cpu_mem_usage tests by SunMarc in https://github.com/huggingface/transformers/pull/30782
* CI: update to ROCm 6.0.2 and test MI300 by fxmarty in https://github.com/huggingface/transformers/pull/30266
* Fix OWLv2 Doc by jla524 in https://github.com/huggingface/transformers/pull/30794
* Fix cache type in Idefics2 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30729
* PEFT: Access active_adapters as a property in Trainer by pashminacameron in https://github.com/huggingface/transformers/pull/30790
* CI: more models wo cache support by gante in https://github.com/huggingface/transformers/pull/30780
* Deprecate TF weight conversion since we have full Safetensors support now by Rocketknight1 in https://github.com/huggingface/transformers/pull/30786
* [T5] Adding `model_parallel = False` to `T5ForTokenClassification` and `MT5ForTokenClassification` by retarfi in https://github.com/huggingface/transformers/pull/30763
* Added the necessay import of module by ankur0904 in https://github.com/huggingface/transformers/pull/30804
* Add support for custom checkpoints in MusicGen by jla524 in https://github.com/huggingface/transformers/pull/30011
* Add missing dependencies in image classification example by jla524 in https://github.com/huggingface/transformers/pull/30820
* Support mixed-language batches in `WhisperGenerationMixin` by cifkao in https://github.com/huggingface/transformers/pull/29688
* Remove unused module DETR based models by conditionedstimulus in https://github.com/huggingface/transformers/pull/30823
* Jamba - Skip 4d custom attention mask test by amyeroberts in https://github.com/huggingface/transformers/pull/30826
* Missing `Optional` in typing. by xkszltl in https://github.com/huggingface/transformers/pull/30821
* Update ds_config_zero3.json by pacman100 in https://github.com/huggingface/transformers/pull/30829
* Better llava next. by nxphi47 in https://github.com/huggingface/transformers/pull/29850
* Deprecate models script - correctly set the model name for the doc file by amyeroberts in https://github.com/huggingface/transformers/pull/30785
* Use `torch 2.3` for CI by ydshieh in https://github.com/huggingface/transformers/pull/30837
* Fix llama model sdpa attention forward function masking bug when output_attentions=True by Aladoro in https://github.com/huggingface/transformers/pull/30652
* [LLaVa-NeXT] Small fixes by NielsRogge in https://github.com/huggingface/transformers/pull/30841
* [Idefics2] Improve docs, add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30717
* Cache: add new flag to distinguish models that `Cache` but not static cache by gante in https://github.com/huggingface/transformers/pull/30800
* Disable the FA backend for SDPA on AMD GPUs by mht-sharma in https://github.com/huggingface/transformers/pull/30850
* Video-LLaVa: Fix docs by zucchini-nlp in https://github.com/huggingface/transformers/pull/30855
* Docs: update example with assisted generation + sample by gante in https://github.com/huggingface/transformers/pull/30853
* TST / Quantization: Reverting to torch==2.2.1 by younesbelkada in https://github.com/huggingface/transformers/pull/30866
* Fix VideoLlava imports by amyeroberts in https://github.com/huggingface/transformers/pull/30867
* TEST: Add llama logits tests by younesbelkada in https://github.com/huggingface/transformers/pull/30835
* Remove deprecated logic and warnings by amyeroberts in https://github.com/huggingface/transformers/pull/30743
* Enable device map by darshana1406 in https://github.com/huggingface/transformers/pull/30870
* Fix dependencies for image classification example by jla524 in https://github.com/huggingface/transformers/pull/30842
* [whisper] fix multilingual fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30865
* update release script by ArthurZucker in https://github.com/huggingface/transformers/pull/30880

New Contributors
* joaocmd made their first contribution in https://github.com/huggingface/transformers/pull/23342
* kamilakesbi made their first contribution in https://github.com/huggingface/transformers/pull/30121
* dtlzhuangz made their first contribution in https://github.com/huggingface/transformers/pull/30262
* steven-basart made their first contribution in https://github.com/huggingface/transformers/pull/30405
* manju-rangam made their first contribution in https://github.com/huggingface/transformers/pull/30457
* kyo-takano made their first contribution in https://github.com/huggingface/transformers/pull/30494
* mgoin made their first contribution in https://github.com/huggingface/transformers/pull/30488
* eitanturok made their first contribution in https://github.com/huggingface/transformers/pull/30509
* clinty made their first contribution in https://github.com/huggingface/transformers/pull/30512
* warner-benjamin made their first contribution in https://github.com/huggingface/transformers/pull/30442
* XavierSpycy made their first contribution in https://github.com/huggingface/transformers/pull/30438
* DarshanDeshpande made their first contribution in https://github.com/huggingface/transformers/pull/30558
* frasermince made their first contribution in https://github.com/huggingface/transformers/pull/29721
* lucky-bai made their first contribution in https://github.com/huggingface/transformers/pull/30358
* rb-synth made their first contribution in https://github.com/huggingface/transformers/pull/30566
* lausannel made their first contribution in https://github.com/huggingface/transformers/pull/30573
* jonghwanhyeon made their first contribution in https://github.com/huggingface/transformers/pull/30597
* mobicham made their first contribution in https://github.com/huggingface/transformers/pull/29637
* yting27 made their first contribution in https://github.com/huggingface/transformers/pull/30362
* jiaqianjing made their first contribution in https://github.com/huggingface/transformers/pull/30664
* claralp made their first contribution in https://github.com/huggingface/transformers/pull/30505
* mimbres made their first contribution in https://github.com/huggingface/transformers/pull/30653
* sorgfresser made their first contribution in https://github.com/huggingface/transformers/pull/30687
* nurlanov-zh made their first contribution in https://github.com/huggingface/transformers/pull/30485
* zafstojano made their first contribution in https://github.com/huggingface/transformers/pull/30678
* davidgxue made their first contribution in https://github.com/huggingface/transformers/pull/30602
* rootonchair made their first contribution in https://github.com/huggingface/transformers/pull/30698
* eigen2017 made their first contribution in https://github.com/huggingface/transformers/pull/30552
* Nilabhra made their first contribution in https://github.com/huggingface/transformers/pull/30771
* a8nova made their first contribution in https://github.com/huggingface/transformers/pull/26870
* pashminacameron made their first contribution in https://github.com/huggingface/transformers/pull/30790
* retarfi made their first contribution in https://github.com/huggingface/transformers/pull/30763
* yikangshen made their first contribution in https://github.com/huggingface/transformers/pull/30005
* ankur0904 made their first contribution in https://github.com/huggingface/transformers/pull/30804
* conditionedstimulus made their first contribution in https://github.com/huggingface/transformers/pull/30823
* nxphi47 made their first contribution in https://github.com/huggingface/transformers/pull/29850
* Aladoro made their first contribution in https://github.com/huggingface/transformers/pull/30652
* hyenal made their first contribution in https://github.com/huggingface/transformers/pull/30555
* darshana1406 made their first contribution in https://github.com/huggingface/transformers/pull/30870

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.40.2...v4.41.0

2.0.0

Not secure
Name change: welcome 🤗 Transformers

Following the extension to TensorFlow 2.0, `pytorch-transformers` => `transformers`

Install with `pip install transformers`

Also, note that PyTorch is **no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch** to be able to use (and load) the models.

TensorFlow 2.0 - PyTorch

All the PyTorch `nn.Module` classes now have their counterpart in TensorFlow 2.0 as `tf.keras.Model` classes. TensorFlow 2.0 classes have the same name as their PyTorch counterparts prefixed with `TF`.

The interoperability between TensorFlow and PyTorch is actually **a lot deeper** than what is usually meant when talking about libraries with multiple backends:
- each model (not just the static computation graph) can be seamlessly moved from one framework to the other during the lifetime of the model for training/evaluation/usage (`from_pretrained` can load weights saved from models saved in one or the other framework),
- an example is given in the quick-tour on TF 2.0 and PyTorch in the readme in which a model is trained using keras.fit before being opened in PyTorch for quick debugging/inspection.

Remaining unsupported operations in TF 2.0 (to be added later):
- resizing input embeddings to add new tokens
- pruning model heads

TPU support
Training on TPU using free TPUs provided in the TensorFlow Research Cloud (TFRC) program is possible but requires to implement a custom training loop (not possible with keras.fit at the moment).
We will add an example of such a custom training loop soon.

Improved tokenizers

Tokenizers have been improved to provide extended encoding methods `encoding_plus` and additional arguments to `encoding`. Please refer to the doc for detailed usage of the new options.

Breaking changes

Positional order of some model keywords inputs changed (better TorchScript support)

To be able to better use Torchscript both on CPU and GPUs (see 1010, 1204 and 1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.

If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any breaking change.

If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you should double-check the exact order of input arguments.

Dependency requirements have changed

PyTorch is no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch to be able to use (and load) the models.

Renamed method

The method `add_special_tokens_sentence_pair` has been renamed to the more appropriate name `add_special_tokens_sequence_pair`.
The same holds true for the method `add_special_tokens_single_sentence` which has been changed to `add_special_tokens_single_sequence`.

Community additions/bug-fixes/improvements
- new German model (Timoeller)
- new script for MultipleChoice training (SWAG, RocStories...) (erenup)
- better fp16 support (ziliwang and bryant1410)
- fix evaluation in run_lm_finetuning (SKRohit)
- fiw LM finetuning to prevent crashing on assert len(tokens_b)>=1 (searchivarius)
- Various doc and docstring fixes (sshleifer, Maxpa1n, mattolson93, t080)

1.10

The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).

* Byebye torch 1.10 by ydshieh in 28207

Model tagging

You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have `trl` tag you can search: https://huggingface.co/models?other=trl&sort=created

* [`core`/ FEAT] Add the possibility to push custom tags using `PreTrainedModel` itself by younesbelkada in 28405 - e.g.

python
from transformers import AutoModelForCausalLM

model_name = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_name)

model.add_model_tags(["tag-test"])
model.push_to_hub("llama-tagged")


Bugfixes and improvements

* Fix PatchTSMixer Docstrings by vijaye12 in 27943
* use logger.warning_once to avoid massive outputs by ranchlai in 27428
* Docs for AutoBackbone & Backbone by merveenoyan in 27456
* Fix test for auto_find_batch_size on multi-GPU by muellerzr in 27947
* Update import message by NielsRogge in 27946
* Fix parameter count in readme for mixtral 45b by CyberTimon in 27945
* In PreTrainedTokenizerBase add missing word in error message by petergtz in 27949
* Fix AMD scheduled CI not triggered by ydshieh in 27951
* Add deepspeed test to amd scheduled CI by echarlaix in 27633
* Fix a couple of typos and add an illustrative test by rjenc29 in 26941
* fix bug in mask2former: cost matrix is infeasible by xuchenhao001 in 27897
* Fix for stochastic depth decay rule in the TimeSformer implementation by atawari in 27875
* fix no sequence length models error by AdamLouly in 27522
* [`Mixtral`] Change mistral op order by younesbelkada in 27955
* Update bounding box format everywhere by NielsRogge in 27944
* Support PeftModel signature inspect by dancingpipi in 27865
* fixed typos (issue 27919) by asusevski in 27920
* Hot-fix-mixstral-loss by ArthurZucker in 27948
* Fix link in README.md of Image Captioning by saswatmeher in 27969
* Better key error for AutoConfig by Rocketknight1 in 27976
* [doc] fix typo by stas00 in 27981
* fix typo in dvclive callback by dberenbaum in 27983
* [`Tokenizer Serialization`] Fix the broken serialisation by ArthurZucker in 27099
* [`Whisper`] raise better errors by ArthurZucker in 27971
* Fix PatchTSMixer slow tests by ajati in 27997
* [`CI slow`] Fix expected values by ArthurZucker in 27999
* Fix bug with rotating checkpoints by muellerzr in 28009
* [Doc] Spanish translation of glossary.md by aaronjimv in 27958
* Add model_docs from cpmant.md to derformable_detr.md by rajveer43 in 27884
* well well well by ArthurZucker in 28011
* [`SeamlessM4TTokenizer`] Safe import by ArthurZucker in 28026
* [`core` / `modeling`] Fix training bug with PEFT + GC by younesbelkada in 28031
* Fix AMD push CI not triggered by ydshieh in 28029
* SeamlessM4T: `test_retain_grad_hidden_states_attentions` is flaky by gante in 28035
* Fix languages covered by M4Tv2 by ylacombe in 28019
* Fixed spelling error in T5 tokenizer warning message (s/thouroughly/t… by jeddobson in 28014
* Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window by gante in 28037
* [Seamless] Fix links in docs by sanchit-gandhi in 27905
* Remove warning when Annotion enum is created by amyeroberts in 28048
* [`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` by younesbelkada in 28043
* [`Modeling` / `Mixtral`] Fix GC + PEFT issues with Mixtral by younesbelkada in 28061
* [Flax BERT] Update deprecated 'split' method by sanchit-gandhi in 28012
* [Flax LLaMA] Fix attn dropout by sanchit-gandhi in 28059
* Remove SpeechT5 deprecated argument by ylacombe in 28062
* doc: Correct spelling mistake by caiyili in 28064
* [`Mixtral`] update conversion script to reflect new changes by younesbelkada in 28068
* Skip M4T `test_retain_grad_hidden_states_attentions` by ylacombe in 28060
* [LLaVa] Add past_key_values to _skip_keys_device_placement to fix multi-GPU dispatch by aismlv in 28051
* Make GPT2 traceable in meta state by kwen2501 in 28054
* Fix bug for checkpoint saving on multi node training setting by dumpmemory in 28078
* Update fixtures-image-utils by lhoestq in 28080
* Fix `low_cpu_mem_usage` Flag Conflict with DeepSpeed Zero 3 in `from_pretrained` for Models with `keep_in_fp32_modules`" by kotarotanahashi in 27762
* Fix wrong examples in llava usage. by Lyken17 in 28020
* [docs] Trainer by stevhliu in 27986
* [docs] MPS by stevhliu in 28016
* fix resuming from ckpt when using FSDP with FULL_STATE_DICT by pacman100 in 27891
* Fix the deprecation warning of _torch_pytree._register_pytree_node by cyyever in 27803
* Spelling correction by saeneas in 28110
* in peft finetune, only the trainable parameters need to be saved by sywangyi in 27825
* fix ConversationalPipeline docstring by not-lain in 28091
* Disable jitter noise during evaluation in SwitchTransformers by DaizeDong in 28077
* Remove warning if `DISABLE_TELEMETRY` is used by Wauplin in 28113
* Fix indentation error - semantic_segmentation.md by rajveer43 in 28117
* [docs] General doc fixes by stevhliu in 28087
* Fix a typo in tokenizer documentation by mssalvatore in 28118
* [Doc] Fix token link in What 🤗 Transformers can do by aaronjimv in 28123
* When save a model on TPU, make a copy to be moved to CPU by qihqi in 27993
* Update split string in doctest to reflect 28087 by amyeroberts in 28135
* [`Mixtral`] Fix loss + nits by ArthurZucker in 28115
* Update modeling_utils.py by mzelling in 28127
* [docs] Fix mistral link in mixtral.md by aaronjimv in 28143
* Remove deprecated CPU dockerfiles by ashahba in 28149
* Fix FA2 integration by pacman100 in 28142
* [gpt-neox] Add attention_bias config to support model trained without attention biases by dalgarak in 28126
* move code to Trainer.evaluate to enable use of that function with multiple datasets by peter-sk in 27844
* Fix weights not properly initialized due to shape mismatch by ydshieh in 28122
* Avoid unnecessary warnings when loading `CLIPConfig` by ydshieh in 28108
* Update FA2 exception msg to point to hub discussions by amyeroberts in 28161
* Align backbone stage selection with out_indices & out_features by amyeroberts in 27606
* [docs] Trainer docs by stevhliu in 28145
* Fix yolos resizing by amyeroberts in 27663
* disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest by dwyatte in 28169
* Fix `input_embeds` docstring in encoder-decoder architectures by gante in 28168
* [Whisper] Use torch for stft if available by sanchit-gandhi in 26119
* Fix slow backbone tests - out_indices must match stage name ordering by amyeroberts in 28186
* Update YOLOS slow test values by amyeroberts in 28187
* Update `docs/source/en/perf_infer_gpu_one.md` by ydshieh in 28198
* Fix ONNX export for causal LM sequence classifiers by removing reverse indexing by dwyatte in 28144
* Add Swinv2 backbone by NielsRogge in 27742
* Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class by nicholasneo78 in 27914
* Bug: `training_args.py` fix missing import with accelerate with version `accelerate==0.20.1` by michaelfeil in 28171
* Fix the check of models supporting FA/SDPA not run by ydshieh in 28202
* Drop `feature_extractor_type` when loading an image processor file by ydshieh in 28195
* [Whisper] Fix word-level timestamps with bs>1 or num_beams>1 by ylacombe in 28114
* Fixing visualization code for object detection to support both types of bounding box. by Anindyadeep in 27842
* update the logger message with accordant weights_file_name by izyForever in 28181
* [`Llava`] Fix llava index errors by younesbelkada in 28032
* fix FA2 when using quantization by pacman100 in 28203
* small typo by stas00 in 28229
* Update docs around mixing hf scheduler with deepspeed optimizer by dwyatte in 28223
* Fix trainer saving safetensors: metadata is None by hiyouga in 28219
* fix bug:divide by zero in _maybe_log_save_evaluate() by frankenliu in 28251
* [Whisper] Fix errors with MPS backend introduced by new code on word-level timestamps computation by ercaronte in 28288
* Remove fast tokenization warning in Data Collators by dbuos in 28213
* fix documentation for zero_shot_object_detection by not-lain in 28267
* Remove token_type_ids from model_input_names (like 24788) by Apsod in 28325
* Translate contributing.md into Chinese by Mayfsz in 28243
* [docs] Sort es/toctree.yml | Translate performance.md by aaronjimv in 28262
* Fix error in M4T feature extractor by ylacombe in 28340
* README: install transformers from conda-forge channel by kevherro in 28313
* Don't check the device when device_map=auto by yuanwu2017 in 28351
* Fix pos_mask application and update tests accordingly by ferjorosa in 27892
* fix FA2 when using quantization for remaining models by susnato in 28341
* Update VITS modeling to enable ONNX export by echarlaix in 28141
* chore: Fix typo s/exclusivelly/exclusively/ by hugo-syn in 28361
* Enhancing Code Readability and Maintainability with Simplified Activation Function Selection. by hi-sushanta in 28349
* Fix building alibi tensor when num_heads is not a power of 2 by abuelnasr0 in 28380
* remove two deprecated function by statelesshz in 28220
* Bugfix / ffmpeg input device (mic) not working on Windows by Teapack1 in 27051
* [AttentionMaskConverter] fix sdpa unmask unattended by zspo in 28369
* Remove shell=True from subprocess.Popen to Mitigate Security Risk by avimanyu786 in 28299
* Add segmentation map processing to SAM Image Processor by rwood-97 in 27463
* update warning for image processor loading by ydshieh in 28209
* Fix initialization for missing parameters in `from_pretrained` under ZeRO-3 by XuehaiPan in 28245
* Fix `_merge_input_ids_with_image_features` for llava model by VictorSanh in 28333
* Use mmap option to load_state_dict by weimingzha0 in 28331
* [BUG] BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch by inkinworld in 28201
* Skip now failing test in the Trainer tests by muellerzr in 28421
* Support `DeepSpeed` when using auto find batch size by muellerzr in 28088
* Fix number of models in README.md by prasatee in 28430
* CI: limit natten version by gante in 28432
* Fix for checkpoint rename race condition by tblattner in 28364
* Fix load correct tokenizer in Mixtral model documentation by JuanFKurucz in 28437
* [docstring] Fix docstring for ErnieConfig, ErnieMConfig by Sparty in 27029
* [Whisper] Fix slow test by patrickvonplaten in 28407
* Assitant model may on a different device by jiqing-feng in 27995
* Enable multi-label image classification in pipeline by amyeroberts in 28433
* Optimize the speed of the truncate_sequences function. by ikkvix in 28263
* Use python 3.10 for docbuild by ydshieh in 28399
* Fix docker file by ydshieh in 28452
* Set `cache_dir` for `evaluate.load()` in example scripts by aphedges in 28422
* Optionally preprocess segmentation maps for MobileViT by harisankar95 in 28420
* Correctly resolve trust_remote_code=None for AutoTokenizer by Rocketknight1 in 28419
* Fix load balancing loss func for mixtral by liangxuZhang in 28256
* Doc by jiqing-feng in 28431
* Fix docstring checker issues with PIL enums by Rocketknight1 in 28450
* Fix broken link on page by keenranger in 28451
* Mark two logger tests as flaky by amyeroberts in 28458
* Update metadata loading for oneformer by amyeroberts in 28398
* Fix torch.ones usage in xlnet by sungho-ham in 28471
* Generate: deprecate old public functions by gante in 28478
* Docs: add model paths by gante in 28475
* Generate: refuse to save bad generation config files by gante in 28477
* TF: purge `TFTrainer` by gante in 28483
* Fix docstrings and update docstring checker error message by Rocketknight1 in 28460
* Change progress logging to once across all nodes by siddartha-RE in 28373
* Generate: fix candidate device placement by gante in 28493
* Fix paths to AI Sweden Models reference and model loading by JuanFKurucz in 28423
* [`chore`] Update warning text, a word was missing by tomaarsen in 28017
* Don't set `finetuned_from` if it is a local path by ydshieh in 28482
* Add the XPU device check for pipeline mode by yuanwu2017 in 28326
* Tokenizer kwargs in textgeneration pipe by thedamnedrhino in 28362
* [GPTQ] Fix test by SunMarc in 28018
* Fixed minor typos by rishit5 in 28489
* Add a use_safetensors arg to TFPreTrainedModel.from_pretrained() by Rocketknight1 in 28511
* Generate: consolidate output classes by gante in 28494
* fix: sampling in flax keeps EOS by borisdayma in 28378
* improve dev setup comments and hints by 4imothy in 28495
* SiLU activation wrapper for safe importing by amyeroberts in 28509
* Remove `task` arg in `load_dataset` in image-classification example by regisss in 28408
* Improving Training Performance and Scalability Documentation by HamzaFB in 28497
* Fix mismatching loading in from_pretrained with/without accelerate by fxmarty in 28414
* Fix/speecht5 bug by NimaYaqmuri in 28481
* [ `TokenizationUtils`] Fix `add_special_tokens` when the token is already there by ArthurZucker in 28520
* [`TokenizationRoformerFast`] Fix the save and loading by ArthurZucker in 28527
* [`SpeechT5Tokenization`] Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme by ArthurZucker in 28522
* Clearer error for SDPA when explicitely requested by fxmarty in 28006
* Add is_model_supported for fx by inisis in 28521
* Config: warning when saving generation kwargs in the model config by gante in 28514
* [Makefile] Exclude research projects from format by patrickvonplaten in 28551
* symbolic_trace: add past_key_values, llama, sdpa support by fxmarty in 28447
* Allow to train dinov2 with different dtypes like bf16 by StarCycle in 28504
* Fix Switch Transformers When sparse_step = 1 by agemagician in 28564
* Save `Processor` by ydshieh in 27761
* Use `weights_only` only if torch >= 1.13 by ydshieh in 28506
* [`Core Tokenization`] Support a fix for spm fast models by ArthurZucker in 26678
* Use `LoggingLevel` context manager in 3 tests by ydshieh in 28575
* Fix the documentation checkpoint for xlm-roberta-xl by jeremyfowers in 28567
* [ASR Pipe] Update init to set model type and subsequently call parent init method by sanchit-gandhi in 28486
* [Whisper Tok] Move token ids to CPU when computing offsets by sanchit-gandhi in 28485
* [Whisper] Fix audio classification with weighted layer sum by sanchit-gandhi in 28563
* Making CTC training example more general by ylacombe in 28582
* Don't save `processor_config.json` if a processor has no extra attribute by ydshieh in 28584
* Fix wrong xpu device in DistributedType.MULTI_XPU mode by faaany in 28386
* [GPTNeoX] Fix BC issue with 4.36 by ArthurZucker in 28602

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* aaronjimv
* [Doc] Spanish translation of glossary.md (27958)
* [Doc] Fix token link in What 🤗 Transformers can do (28123)
* [docs] Fix mistral link in mixtral.md (28143)
* [docs] Sort es/toctree.yml | Translate performance.md (28262)
* rajveer43
* Add model_docs from cpmant.md to derformable_detr.md (27884)
* Fix indentation error - semantic_segmentation.md (28117)
* poedator
* 4D `attention_mask` support (27539)
* [bnb] Let's make serialization of 4bit models possible (26037)
* connor-henderson
* Add FastSpeech2Conformer (23439)
* JustinLin610
* Add qwen2 (28436)
* SangbumChoi
* enable training mask2former and maskformer for transformers trainer by SangbumChoi in 28277
* [DETA] Improvement and Sync from DETA especially for training by SangbumChoi in 27990
* fix auxiliary loss training in DetrSegmentation by SangbumChoi in 28354

Page 27 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.