Transformers

Latest version: v4.46.3

Safety actively analyzes 682441 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 30

4.43.2

- Fix float8_e4m3fn in modeling_utils (32193)
- Fix resize embedding with Deepspeed (32192)
- let's not warn when someone is running a forward (32176)
- RoPE: relaxed rope validation (32182)

4.43.1

- fix (32162)

4.43.0

Llama

The Llama 3.1 models are released by Meta and come in three flavours: 8B, 70B, and 405B.

To get an overview of Llama 3.1, please visit the [Hugging Face announcement blog post](https://huggingface.co/blog/llama31).

We release a [repository of llama recipes](https://github.com/huggingface/huggingface-llama-recipes) to showcase usage for inference, total and partial fine-tuning of the different variants.

![image](https://github.com/user-attachments/assets/4b5bf1e0-647c-428d-8f88-691bc343c53d)

Chameleon

The Chameleon model was proposed in [Chameleon: Mixed-Modal Early-Fusion Foundation Models](https://arxiv.org/abs/2405.09818v1) by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response.

* Chameleon: add model by zucchini-nlp in 31534

ZoeDepth

The ZoeDepth model was proposed in [ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth](https://arxiv.org/abs/2302.12288) by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the [DPT](https://huggingface.co/docs/transformers/main/en/model_doc/dpt) framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier.

* Add ZoeDepth by NielsRogge in 30136

Hiera

Hiera was proposed in [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/abs/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

The paper introduces “Hiera,” a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed “bells-and-whistles,” are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.

* Adding hiera by Namangarg110 in 30356

Agents

Our ReactAgent has a specific way to return its final output: it calls the tool final_answer, added to the user-defined toolbox upon agent initialization, with the answer as the tool argument. We found that even for a one-shot agent like CodeAgent, using a specific final_answer tools helps the llm_engine find what to return: so we generalized the final_answer tool for all agents.

* Adds final answer tool for all agents by aymeric-roucher in 31703

Now if your code-based agent (like ReactCodeAgent) defines a function at step 1, it will remember the function definition indefinitely. This means your agent can create its own tools for later re-use!

* Code agent: allow function persistence between steps by aymeric-roucher in 31769

This is a transformative PR: it allows the agent to regularly run a specific step for planning its actions in advance. This gets activated if you set an int for planning_interval upon agent initialization. At step 0, a first plan will be done. At later steps (like steps 3, 6, 9 if you set planning_interval=3 ), this plan will be updated by the agent depending on the history of previous steps. More detail soon!

* Agents planning by aymeric-roucher in 31702

Notable changes to the codebase

A significant RoPE refactor was done to make it model agnostic and more easily adaptable to any architecture.
It is only applied to Llama for now but will be applied to all models using RoPE over the coming days.

* Llama: RoPE refactor by gante in 32135

Breaking changes

TextGenerationPipeline and tokenizer kwargs

🚨🚨 This PR changes the code to rely on the tokenizer's defaults when these flags are unset. This means some models using `TextGenerationPipeline` previously did not add a `<bos>` by default, which (negatively) impacted their performance. In practice, this is a breaking change.

Example of a script changed as a result of this PR:
py
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("Foo bar"))


* 🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs by gante in 31747


Bugfixes and improvements

* Fix post gemma merge by ArthurZucker in 31660
* Fix float out of range in owlvit and owlv2 when using FP16 or lower precision by aliencaocao in 31657
* [docs] Llama3 by stevhliu in 31662
* [HybridCache] Fix `get_seq_length` method by sanchit-gandhi in 31661
* don't zero out the attention_mask when using sliding window with flash attention by winglian in 31670
* Fix Gemma2 4d attention mask by hiyouga in 31674
* Fix return_dict in encodec by jla524 in 31646
* add gather_use_object arguments by SangbumChoi in 31514
* Gemma capping is a must for big models by ArthurZucker in 31698
* Add French version of run scripts tutorial by jadechoghari in 31483
* dependencies: `keras-nlp<0.14` pin by gante in 31684
* remove incorrect urls pointing to the llava repository by BiliBraker in 31107
* Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` by ydshieh in 31730
* Fix mistral ONNX export by fxmarty in 31696
* [whisper] static kv cache by sanchit-gandhi in 31166
* Make tool JSON schemas consistent by Rocketknight1 in 31756
* Fix documentation for Gemma2. by jbornschein in 31682
* fix assisted decoding by jiqing-feng in 31401
* Requires for torch.tensor before casting by echarlaix in 31755
* handle (processor_class, None) returned by ModelPatterns by molbap in 31753
* Gemma 2: Update slow tests by gante in 31759
* Add ignore_errors=True to trainer.py rmtree in _inner_training_loop by njbrake in 31668
* [fix bug] logits's shape different from label's shape in preprocess_logits_for_metrics by wiserxin in 31447
* Fix RT-DETR cache for generate_anchors by qubvel in 31671
* Fix RT-DETR weights initialization by qubvel in 31724
* `pytest_num_workers=4` for some CircleCI jobs by ydshieh in 31764
* Fix Gemma2 types by hiyouga in 31779
* Add torch_empty_cache_steps to TrainingArguments by aliencaocao in 31546
* Fix ClapProcessor to merge feature_extractor output into the returned BatchEncoding by mxkopy in 31767
* Fix serialization for offloaded model by SunMarc in 31727
* Make tensor device correct when ACCELERATE_TORCH_DEVICE is defined by kiszk in 31751
* Exclude torch.compile time from metrics computation by zxd1997066 in 31443
* Update CometCallback to allow reusing of the running experiment by Lothiraldan in 31366
* Fix gemma tests by ydshieh in 31794
* Add training support for SigLIP by aliencaocao in 31495
* Repeating an important warning in the chat template docs by Rocketknight1 in 31796
* Allow FP16 or other precision inference for Pipelines by aliencaocao in 31342
* Fix galore lr display with schedulers by vasqu in 31710
* Fix Wav2Vec2 Fairseq conversion (weight norm state dict keys) by gau-nernst in 31714
* Depth Anything: update conversion script for V2 by pcuenca in 31522
* Fix Seq2SeqTrainer crash when BatchEncoding data is None by iohub in 31418
* Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/decision_transformer by dependabot[bot] in 31813
* Add FA2 and `sdpa` support for SigLIP by qubvel in 31499
* Bump transformers from 4.26.1 to 4.38.0 in /examples/tensorflow/language-modeling-tpu by dependabot[bot] in 31837
* Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/lxmert by dependabot[bot] in 31838
* Fix typos by omahs in 31819
* transformers.fx.symbolic_trace supports inputs_embeds by fxmarty in 31574
* Avoid failure `TFBlipModelTest::test_pipeline_image_to_text` by ydshieh in 31827
* Fix incorrect accelerator device handling for MPS in `TrainingArguments` by andstor in 31812
* Mamba & RecurrentGemma: enable strict signature by gante in 31549
* Deprecate `vocab_size` in other two VLMs by zucchini-nlp in 31681
* FX symbolic_trace: do not test decoder_inputs_embeds by fxmarty in 31840
* [Grounding DINO] Add processor to auto mapping by NielsRogge in 31845
* chore: remove duplicate words by hattizai in 31853
* save_pretrained: use tqdm when saving checkpoint shards from offloaded params by kallewoof in 31856
* Test loading generation config with safetensor weights by gante in 31550
* docs: typo in tf qa example by chen-keinan in 31864
* Generate: Add new decoding strategy "DoLa" in `.generate()` by voidism in 29619
* Fix `_init_weights` for `ResNetPreTrainedModel` by ydshieh in 31851
* Update depth estimation task guide by merveenoyan in 31860
* Bump zipp from 3.7.0 to 3.19.1 in /examples/research_projects/decision_transformer by dependabot[bot] in 31871
* Add return type annotation to PreTrainedModel.from_pretrained by mauvilsa in 31869
* Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" by ydshieh in 31868
* Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/visual_bert by dependabot[bot] in 31872
* add warning when using gradient_checkpointing with FSDP full shard by yundai424 in 31578
* Add conversion for interleave llava by zucchini-nlp in 31858
* remove duplicate words in msg by yukionfire in 31876
* Fix file type checks in data splits for contrastive training example script by npyoung in 31720
* Fix failed tests in 31851 by ydshieh in 31879
* fix: Removed `duplicate` field definitions in some classes by Sai-Suraj-27 in 31888
* Push sharded checkpoint to hub when `push_to_hub=True` in `TrainingArguments` by SunMarc in 31808
* [RT-DETR] Add resources by NielsRogge in 31815
* Modify `warnings` in a `with` block to avoid flaky tests by ydshieh in 31893
* Add a condition for nested_detach by haikuoxin in 31855
* InstructBlipVideo: Update docstring by zucchini-nlp in 31886
* Fixes to alternating SWA layers in Gemma2 by turboderp in 31775
* Processor accepts any kwargs by zucchini-nlp in 31889
* [`ConvertSlow`] make sure the order is preserved for addedtokens by ArthurZucker in 31902
* [`Gemma2`] Support FA2 softcapping by ArthurZucker in 31887
* Fix missing methods for Fuyu by Isotr0py in 31880
* fix: Fixed the `1st argument` name in classmethods by Sai-Suraj-27 in 31907
* add gather_use_object arguments II by SangbumChoi in 31799
* Add warning message for beta and gamma parameters by OmarManzoor in 31654
* Fix fx tests with inputs_embeds by fxmarty in 31862
* Refactor flash attention implementation in transformers by ArthurZucker in 31446
* Generate: fix `SlidingWindowCache.reset()` by gante in 31917
* 🚨 fix(SigLip): remove spurious exclusion of first vision output token by transmissions11 in 30952
* Allow `Trainer.get_optimizer_cls_and_kwargs` to be overridden by apoorvkh in 31875
* [Bug Fix] fix qa pipeline tensor to numpy by jiqing-feng in 31585
* Docker: TF pin on the consistency job by gante in 31928
* fix prompt strip to support tensors and np arrays by AvivSham in 27818
* Fix `GenerationMixin.generate` compatibility with pytorch profiler by fxmarty in 31935
* Generate: remove deprecated code due to `Cache` and `cache_position` being default by gante in 31898
* Generate: v4.42 deprecations 🧹🧹 by gante in 31956
* Whisper: move to tensor cpu before converting to np array at decode time by gante in 31954
* fix: Removed a wrong key-word argument in `sigmoid_focal_loss()` function call by Sai-Suraj-27 in 31951
* Generate: handle `logits_warper` update in models with custom generate fn by gante in 31957
* fix: Fixed the arguments in `create_repo()` function call by Sai-Suraj-27 in 31947
* Notify new docker images built for circleci by ydshieh in 31701
* Avoid race condition by ydshieh in 31973
* Masking: remove flakiness from test by gante in 31939
* Generate: doc nits by gante in 31982
* Fix the incorrect permutation of gguf by PenutChen in 31788
* Cambricon MLUs support SDPA and flash_attn by huismiling in 31102
* Speedup model init on CPU (by 10x+ for llama-3-8B as one example) by muellerzr in 31771
* [tests] fix deepspeed zero3 config for `test_stage3_nvme_offload` by faaany in 31881
* Fix bad test about slower init by muellerzr in 32002
* Tests: remove cuda versions when the result is the same 🧹🧹 by gante in 31955
* Bug report update by gante in 31983
* add flash-attn deterministic option to flash-attn>=2.4.1 by junrae6454 in 31961
* fix: Fixed incorrect dictionary assignment in `src/transformers/__init__.py` by Sai-Suraj-27 in 31993
* Bug report update -- round 2 by gante in 32006
* Fix gather when collecting 'num_input_tokens_seen' by CodeCreator in 31974
* Fix if else and *actually* enable superfast init by muellerzr in 32007
* SpeechEncoderDecoder doesn't support param buffer assignments by muellerzr in 32009
* Fix tests skip by qubvel in 32012
* Fixed `log messages` that are resulting in TypeError due to too many arguments by Sai-Suraj-27 in 32017
* Fix typo in classification function selection logic to improve code consistency by moses in 32031
* doc: fix broken BEiT and DiNAT model links on Backbone page by dvrogozh in 32029
* Pass missing arguments to `SeamlessM4Tv2ConformerEncoderLayer.forward()` when gradient checkpointing is enabled by anferico in 31945
* Add language to word timestamps for Whisper by robinderat in 31572
* Add `sdpa` and FA2 for CLIP by qubvel in 31940
* unpin `numpy<2.0` by ydshieh in 32018
* Chameleon: minor fixes after shipping by zucchini-nlp in 32037
* Bump scikit-learn from 1.0.2 to 1.5.0 in /examples/research_projects/decision_transformer by dependabot[bot] in 31458
* Bump scikit-learn from 1.1.2 to 1.5.0 in /examples/research_projects/codeparrot/examples by dependabot[bot] in 32052
* [mistral] Support passing `head_dim` through config (and do not require `head_dim * num_heads == hidden_size`) by xenova in 32050
* Add torch.compile Support For Mamba by zhenglongjiepheonix in 31247
* fix: Removed `duplicate entries` in a dictionary by Sai-Suraj-27 in 32041
* docs: Fixed 2 links in the docs along with some minor fixes by Sai-Suraj-27 in 32058
* Llava: add default chat templates by zucchini-nlp in 31691
* [Chameleon, Hiera] Improve docs by NielsRogge in 32038
* Incorrect Whisper long-form decoding timestamps by kamilakesbi in 32003
* [mistral] Fix FA2 attention reshape for Mistral Nemo by xenova in 32065
* VideoLLaVa: fix chat format in docs by zucchini-nlp in 32083
* Fix progress callback deepcopy by fozziethebeat in 32070
* Fixes to chameleon docs by merveenoyan in 32078
* Add image-text-to-text task guide by merveenoyan in 31777
* Support generating with fallback for short form audio in Whisper by kamilakesbi in 30984
* Disable quick init for deepspeed by muellerzr in 32066
* Chameleon: not supported with fast load by zucchini-nlp in 32091
* Fix tests after `huggingface_hub` 0.24 by Wauplin in 32054
* Fix shard order by b-chu in 32023
* Generate: store special token tensors under a unique variable name by gante in 31980
* fix: Replaced deprecated `mktemp()` function by Sai-Suraj-27 in 32123
* Mention model_info.id instead of model_info.modelId by Wauplin in 32106
* [generate] fix eos/pad id check on mps devices by sanchit-gandhi in 31695
* Fix failing test with race condition by Rocketknight1 in 32140
* Update `ko/_toctree.yml` and remove `custom_tools.md` to reflect latest changes by jungnerd in 31969
* fix: Fixed raising `TypeError` instead of `ValueError` for invalid type by Sai-Suraj-27 in 32111
* [RoBERTa] Minor clarifications to model doc by bt2513 in 31949
* Return assistant generated tokens mask in apply_chat_template by yonigottesman in 30650
* Don't default to other weights file when use_safetensors=True by amyeroberts in 31874
* set warning level to info for special tokens have been added by ArthurZucker in 32138
* Add new quant method by SunMarc in 32047
* Add llama3-llava-next-8b to llava_next conversion script by jamt9000 in 31395
* LLaVaNeXT: pad on right if training by zucchini-nlp in 32134
* Remove `trust_remote_code` when loading Libri Dummy by sanchit-gandhi in 31748
* [modelling] remove un-necessary transpose for fa2 attention by sanchit-gandhi in 31749
* Fix mask creations of `GPTNeoX` and `GPT2` by vasqu in 31944
* Add method to retrieve used chat template by KonradSzafer in 32032
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods by mig-mfreitas in 30910
* Disable quick init for TapasPreTrainedModel by daniellok-db in 32149
* Modify resize_token_embeddings to ensure output type is same as input by bayllama in 31979
* gguf conversion add_prefix_space=None for llama3 by itazap in 31937
* Fix flash attention speed issue by Cyrilvallez in 32028
* Fix video batching to videollava by merveenoyan in 32139
* Added mamba.py backend by alxndrTL in 30139
* Rename Phi-3 rope scaling type by garg-amit in 31436
* Revert "Incorrect Whisper long-form decoding timestamps " by sanchit-gandhi in 32148
* Fix typing to be compatible with later py versions by amyeroberts in 32155
* feat(cache): StaticCache uses index_copy_ to avoid useless copy by tengomucho in 31857
* Added additional kwarg for successful running of optuna hyperparameter search by DeF0017 in 31924
* Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs by RhuiDih in 31629

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* aliencaocao
* Fix float out of range in owlvit and owlv2 when using FP16 or lower precision (31657)
* Add torch_empty_cache_steps to TrainingArguments (31546)
* Add training support for SigLIP (31495)
* Allow FP16 or other precision inference for Pipelines (31342)
* voidism
* Generate: Add new decoding strategy "DoLa" in `.generate()` (29619)
* Namangarg110
* Adding hiera (30356)

4.42.4

Mostly gemma2 support FA2 softcapping!
but also fix the sliding window for long context and other typos.


* [Gemma2] Support FA2 softcapping (31887) by ArthurZucker
* [ConvertSlow] make sure the order is preserved for addedtokens (31902) by ArthurZucker
* Fixes to alternating SWA layers in Gemma2 (31775) by turboderp
* Requires for torch.tensor before casting (31755) by echarlaix

Was off last week could not get this out, thanks all for your patience 🥳

4.42.3

Make sure we have attention softcapping for "eager" GEMMA2 model

After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭

- Gemma capping is a must for big models (31698)

4.42.2

Patch release

Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2!

- Fix Gemma2 4d attention mask (31674) by hiyouga
- don't zero out the attention_mask when using sliding window with flash attention (31670) by winglian

Page 3 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.