Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 30

4.44.2

Patch release v4.44.2, mostly 2 regressions that were not caught for Jamba and for processors!

- Fix: Jamba cache fails to use torch.nn.module (32894) Authored by xgal
- Fix: No need to dtype A in Jamba (32924) xgal
- Fix: Regression on Processor.save_pretrained caused by 31691 (32921) Authored by leloykun

4.44.1

Here are the different fixes, mostly Gemma2 context length, nits here and there, and generation issues

- is_torchdynamo_compiling -- cast a wide exception net (32476) by gante
- Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit (32276)" (32477) by gante and matthewdouglas
- Gemma2: fix FA2 generation (32553) by zucchini-nlp
- Fix: FA2 with packed training (32487) by zucchini-nlp
- Fix sliding window attention used in Gemma2FlashAttention2 (32522) by brcps12
- Automatically add transformers tag to the modelcard (32623) by LysandreJik
- add back the position ids (32554) by ArthurZucker
- Use head_dim if in config for RoPE (32495) suiyoubi ArthurZucker
- Revert PR 32299, flag users when Zero-3 was missed (32851) by muellerzr
- fix multi-gpu with static cache (32543) by SunMarc
- Reduce the error log when using core models that need their weights r… (32656) by muellerzr
- Fix VLM generation issues (32836) by zucchini-nlp
- Fix generate with inputs_embeds as input (32493) (this PR has some cherry-pick)

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.44.0...v4.44.1

4.44.0

This release comes a bit early in our cycle because we wanted to ship important and requested models along with improved performances for everyone!

All of these are included with examples in the awesome https://github.com/huggingface/local-gemma repository! 🎈 We tried to share examples of what is now possible with all the shipped features! Kudos to gante, sanchit-gandhi and xenova

💥 End-to-end generation compile
*Generate: end-to-end compilation 30788 by gante*: `model.generate` now supports compiling! There are a few limitations, but here is a small snippet:

python3
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import copy

model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B", torch_dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

compile generate
compiled_generate = torch.compile(model.generate, fullgraph=True, mode="reduce-overhead")

compiled generate does NOT accept parameterization except a) model inputs b) a generation config
generation_config = copy.deepcopy(model.generation_config)
generation_config.pad_token_id = model.config.eos_token_id

model_inputs = tokenizer(["Write a poem about the market crashing in summer"], return_tensors="pt")
model_inputs = model_inputs.to(model.device)
output_compiled = compiled_generate(**model_inputs, generation_config=generation_config)
print(output_compiled)



⚡ 3 to 5x compile speedup (compilation time 👀 not runtime)
* 3-5x faster torch.compile forward compilation for autoregressive decoder models 32227* by fxmarty .
As documented on the PR, this makes the whole generation a lot faster when you re-use the cache!
You can see this when you run `model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)`

🪶 Offloaded KV cache: offload the cache to CPU when you are GPU poooooor 🚀
* Offloaded KV Cache 31325* by n17s : you just have to set `cache_implementation="offloaded"` when calling `from_pretrained` or using this:
python3
from transformers import GenerationConfig
gen_config = GenerationConfig(cache_implementation="offloaded", other generation options such as num_beams=4,num_beam_groups=2,num_return_sequences=4,diversity_penalty=1.0,max_new_tokens=50,early_stopping=True)
outputs = model.generate(inputs["input_ids"],generation_config=gen_config)


📦 Torch export for static cache
`pytorch` team gave us a great gift: you can now use `torch.export` directly compatible with [Executorch](https://pytorch.org/executorch/main/index.html)! Find examples [here](https://github.com/huggingface/transformers/pull/31706).

* Make static cache compatible with torch.export 32168 by guangy10

This also unlocks support for prompt reuse:
python3
import os, torch, copy
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
device = "cuda"
ckpt = "meta-llama/Meta-Llama-3.1-8B-Instruct"

INITIAL_PROMPT = "From now on, you are going to answer all my questions with historical details. Make sure to always add a bit of french here and there, for style."

model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(ckpt)

prompt_cache = DynamicCache()
inputs = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
prompt_cache = model(**inputs, past_key_values = prompt_cache).past_key_values

prompt = "Why are french people obsessed with french?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20)
response = tokenizer.batch_decode(outputs)[0]
print(response)

prompt = "What is the best city to swim in?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**new_inputs, past_key_values=copy.deepcopy(prompt_cache),max_new_tokens=20)
response = tokenizer.batch_decode(outputs)[0]


Gemma2: assisted decoding
*Gemma 2: support assisted generation 32357* by gante

We now have a 2B Gemma 2 model -- a perfect sidekick for the 27B with assisted generation. We've enabled assisted generation in gemma 2, with a caveat: assisted generation currently requires the use of a windowless cache (as opposed to the default cache for gemma 2), so you might observe some output mismatch on long sequences. Read more about it [here](https://huggingface.co/blog/gemma-july-update#assisted-generation).

py
transformers assisted generation reference:
https://huggingface.co/docs/transformers/main/en/llm_optims#speculative-decoding
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

we DON’T recommend using the 9b model with the 2b model as its assistant
assistant_model_name = 'google/gemma-2-2b-it'
reference_model_name = 'google/gemma-2-27b-it'

tokenizer = AutoTokenizer.from_pretrained(reference_model_name)
model = AutoModelForCausalLM.from_pretrained(
reference_model_name, device_map='auto', torch_dtype=torch.bfloat16
)
assistant_model = AutoModelForCausalLM.from_pretrained(
assistant_model_name, device_map='auto', torch_dtype=torch.bfloat16
)

model_inputs = tokenizer("Einstein's theory of relativity states", return_tensors="pt").to(model.device)
generation_options = {
"assistant_model": assistant_model,
"do_sample": True,
"temperature": 0.7,
"max_new_tokens": 64,
}

outputs = model.generate(**model_inputs, **generation_options)
tokenizer.batch_decode(outputs, skip_special_tokens=True)


Nemotron support
![image](https://github.com/user-attachments/assets/512d3fbe-909b-4e45-9927-cab78e0f522a)
> Nemotron-4-340B-Instruct is a large language model (LLM) that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. It is a fine-tuned version of the Nemotron-4-340B-Base model, optimized for English-based single and multi-turn chat use-cases. It supports a context length of 4,096 tokens.

The conversion script should be able to cover Minitron and Nemotron, thanks and kudos to suiyoubi. See:
* Add Nemotron HF Support 31699


Codestral support
![image](https://github.com/user-attachments/assets/2827f950-f6c5-4fb8-8569-e8008aa79651)
> Codestral is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash. It also performs well on more specific ones like Swift and Fortran. This broad language base ensures Codestral can assist developers in various coding environments and projects.

Codestral saves developers time and effort: it can complete coding functions, write tests, and complete any partial code using a fill-in-the-middle mechanism. Interacting with Codestral will help level up the developer’s coding game and reduce the risk of errors and bugs.

It's mamba2 architecture, was a bit of a pain to remove all einops but hope we made it better for everyone!

* Add codestral mamba2 32080 by molbap and vasqu

Breaking changes:
We removed the chat template **in the code**, they should all be on the hub!
* 🚨 No more default chat templates 31733 by Rocketknight1

Long-form decoding for whisper, even faster:
Our great sanchit-gandhi worked on porting the recent compile upgrades to long form decoding in
* [whisper] compile compatibility with long-form decoding 31772




What's Changed
* Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs by RhuiDih in https://github.com/huggingface/transformers/pull/31629
* Updated `ruff` to the latest version by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/31926
* fix by gante in https://github.com/huggingface/transformers/pull/32162
* fix: Fixed an if condition that is always evaluating to true by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32160
* [docs] change temperature to a positive value by faaany in https://github.com/huggingface/transformers/pull/32077
* adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer by rohitdwivedula in https://github.com/huggingface/transformers/pull/32171
* fix: default value reflects the runtime environment variables rather than the ones present at import time. by junrae6454 in https://github.com/huggingface/transformers/pull/32153
* Update qwen2.md by ArtificialZeng in https://github.com/huggingface/transformers/pull/32108
* Remove conversational pipeline tests by amyeroberts in https://github.com/huggingface/transformers/pull/32099
* RoPE: relaxed rope validation by gante in https://github.com/huggingface/transformers/pull/32182
* let's not warn when someone is running a forward by ArthurZucker in https://github.com/huggingface/transformers/pull/32176
* Fix resize embedding with Deepspeed by zucchini-nlp in https://github.com/huggingface/transformers/pull/32192
* Fix float8_e4m3fn in modeling_utils by SunMarc in https://github.com/huggingface/transformers/pull/32193
* Support dequantizing GGUF FP16 format by PenutChen in https://github.com/huggingface/transformers/pull/31783
* :rotating_light: No more default chat templates by Rocketknight1 in https://github.com/huggingface/transformers/pull/31733
* fix: Replaced deprecated `unittest method` with the correct one by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32198
* [whisper] fix short-form output type by sanchit-gandhi in https://github.com/huggingface/transformers/pull/32178
* remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 by statelesshz in https://github.com/huggingface/transformers/pull/32210
* Update question_answering.py by avlewis in https://github.com/huggingface/transformers/pull/32208
* [BigBird Pegasus] set _supports_param_buffer_assignment to False by kashif in https://github.com/huggingface/transformers/pull/32222
* [warnings] fix E721 warnings by kashif in https://github.com/huggingface/transformers/pull/32223
* Follow up for 31973 by ydshieh in https://github.com/huggingface/transformers/pull/32025
* translate philosophy.md to chinese by statelesshz in https://github.com/huggingface/transformers/pull/32177
* Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac by jrhe in https://github.com/huggingface/transformers/pull/31846
* Fix code snippet for Grounding DINO by qubvel in https://github.com/huggingface/transformers/pull/32229
* Generation: stop at `eos` for assisted decoding by zucchini-nlp in https://github.com/huggingface/transformers/pull/31301
* Llava: generate without images by zucchini-nlp in https://github.com/huggingface/transformers/pull/32183
* Resize embeds with DeepSpeed by zucchini-nlp in https://github.com/huggingface/transformers/pull/32214
* don't log base model architecture in wandb if log model is false by joaonadkarni in https://github.com/huggingface/transformers/pull/32143
* Refactor: Removed un-necessary `object` base class by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32230
* Adds: extra_repr for RMSNorm layers in most models by rohitdwivedula in https://github.com/huggingface/transformers/pull/32204
* Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 by catalys1 in https://github.com/huggingface/transformers/pull/31934
* [tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` by faaany in https://github.com/huggingface/transformers/pull/32039
* Flash-Attn: fix generation when no attention mask or no pading by zucchini-nlp in https://github.com/huggingface/transformers/pull/32241
* More flexible trigger condition by ydshieh in https://github.com/huggingface/transformers/pull/32251
* Llama 3.1: replace for loop by tensor ops at inv_freq initialization by gante in https://github.com/huggingface/transformers/pull/32244
* 🚨 Bloom support for cache class by zucchini-nlp in https://github.com/huggingface/transformers/pull/31445
* Upload new model failure report to Hub by ydshieh in https://github.com/huggingface/transformers/pull/32264
* Optimize t5 tokenize logic to avoid redundant calls by leejet in https://github.com/huggingface/transformers/pull/32270
* fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32262
* Repo: remove exceptions in `check_docstrings` by gante in https://github.com/huggingface/transformers/pull/32259
* make `p_mask` a numpy array before passing to `select_starts_ends` by faaany in https://github.com/huggingface/transformers/pull/32076
* fix(docs): Fixed a link in docs by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32274
* Generate: end-to-end compilation by gante in https://github.com/huggingface/transformers/pull/30788
* Whisper tokenizer word level timestamps by kamilakesbi in https://github.com/huggingface/transformers/pull/32197
* [pipeline] fix padding for 1-d tensors by sanchit-gandhi in https://github.com/huggingface/transformers/pull/31776
* Make static cache compatible with torch.export by guangy10 in https://github.com/huggingface/transformers/pull/32168
* Add stream messages from agent run for gradio chatbot by aymeric-roucher in https://github.com/huggingface/transformers/pull/32142
* use torch 2.4 in 2 CI jobs by ydshieh in https://github.com/huggingface/transformers/pull/32302
* Docs: fix GaLore optimizer code example by gil2rok in https://github.com/huggingface/transformers/pull/32249
* Fix GGUF dequantize for `gguf==0.9.1` by Isotr0py in https://github.com/huggingface/transformers/pull/32298
* Cast epochs_trained to int when resuming training by teddy-f-47 in https://github.com/huggingface/transformers/pull/32286
* feat(ci): set `fetch-depth: 0` in trufflehog checkout step by McPatate in https://github.com/huggingface/transformers/pull/31663
* Fix M4T for ASR pipeline by ylacombe in https://github.com/huggingface/transformers/pull/32296
* Docs: formatting nits by gante in https://github.com/huggingface/transformers/pull/32247
* Alternative agent plan by plaggy in https://github.com/huggingface/transformers/pull/32295
* fix: Added missing raise keyword for few exceptions by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32333
* fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit by winglian in https://github.com/huggingface/transformers/pull/32276
* fixes 32329 : The Torch code is correct - to get an average of 10% o… by fkrasnov2 in https://github.com/huggingface/transformers/pull/32335
* Repo checks: skip docstring checks if not in the diff by gante in https://github.com/huggingface/transformers/pull/32328
* Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process by xenova in https://github.com/huggingface/transformers/pull/32191
* LLaVA-NeXT: fix anyres shapes by zucchini-nlp in https://github.com/huggingface/transformers/pull/32314
* Gemma2 and flash-attention by zucchini-nlp in https://github.com/huggingface/transformers/pull/32188
* Llama 3.1: Fix incorrect `inv_freq` assignment by gante in https://github.com/huggingface/transformers/pull/32330
* [Idefics2] - Fix FA2 call for Perceiver layer by amyeroberts in https://github.com/huggingface/transformers/pull/32275
* Gemma 2: support assisted generation by gante in https://github.com/huggingface/transformers/pull/32357
* Fix error when streaming to gradio with non-string tool arguments by aymeric-roucher in https://github.com/huggingface/transformers/pull/32360
* >3-5x faster torch.compile forward compilation for autoregressive decoder models by fxmarty in https://github.com/huggingface/transformers/pull/32227
* fix: Fixed `staticmethods` with self as first argument by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32361
* fix: warmup_steps check for training_args by Ricardo-L-C in https://github.com/huggingface/transformers/pull/32236
* LLaVa: add cache class attribute by zucchini-nlp in https://github.com/huggingface/transformers/pull/32278
* [enc-dec cache] fix bug in indexing by sanchit-gandhi in https://github.com/huggingface/transformers/pull/32370
* [whisper] compile compatibility with long-form decoding by sanchit-gandhi in https://github.com/huggingface/transformers/pull/31772
* Remove size check between attn_weights and kv_seq_len for phi3 by helunwencser in https://github.com/huggingface/transformers/pull/32339
* add missing attribute _supports_param_buffer_assignment for gpt-j. by nv-guomingz in https://github.com/huggingface/transformers/pull/32359
* Check device map for saving tokenizer config on TPU (fix for issue 31971) by ayukh in https://github.com/huggingface/transformers/pull/32043
* update clean_up_tokenization_spaces warning by itazap in https://github.com/huggingface/transformers/pull/32371
* Empty list in defaults for LLaMA special tokens during weights conversion by ViktorooReps in https://github.com/huggingface/transformers/pull/32342
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase by OmarManzoor in https://github.com/huggingface/transformers/pull/31233
* Offloaded KV Cache by n17s in https://github.com/huggingface/transformers/pull/31325
* Docker: add `speech` dep to the consistency docker image by gante in https://github.com/huggingface/transformers/pull/32374
* Fixed Hybrid Cache Shape Initialization. by OsamaS99 in https://github.com/huggingface/transformers/pull/32163
* Yell at the user if zero-3 init wasn't performed, but expected to have been done by muellerzr in https://github.com/huggingface/transformers/pull/32299
* Update docs by zucchini-nlp in https://github.com/huggingface/transformers/pull/32368
* RoPE: Add numerical tests ✨ by gante in https://github.com/huggingface/transformers/pull/32380
* [generate] only require an attention mask for mps with torch<2.4 by sanchit-gandhi in https://github.com/huggingface/transformers/pull/32367
* fix: (issue 32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. by fshp971 in https://github.com/huggingface/transformers/pull/32157
* MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. by Luke20000429 in https://github.com/huggingface/transformers/pull/31500
* Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/32393
* fix: SeamlessM4TFeatureExtractor stride remainder by TechInterMezzo in https://github.com/huggingface/transformers/pull/32088
* Phi3 tests: fix typing for Python 3.8 by zucchini-nlp in https://github.com/huggingface/transformers/pull/32388
* 32184 save total_vocab_size by itazap in https://github.com/huggingface/transformers/pull/32240
* add values for neftune by nbroad1881 in https://github.com/huggingface/transformers/pull/32399
* Fix documentation references to google/bit-50 model by JuanFKurucz in https://github.com/huggingface/transformers/pull/32407
* Persist embedding type of BART and mBART models after resize by AbdiHaryadi in https://github.com/huggingface/transformers/pull/32242
* fix: Updated `test_embeded_special_tokens` for luke and mluke models by Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/32413
* Respect the config's attn_implementation if set by amyeroberts in https://github.com/huggingface/transformers/pull/32383
* Fix documentation links and code reference to model llava-next by JuanFKurucz in https://github.com/huggingface/transformers/pull/32434
* Cache: create docs by zucchini-nlp in https://github.com/huggingface/transformers/pull/32150
* Llava: fix checkpoint_doc by RUFFY-369 in https://github.com/huggingface/transformers/pull/32458
* add the missing flash attention test marker by faaany in https://github.com/huggingface/transformers/pull/32419
* Update kwargs validation for `preprocess` with decorator by qubvel in https://github.com/huggingface/transformers/pull/32024
* Fix get large model config for Switch Transformer encoder only tester by JuanFKurucz in https://github.com/huggingface/transformers/pull/32438
* Dependencies: fix typo by gante in https://github.com/huggingface/transformers/pull/32389
* Add Nemotron HF Support by suiyoubi in https://github.com/huggingface/transformers/pull/31699
* Generate: fix end to end compilation by gante in https://github.com/huggingface/transformers/pull/32465
* Add codestral mamba2 by molbap in https://github.com/huggingface/transformers/pull/32080

New Contributors
* RhuiDih made their first contribution in https://github.com/huggingface/transformers/pull/31629
* rohitdwivedula made their first contribution in https://github.com/huggingface/transformers/pull/32171
* ArtificialZeng made their first contribution in https://github.com/huggingface/transformers/pull/32108
* avlewis made their first contribution in https://github.com/huggingface/transformers/pull/32208
* jrhe made their first contribution in https://github.com/huggingface/transformers/pull/31846
* joaonadkarni made their first contribution in https://github.com/huggingface/transformers/pull/32143
* catalys1 made their first contribution in https://github.com/huggingface/transformers/pull/31934
* leejet made their first contribution in https://github.com/huggingface/transformers/pull/32270
* guangy10 made their first contribution in https://github.com/huggingface/transformers/pull/32168
* gil2rok made their first contribution in https://github.com/huggingface/transformers/pull/32249
* teddy-f-47 made their first contribution in https://github.com/huggingface/transformers/pull/32286
* plaggy made their first contribution in https://github.com/huggingface/transformers/pull/32295
* fkrasnov2 made their first contribution in https://github.com/huggingface/transformers/pull/32335
* helunwencser made their first contribution in https://github.com/huggingface/transformers/pull/32339
* nv-guomingz made their first contribution in https://github.com/huggingface/transformers/pull/32359
* ayukh made their first contribution in https://github.com/huggingface/transformers/pull/32043
* n17s made their first contribution in https://github.com/huggingface/transformers/pull/31325
* OsamaS99 made their first contribution in https://github.com/huggingface/transformers/pull/32163
* fshp971 made their first contribution in https://github.com/huggingface/transformers/pull/32157
* Luke20000429 made their first contribution in https://github.com/huggingface/transformers/pull/31500
* TechInterMezzo made their first contribution in https://github.com/huggingface/transformers/pull/32088
* AbdiHaryadi made their first contribution in https://github.com/huggingface/transformers/pull/32242
* RUFFY-369 made their first contribution in https://github.com/huggingface/transformers/pull/32458
* suiyoubi made their first contribution in https://github.com/huggingface/transformers/pull/31699

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.43.4...v4.44.0

4.43.4

There was a mick mack, now deepseep issues are properly pushed with:
- Resize embeds with DeepSpeed https://github.com/huggingface/transformers/pull/32214

🤗 Enjoy holidays

4.43.3

We still saw some bugs so zucchini-nlp added:
~- Resize embeds with DeepSpeed 32214~
- don't log base model architecture in wandb if log model is false 32143


Other fixes:
- [whisper] fix short-form output type 32178, by sanchit-gandhi which fixes the short audio temperature fallback!
- [BigBird Pegasus] set _supports_param_buffer_assignment to False 32222 by kashif, mostly related to the new super fast init, some models have to get this set to False. If you see a weird behavior look for that 😉

4.43.2

- Fix float8_e4m3fn in modeling_utils (32193)
- Fix resize embedding with Deepspeed (32192)
- let's not warn when someone is running a forward (32176)
- RoPE: relaxed rope validation (32182)

Page 2 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.