Transformers

Latest version: v4.50.3

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 33

4.42.2

Not secure

Patch release

Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2!

- Fix Gemma2 4d attention mask (31674) by hiyouga
- don't zero out the attention_mask when using sliding window with flash attention (31670) by winglian

4.42.1

Not secure

Patch release for commit:

- [HybridCache] Fix get_seq_length method (31661)

4.42.0

Not secure

New model additions

Gemma-2

The Gemma2 model was proposed in [Gemma2: Open Models Based on Gemini Technology and Research](https://blog.google/technology/developers/Gemma2-open-models/) by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.

The abstract from the paper is the following:

*This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations*

![image](https://github.com/huggingface/transformers/assets/30755778/798b25f4-485a-4b60-abe5-af612def209b)

* Add gemma 2 by ArthurZucker in 31659

RTDETR

The RT-DETR model was proposed in [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.

RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.

![image](https://github.com/huggingface/transformers/assets/30755778/78b096d4-2686-41cb-9fdd-1cd517722fd3)

* New model support RTDETR by SangbumChoi in 29077

InstructBlip

The InstructBLIP model was proposed in [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) architecture for visual instruction tuning.

InstructBLIP uses the same architecture as [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.

![image](https://github.com/huggingface/transformers/assets/30755778/fd6997aa-d299-4d14-9eab-c3f16309bae9)

* Add video modality for InstrucBLIP by zucchini-nlp in 30182

LlaVa NeXT Video

The LLaVa-NeXT-Video model was proposed in [LLaVA-NeXT: A Strong Zero-shot Video Understanding Model](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/) by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon [LLaVa-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.

[LLaVA-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on [VideoMME bench](https://arxiv.org/abs/2405.21075).

* Add LLaVa NeXT Video by zucchini-nlp in 31252

New model adder

A very significant change makes its way within the `transformers` codebase, introducing a new way to add models to `transformers`. We recommend reading the description of the PR below, but here is the gist of it:

> The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:
>
> - single model single file
> - explicit code
> - standardization of modeling code
> - readable and educative code
> - simple code
> - least amount of modularity
>
> This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.

* Diff converter v2 by ArthurZucker in 30868

Tool-use and RAG model support

We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the **Nous-Hermes**, **Command-R** and **Mistral/Mixtral** model families for support in the very near future. Please see the updated [chat template docs](https://huggingface.co/docs/transformers/main/chat_templating) for more information.

If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the [Hugging Face Discord server](https://hf.co/join/discord). Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.

* Chat Template support for function calling and RAG by Rocketknight1 in 30621

GGUF support

We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.

* Add Qwen2 GGUF loading support by Isotr0py in 31175
* GGUF: Fix llama 3 GGUF by younesbelkada in 31358
* Fix llama gguf converter by SunMarc in 31575

Trainer improvements

A new optimizer is added in the `Trainer`.

* FEAT / Trainer: LOMO optimizer support by younesbelkada in 30178

Quantization improvements

Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.

Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.

* Quantized KV Cache by zucchini-nlp in 30483
* Docs / Quantization: refactor quantization documentation by younesbelkada in 30942

Examples

New instance segmentation examples are added by qubvel

* Instance segmentation examples by qubvel in 31084

Notable improvements

As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:

py
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)

* Enable HF pretrained backbones by amyeroberts in 31145

Additionally, we thank Cyrilvallez for diving into our `generate` method and greatly reducing the memory requirements.

* Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 by Cyrilvallez in 30536

Breaking changes

Remove ConversationalPipeline and Conversation object

Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.

The `TextGenerationPipeline` is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.

* 🚨 Remove ConversationalPipeline and Conversation object by Rocketknight1 in 31165

Remove an accidental duplicate softmax application in FLAVA's attention

Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.

* 🚨 FLAVA: Remove double softmax by amyeroberts in 31322

Idefics2's `ignore_index` attribute of the loss is updated to `-100`

* 🚨 [Idefics2] Update ignore index by NielsRogge in 30898

out_indices from `timm` being updated

Recent updates to timm changed the type of the attribute `model.feature_info.out_indices`. Previously, `out_indices` would reflect the input type of `out_indices` on the `create_model` call i.e. either `tuple` or `list`. Now, this value is always a tuple.

As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast `out_indices` to always be a list.

This has the possibility of being a slight breaking change if users are creating models and relying on `out_indices` on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.

* 🚨 out_indices always a list by amyeroberts in 30941

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

* 🚨 Remove dataset with restrictive license by echarlaix in 31452

Bugfixes and improvements

* Add fixed resize and pad strategy for object detection by qubvel in 30742
* Enable dynamic resolution input for Swin Transformer and variants by the-neural-networker in 30656
* Add TokenClassification for Mistral, Mixtral and Qwen2 by josephenguehard in 29878
* FIX / Quantization: Fix Dockerfile build by younesbelkada in 30890
* Add support for torch.compile dynamic shapes by warner-benjamin in 30560
* LLaVa-Next: Update docs with batched inference by zucchini-nlp in 30857
* DeformableDETR two stage support bfloat16 by DonggeunYu in 30907
* add return_token_timestamps to WhisperProcessor by kamilakesbi in 30812
* Fix num_hidden_layers in initialization of new model in Mamba by SrGonao in 30403
* separate kwargs in processor (similar to 30193) by Eric2i in 30905
* fix for custom pipeline configuration by not-lain in 29004
* Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by ylacombe in 28706
* Fix a shape annotation and typos in `mamba` slow forward by vasqu in 30691
* `tokenizer_class = "AutoTokenizer"` Llava Family by ArthurZucker in 30912
* Introduce configured_state arg for accelerator_config by muellerzr in 29781
* Add torch.compile for Mistral by zhenglongjiepheonix in 30642
* [docs] Spanish translation of model_memory_anatomy.md by aaronjimv in 30885
* FIX / TST: Fix expected results on Mistral slow test (A10) by younesbelkada in 30909
* PaliGemma - fix processor with no input text by hiyouga in 30916
* CI: AMD MI300 tests fix by mht-sharma in 30797
* Enforce saving at end of training if saving option chosen by muellerzr in 30160
* fix: center_crop occasionally outputs off-by-one dimension matrix by mattlbeck in 30934
* [Benchmark] Reuse `optimum-benchmark` by ydshieh in 30615
* TST / Workflows: Get slack notifications for docker image build by younesbelkada in 30891
* Fix swin embeddings interpolation by amyeroberts in 30936
* Fix inhomogeneous shape error in example by Zantares in 30434
* update ruff version by ArthurZucker in 30932
* Update build ci image [push-ci-image] by ArthurZucker in 30933)
* Update video-llava docs by zucchini-nlp in 30935
* Fix low cpu mem usage tests by SunMarc in 30808
* [doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by Vaibhavs10 in 30938
* Avoid extra chunk in speech recognition by jonatanklosko in 29539
* [whisper] only trigger forced ids warning once by sanchit-gandhi in 30966
* Paligemma - fix slow tests, add bf16 and f16 slow tests by molbap in 30851
* Finally fix the missing new model failure CI report by ydshieh in 30968
* legacy to init the slow tokenizer when converting from slow was wrong by ArthurZucker in 30972
* Generation: get special tokens from model config by zucchini-nlp in 30899
* [Whisper] Strip prompt before finding common subsequence by sanchit-gandhi in 27836
* Fix link in Pipeline documentation by junhl in 30948
* [Mistral and friends] Update MLP by NielsRogge in 31057
* Paligemma causal attention mask by molbap in 30967
* Update object detection with latest resize and pad strategies by qubvel in 30955
* Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by kamilakesbi in 30637
* Push ci image by ArthurZucker in 30982
* test_custom_4d_attention_mask skip with sliding window attn by poedator in 30833
* Finish adding support for torch.compile dynamic shapes by warner-benjamin in 30919
* FIX / Docs: Minor changes in quantization docs by younesbelkada in 30985
* Fix accelerate failing tests by SunMarc in 30836
* [tests] add `torch.use_deterministic_algorithms` for XPU by faaany in 30774
* Add a check that warmup_setps is either 0 or >= 1 by ymoslem in 30764
* Update 4 `MptIntegrationTests` expected outputs by ydshieh in 30989
* [Port] TensorFlow implementation of Mistral by ariG23498 in 29708
* Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by ymoslem in 29834
* Bugfix: WandbCallback uploads initial model checkpoint by mgerstgrasser in 30897
* add prefix space ignored in llama 29625 by itazap in 30964
* Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by kkoehncke in 26139)"
* Do not trigger autoconversion if local_files_only by Wauplin in 31004
* pin `uv==0.1.45` by ydshieh in 31006
* Perceiver interpolate position embedding by g1y5x3 in 30979
* [tests] make `test_model_parallelism` device-agnostic by faaany in 30844
* FIX / TST: Fix expected results on Mistral AWQ test by SunMarc in 30971
* allow multi-gpu by ydshieh in 31011
* Fix resume_download future warning by Wauplin in 31007
* Quantization / TST: Fix remaining quantization tests by younesbelkada in 31000
* save the list of new model failures by ydshieh in 31013
* added interpolation for vitmae model in pytorch as well as tf. by bhuvanmdev in 30732
* Add split special tokens by itazap in 30772
* Paligemma- fix devices and dtype assignments by molbap in 31008
* Redirect transformers_agents doc to agents by aymeric-roucher in 31054
* unpin uv by ydshieh in 31055
* Follow up: Fix link in dbrx.md by eitanturok in 30514
* Update feature request label in template by amyeroberts in 30940
* Fix quanto tests by SunMarc in 31062
* Fix pad_to_max_length Whisper by ylacombe in 30787
* skip `test_model_parallelism` for 2 model test classes by ydshieh in 31067
* use `main` by ydshieh in 31065
* Remove `ninja` from docker image build by ydshieh in 31080
* fix "piano" typo by clinty in 31027
* Update quicktour.md to fix broken link to Glossary by apalkk in 31072
* Remove redundant backend checks in training_args.py by kevint324 in 30999
* fix from_pretrained in offline mode when model is preloaded in cache by oOraph in 31010
* Remove float64 cast for OwlVit and OwlV2 to support MPS device by qubvel in 31071
* Fix OWLv2 post_process_object_detection for multiple images by qubvel in 31082
* Fix typo in trainer.py by taslimisina in 31048
* [SuperPoint, PaliGemma] Update docs by NielsRogge in 31025
* Fix failing tokenizer tests by LysandreJik in 31083
* Watermark: fix tests by zucchini-nlp in 30961
* Docs / PEFT: Add PEFT API documentation by younesbelkada in 31078
* Render chat template tojson filter as unicode by CISC in 31041
* FIX: Add `accelerate` as a hard requirement by younesbelkada in 31090
* FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` by younesbelkada in 31092
* skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` by ydshieh in 31086
* Fix PretrainedConfig docstring with deprecated resume_download by albertvillanova in 31014
* Fix DeepSpeed compatibility with weight_norm by jonnyli1125 in 30881)
* TST: Fix instruct-blip tests by younesbelkada in 31088
* Docs / Quantization: Redirect deleted page by younesbelkada in 31063
* Deprecate low use models by amyeroberts in 30781
* Quantized KV cache: update quanto by zucchini-nlp in 31052
* FEAT: Add mistral v3 conversion script by younesbelkada in 30981
* Use `HF_HUB_OFFLINE` + fix has_file in offline mode by Wauplin in 31016
* Improve `transformers-cli env` reporting by statelesshz in 31003
* Fix env.py in cases where torch is not present by Rocketknight1 in 31113
* Fix faulty rstrip in module loading by Rocketknight1 in 31108
* Rm maintainer + migrate by muellerzr in 31089
* Fix nightly circleci by ydshieh in 31114
* FIX / Docs: Fix GPTQ expected number of bits by younesbelkada in 31111
* Add VLM generation default contributor by gante in 31115
* Add on_optimizer_step to callback options by dhruvbpai in 31095
* Cleanup docker build by ydshieh in 31119
* FIX / Quantization: Add extra validation for bnb config by younesbelkada in 31135
* fix get_scheduler when name is warmup_stable_decay by zspo in 31128
* Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config by younesbelkada in 31136
* Workflow: Remove `IS_GITHUB_CI` by younesbelkada in 31147
* helper by ArthurZucker in 31152
* pytest -rsfE by ydshieh in 31140
* Fix quantized cache output by SunMarc in 31143
* Update sam.md by asifajrof in 31130
* Quantization: Enhance bnb error message by younesbelkada in 31160
* [trainer] add sanity evaluation option by SunMarc in 31146
* Add streaming, various fixes by aymeric-roucher in 30838
* Added description of quantization_config by vamsivallepu in 31133
* Fix typo: use_safetenstors to use_safetensors by CharlesCNorton in 31184
* Remove copied froms for deprecated models by amyeroberts in 31153
* Token healing by ahmed-moubtahij in 30081
* [`GemmaModel`] fix small typo by ArthurZucker in 31202
* Fix Cannot convert [array()] to EagerTensor of dtype int64 by pavi-ninjaac in 31109
* Ignore non-causal mask in more cases with SDPA by fxmarty in 30138
* SlidingWindowCache: reduce differences to other Cache classes by gante in 30970
* Fix `test_compile_static_cache` by ydshieh in 30991
* fix the get_size_with_aspect_ratio in max_size situation by SangbumChoi in 30902
* Fix typo in utils by Bojun-Feng in 31169
* Rename sanity_evaluation to eval_on_start by Qubitium in 31192
* Wrong translation FR : Contents = Contenu by jadechoghari in 31186
* Cohere: Fix copied from by younesbelkada in 31213
* Set greater_is_better to False if metric_for_best_model ends with "loss" by miivanov90 in 31142
* Fix GPU OOM for `mistral.py::Mask4DTestHard` by ydshieh in 31212
* [docs] Spanish translation of tokenizer_summary.md by aaronjimv in 31154
* Pass device in Logits Processor's init by zucchini-nlp in 29804
* Fix sentence fragment within test comments by DomHudson in 31218
* fix(PatchTST): Wrong dropout used for PretainHead by maxstrobel in 31117
* Video-LLaVa: handle any number of frames by zucchini-nlp in 31221
* Add dynamic resolution input/interpolate position embedding to deit by p-kris10 in 31131
* fix bf16 issue in text classification pipeline by chujiezheng in 30996
* Fix pipeline tests - torch imports by amyeroberts in 31227
* Add new line switch before logging ***** Running {description} ***** by jacklanda in 31225
* add no split modules for xlmrobertaxl by ManuelFay in 31223
* Fix `MistralIntegrationTest` by ydshieh in 31231
* Blip: Deprecate `BlipModel` by younesbelkada in 31235
* Move out common backbone config param validation by amyeroberts in 31144
* Upload (daily) CI results to Hub by ydshieh in 31168
* Specify dtype=torch.bool to avoid xla error by ysulsky in 31191
* Fixing `name 'torch' is not defined` in `bitsandbytes` integration by jamesbraza in 31243
* Benchmark GitHub Actions workflow by ydshieh in 31163
* Early labels validation by amyeroberts in 31240
* doc: add info about wav2vec2 bert in older wav2vec2 models. by Vaibhavs10 in 31120
* enable deterministic mode for npu by statelesshz in 31253
* Add missing Flaubert tokenizer tests by bastrob in 30492
* Fix circular reference issue in CLIPTokenizerFast by dhaivat1729 in 31075
* Add condition to `benchmark` job in `push-important-models.yml` by ydshieh in 31259
* Skip failing JetMOE generation tests by amyeroberts in 31266
* no need for explicit EXTRA_TOKENS in processing_paligemma.py by grahamannett in 31022
* [`SwitchTransformer`] Significant performance improvement on MoE blocks by ranggihwang in 31173
* fix loading special_tokens_map_file by ZhiyuanChen in 31012
* Make mamba use cache by zucchini-nlp in 31116
* Generation: fix handling of special tokens by zucchini-nlp in 31254
* Switch from `cached_download` to `hf_hub_download` in remaining occurrences by Wauplin in 31284
* fix: `str` should be used not `int` when setting env variables by statelesshz in 31272
* Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by baoleai in 31264
* fix accelerate tests for roberta xl by SunMarc in 31288
* Enable dynamic resolution input for Beit by OmarManzoor in 31053
* Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by amyeroberts in 31258
* Pipeline VQA: Add support for list of images and questions as pipeline input by BlacCod in 31217
* Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by gorodnitskiy in 31295
* Update text-to-speech.md by jaguaryang in 31269
* Fixed Wav2Vec2ProcessorWithLM decoding error by karicotiza in 31188
* Fix jetmoe model by Cyrilvallez in 31279
* Extend save_pretrained to offloaded models by blbadger in 27412
* Implement JSON dump conversion for torch_dtype in TrainingArguments by junrae6454 in 31224
* interpolation added for TVP. by bhuvanmdev in 30863
* Rename test_model_common_attributes -> test_model_get_set_embeddings by amyeroberts in 31321
* Use unused prepare_img() function in dinov2 conversion script by IbrahimAmin1 in 31335
* docs: fix style by imba-tjd in 31340
* Fix paligemma inverted mask by molbap in 31207
* docs/zh: fix style by imba-tjd in 31334
* Decorators for deprecation and named arguments validation by qubvel in 30799
* Improve error msg when using bitsandbytes by SunMarc in 31350
* Fix Cohere CI by ydshieh in 31263
* Fix gradio tool demos by aymeric-roucher in 31230
* Fast image processor by amyeroberts in 28847
* Add french translation of AutoBackbone by jadechoghari in 31300
* Add support to declare imports for code agent by JasonZhu1313 in 31355
* Fix idefics cache by zucchini-nlp in 31377
* [Bug Fix] Renamed loss to losses to suppress UnboundLocalError by her0e1c1 in 31365
* docs: fix broken link by imba-tjd in 31370
* backbone_utils - fix relative import by amyeroberts in 31382
* README underline between badges fix by novialriptide in 31376
* Update comment in modeling_utils.py by inf3rnus in 31299
* Use huggingface_hub helper function to split state dict by SunMarc in 31091
* Change JSON serialization to custom json.dumps by junrae6454 in 31100
* feat(ci): add trufflehog secrets detection by McPatate in 31344
* [QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by aliencaocao in 31364
* Make chat templates part of ProcessorMixin by Rocketknight1 in 30744
* add initial design for uniform processors + align model by molbap in 31197
* Add missing French translation of tutoriel_pipeline.md by jadechoghari in 31396
* Temporarily pin datasets upper version to fix CI by albertvillanova in 31407
* Support Clip QKV for MPT by akakakakakaa in 31307
* Pin datasets<2.20.0 for examples by amyeroberts in 31417
* Fix MusicGen SDPA by ylacombe in 31208
* Set seed for M4T retain grad test by ylacombe in 31419
* Fix SpeechT5 `decoder_attention_mask` shape by ylacombe in 28071
* Change potential `inputs_embeds` padding `logger.warning` to `logger.warning_once` by naimenz in 31411
* Remove duplicate image processor in auto map by amyeroberts in 31383
* Install the tensorflow example requirements in docker by amyeroberts in 31428
* Remove empty create_and_test_config_common_properties tests by amyeroberts in 31359
* xpu: support xpu backend from stock pytorch (>=2.4) by dvrogozh in 31238
* Musicgen special tokens in tensors by zucchini-nlp in 31420
* Fix Bark logits processors device misplacement by ylacombe in 31416
* Rename misnamed image processor test files by amyeroberts in 31430
* Generate: fix `tokenizer` being popped twice by gante in 31427
* [tests] make `TestDeepSpeedModelZoo` device-agnostic by faaany in 31402
* Support multiple validation datasets when `dataloader_persistent_workers=True` by bastienlc in 30627
* Pass datasets trust_remote_code by albertvillanova in 31406
* simple fix by tokenizer-decode in 31456
* Fix typing errors in `Qwen2ForTokenClassification` by kevinhu in 31440
* Agents: Improve python interpreter by aymeric-roucher in 31409
* Donut: fix `generate` call from local path by gante in 31470
* Make "tool_use" the default chat template key when tools are passed by Rocketknight1 in 31429
* Fix single letter stop strings by Rocketknight1 in 31448
* Update chat template docs and bump Jinja version by Rocketknight1 in 31455
* Improve `PreTrainedTokenizerFast` loading time when there are many added tokens by ydshieh in 31404
* Fix documentation typos by qgallouedec in 31476
* Give more useful `metric_for_best_model` errors by tomaarsen in 31450
* Update perf_train_gpu_many.md by remyleone in 31451
* [`GPT2`] Add SDPA support by vasqu in 31172
* Fix autocast incompatibility in RecurrentGemma by xplip in 30832
* Use self.config_tester.run_common_tests() by amyeroberts in 31431
* [tests] rename `test_config_object` to `test_ds_config_object` by faaany in 31403
* Docs / AQLM: Clarify `torch.compile` support for AQLM by younesbelkada in 31473
* Mamba: add generative tests by gante in 31478
* Update object_detection.md by jajupmochi in 31488
* Add docs on zeroshot image classification prompt templates by aliencaocao in 31343
* auto-detect device when no device is passed to pipeline by faaany in 31398
* Fix typo: pas_token_id by ftnext in 30894
* Fix `wandb` integration with `SetFit` model by timothepearce in 30021
* Consider inheritance in type checking for tensors by daemyung in 31378
* Add valid columns check in _remove_unused_columns method by arthasking123 in 31466
* Fix a teeny-tiny typo in `tokenization_utils_base.py`'s docstring by sadra-barikbin in 31510
* Fix mismatched ` in doc & other common typos by jhwei in 31516
* RWKV: enable generation tests by gante in 31490
* unskip 2 tests in cohere by ydshieh in 31517
* Revive Nightly/Past CI by ydshieh in 31159
* Deprecate legacy cache + use cache position by zucchini-nlp in 31491
* SPLIT PR: add user defined symbols and control symbols by itazap in 31305
* Removed torch.cuda.empty_cache from train loop. by FoamoftheSea in 31530
* Update mask_generation.md by nicholicaron in 31543
* Correct is_flaky test decoration by qubvel in 31480
* Add implementation of `spectrogram_batch` by ravenouse in 27159
* chore: fix typos by xiaoxianBoy in 31559
* Update git templates by ArthurZucker in 31539
* Fix the error caused by incorrect use of logger in pipeline by lanyun1103 in 31565
* Fix bug about add_special_tokens and so on by hiroshi-matsuda-rit in 31496
* Add Jinja as a requirement with the right version cutoff by Rocketknight1 in 31536
* Fix doc typo in `TrainingArguments` by qgallouedec in 31503
* Fix is_torch_xpu_available for torch < 2.3 by amyeroberts in 31573
* Added version constraint on numpy for version <2.0 by Resteklicken in 31569
* Siglip: add `_no_split_module` by zucchini-nlp in 31566
* fix output data type of image classification by jiqing-feng in 31444
* add preprocessing_num_workers to run_classification.py by jiahuanluo in 31586
* Improve error message for mismatched copies in code blocks by molbap in 31535
* Add ViTImageProcessorFast to tests by amyeroberts in 31424
* docs: move translations to `i18n` by SauravMaheshkar in 31584
* Removed unnecessary `self.projection` call in `VivitTubeletEmbeddings` by v-iashin in 31632
* [`GPT-NeoX`] Add SDPA support by vasqu in 31031
* Update RT-DETR code snippet by qubvel in 31631
* Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by younesbelkada in 31161
* Fix RT-DETR inference with float16 and bfloat16 by qubvel in 31639
* Fix paligemma detection inference by molbap in 31587
* Generate: fix assisted generation with `past_key_values` passed as kwargs by gante in 31644
* Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by aliencaocao in 31589
* Skip tests properly by amyeroberts in 31308
* Generation: past kv can be None by zucchini-nlp in 31051
* Fix ONNX exports for Optimum compatible models by merveenoyan in 31311

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* josephenguehard
* Add TokenClassification for Mistral, Mixtral and Qwen2 (29878)
* vasqu
* Fix a shape annotation and typos in `mamba` slow forward (30691)
* [`GPT2`] Add SDPA support (31172)
* [`GPT-NeoX`] Add SDPA support (31031)
* ariG23498
* [Port] TensorFlow implementation of Mistral (29708)
* bhuvanmdev
* added interpolation for vitmae model in pytorch as well as tf. (30732)
* interpolation added for TVP. (30863)
* SangbumChoi
* fix the get_size_with_aspect_ratio in max_size situation (30902)
* New model support RTDETR (29077)
* Cyrilvallez
* Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 (30536)
* Fix jetmoe model (31279)
* ravenouse
* Add implementation of `spectrogram_batch` (27159)

4.41.2

Not secure

Mostly fixing some stuff related to `trust_remote_code=True` and `from_pretrained`

The `local_file_only` was having a hard time when a `.safetensors` file did not exist. This is not expected and instead of trying to convert, we should just fallback to loading the `.bin` files.

* Do not trigger autoconversion if local_files_only 31004 from Wauplin fixes this!
* Paligemma: Fix devices and dtype assignments (31008) by molbap
* Redirect transformers_agents doc to agents (31054) aymeric-roucher
* Fix from_pretrained in offline mode when model is preloaded in cache (31010) by oOraph
* Fix faulty rstrip in module loading (31108) Rocketknight1

4.41.1

Not secure

Fix PaliGemma finetuning:
The causal mask and label creation was causing label leaks when training. Kudos to probicheaux for finding and reporting!

- https://github.com/huggingface/transformers/commit/a755745546779ae5c42510bc02a859bdac82b3b7 : PaliGemma - fix processor with no input text (https://github.com/huggingface/transformers/pull/30916) hiyouga
- https://github.com/huggingface/transformers/commit/a25f7d3c12975fe21eab437dda7363e9024de7c0 : Paligemma causal attention mask (https://github.com/huggingface/transformers/pull/30967) molbap and probicheaux

Other fixes:
- https://github.com/huggingface/transformers/commit/bb48e921868ac750417956de941606f7e2fa02ca: tokenizer_class = "AutoTokenizer" Llava Family (https://github.com/huggingface/transformers/pull/30912)
- https://github.com/huggingface/transformers/commit/1d568dfab262f76079eb4f3d05b606d51a0c9e4b : legacy to init the slow tokenizer when converting from slow was wrong (https://github.com/huggingface/transformers/pull/30972)
- https://github.com/huggingface/transformers/commit/b1065aa08ac0da11fcb9e3827cd7eafabe4beebd : Generation: get special tokens from model config (https://github.com/huggingface/transformers/pull/30899) zucchini-nlp

Reverted https://github.com/huggingface/transformers/commit/4ab7a28216211571fdddba414d4edd8426ab6489

4.41.0

Not secure

New models

Phi3

The Phi-3 model was proposed in [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/abs/2404.14219) by Microsoft.

TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a
Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

<img width="1599" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/0f37c6b0-b118-453c-ac64-6e45aa291d0a">

* Phi-3 by gugarosa in https://github.com/huggingface/transformers/pull/30423

JetMoE

JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by [Yikang Shen](https://scholar.google.com.hk/citations?user=qff5rRYAAAAJ) and [MyShell](https://myshell.ai/). JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the [ModuleFormer](https://arxiv.org/abs/2306.04640). Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.

<img width="1559" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/cc83ce99-7a61-4d04-a234-3f68e6c0fafd">

* Add JetMoE model by yikangshen in https://github.com/huggingface/transformers/pull/30005

PaliGemma

PaliGemma is a lightweight open vision-language model (VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199), and based on open components like the [SigLIP vision model](https://arxiv.org/abs/2303.15343) and the [Gemma language model](https://arxiv.org/abs/2403.08295). PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

More than 120 checkpoints are released see the collection [here](https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda) !

<img width="1064" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/23584b9a-6c36-46f5-8700-32f402c0f674">

* Add PaliGemma by molbap in https://github.com/huggingface/transformers/pull/30814

VideoLlava
Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.

💡 Simple baseline, learning united visual representation by alignment before projection
With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously.
🔥 High performance, complementary learning with video and image
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.

<img width="532" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/cLniWc__KECBBesliHKhd.png">

* Add Video Llava by zucchini-nlp in https://github.com/huggingface/transformers/pull/29733

Falcon 2 and FalconVLM:

<img width="1024" alt="image" src="https://falconllm.tii.ae/assets/images/table-1___.jpeg">

Two new models from TII-UAE! They published a [blog-post](https://falconllm.tii.ae/falcon-2.html) with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the `Llava` framework
* Support for Falcon2-11B by Nilabhra in https://github.com/huggingface/transformers/pull/30771
* Support arbitrary processor by ArthurZucker in https://github.com/huggingface/transformers/pull/30875

GGUF `from_pretrained` support

<img width="1064" alt="image" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/gguf-spec.png">

You can now load most of the GGUF quants directly with transformers' `from_pretrained` to convert it to a classic pytorch model. The API is simple:

python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: https://github.com/huggingface/transformers/issues/27712 for more details

* Loading GGUF files support by LysandreJik in https://github.com/huggingface/transformers/pull/30391

Page 6 of 33

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 6 of 33

4.42.2

4.42.1

4.42.0

4.41.2

4.41.1

4.41.0

Page 6 of 33

Links

Releases