New models
PaliGemma-2
PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199), and based on open components like the [SigLIP vision model](https://arxiv.org/abs/2303.15343) and the [Gemma language model](https://arxiv.org/abs/2403.08295). PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.
PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the [Gemma models list](https://ai.google.dev/gemma/docs/get_started#models-list). PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.
<img width="743" alt="image" src="https://github.com/user-attachments/assets/55cda8a6-b463-4a58-b7d3-f7d50ee2fa11">
I-JEPA
The I-JEPA model was proposed in [Image-based Joint-Embedding Predictive Architecture](https://arxiv.org/pdf/2301.08243.pdf) by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.
<img width="413" alt="image" src="https://github.com/user-attachments/assets/561ca9d7-0327-477a-96b8-61d2af0caf34">
* Add I-JEPA by jmtzt in 33125
OLMo 2
<img width="833" alt="image" src="https://github.com/user-attachments/assets/1abdde92-0aae-404a-b83e-77ec8bd13b7f">
The OLMo2 model is the successor of the OLMo model, which was proposed in [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838).
The architectural changes from the original OLMo model to this model are:
- RMSNorm is used instead of standard layer norm.
- Norm is applied to attention queries and keys.
- Norm is applied after attention/feedforward layers rather than before.
Commits:
* Add OLMo November 2024 by 2015aroras in 34551
* Rename OLMo November to OLMo2 by 2015aroras in 34864
Layer-Skip Llama
We add support for Meta's Layer-Skip Llama 3.2 1B model.
The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in [Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding](https://arxiv.org/abs/2404.16710) and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.
<img width="854" alt="image" src="https://github.com/user-attachments/assets/4a9e3596-e44e-419f-804d-9f4d03f8f680">
* Self-speculation (Layer-Skip Llama) by ArthurZucker in 34240
Tensor Parallel implementation
This PR uses the `torch.distributed.tensor.parallel` subpackage to implement Tensor Parallel for Llama (as an example).
The motivation is multi-fold:
1. to make modeling code simple as single-worker case:
all manual TP implementations under `if self.config.pretraining_tp > 1` can be removed.
2. to make tensor parallelism easily accessible by users:
added a `model.tensor_parallel(device_mesh)` method that allows users to turn a single-proc model into a parallel model. !- Please guide me to a right place to put this function/method if `PreTrainedModel` is not a preferred place. -!
This is the first PR of many to simplify and enable Tensor Parallel across models.
* Simplify Tensor Parallel implementation with PyTorch TP by kwen2501 in 34184
Farewell, Python 3.8
Python 3.8 reaches end of life, and, as such, we drop it from our CI.
* Drop support for Python 3.8 by ydshieh in 34314
GGUF improvements
Several improvements have been done to the GGUF support in transformers; notably by adding new architectures to the list of supported architectures.
* Add T5 GGUF loading support by junejae in 33389
* Add GGUF for Mamba by VladOS95-cyber in 34200
* Add Nemotron GGUF Loading Support by farrosalferro in 34725
* Improve gguf tensor processing by VladOS95-cyber in 34515
* Fix `use_parallel_residual` and `qkv_bias` for StableLM GGUF config extraction by Isotr0py in 34450
Fast processors
We continue the work to improve the speed of fast processors as detailed in this [roadmap](https://www.notion.so/huggingface2/OptimVision-Optimize-preprocessing-time-10f1384ebcac8091a12debb87fe5f591).
We contribute a fast processor to RT-DETR.
* Add Image Processor Fast RT-DETR by yonigozlan in 34354
New pipelines
A new pipeline has been added to transformers: image-text-to-text!
the pipeline support the following inputs:
- unbatched images and text - images=image, text=text
- batched images and text - images = [image, image], text= [text, text]
- several images per prompt (only for models supporting the use of an image token) - images = [[image, image], [image]] or images=[image, image, image], text = ["... <image>...<image>...", "...<image>..."]
- Chat templates (for models supporting them).
* Add image text to text pipeline by yonigozlan in 34170
Notable refactors
Separate chat templates into a single file
We have had several issues with chat templates because they're stored as single lines in the JSON config files:
- Impossible to review diffs
- Very hard to edit in the web UI (or in general)
- Differences between `processor` templates in `chat_template.json` and `tokenizer` templates in `tokenizer_config.json` causing confusion
- Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead
The solution:
- Just move chat templates to a single `chat_template.jinja` file in the repo
- If multiple templates are required, then they should still be stored in the JSON file. This is not supported for `Processor` classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future.
- If a `chat_template.jinja` file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have any `chat_template` entry in `tokenizer_config.json`.
For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.
* Separate chat templates into a single file by Rocketknight1 in 33957
Large modular logic refactor
This PR largely rework the logic we use in the modular converter. It is (hopefully) clearer and maintainable. Instead of going in all directions, adding stuff, then deleting it if not needed, we now do the following:
- visit all the modular file (record imports/functions/classes/assignments nodes)
- create function dependency mapping
- for each import coming from another model:
- visit the corresponding file
- create function dependency mapping
- update mapping with function/assignment from the modular (updated/new functions)
- create the class dependency graph based on merged dependencies
- update dependency graph of the modular with the functions and assignments imported from the other files
- for each class recorded in the modular:
- if inherithing from class in another file:
- replace call to super
- find the dependencies after the node was replaced
- follow (updated with modular defs) dependency mapping to add all nodes
- else:
- only add needed imported functions (and their dependencies)
- determine the needed imports and add them
* Large modular logic refactoring by Cyrilvallez in 34487
Community bugfixes and improvements
* Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned by Abhishek-TAMU in 33932
* Better defaults by ArthurZucker in 34026
* translated gguf.md into chinese by blueingman in 34163
* CI: fix failures by zucchini-nlp in 34371
* Zamba is an LM by LysandreJik in 34342
* add code generation to natural language processing section by furtnerthomas in 34333
* Fix pil_torch_interpolation_mapping import in image_processing_detr_fast by yonigozlan in 34375
* Add code sample docstrings and checkpoint reference for GLM models by h3110Fr13nd in 34360
* refactor: remove redundant if-condition and improve type correctness for `convert_tokens_to_ids` by winstxnhdw in 34030
* Ignore unsupported kwarg in ProcessorMixin call by yonigozlan in 34285
* [PEFT] Add warning for missing key in LoRA adapter by BenjaminBossan in 34068
* Fix `torch.fx` issue related to the new `loss_kwargs` keyword argument by michaelbenayoun in 34380
* Correct the new defaults by Cyrilvallez in 34377
* [auto. ping] Avoid sending empty info + add more team members by ydshieh in 34383
* Fix glm by Cyrilvallez in 34388
* Use non nested images and batched text Idefics2/3 by yonigozlan in 34222
* Fix onnx non-expotable inplace aten op by IlyasMoutawwakil in 34376
* Fix right padding in LLaVA models by zucchini-nlp in 34305
* no filter by ydshieh in 34391
* SynthID: better example by gante in 34372
* Tests: upgrade `test_eager_matches_sdpa_generate` by gante in 34386
* Fix bnb training test failure by matthewdouglas in 34414
* Avoid check expected exception when it is on CUDA by ydshieh in 34408
* Fix typos in agents_advanced.md by rudydel in 34405
* [docs] Cache implementations by stevhliu in 34325
* Fix pix2struct by IlyasMoutawwakil in 34374
* pin `tensorflow_probability<0.22` in docker files by ydshieh in 34381
* Tiny update after 34383 by ydshieh in 34404
* Fix batch size handling in prediction_loop for DataLoaderShard by zeus2611 in 34343
* exclude fsdp from delay_optimizer_creation by eljandoubi in 34140
* New option called `"best"` for `args.save_strategy`. by seanswyi in 31817
* [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details by h3110Fr13nd in 34322
* 🌐 [i18n-KO] Translated `model_doc/barthez.md` to Korean by Jwaminju in 33980
* Apply linting to the important code blocks to make it readable by ShubhamJagtap2000 in 34449
* Torchao weights only + prequantized compability by SunMarc in 34355
* [i18n-ar] Translated file : `docs/source/ar/fast_tokenizers.md` into Arabic by AhmedAlmaghz in 33034
* enable average tokens across devices by techkang in 34373
* feat: run benchmarks on A100 by McPatate in 34287
* Add `post_process_depth_estimation` for GLPN by alex-bene in 34413
* LLaVA: latency issues by zucchini-nlp in 34460
* Generation: fix test by zucchini-nlp in 34369
* Fix CI by zucchini-nlp in 34458
* use a tinymodel to test generation config which aviod timeout by techkang in 34482
* 🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing by sbucaille in 33200
* Simplify running tests in a subprocess by ydshieh in 34213
* Fix perplexity computation in perplexity.md by Framartin in 34387
* Fixes for Modular Converter on Windows by hlky in 34266
* Fix regression loading dtype by SunMarc in 34409
* Bert is ExecuTorch compatible by guangy10 in 34424
* manual `head_dim` for `mixtral` model by wavy-jung in 34281
* fix-qwen2vl-no-position_ids by simonJJJ in 33487
* Bug fix for drop path decay rate in swin transformer by abhi-glitchhg in 34291
* MobileBERT is ExecuTorch compatible by guangy10 in 34473
* Albert is ExecuTorch compatible by guangy10 in 34476
* Adding `optimizer_cls_and_kwargs` to `Trainer.__init__` by apoorvkh in 34358
* Fix performance in get_imports regexp by AlekseyLobanov in 34298
* fix incorrect warning by yonigozlan in 34416
* Un-deprecate timeout arg in pipelines by Rocketknight1 in 34382
* Roberta is ExecuTorch compatible by guangy10 in 34425
* Fix format mistake in string repr of tokenizer objects by gpetho in 34493
* Mllama: update docs by zucchini-nlp in 34334
* VLMs: fix number of image tokens by zucchini-nlp in 34332
* Tests: move `generate` tests to the right mixin and delete redundant tests by gante in 34464
* fix pixtral processor by molbap in 34486
* Use torch 2.5 in scheduled CI by ydshieh in 34465
* Fix super tiny extra space typo by fzyzcjy in 34440
* UPDATE Documentation for TRANSLATING.md Documentation into Multiple Languages.(Changes made) by anshumangahlot in 34226
* enable QA bf16 pipeline by jiqing-feng in 34483
* Fix: img size mismatch caused by incorrect unpadding in LLaVA-Next by jp1924 in 34522
* Fix step shifting when accumulate gradient by kibitzing in 33673
* avoid calling `gc.collect` and `cuda.empty_cache` by ydshieh in 34514
* Qwen2VL: skip base `input_ids`-`inputs_embeds` equivalence check by gante in 34535
* fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests by philkuz in 34518
* feat: add benchmarks pg indexes by McPatate in 34536
* make `test_eager_matches_sdpa_inference `less flaky by ydshieh in 34512
* Bug Fix for issue 34294 by fpgaminer in 34295
* [CLIPSeg] Make interpolate_pos_encoding default to True by NielsRogge in 34419
* update doc by jiqing-feng in 34478
* [i18n-ar] Translated file : `docs/source/ar/multilingual.md` into Arabic by AhmedAlmaghz in 33048
* Blip: get/set input embeddings correctly by zucchini-nlp in 34152
* BLIP: enable generation tests by zucchini-nlp in 34174
* :red_circle: :red_circle: fix `query_pre_attn_scalar` different of `num_heads` in default gemma2 config by molbap in 34540
* [i18n-HI] Translated accelerate page to Hindi by karthik-script in 34443
* Update trainer for easier handling of accumulate, compile fixes, and proper reporting by muellerzr in 34511
* VLM: special multimodal Tokenizer by zucchini-nlp in 34461
* MPS: `isin_mps_friendly` can support 0D tensors by gante in 34538
* Add text support to the Trainer's TensorBoard integration by JacobLinCool in 34418
* [i18n-HI] Translated TFLite page to Hindi by karthik-script in 34572
* 🌐 [i18n-KO] Translated perf_train_special.md to Korean by maximizemaxwell in 34590
* 🌐 [i18n-KO] Update README_ko.md by J4BEZ in 33098
* fix TrainerState doc because num_input_tokens_seen is unused by defau… by techkang in 34593
* Fix Whisper CI by ydshieh in 34541
* Skip DeepSpeed ZeRO Stage 3 model initialization when bnb by eljandoubi in 34395
* FIX: Broken repr of TorchAoConfig by BenjaminBossan in 34560
* Load sub-configs from composite configs by zucchini-nlp in 34410
* DistilBERT is ExecuTorch compatible by guangy10 in 34475
* Remove unused test_dataset by thisisiron in 34516
* Revert "Fix Whisper CI" by ydshieh in 34605
* Fix 34494 assistant tokens when truncated by yonigottesman in 34531
* Remove `slow` for `test_eager_matches_sdpa_inference` by ydshieh in 34558
* Changing __repr__ in torchao to show quantized Linear by MekkCyber in 34202
* Fix torchvision interpolation CI by yonigozlan in 34539
* 🌐 [i18n-KO] Translated `convbert.md` to Korean by ahnjj in 34599
* fix(dvclive): pass fake dataset to avoid exception in trainer init by shcheklein in 34455
* 🌐 [i18n-KO] Translated `timesformer.md` to Korean by mreraser in 33972
* 🌐 [i18n-KO] Translated bert.md to Korean by maximizemaxwell in 34627
* [i18n-ar] Translated file : `docs/source/ar/trainer.md` into Arabic by AhmedAlmaghz in 33080
* Update llm_engine.py by louisbrulenaudet in 33332
* Agents: turn any Space into a Tool with `Tool.from_space()` by aymeric-roucher in 34561
* [docs] update not-working model revision by faaany in 34682
* [i18n-ar] Translated file : `docs/source/ar/torchscript.md` into Arabic by AhmedAlmaghz in 33079
* Agents: Small fixes in streaming to gradio + add tests by aymeric-roucher in 34549
* 🌐 [i18n-KO] Translated marian.md to Korean by maximizemaxwell in 34698
* [docs] Broken link in generation_strategies by pcuenca in 34717
* Fix example in EsmConfig docstring by yuanx749 in 34653
* [docs] add xpu device check by faaany in 34684
* Retain newlines in chat template when `continue_final_message=True` by lewtun in 34253
* Update llava.md by LysandreJik in 34749
* fix(wandb): pass fake dataset to avoid exception in trainer (see 34455) by CezaPasc in 34720
* add xpu path for awq by jiqing-feng in 34712
* FSDP grad accum fix by winglian in 34645
* Remove FSDP wrapping from sub-models. by eljandoubi in 34452
* 🧼 remove v4.44 deprecations by gante in 34245
* VLMs: `patch_size` -> `num_image_tokens` in processing by zucchini-nlp in 33424
* Fix broken link by ofek in 34618
* fix a typo bug where 'id2label' was incorrectly written as 'i2label' when reading config by ZuoChenFttS in 34637
* Fix skip of test_training_gradient_checkpointing by dvrogozh in 34723
* make sure to disable gradients for integer tensor by winglian in 32943
* [docs] make `empty_cache` device-agnostic by faaany in 34774
* [docs] add XPU besides CUDA, MPS etc. by faaany in 34777
* [tests] add XPU part to testing by faaany in 34778
* fix: Update pixel_values parameter in hf_model input by thisisiron in 34782
* Fix callback key name by jung-hunsoo in 34762
* fix: Wrong task mentioned in docs by ecyht2 in 34757
* Allow handling files as args for a tool created with Tool.from_space by aymeric-roucher in 34687
* Fix Whisper CI by ydshieh in 34617
* protect tensor parallel usage by ArthurZucker in 34800
* Trainer hyperparameter search kwargs docs update by GuillemGSubies in 34459
* feat: allow to use hf-hub models for timm backbone by cgebbe in 34729
* Support gradient checkpointing in Qwen2VL ViT by li-plus in 34724
* Fix: siglip image processor rgb_convert is not being applied correctly. by jp1924 in 34301
* fix cpu bnb path by jiqing-feng in 34647
* Gemma capping by ArthurZucker in 34282
* Fix cache_utils for optimum.quanto kvcache quantization by SunMarc in 34750
* Modular fix by Cyrilvallez in 34802
* MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu by huismiling in 34326
* 🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 by philkuz in 34393
* Feature: print tokens per second during training by tibor-reiss in 34507
* Add do_convert_rgb to vit by jp1924 in 34523
* Fix post process function called in the instance segmentation example of mask2former by OnTheThirdDay in 34588
* fix crash in tiiuae/falcon-11B-vlm image-to-text generation by sywangyi in 34728
* Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline by yonigozlan in 34562
* Add Image Processor Fast Deformable DETR by yonigozlan in 34353
* Run `test_medium_seamless_m4t_pt` in `subprocess` to avoid many failures by ydshieh in 34812
* Fix `check_training_gradient_checkpointing` by ydshieh in 34806
* Added image-text-to-text pipeline to task guide by merveenoyan in 34783
* Translate attention.md into Chinese by wwwbai in 34716
* LLaVA OV: fix unpadding precision by zucchini-nlp in 34779
* Fix low memory beam search by zucchini-nlp in 34746
* Fix the memory usage issue of logits in generate() by kjohew in 34813
* fix(DPT,Depth-Anything) `torch.export` by philkuz in 34103
* Fix: take into account meta device by tibor-reiss in 34134
* Fix hyperparameter search when optuna+deepseed by corentin-ryr in 34642
* Fix CI by tweaking torchao tests by SunMarc in 34832
* Fix CI slack reporting issue by ydshieh in 34833
* VLMs: enable generation tests - last batch by zucchini-nlp in 34484
* Change logging level from warning to info for `max_steps` overriding `num_train_epochs` by qgallouedec in 34810
* Fix ds nvme by eljandoubi in 34444
* Fix heuristic scheduling for UAG by jmamou in 34805
* Refactor StarCoder2 using modular by Cyrilvallez in 34015
* Watermarking: fix order by zucchini-nlp in 34849
* Update checks for torch.distributed.tensor to require torch >= 2.5 by loadams in 34816
* Remove quantization related config from dequantized model by konradkalita in 34856
* Auto compile when static cache by ArthurZucker in 34247
* Speculative decoding: Test the target distribution (to prevent issues like 32867) by keyboardAnt in 34553
* smol improvements to support more flexible usage by andimarafioti in 34857
* [CI] Skip EETQ tests while package is broken with latest transformers by BenjaminBossan in 34854
* Bitnet test fix to avoid using gated model by MekkCyber in 34863
* Fix support for image processors modifications in modular by yonigozlan in 34866
* Fix: Enable prefill phase key value caching of nemotron/minitron models by jeongin601 in 34742
* Add safe_globals to resume training on PyTorch 2.6 by dvrogozh in 34632
* Cache: init empty cache when `use_cache` by zucchini-nlp in 34274
* BLIP: fix generation after hub update by zucchini-nlp in 34876
* [`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM by ArthurZucker in 22105
* 🔴 Mllama: fix base prefix by zucchini-nlp in 34874
* Sum gathered input tokens by techkang in 34554
* allow unused input parameters passthrough when chunking in asr pipelines by VictorAtIfInsurance in 33889
* prepare_fa2_from_position_ids function bugfix by meliksahturker in 33269
* chore: fix some typos by wanxiangchwng in 34891
* Fix convert_tokens_to_string when decoder is None by dszeto in 34569
* [`peft`] Given that `self.active_adapter` is deprecated, avoid using it by tomaarsen in 34804
* Fix Qwen2 failing tests by jla524 in 34819
* Fix : BitNet tests by MekkCyber in 34895
* [AWQ, CI] Bump AWQ version used in docker image by BenjaminBossan in 34922
* fix static cache data type miss-match by jiqing-feng in 34799
* Fix `test_auto_backbone_timm_model_from_pretrained` by ydshieh in 34877
* Upgrade torch version to 2.5 in dockerfile for quantization CI by MekkCyber in 34924
* Fix failling GGML test by MekkCyber in 34871
* Updated documentation and added conversion utility by ViktorooReps in 34319
* making gpt2 fx traceable by xuzifei-dmatrix in 34633
* Fix import structure for Fast Image processors by yonigozlan in 34859
* VideoLLaVA: add default values by zucchini-nlp in 34916
* Skipping aqlm non working inference tests till fix merged by MekkCyber in 34865
* [Whisper] Fix whisper integration tests by eustlb in 34111
* Add Pytorch Tensor Parallel support for Mistral by VladOS95-cyber in 34927
* change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model by zRzRzRzRzRzRzR in 34629
* Fix torch.onnx.export of Qwen2-VL vision encoder by xenova in 34852
* Update the Python version in the Chinese README to match the English README. by vansin in 34870
* [i18n-ar] Translated file : `docs/source/ar/benchmarks.md` into Arabic by AhmedAlmaghz in 33023
* [docs] use device-agnostic API instead of cuda by faaany in 34913
* [doc] use full path for run_qa.py by faaany in 34914
* docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE by imba-tjd in 34904
* [i18n-zh]Translated tiktoken.md into chinese by blueingman in 34936
* [`FlexAttention`] Update gemma2 by ArthurZucker in 34942
* Fix : Add PEFT from source to CI docker by MekkCyber in 34969
* Avoid calling `get_max_length` by ydshieh in 34971
* Fix flaky test execution caused by `Thread` by ydshieh in 34966
* 🌐 [i18n-KO] Translated encoder-decoder.md to Korean by maximizemaxwell in 34880
* [docs] add explanation to `release_memory()` by faaany in 34911
* [i18n-zh]Translated perf_train_special.md into Chinese by blueingman in 34948
* Fix typo in code block in vipllava.md by yuanx749 in 34957
* Fixed typo in `VisitWebpageTool` by sergiopaniego in 34978
* [PEFT] Set eval mode when loading PEFT adapter by BenjaminBossan in 34509
* Fix `save_pretrained` for partially offloaded models by kylesayrs in 34890
* 🚨🚨🚨 Changed DINOv2Config default patch size to 14 by OFSkean in 34568
* Refine the code of Universal Assisted Generation by xinpengzz in 34823
* Allow compressed-tensors quantized model to be trained by horheynm in 34520
* Offloaded cache: fix generate by zucchini-nlp in 34921
* Fix `utils/check_bad_commit.py` (for auto ping in CI) by ydshieh in 34943
* Add optimized `PixtralImageProcessorFast` by mgoin in 34836
* Improve `.from_pretrained` type annotations by qubvel in 34973
* Fix docker CI : install autogptq from source by MekkCyber in 35000
* Let server decide default repo visibility by Wauplin in 34999
* 🚨🚨🚨 Uniformize kwargs for TrOCR Processor by tibor-reiss in 34587
* Update timm version by qubvel in 35005
* fix: double verbs by SamuelLarkin in 35008
* Update `FillMaskPipeline.__call__` signature and docstring by alvarobartt in 35006
* Only cast `cu_seqlens` when tracing by xenova in 35016
* fix variable undefined bug when return_tensors is not specified in llava processing by chenweize1998 in 34953
* Optimize memory usage of mllama encoder by milesial in 34930
* Typo in warning switching to optimum-quanto by Bojun-Feng in 35028
* Add type hints for forward functions in Gemma2 by jla524 in 35034
* Fix `test_eager_matches_sdpa_inference` for `XPU` backend by dvrogozh in 34889
* Multiple typo fixes in Tutorials docs by henryhmko in 35035
* add docstring example for compute_loss_func by secrettoad in 35020
* [i18n-ar] Translated file : `docs/source/ar/notebooks.md` into Arabic by AhmedAlmaghz in 33049
* [docs] add the missing import for Image and bug fix by faaany in 34776
* Translate bertlogy.md into Chinese by wwwbai in 34908
* Automatic compilation in generate: do not rely on inner function by Cyrilvallez in 34923
* Add token cost + runtime monitoring to Agent and HfEngine children by aymeric-roucher in 34548
* Fix `BertGeneration` by ydshieh in 35043
* fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… by sywangyi in 34454
* [docs] fix example code bug by faaany in 35054
* Translate community.md into Chinese by wwwbai in 35013
* [docs] use device-agnostic instead of `cuda` by faaany in 35047
* [docs] use device-agnostic API instead of hard-coded cuda by faaany in 35048
* Fix `pad_token_tensor` is None in warning by tshu-w in 34005
* Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 by VladOS95-cyber in 35007
* [`GPTNeoX`] Flex Attention + Refactor by vasqu in 34896
* Support for easier multimodal use of modular by Cyrilvallez in 35056
* [docs] add a comment that offloading requires CUDA GPU by faaany in 35055
* [docs] Increase visibility of torch_dtype="auto" by stevhliu in 35067
* Informative by ydshieh in 35059
* [Whisper] Fix whisper tokenizer by eustlb in 34537
* [`tokenizers`] bump to 0.21 by ArthurZucker in 34972
* Update Mistral conversion script by Cyrilvallez in 34829
* Fix `tie_word_embeddings` handling for GGUF models by Isotr0py in 35085
* Deprecate quanto and switch to optimum-quanto by MekkCyber in 35001
* BLIP: this is correct now by zucchini-nlp in 35081
* [`trainer`] fix the GA `model_accepts_loss_kwargs` by ArthurZucker in 34915
* Fix flaky Hub CI (`test_trainer.py`) by ydshieh in 35062
* Adaptive dynamic number of speculative tokens by jmamou in 34156
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* AhmedAlmaghz
* [i18n-ar] Translated file : `docs/source/ar/fast_tokenizers.md` into Arabic (33034)
* [i18n-ar] Translated file : `docs/source/ar/multilingual.md` into Arabic (33048)
* [i18n-ar] Translated file : `docs/source/ar/trainer.md` into Arabic (33080)
* [i18n-ar] Translated file : `docs/source/ar/torchscript.md` into Arabic (33079)
* [i18n-ar] Translated file : `docs/source/ar/benchmarks.md` into Arabic (33023)
* maximizemaxwell
* 🌐 [i18n-KO] Translated perf_train_special.md to Korean (34590)
* 🌐 [i18n-KO] Translated bert.md to Korean (34627)
* 🌐 [i18n-KO] Translated marian.md to Korean (34698)
* 🌐 [i18n-KO] Translated encoder-decoder.md to Korean (34880)
* 2015aroras
* Add OLMo November 2024 (34551)
* Rename OLMo November to OLMo2 (34864)
* mgoin
* Add optimized `PixtralImageProcessorFast` (34836)