Product Research Enterprise Plans Docs

Transformers

Latest version: v4.50.3

Safety actively analyzes 724051 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 33

4.29.2

Not secure

Fixes the package so non-Python files (like CUDA kernels) are properly included.

4.29.1

Not secure

Reverts a regression in the FSDP integration.
Add `pip install transformers["agent"]` to have all dependencies agents rely on.
Fixes the documentation about agents.

* Revert "search buffers for dtype" in 23308 by sgugger
* Fix image segmentation tool test in 23306 by sgugger
* Fix typo in gradio-tools docs in 23305 by freddyaboulton
* Fix broken links in the agent docs in 23297 by sgugger
* Agents extras in 23301 by LysandreJik
* Update transformers_agents.mdx in 23289 by mishig25
* Update custom_tools.mdx: fix link in 23292 by mishig25

4.29.0

Not secure

Transformers Agents

Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the [docs](https://huggingface.co/docs/transformers/transformers_agents)

* Transformers Agents by LysandreJik patrickvonplaten and sgugger in 23214

SAM

SAM (Segment Anything Model) was proposed in [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

The model can be used to predict segmentation masks of any object of interest given an input image.

* Add Segment Anything Model (SAM) by ArthurZucker in 22654
* [`SAM`] Correct arxiv link by younesbelkada in 22886
* Fix SAM example in documentation by fxmarty in 22887
* [`SAM`] Change to `facebook/sam-vit-base` by younesbelkada in 22891
* Small sam patch by ArthurZucker in 22920
* [`SAM`] Add sam doc by younesbelkada in 22984
* Make sam ONNX exportable by fxmarty in 22915
* `DocumentQuestionAnsweringPipeline` only for fast ⚡ tokenizers by ydshieh in 22745
* Add `automatic-mask-generation` pipeline for Segment Anything Model (SAM) by ArthurZucker in 22840
* Expose AutoModelForMaskGeneration by fxmarty in 22910

RWKV

RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).

This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).

* Add RWKV-4 by sgugger and younesbelkada in 22797

FocalNet

The FocalNet model was proposed in [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like [ViT](https://huggingface.co/docs/transformers/model_doc/vit) and [Swin](https://huggingface.co/docs/transformers/model_doc/swin)) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.

* Add FocalNet by NielsRogge in 21532
* Add focalnet backbone by alaradirik in 23104

OpenLLaMa

The Open-Llama model was proposed in [Open-Llama project](https://github.com/s-JoL/Open-Llama) by community developer s-JoL.

The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

* add open-llama model with ckpt by s-JoL in 22795

Assisted Generation

Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!

* Generate: Add assisted generation by gante in 22211
* Generate: assisted generation with sample (take 2) by gante in 22949

Code on the Hub from another repo

To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.

* Use code on the Hub from another repo by sgugger in 22698
* Use code on the Hub from another repo by sgugger in 22814

Breaking changes

This releases has three breaking changes compared to version v4.28.0.

The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.

* 🚨🚨🚨 [`Pix2Struct`] Attempts to fix training issues 🚨🚨🚨 by younesbelkada in 23004

The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library

* 🚨🚨🚨 Use default ignore index in Luke by sgugger in 23014

Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct

* 🚨🚨🚨 [`Blip`] remove labels masking by younesbelkada in 23024

Bugfixes and improvements

* Change `torch_dtype` to `str` when `saved_model=True` in `save_pretrained` for TF models by ydshieh in 22740
* 🌐 [i18n-KO] Translated `training.mdx` to Korean by gabrielwithappy in 22670
* Remove `DS_BUILD_AIO=1` by ydshieh in 22741
* [trainer] update url by stas00 in 22747
* fix(llama): fix LlamaTokenzier by rockmagma02 in 22746
* Generate: handle text conditioning with multimodal encoder-decoder models by gante in 22748
* Revert (for now) the change on `Deta` in 22437 by ydshieh in 22750
* Fix `serving_output` for TF composite models (encoder-decoder like models) by ydshieh in 22743
* 🌐 [i18n-KO] Translated `sequence_classification.mdx` to Korean by 0525hhgus in 22655
* [Examples] TPU-based training of a language model using TensorFlow by sayakpaul in 21657
* Pix2struct: doctest fix by gante in 22761
* Generate: pin number of beams in BART test by gante in 22763
* Fix a mistake in Llama weight converter log output. by aljungberg in 22764
* Fix failing torchscript tests for `CpmAnt` model by ydshieh in 22766
* [WIP]🌐 [i18n-KO] Translated `tutorial/proprecssing.mdx` to Korean by sim-so in 22578
* Tweak ESM tokenizer for Nucleotide Transformer by Rocketknight1 in 22770
* Fix word_ids hyperlink by mayankagarwals in 22765
* Seq2SeqTrainer: Evict decoder_input_ids only when it is created from labels by gante in 22772
* Indexing fix - CLIP checkpoint conversion by amyeroberts in 22776
* Move labels to the same device as logits for Whisper by oscar-garzon in 22779
* Generate: add CJK support to TextStreamer by bcol23 in 22664
* Fix `test_word_time_stamp_integration` for `Wav2Vec2ProcessorWithLMTest` by ydshieh in 22800
* 🌐 [i18n-KO] Translated `custom_models.mdx` to Korean by HanNayeoniee in 22534
* [i18n-KO] fix: docs: ko: sagemaker anchors and `_toctree.yml` by jungnerd in 22549
* improve(llama): Faster apply_rotary_pos_emb by fpgaminer in 22785
* Fix sneaky torch dependency in TF example by Rocketknight1 in 22804
* 🌐 [i18n-KO] Translated `tasks/translation.mdx` to Korean by wonhyeongseo in 22805
* Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests by ydshieh in 22774
* Fix squeeze into torch 1.x compatible form in llama model by DyeKuu in 22808
* Remove accelerate from tf test reqs by muellerzr in 22777
* Simplify update metadata job by sgugger in 22811
* Revert "Use code on the Hub from another repo" by sgugger in 22813
* Introduce `PartialState` as the device handler in the `Trainer` by muellerzr in 22752
* Mark auto models as important by sgugger in 22815
* TTS fine-tuning for SpeechT5 by hollance in 21824
* 🌐 [i18n-KO] Fix anchor links for docs `auto_tutorial`, `training` by gabrielwithappy in 22796
* Fix Past CI not running against the latest `main` by ydshieh in 22823
* Fix `test_eos_token_id_int_and_list_top_k_top_sampling` by ydshieh in 22826
* Update accelerate version + warning check fix by muellerzr in 22833
* Fix from_pretrained when model is instantiated on the meta device by sgugger in 22837
* Raise err if minimum Accelerate version isn't available by muellerzr in 22841
* Make ClipSeg compatible with model parallelism by youssefadr in 22844
* fix SpeechT5 doc comments by hollance in 22854
* move preprocess_logits_for_metrics before _nested_gather in trainer.e… by ChenyangLiu in 22603
* feat(model parallelism): move labels to the same device as logits for M2M100 by elabongaatuo in 22850
* use `acceleratemain` in CI by ydshieh in 22859
* Remove 'main' from doc links by amyeroberts in 22860
* Show diff between 2 CI runs on Slack reports by ydshieh in 22798
* Remove some pipeline skip cases by ydshieh in 22865
* Fixup multigpu local_rank by muellerzr in 22869
* Fix to removing ESM special tokens by Rocketknight1 in 22870
* XGLM: Fix left-padding (PT and TF) by gante in 22828
* Patching clip model to create mask tensor on the device by shanmugamr1992 in 22711
* fix: Correct small typo in docstring by oscar-defelice in 22857
* Generation: only search for eos_token if set by xloem in 22875
* Change schedule CI time by ydshieh in 22884
* fix warning function call creating logger error (max_length and max_new_tokens) by QuentinAmbard in 22889
* [Examples/TensorFlow] minor refactoring to allow compatible datasets to work by sayakpaul in 22879
* moved labels to the same device as logits for OTP, CODEGEN ,gptj and pixel2struct model by sushmanthreddy in 22872
* Include decoder_attention_mask in T5 model inputs by aashiqmuhamed in 22835
* Fix weight tying in TF-ESM by Rocketknight1 in 22839
* Pin flax & optax version by amyeroberts in 22895
* Revert DeepSpeed stuff from accelerate integration by muellerzr in 22899
* [tensorflow] Add support for the `is_symbolic_tensor` predicate by hvaara in 22878
* moved labels to the same device as logits for LILT model by sushmanthreddy in 22898
* Skip a failing test on main for now by ydshieh in 22911
* Moved labels to enable parallelism pipeline in Luke model by sushmanthreddy in 22909
* Fix counting in Slack report for some jobs by ydshieh in 22913
* Fix Slack report for Nightly CI and Past CI by ydshieh in 22901
* fix CLAP integration tests by hollance in 22834
* Add inputs_embeds functionality when generating with GPT-Neox by TobiasLee in 22916
* Fix `FillMaskPipelineTests` by ydshieh in 22894
* Update Swin MIM output class by alaradirik in 22893
* fix bug of CLAP dataloader by lukewys in 22674
* Fix: Seq2SeqTrainingArgs overriding to_dict for GenerationConfig json support by Natooz in 22919
* fix: GPTNeoX half inference error by SeongBeomLEE in 22888
* Remove broken test_data symlink in legacy s2s examples by hvaara in 22876
* Hardcode GELU as the intermediate activation for ESM by Rocketknight1 in 22892
* [CI] clap patch fusion test values by ArthurZucker in 22922
* ddp fixes for training by winglian in 22874
* tests: Fix flaky test for NLLB-MoE by connor-henderson in 22880
* Fix a minor bug in CI slack report by ydshieh in 22906
* Feature to convert videomae huge and small finetuned on kinetics and ssv2 added to the videomae to pytorch converter by sandstorm12 in 22788
* vilt_model by sushmanthreddy in 22930
* [i18n-KO] Translated `accelerate.mdx` to Korean by 0525hhgus in 22830
* [CLAP] Doc nits by ArthurZucker in 22957
* Generate: Add exception path for Donut by gante in 22955
* Update tiny models and a few fixes by ydshieh in 22928
* 🌐 [i18n-KO] Translated `tasks/masked_language_modeling.mdx` to Korean by HanNayeoniee in 22838
* 🌐 [i18n-KO] Translated `tasks/summarization.mdx` to Korean by sim-so in 22783
* Add an attribute to disable custom kernels in deformable detr in order to make the model ONNX exportable by fxmarty in 22918
* Decorate `test_codegen_sample_max_time` as flaky by ydshieh in 22953
* Raise error if `stride` is too high in `TokenClassificationPipeline` by boyleconnor in 22942
* [Fix Bugs] Fix keys in `_load_pretrained_model` by hanrui1sensetime in 22947
* Prepare tests for hfh 0.14 by Wauplin in 22958
* 🌐 [i18n-KO] Translated `run_scripts.mdx` to Korean by HanNayeoniee in 22793
* Reverting Deta cloning mecanism. by Narsil in 22656
* fix ValueError message in LlamaAttention by othertea in 22966
* Fix TF example in quicktour by Rocketknight1 in 22960
* Update feature selection in to_tf_dataset by amyeroberts in 21935
* 🌐 [i18n-KO] translate `create_a_model` doc to Korean by gabrielwithappy in 22754
* Install `acceleretemain` in PyTorch Past CI jobs by ydshieh in 22963
* Fix `DeepSpeed` CI job link in Past CI by ydshieh in 22967
* 🌐 [i18n-KO] Fixed `tasks/masked_language_modeling.mdx` by HanNayeoniee in 22965
* Neptune fix bug init run by AleksanderWWW in 22836
* fixed small typo in code example by jvanmelckebeke in 22982
* Avoid invalid escape sequences, use raw strings by Lingepumpe in 22936
* [`DocTest`] Fix correct checkpoint by younesbelkada in 22988
* 🌐 [i18n-KO] Translated `serialization.mdx` to Korean by wonhyeongseo in 22806
* Fix typo in mega.mdx by dleve123 in 22998
* 🌐 [i18n-KO] Translated `tasks/image_captioning.mdx` to Korean by sim-so in 22943
* 🌐 [i18n-KO] Translated `token_classification.mdx` to Korean by 0525hhgus in 22945
* Add TensorFlow Wav2Vec2 for sequence classification by nandwalritik in 22073
* Remove a failing ONNX test by ydshieh in 23011
* Add gradient checkpointing to Whisper Flax by versae in 22954
* [`PEFT`] Add HFTracer support for PEFT by younesbelkada in 23006
* [Llama Tokenizer] Fast llama template by ArthurZucker in 22959
* Fix None value when adding info to auto_map by sgugger in 22990
* Bring back PartialState DeepSpeed by muellerzr in 22921
* Add methods to PreTrainedModel to use PyTorch's BetterTransformer by fxmarty in 21259
* [`Pix2Struct`] Fix pix2struct doctest by younesbelkada in 23023
* 🌐 [i18n-KO] Translated `multilingual.mdx` to Korean by HanNayeoniee in 23008
* Fix the expected error in `test_offline_mode_pipeline_exception` by ydshieh in 23022
* [MEGA] nit size test by ArthurZucker in 23028
* added GPTNeoXForTokenClassification by peter-sk in 23002
* added GPTNeoForTokenClassification by peter-sk in 22908
* Update `BridgeTowerModelTester` by ydshieh in 23029
* Fix bigbird random attention by Bearnardd in 21023
* Fix CLAP link across all READMEs by ehsanmok in 23032
* Make `_test_xla_generate` less flaky by ydshieh in 22996
* Add Trainer support for ReduceLROnPlateau by pie3636 in 23010
* 🌐 [i18n-KO] Translated `model_sharing.mdx` to Korean by 0525hhgus in 22991
* [docs] Doc TOC updates by MKhalusova in 23049
* Cuda rng_state_all is used when saving in distributed mode so same should also be used when loading by ShivamShrirao in 23045
* Skip pt/flax equivalence tests in pytorch `bigbird` test file by ydshieh in 23040
* Fix model parallelism for `BridgeTower` by ydshieh in 23039
* extend the test files by ydshieh in 23043
* Generate: prepare assisted generation for release by gante in 23052
* Fix grammar error in summarization pipeline by SKaplanOfficial in 23080
* Fix string syntax error in logger warning message (additional comma) by xwen99 in 23083
* Add `BioGPTForSequenceClassification` by awinml in 22253
* Fix `convnext` __init__ by IMvision12 in 23078
* Depricate xpu_backend for ddp_backend by muellerzr in 23085
* 🌐 [i18n-KO] Translated `tasks/image_classification.mdx` to Korean by 0525hhgus in 23048
* 🌐 [i18n-KO] Translated `tasks/question_answering.mdx` to Korean by jungnerd in 23012
* 🌐 [i18n-KO] Translated `tasks/zero_shot_image_classification.mdx` to Korean by HanNayeoniee in 23065
* added type hints for blip_text pytorch model by iamarunbrahma in 23071
* Save the tokenizer and image preprocessor after training a model with the contrastive image-text example by regisss in 23035
* GPT2ForQuestionAnswering by peter-sk in 23030
* 🌐 [i18n-KO] Translated `torchscript.mdx` to Korean by sim-so in 23060
* Fix check for backword_pos by winglian in 23075
* [`Flava`] Fix flava `torch.distributed.nn.functional import all_gather` issue by younesbelkada in 23108
* [ONNX] Sam fix by michaelbenayoun in 23110
* num_noise_spans should be <= num_items 22246 by alexcpn in 22938
* Fixed default config for `Pix2Struct` model to set `Pix2StructTextModel` to `is_decoder=True` by gbarello-uipath in 23051
* Pin numba for now by sgugger in 23118
* [`Doctest`] Fix pix2struct doctest by younesbelkada in 23121
* Generate: slow assisted generation test by gante in 23125
* Generate: correct beam search length on score calculation for multi batch generation by gante in 23127
* improve unclear documentation by ManuelFay in 23123
* Generate: better warnings with pipelines by gante in 23128
* Add resources for LayoutLmV2 and reformat documentation resources by y3sar in 23115
* Fix ConvNext V2 paramater naming issue by alaradirik in 23122
* Support union types `X | Y` syntax for `HfArgumentParser` for Python 3.10+ by XuehaiPan in 23126
* Add support for beam search's num_return_sequencs flag in flax by mayankagarwals in 23082
* docs: ko: update `_toctree.yml` by HanNayeoniee in 23112
* [doc] Try a few ≠ ways of linking to Papers, users, and org profiles by julien-c in 22611
* Enable to use custom tracer in FX `symbolic_trace` by regisss in 23105
* Remove redundant print statements by alaradirik in 23133
* Tidy Pytorch GLUE benchmark example by tlby in 23134
* GPTNeoForQuestionAnswering by peter-sk in 23057
* Add methods to update and verify out_features out_indices by amyeroberts in 23031
* fix spelling error by digger-yu in 23143
* Remove typo in perf_train_gpu_many.mdx by MrGeislinger in 23144
* fix resume fsdp by qywu in 23111
* gpt2 multi-gpu fix by peter-sk in 23149
* GPTNeoXForQuestionAnswering by peter-sk in 23059
* [`GPT-J`] Fix causal mask dtype by younesbelkada in 23147
* Add FlaxWhisperForAudioClassification model by raghavanone in 22883
* [docs] Text to speech task guide by MKhalusova in 23107
* Generate: text generation pipeline no longer emits `max_length` warning when it is not set by gante in 23139
* Revert "Add FlaxWhisperForAudioClassification model" by sgugger in 23154
* Add TrOCR resources by huangperry in 23142
* fixed whisper positional encoding by anvilarth in 23167
* 🌐 [i18n-KO] docs: ko: Translate `multiple_choice.mdx` by gabrielwithappy in 23064
* fix: Passing language as acronym to Whisper generate by connor-henderson in 23141
* Add `no_trainer` scripts to pre-train Vision Transformers by awinml in 23156
* Add FlaxWhisperForAudioClassification model by raghavanone in 23173
* search buffers for dtype by cyyever in 23159
* Update LLaMA docs with arxiv link by awinml in 23191
* fix random attention for pytorch's bigbird/pegasus_bigbird by Bearnardd in 23056
* Fix hf_argparser.parse_json_file to open file with utf-8 encoding, close file when finished by RobertBaruch in 23194
* Generate: starcoder 🤜 🤛 assisted generation by gante in 23182
* Fixing class embedding selection in owl-vit by orrzohar in 23157
* New version of Accelerate for the Trainer by sgugger in 23204
* docs: Fix broken link in 'How to add a model...' by connor-henderson in 23216
* Pin tensorflow-probability by sgugger in 23220
* [SAM] Add resources by NielsRogge in 23224
* audio_utils improvements by hollance in 21998
* make opt checkpoint dir name correct by dumpmemory in 21660
* Fix typo ; Update output.mdx by furkanakkurt1335 in 23227
* fix: Update run_qa.py to work with deepset/germanquad by sjrl in 23225
* Add Japanese translation to accelerate.mdx by rustinwelter in 23232
* Proposed fix for TF example now running on safetensors. by Narsil in 23208
* Support ratios for `logging_steps`, `eval_steps`, and `save_steps` by konstantinjdobler in 23235
* [Doctests] Refactor doctests + add CI by ArthurZucker in 22987
* Revert "[Doctests] Refactor doctests + add CI" by sgugger in 23245
* Fix `from_config` by DyeKuu in 23246
* CTC example: updated trainer parameters to save tokenizer by MKhalusova in 23243
* [docs] Audio task guides fixes by MKhalusova in 23239
* Improve Docs of Custom Tools and Agents by patrickvonplaten in 23255
* Metadata update by LysandreJik in 23259
* Update Image segmentation description by LysandreJik in 23261
* pin `tensorflow-probability` in docker files by ydshieh in 23260
* Refine documentation for Tools by sgugger in 23266
* Fix new line bug in chat mode for agents by sgugger in 23267
* Render custom tool docs a bit better by sgugger in 23269
* chore: allow protobuf 3.20.3 requirement by jose-turintech in 22759
* Fix link displayed for custom tools by sgugger in 23274
* Remove missplaced test file by sgugger in 23275

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* gabrielwithappy
* 🌐 [i18n-KO] Translated `training.mdx` to Korean (22670)
* 🌐 [i18n-KO] Fix anchor links for docs `auto_tutorial`, `training` (22796)
* 🌐 [i18n-KO] translate `create_a_model` doc to Korean (22754)
* 🌐 [i18n-KO] docs: ko: Translate `multiple_choice.mdx` (23064)
* 0525hhgus
* 🌐 [i18n-KO] Translated `sequence_classification.mdx` to Korean (22655)
* [i18n-KO] Translated `accelerate.mdx` to Korean (22830)
* 🌐 [i18n-KO] Translated `token_classification.mdx` to Korean (22945)
* 🌐 [i18n-KO] Translated `model_sharing.mdx` to Korean (22991)
* 🌐 [i18n-KO] Translated `tasks/image_classification.mdx` to Korean (23048)
* sim-so
* [WIP]🌐 [i18n-KO] Translated `tutorial/proprecssing.mdx` to Korean (22578)
* 🌐 [i18n-KO] Translated `tasks/summarization.mdx` to Korean (22783)
* 🌐 [i18n-KO] Translated `tasks/image_captioning.mdx` to Korean (22943)
* 🌐 [i18n-KO] Translated `torchscript.mdx` to Korean (23060)
* HanNayeoniee
* 🌐 [i18n-KO] Translated `custom_models.mdx` to Korean (22534)
* 🌐 [i18n-KO] Translated `tasks/masked_language_modeling.mdx` to Korean (22838)
* 🌐 [i18n-KO] Translated `run_scripts.mdx` to Korean (22793)
* 🌐 [i18n-KO] Fixed `tasks/masked_language_modeling.mdx` (22965)
* 🌐 [i18n-KO] Translated `multilingual.mdx` to Korean (23008)
* 🌐 [i18n-KO] Translated `tasks/zero_shot_image_classification.mdx` to Korean (23065)
* docs: ko: update `_toctree.yml` (23112)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `tasks/translation.mdx` to Korean (22805)
* 🌐 [i18n-KO] Translated `serialization.mdx` to Korean (22806)
* peter-sk
* added GPTNeoXForTokenClassification (23002)
* added GPTNeoForTokenClassification (22908)
* GPT2ForQuestionAnswering (23030)
* GPTNeoForQuestionAnswering (23057)
* gpt2 multi-gpu fix (23149)
* GPTNeoXForQuestionAnswering (23059)
* s-JoL
* add open-llama model with ckpt (22795)
* awinml
* Add `BioGPTForSequenceClassification` (22253)
* Add `no_trainer` scripts to pre-train Vision Transformers (23156)
* Update LLaMA docs with arxiv link (23191)
* raghavanone
* Add FlaxWhisperForAudioClassification model (22883)
* Add FlaxWhisperForAudioClassification model (23173)

4.28.1

Not secure

Fixes a regression for DETA models

- Revert the change on Deta by ydshieh in 22750

4.28.0

Not secure

LLaMA

The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights [here](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form) then use the conversion script to generate a checkpoint compatible with Hugging Face

* LLaMA Implementation by zphang in 21955

Pix2Struct, MatCha, DePlot

Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.

* Add Pix2Struct by younesbelkada in 21400
* Add DePlot + MatCha on `transformers` by younesbelkada in 22528

Mega

MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGA’s compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.

* Add Mega: Moving Average Equipped Gated Attention by mnaylor5 in 21766

GPTBigCode

The model is a an optimized [GPT2 model](https://huggingface.co/docs/transformers/model_doc/gpt2) with support for Multi-Query Attention.

* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by jlamypoirier in 22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

* `NLLB-MoE` Adds the moe model by ArthurZucker in 22024

Serializing 8bit models

* [`bnb`] Let's make serialization of int8 models possible by younesbelkada in 22177

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo [here](https://huggingface.co/ybelkada/bloom-1b7-8bit)

Breaking Changes

Ordering of height and width for the BLIP image processor

_Notes from the PR:_

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.

* 🚨🚨🚨 Fix ordering of height, width for BLIP image processor by amyeroberts in 22466

Prefix tokens for the NLLB tokenizer

The big problem was the `prefix` and `suffix` tokens of the NLLB tokenizer.

Previous behaviour:
python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> 2: '</s>'
>>> 256047 : 'eng_Latn'

New behaviour

python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)

* 🚨🚨🚨 `[NLLB Tokenizer]` Fix the prefix tokens 🚨🚨🚨 by ArthurZucker in 22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

* Add TF port of BLIP by Rocketknight1 in 22090

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

* Generate: Export TF generate with a TF tokenizer by gante in 22310

Task guides

A new task guide has been added, focusing on depth-estimation.

* Depth estimation task guide by MKhalusova in 22205

Bugfixes and improvements

* Load optimizer state on CPU to avoid CUDA OOM by sgugger in 22159
* Run all tests by default by sgugger in 22162
* Fix: unfinished_sequences with correct device by Stxr in 22184
* Revert 22152 MaskedImageCompletionOutput changes by amyeroberts in 22187
* Regression pipeline device by sgugger in 22190
* Update BridgeTowerForContrastiveLearning by abhiwand in 22145
* t5 remove data dependency by prathikr in 22097
* Fix DeepSpeed CI by ydshieh in 22194
* Fix typo in Align docs by alaradirik in 22199
* Update expected values in `MgpstrModelIntegrationTest` by ydshieh in 22195
* Italian Translation of migration.mdx by Baelish03 in 22183
* Update tiny model creation script by ydshieh in 22202
* Temporarily fix ONNX model exporting error by SatyaJandhyalaAtMS in 21830
* [`XGLM`] Add `accelerate` support for XGLM by younesbelkada in 22207
* fixes a typo in WhisperFeatureExtractor docs. by susnato in 22208
* Hotfix for natten issue with torch 2.0.0 on CircleCI by ydshieh in 22218
* fix typos in llama.mdx by keturn in 22223
* fix code example in mgp-str doc by wdp-007 in 22219
* Use `dash==2.8.1` for now for daily CI by ydshieh in 22227
* LLaMA house-keeping by sgugger in 22216
* fix AutoTP in deepspeed could not work for bloom by sywangyi in 22196
* Add LlamaForSequenceClassification by lewtun in 22209
* Removed .mdx extension in two links by MKhalusova in 22230
* fix(docs): fix task guide links in model docs by Seb0 in 22226
* Fix natten by alihassanijr in 22229
* Revert "Use `dash==2.8.1` for now for daily CI" by ydshieh in 22233
* Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by ma787639046 in 22234
* [trainer] param count for deepspeed zero3 by stas00 in 22193
* Update training_args.py -- a nightly install is not required anymore for torch.compile by pminervini in 22266
* [Docs] fix typos in some tokenizer docs by yesinkim in 22256
* Italian translation perf_infer_cpu by nickprock in 22243
* [Trainer] Add optional communication backends for torch.distributed when using GPU by heya5 in 22247
* Fix the gradient checkpointing bug of the llama model by yqy2001 in 22270
* Fix balanced and auto device_map by sgugger in 22271
* Rework a bit the LLaMA conversion script by sgugger in 22236
* Proper map location for optimizer load by sgugger in 22273
* Fix doc links by amyeroberts in 22274
* Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by ani300 in 22279
* Example of pad_to_multiple_of for padding and truncation guide & docstring update by MKhalusova in 22278
* Update vision docstring bool masked pos by amyeroberts in 22237
* replace_8bit_linear modules_to_not_convert default value fix by BlackSamorez in 22238
* Fix error in mixed precision training of `TFCvtModel` by gcuder in 22267
* More doctests by ydshieh in 22268
* fix more doctests by ydshieh in 22292
* Add translation perf_infer_gpu_one for it by davidegazze in 22296
* Restore fp16 support on xla gpu device by ymwangg in 22300
* Correct NATTEN function signatures and force new version by alihassanijr in 22298
* [deepspeed] offload + non-cpuadam optimizer exception doc by stas00 in 22044
* Final update of doctest by ydshieh in 22299
* Add MaskedImageModelingOutput by alaradirik in 22212
* Enable traced model for text-generation task by jiqing-feng in 22265
* add low_cpu_mem_usage option in run_clm.py example which will benefit… by sywangyi in 22288
* fix: Allow only test_file in pytorch and flax summarization by connor-henderson in 22293
* Fix position embeddings for GPT-J and CodeGen by njhill in 22069
* Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by silentghoul-spec in 22302
* Enforce `max_memory` for device_map strategies by sgugger in 22311
* Beef up Llama tests by gante in 22314
* docs: Resolve incorrect type typo in trainer methods by tomaarsen in 22316
* Chunkable token classification pipeline by luccailliau in 21771
* Fix PipelineTests skip conditions by ydshieh in 22320
* [deepspeed zero3] need `generate(synced_gpus=True, ...)` by stas00 in 22242
* [gptj] support older pytorch version by stas00 in 22325
* Move common properties to BackboneMixin by amyeroberts in 21855
* Backbone add mixin tests by amyeroberts in 22542
* Backbone add out indices by amyeroberts in 22493
* [`MBart`] Add `accelerate` support for MBart by younesbelkada in 22309
* Fixed gradient checkpoint bug for TimeSeriesTransformer by mollerup23 in 22272
* Mention why one needs to specify max_steps in Trainer by lhoestq in 22333
* Fix various imports by sgugger in 22281
* Minor typo in pipeline FillMaskPipeline's documentation. by SamuelLarkin in 22339
* Added type hints to TFDeiTModel by Batese2001 in 22327
* Fix --bf16 option support for Neuron after PR 22300 by jeffhataws in 22307
* Generate: add test for left-padding support by gante in 22322
* Enable training Llama with model or pipeline parallelism by kooshi in 22329
* Automatically create/update tiny models by ydshieh in 22275
* [HFTracer] Make embeddings ops take on the dtype of the weight by jamesr66a in 22347
* Fix typo in Greedy Search Description by awinml in 22345
* Generate: Add GPTNeoX integration test by gante in 22346
* Update docker files to use official torch 2.0.0 by ydshieh in 22357
* Pin tensorflow-text to go with tensorflow by sgugger in 22362
* Improve error message by Mahrkeenerh in 22361
* TensorFlow: pin maximum version to 2.12 by gante in 22364
* Resnet flax by Shubhamai in 21472
* [Trainer] add disclaimer that full_determinism is slow by stas00 in 22368
* [safetensors] don't use in `torch<1.10` by stas00 in 22370
* TensorFlow: additional missing `cmake` dependencies in CI by gante in 22383
* Changed world_size() to get_world_size() bugfix by Charlie-Bell in 22381
* Translated documentation in italian by nickprock in 22388
* Adapt find_tied_parameters to handle breaking change in Accelerate by sgugger in 22360
* load_in_8bit now respects 'balanced' device maps in multi-gpu environments by kooshi in 22377
* Wav2Vec2ProcessorWithLM can return N best hypotheses now by vsokolovskii in 22235
* Seq2seq trainer generation config arg by Natooz in 22323
* Generate: support for left-padding on GPTNeoX and Llama by gante in 22382
* [`bnb`] Force `requires_grad` to be `False` by younesbelkada in 22396
* Transformers env safetensors by sgugger in 22400
* [Pix2Struct] Add support to resize embeddings by NielsRogge in 22394
* Trainer: move Seq2SeqTrainer imports under the typing guard by gante in 22401
* Trainer: missing None check by gante in 22404
* Hardware Auto-Setup for Examples by dongreenberg in 22319
* [neptune] fix checkpoint bug with relative out_dir by kshitij12345 in 22102
* Fix bug in perplexity guide calculations and update perplexity numbers. Fixes 22348 by fpgaminer in 22411
* [performance] ensure `causal_mask` is created directly on device by jeffra in 22378
* MBart: Fix docs and doctests by gante in 22422
* Add clean_up_tokenization_spaces to config by ArthurZucker in 22341
* Hyperparameter search reporting to W&B by NoB0 in 22440
* [`bnb`] fix bnb failing test by younesbelkada in 22439
* [`Generate`] Add conditional generation for multimodal models by younesbelkada in 22424
* Don't hard error when cache version can't be converted to int by sgugger in 22427
* Use real tokenizers if tiny version(s) creation has issue(s) by ydshieh in 22428
* Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by sgugger in 22444
* [`Pix2Struct`] Fix slow test by younesbelkada in 22448
* Revert "Fix --bf16 option support for Neuron after PR 22300" by jeffhataws in 22451
* Update Neptune docs by normandy7 in 22452
* Avoid using personal HF token in CI by ydshieh in 22453
* Update release instructions by sgugger in 22454
* Pin ruff by sgugger in 22455
* Update: ignore padding support for TransfoXL training when n_clusters==0 by StefanHeng in 22457
* Rescale image back if it was scaled during PIL conversion by amyeroberts in 22458
* Skip flaky NLLB Moe test for now by amyeroberts in 22463
* Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by hvaara in 22285
* [NLLB-MoE] `model_type` update for auto mapping by ArthurZucker in 22470
* Llama: support for `max_position_embeddings` by gante in 22471
* Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by manueldeprada in 22473
* (Re-)Enable Nightly + Past CI by ydshieh in 22393
* Relax `eos_token_id < 0` checks in `generate()` from `ValueError` to warning by lewtun in 22472
* Update `Wav2Vec2ProcessorWithLM` doc example by ydshieh in 22474
* Making sure we can use safetensors to serialize all the time. by Narsil in 22437
* Update Neptune callback docstring by normandy7 in 22497
* Test fetch v2 by sgugger in 22367
* Update convert_llama_weights_to_hf.py by Ricardokevins in 22525
* [Time-Series] fix past_observed_mask type by elisim in 22076
* Fix llama tokenizer by ArthurZucker in 22402
* [WIP] docs: ko: sagemaker.mdx by jungnerd in 22509
* added biogpt token classifier by upjabir in 22447
* Generate: `TextIteratorStreamer` (streamer for gradio) by gante in 22501
* Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by larekrow in 22526
* llama docs: fix conversion script url by python273 in 22514
* fix LayoutLMv3TokenizerFast subword label after 'Ġ' token by thibaultdouzon in 21695
* [BLIP] fix cross attentions for BlipTextEncoder by zhbh01 in 22515
* [`Trainer`] Force `is_model_parallel` when model is loaded in multiple GPUs using `accelerate` by younesbelkada in 22532
* [`T5`] Enable naive Pipeline Parallelism training for T5 by younesbelkada in 22535
* Fix missing metrics with multiple eval datasets by hawkeoni in 22536
* [setup] drop deprecated `distutils` usage by XuehaiPan in 22531
* Generate: Enable easier TextStreamer customization by vblagoje in 22516
* [setup] migrate setup script to `pyproject.toml` by XuehaiPan in 22539
* Update test_image_processing_pix2struct.py by younesbelkada in 22543
* Fix OPTForQuestionAnswering doc string by curlup in 22481
* Generate: Add text streamer decoding options by gante in 22544
* 🔥py38 + torch 2 🔥🔥🔥🚀 by ydshieh in 22204
* Time to Say Goodbye, torch 1.7 and 1.8 by ydshieh in 22291
* [Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by TheWall9 in 22416
* Implemented safetensors checkpoints save/load for Trainer by ViktorooReps in 22498
* Remove hack for dynamic modules and use Python functions instead by sgugger in 22537
* [`bnb`] Fix typo by younesbelkada in 22556
* Add id2label and label2id to model's config in run_xnil by maziyarpanahi in 22558
* Soft error whisper. by Narsil in 22475
* corrected the code comment for the output of find_pruneable_heads_and_indices by SunHaozhe in 22557
* Flax Regnet by Shubhamai in 21867
* fix `_no_split_modules` for Whisper model by pacman100 in 22486
* Fix inverted conditional in TF common test! by Rocketknight1 in 22540
* Generate: `TextIteratorStreamer` timeout by gante in 22576
* Move back doctest instructions to setup.cfg by sgugger in 22587
* Tests: disable `accelerate_tests` mark warnings by gante in 22585
* Fix PT-TF equivalence test for GPT1 by Rocketknight1 in 22586
* Add thousands separator in training summary by qmeeus in 22583
* docs: ko: complete `_toctree.yml` by wonhyeongseo in 22581
* Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by mpenagar in 21926
* Fix a typo in one of the BLIP pretrained checkpoint names by Rocketknight1 in 22588
* Adding support for BPE merge creation from scores instead of ids. by Narsil in 22582
* Use native TF checkpoints for the BLIP TF tests by Rocketknight1 in 22593
* feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by kaustubh-s1 in 22591
* Adding Llama FastTokenizer support. by Narsil in 22264
* Revert error back into warning for byte fallback conversion. by Narsil in 22607
* Seq2SeqTrainer: use unwrapped model to retrieve the generation config by gante in 22584
* Make tiny model creation + pipeline testing more robust by ydshieh in 22500
* docs: Fix broken link to generation strategies by connor-henderson in 22623
* update_pip_test_mapping by ydshieh in 22606
* A script to add/update `pipeline_model_mapping` systematically by ydshieh in 22180
* [`bnb`] 8bit models should not be converted to `DDP` by younesbelkada in 22628
* LlamaTokenizerFast Fix (.., from_slow=True). by Narsil in 22630
* [`Blip`] Fix slow tests and doctests with correct values by younesbelkada in 22632
* Update tiny model summary file for recent models by ydshieh in 22637
* fix FSDP version related issues by pacman100 in 22489
* 🌐[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` by gabrielwithappy in 22533
* Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by xssChauhan in 22596
* Fix typo by Ronalmoo in 22650
* Fix `MegaModel` CI by ydshieh in 22652
* 🌐 [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean by wonhyeongseo in 22508
* Small nit, by ArthurZucker in 22653
* [tokenization] do not push special file by ArthurZucker in 22657
* [OPT] Fix default attention mask size by ArthurZucker in 22649
* Generate: add API warning to streamers by gante in 22659
* Revert migration of setup to pyproject.toml by sgugger in 22658
* moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by iamarunbrahma in 22663
* Model parallelism: Moving labels to the same device as logits for BridgeTower models by shahad-mahmud in 22676
* (feat): Moving labels to same device as logits for Deit by xssChauhan in 22679
* Make dynamic code work with offline mode by sgugger in 22661
* Fix quantization docs typo by python273 in 22666
* use __func__ to check can_generate by xin3he in 22643
* add GPTNeoXForSequenceClassification by Asugawara in 22671
* Model parallelism: Moving labels to same devices as the logits are by shahad-mahmud in 22691
* Update some `MarkupLM` tests' expected values by ydshieh in 22667
* Make it easier to develop without a dev install by sgugger in 22697
* Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by mayankagarwals in 22702
* Clarify stride option by luccailliau in 22684
* Remove 2 failing ONNX conversion tests by ydshieh in 22660
* Replace -100s in predictions by the pad token by sgugger in 22693
* Fix decorator order by ydshieh in 22708
* Update input values for docstring by amyeroberts in 22631
* remove wrong doc in readme by ArthurZucker in 22723
* Added parallel device usage for GPT-J by jprivera44 in 22713
* add model resources for CPMAnt (new) by pioliverse in 20906
* Modify pipeline_tutorial.mdx by ARKA1112 in 22726
* [tests] switch to torchrun by stas00 in 22712
* `torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed by michaelbenayoun in 22728
* add fast support and option by ArthurZucker in 22724
* Update warning levels by NielsRogge in 22727
* Fix docstrings for TF BLIP by Rocketknight1 in 22618
* [Doctest] Add configuration_m2m_100.py by elabongaatuo in 22733
* [Doctest] Add configuration_mvp.py by elabongaatuo in 22735
* Indexing fix for gpt_bigcode by jlamypoirier in 22737
* Make vilt, switch_transformers compatible with model parallelism by Xrenya in 22703
* [Pix2struct] Simplify generation by NielsRogge in 22527

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* zphang
* LLaMA Implementation (21955)
* Seb0
* fix(docs): fix task guide links in model docs (22226)
* mnaylor5
* Add Mega: Moving Average Equipped Gated Attention (21766)
* Shubhamai
* Resnet flax (21472)
* Flax Regnet (21867)
* wonhyeongseo
* docs: ko: complete `_toctree.yml` (22581)
* 🌐 [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean (22508)
* jlamypoirier
* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (22575)
* Indexing fix for gpt_bigcode (22737)
* pioliverse
* add model resources for CPMAnt (new) (20906)

4.27.4

Not secure

This patch fixes a regression with FlauBERT and XLM models.

* Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (21627) in 22444 by sgugger

Page 12 of 33

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 12 of 33

4.29.2

4.29.1

4.29.0

4.28.1

4.28.0

4.27.4

Page 12 of 33

Links

Releases