New models
Mistral
Mistral-7B-v0.1 is a decoder-based LM with the following architectural choices:
- Sliding Window Attention - Trained with 8k context length and fixed cache size, with a theoretical attention span of 128K tokens
- GQA (Grouped Query Attention) - allowing faster inference and lower cache size.
- Byte-fallback BPE tokenizer - ensures that characters are never mapped to out-of-vocabulary tokens.
* [Mistral] Mistral-7B-v0.1 support by Bam4d in 26447
Persimmon
The authors introduced Persimmon-8B, a decoder model based on the classic transformers architecture, with query and key normalization. Persimmon-8B is a fully permissively licensed model with approximately 8 billion parameters, released under the Apache license. Some of the key attributes of Persimmon-8B are long context size (16K), performance, and capabilities for multimodal extensions.
* [`Persimmon`] Add support for persimmon by ArthurZucker in 26042
BROS
BROS stands for BERT Relying On Spatiality. It is an encoder-only Transformer model that takes a sequence of tokens and their bounding boxes as inputs and outputs a sequence of hidden states. BROS encode relative spatial information instead of using absolute spatial information.
* Add BROS by jinhopark8345 in 23190
ViTMatte
ViTMatte leverages plain [Vision Transformers](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for the task of image matting, which is the process of accurately estimating the foreground object in images and videos.
* Add ViTMatte by NielsRogge in 25843
Nougat
Nougat uses the same architecture as [Donut](https://huggingface.co/docs/transformers/main/en/model_doc/donut), meaning an image Transformer encoder and an autoregressive text Transformer decoder to translate scientific PDFs to markdown, enabling easier access to them.
* Add Nougat by NielsRogge and molbap in 25942
Prompt templating
We've added a new [template](https://huggingface.co/docs/transformers/main/chat_templating) feature for chat models. This allows the formatting that a chat model was trained with to be saved with the model, ensuring that users can exactly reproduce that formatting when they want to fine-tune the model or use it for inference. For more information, see [our template documentation](https://huggingface.co/docs/transformers/main/chat_templating).
* Overhaul Conversation class and prompt templating by Rocketknight1 in 25323
🚨🚨 Tokenizer refactor
* [`Tokenizer`] attemp to fix add_token issues by ArthurZucker in 23909
* Nit-added-tokens by ArthurZucker in 26538 adds some fix to 23909 .
🚨Workflow Changes 🚨:
These are not breaking changes per se but rather bugfixes. However, we understand that this may result in some workflow changes so we highlight them below.
- unique_no_split_tokens attribute removed and not used in the internal logic
- sanitize_special_tokens() follows a deprecation cycle and does nothing
- All attributes in SPECIAL_TOKENS_ATTRIBUTES are stored as AddedTokens and no strings.
- loading a slow from a fast or a fast from a slow will no longer raise and error if the tokens added don't have the correct index. This is because they will always be added following the order of the added_tokens but will correct mistakes in the saved vocabulary if there are any. (And there are a lot in old format tokenizers)
- the length of a tokenizer is now max(set(self.get_vocab().keys())) accounting for holes in the vocab. The vocab_size no longer takes into account the added vocab for most of the tokenizers (as it should not). Mostly breaking for T5
- Adding a token using tokenizer.add_tokens([AddedToken("hey", rstrip=False, normalized=True)]) now takes into account rstrip, lstrip, normalized information.
- added_tokens_decoder holds AddedToken, not strings.
- add_tokens() for both fast and slow will always be updated if the token is already part of the vocab, allowing for custom stripping.
- initializing a tokenizer form scratch will now add missing special tokens to the vocab.
- stripping is not always done for special tokens! 🚨 Only if the AddedToken has lstrip=True and rstrip=True
- fairseq_ids_to_tokens attribute removed for Barthez (was not used)
➕ Most visible features:
- printing a tokenizer now shows `tokenizer.added_tokens_decoder` for both fast and slow tokenizers. Moreover, additional tokens that were already part of the initial vocab are also found there.
- faster `from_pretrained`, faster `add_tokens` because special and non special can be mixed together and the trie is not always rebuilt.
- faster encode/decode with caching mechanism for `added_tokens_decoder/encoder`.
- information is fully saved in the `tokenizer_config.json`
**For any issues relating to this, make sure to open a new issue and ping ArthurZucker.**
Flash Attention 2
FA2 support added to transformers for most popular architectures (llama, mistral, falcon) architectures actively being contributed in this issue (https://github.com/huggingface/transformers/issues/26350). Simply pass `use_flash_attention_2=True` when calling `from_pretrained`
In the future, PyTorch will support Flash Attention 2 through [`torch.scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), users would be able to benefit from both (transformers core & transformers + SDPA) implementations of Flash Attention-2 with simple changes (`model.to_bettertransformer()` and force-dispatch the SDPA kernel to FA-2 in the case of SDPA)
* [`core` ] Integrate Flash attention 2 in most used models by younesbelkada in 25598
For our future plans regarding integrating F.sdpa from PyTorch in core transformers, see here: https://github.com/huggingface/transformers/issues/26557
Lazy import structure
Support for lazy loading integration libraries has been added. This will drastically speed up importing `transformers` and related object from the library.
Example before this change:
2023-09-11 11:07:52.010179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
python3 -c "from transformers import CLIPTextModel" 3.31s user 3.06s system 220% cpu 2.893 total
After this change:
python3 -c "from transformers import CLIPTextModel" 1.70s user 1.49s system 220% cpu 1.447 total
* [Core] Add lazy import structure to imports by patrickvonplaten in 26090
Bugfixes and improvements
* Fix typo by susnato in 25966
* Fix Detr CI by ydshieh in 25972
* Fix `test_load_img_url_timeout` by ydshieh in 25976
* nn.Identity is not required to be compatible with PyTorch < 1.1.0 as the minimum PyTorch version we currently support is 1.10.0 by statelesshz in 25974
* Add `Pop2Piano` space demo. by susnato in 25975
* fix typo by kai01ai in 25981
* Use main in conversion script by ydshieh in 25973
* [doc] Always call it Agents for consistency by julien-c in 25958
* Update RAG README.md with correct path to examples/seq2seq by tleyden in 25953
* Update training_args.py to remove the runtime error by sahel-sh in 25920
* Trainer: delegate default generation values to `generation_config` by gante in 25987
* Show failed tests on CircleCI layout in a better way by ydshieh in 25895
* Patch with accelerate xpu by abhilash1910 in 25714
* PegasusX add _no_split_modules by andreeahedes in 25933
* Add TFDebertaV2ForMultipleChoice by raghavanone in 25932
* deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler by pacman100 in 25863
* [Wav2Vec2 Conformer] Fix inference float16 by sanchit-gandhi in 25985
* Add LLaMA resources by eenzeenee in 25859
* [`CI`] Fix red CI and ERROR failed should show by ArthurZucker in 25995
* [`VITS`] tokenizer integration test: fix revision did not exist by ArthurZucker in 25996
* Fix Mega chunking error when using decoder-only model by tanaymeh in 25765
* save space when converting hf model to megatron model. by flower-with-safe in 25950
* Update README.md by NinoRisteski in 26003
* Falcon: fix revision propagation by LysandreJik in 26006
* TF-OPT attention mask fixes by Rocketknight1 in 25238
* Fix small typo README.md by zspo in 25934
* 🌐[i18n-KO] Translated `llm_tutorial.md` to Korean by harheem in 25791
* Remove Falcon from undocumented list by Rocketknight1 in 26008
* modify context length for GPTQ + version bump by SunMarc in 25899
* Fix err with FSDP by muellerzr in 25991
* fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 by kai01ai in 26024
* Fix CircleCI config by ydshieh in 26023
* Add `tgs` speed metrics by CokeDong in 25858
* [VITS] Fix nightly tests by sanchit-gandhi in 25986
* Added HerBERT to README.md by Muskan011 in 26020
* Fix vilt config docstring parameter to match value in init by raghavanone in 26017
* Punctuation fix by kwonmha in 26025
* Try to fix training Loss inconsistent after resume from old checkpoint by dumpmemory in 25872
* Fix Dropout Implementation in Graphormer by alexanderkrauck in 24817
* Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D by gau-nernst in 26031
* Skip warning if tracing with dynamo by angelayi in 25581
* 🌐 [i18n-KO] Translated `llama.md` to Korean by harheem in 26044
* [`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset by ArthurZucker in 26041
* [`CITests`] skip failing tests until 26054 is merged by ArthurZucker in 26063
* only main process should call _save on deepspeed zero3 by zjjMaiMai in 25959
* docs: update link huggingface map by pphuc25 in 26077
* docs: add space to docs by pphuc25 in 26067
* [`core`] Import tensorflow inside relevant methods in `trainer_utils` by younesbelkada in 26106
* Generate: legacy mode is only triggered when `generation_config` is untouched by gante in 25962
* Update logits_process.py docstrings by larekrow in 25971
* Fix ExponentialDecayLengthPenalty negative logits issue by pokjay in 25594
* 🌐 [i18n-KO] Translated `llama2.md` to Korean by mjk0618 in 26047
* [docs] Updates to TTS task guide with regards to the new TTS pipeline by MKhalusova in 26095
* 🌐 [i18n-KO] Translated `contributing.md` to Korean by mjk0618 in 25877
* enable optuna multi-objectives feature by sywangyi in 25969
* chore: correct update_step and correct gradient_accumulation_steps by pphuc25 in 26068
* Text2text pipeline: don't parameterize from the config by gante in 26118
* Fix `MarianTokenizer` to remove metaspace character in `decode` by tanaymeh in 26091
* safeguard torch distributed check by pacman100 in 26056
* fix the deepspeed tests by pacman100 in 26021
* Fix AutoTokenizer docstring typo by amyeroberts in 26117
* [`core`] fix 4bit `num_parameters` by younesbelkada in 26132
* Add missing space in generation/utils.py by jbochi in 26121
* Update spectrogram and waveform model mapping for TTS/A pipeline by Vaibhavs10 in 26114
* [`RWKV`] Final fix RWMV 4bit by younesbelkada in 26134
* docs: feat: add llama2 notebook resources from OSSCA community by junejae in 26076
* Generate: ignore warning when `generation_config.max_length` is set to `None` by gante in 26147
* Fix `test_finetune_bert2bert` by ydshieh in 25984
* Falcon: batched generation by gante in 26137
* Fix `beam_scores` shape when token scores shape changes after `logits_processor` by BakerBunker in 25980
* Update training_args.py - addition of self.distributed_state when using XPU by Serizao in 25999
* [docs] last hidden state vs hidden_states[-1] by MKhalusova in 26142
* Flex xpu bug fix by abhilash1910 in 26135
* Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses by rachthree in 25638
* Fix eval accumulation when `accelerate` > 0.20.3 by sam-scale in 26060
* [Whisper Tokenizer] Encode timestamps by sanchit-gandhi in 26054
* [`PEFT`] Fix PEFT + gradient checkpointing by younesbelkada in 25846
* [MusicGen] Add streamer to generate by sanchit-gandhi in 25320
* Fix beam search when using model parallel by pfldy2850 in 24969
* [MusicGen] Add sampling rate to config by sanchit-gandhi in 26136
* [Whisper] Fix word-level timestamps for audio < 30 seconds by xenova in 25607
* [BLIP-2] Improve conversion script by NielsRogge in 24854
* IDEFICS: allow interpolation of vision's pos embeddings by leot13 in 26029
* [TTA Pipeline] Test MusicGen and VITS by sanchit-gandhi in 26146
* Tweaks to Chat Templates docs by Rocketknight1 in 26168
* [Whisper] Check length of prompt + max new tokens by sanchit-gandhi in 26164
* Update notebook.py to support multi eval datasets by matrix1001 in 25796
* Fix pad to multiple of by ArthurZucker in 25732
* [docs] IDEFICS guide and task guides restructure by MKhalusova in 26035
* [PEFT] Allow PEFT model dict to be loaded by patrickvonplaten in 25721
* No doctest for `convert_bros_to_pytorch.py` by ydshieh in 26212
* Remove `utils/documentation_tests.txt` by ydshieh in 26213
* moved `ctrl` to `Salesforce/ctrl` by julien-c in 26183
* Fix ConversationalPipeline tests by Rocketknight1 in 26217
* [FSMT] Fix non-shared weights by LysandreJik in 26187
* refactor decay_parameters production into its own function by shijie-wu in 26152
* refactor: change default block_size in block size > max position embeddings by pphuc25 in 26069
* [Wav2Vec2-Conf / LLaMA] Style fix by sanchit-gandhi in 26188
* [Permisson] Style fix by sanchit-gandhi in 26228
* [Check] Fix config docstring by sanchit-gandhi in 26222
* 🌐 [i18n-KO] Translated `whisper.md` to Korean by nuatmochoi in 26002
* Create the return value on device to avoid unnecessary copying from CPU by mksit in 26151
* [AutoBackbone] Add test by NielsRogge in 26094
* Update README.md by NinoRisteski in 26198
* Update add_new_pipeline.md by NinoRisteski in 26197
* [docs] Fix model reference in zero shot image classification example by Aleksandar1932 in 26206
* Fix the gitlab user mention in issue templates to the correct user by muellerz in 26237
* Fix some docstring in image processors by ydshieh in 26235
* Fix gated repo tests by Wauplin in 26257
* Fix `Error` not captured in PR doctesting by ydshieh in 26215
* DeepSpeed ZeRO-3 handling when resizing embedding layers by pacman100 in 26259
* [FIX] resize_token_embeddings by passaglia in 26102
* FSDP tests and checkpointing fixes by pacman100 in 26180
* fix name error when accelerate is not available by pacman100 in 26278
* Update bros checkpoint by jinhopark8345 in 26277
* Integrate AMD GPU in CI/CD environment by mfuntowicz in 26007
* Rewrite for custom code warning messages by Rocketknight1 in 26291
* fix deepspeed available detection by fxmarty in 26252
* add bbox input validation by jinhopark8345 in 26294
* include changes from llama by ArthurZucker in 26260
* [`Trainer`] Refactor trainer + bnb logic by younesbelkada in 26248
* add custom RMSNorm to `ALL_LAYERNORM_LAYERS` by shijie-wu in 26227
* Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed by fxmarty in 26225
* Fix FSMT weight sharing by LysandreJik in 26292
* update hf hub dependency to be compatible with the new tokenizers by ArthurZucker in 26301
* Porting the torchaudio kaldi fbank implementation to audio_utils by ylacombe in 26182
* More error message fixup, plus some linebreaks! by Rocketknight1 in 26296
* [QUICK FIX LINK] Update trainer.py by SoyGema in 26293
* Use CircleCI `store_test_results` by ydshieh in 26223
* Fix doctest CI by ydshieh in 26324
* [doc] fixed indices in obj detection example by MKhalusova in 26343
* [TTA Pipeline] Fix MusicGen test by sanchit-gandhi in 26348
* Add image to image pipeline by LeviVasconcelos in 25393
* feat: adding num_proc to load_dataset by pphuc25 in 26326
* Fixed unclosed p tags by HanSeokhyeon in 26240
* Update add_new_model.md by NinoRisteski in 26365
* Fix MusicGen logging error by osanseviero in 26370
* [docs] removed MaskFormerSwin and TimmBackbone from the table on index.md by MKhalusova in 26347
* Update tiny model information and pipeline tests by ydshieh in 26285
* Add Russian localization for README by qweme32 in 26208
* 🌐 [i18n-KO] Translated `audio_classification.mdx` to Korean by gabrielwithappy in 26200
* [ViTMatte] Add resources by NielsRogge in 26317
* Deleted duplicate sentence by titi-devv in 26394
* added support for gradient checkpointing in ESM models by sanjeevk-os in 26386
* Fix DeepSpeed issue with Idefics by HugoLaurencon in 26393
* Add torch `RMSProp` optimizer by natolambert in 26425
* Fix padding for IDEFICS by shauray8 in 26396
* Update semantic_segmentation.md by zekaouinoureddine in 26419
* Fixing tokenizer when `transformers` is installed without `tokenizers` by urialon in 26236
* [`FA` / `tests`] Add use_cache tests for FA models by younesbelkada in 26415
* add bf16 mixed precision support for NPU by statelesshz in 26163
* [`PEFT`] Fix PEFT multi adapters support by younesbelkada in 26407
* Fix failing doctest by LysandreJik in 26450
* Update `runs-on` in workflow files by ydshieh in 26435
* [i18n-DE] Complete first toc chapter by flozi00 in 26311
* 🌐 [i18n-KO] Translated `debugging.md` to Korean by wonhyeongseo in 26246
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean by wonhyeongseo in 26244
* optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by NormXU in 26139
* Fix `cos_sin` device issue in Falcon model by ydshieh in 26448
* docs: change assert to raise and some small docs by pphuc25 in 26232
* change mention of decoder_input_ids to input_ids and same with decode_inputs_embeds by tmabraham in 26406
* [VITS] Fix speaker_embed device mismatch by fakhirali in 26115
* [`PEFT`] introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model by younesbelkada in 26270
* Do not warn about unexpected decoder weights when loading T5EncoderModel and LongT5EncoderModel by fleonce in 26211
* fix_mbart_tied_weights by SunMarc in 26422
* Esm checkpointing by Amelie-Schreiber in 26454
* [Whisper Tokenizer] Make decoding faster after adding timestamps by sanchit-gandhi in 26299
* [docs] Update offline mode docs by stevhliu in 26478
* [docs] navigation improvement between text gen pipelines and text gen params by MKhalusova in 26477
* Skip 2 failing persimmon pipeline tests for now by ydshieh in 26485
* Avoid all-zeor attnetion mask used in testing by ydshieh in 26469
* [Flax Examples] Seq2Seq ASR Fine-Tuning Script by sanchit-gandhi in 21764
* [ASR Pipe] Improve docs and error messages by sanchit-gandhi in 26476
* Revert falcon exception by LysandreJik in 26472
* Fix num_heads in _upad_input by fs4r in 26490
* Fix requests connection error during modelcard creation by jphme in 26518
* Fix issue of canine forward requiring input_ids anyway by marcmk6 in 26290
* Fix broken link to video classification task by HelgeS in 26487
* [`PEFT`] Pass token when calling `find_adapter_config` by younesbelkada in 26488
* [`core`/ `auto` ] Fix bnb test with code revision + bug with code revision by younesbelkada in 26431
* Fix model integration ci by ArthurZucker in 26322
* [`PEFT`] Protect `adapter_kwargs` check by younesbelkada in 26537
* Remove-warns by ArthurZucker in 26483
* [Doctest] Add configuration_roformer.py by Adithya4720 in 26530
* Code-llama-nit by ArthurZucker in 26300
* add build_inputs_with_special_tokens to LlamaFast by ArthurZucker in 26297
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean by wonhyeongseo in 26243
* [i18n-DE] contribute chapter by flozi00 in 26481
* [RFC, Logging] Change warning to info by patrickvonplaten in 26545
* Add tokenizer kwargs to fill mask pipeline. by nmcahill in 26234
* [Wav2Vec2 and Co] Update init tests for PT 2.1 by sanchit-gandhi in 26494
* [AMD] Add initial version for run_tests_multi_gpu by mfuntowicz in 26346
* [Doctest] Add `configuration_encoder_decoder.py` by SrijanSahaySrivastava in 26519
* [InternLM] Add support for InternLM by Rocketknight1 in 26302
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* jinhopark8345
* Add BROS (23190)
* Update bros checkpoint (26277)
* add bbox input validation (26294)
* qweme32
* Add Russian localization for README (26208)
* Bam4d
* [Mistral] Mistral-7B-v0.1 support (26447)
* flozi00
* [i18n-DE] Complete first toc chapter (26311)
* [i18n-DE] contribute chapter (26481)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `debugging.md` to Korean (26246)
* 🌐 [i18n-KO] Translated `perf_train_gpu_many.md` to Korean (26244)
* 🌐 [i18n-KO] Translated `tokenizer_summary.md` to Korean (26243)