LLaMA
The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights [here](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form) then use the conversion script to generate a checkpoint compatible with Hugging Face
* LLaMA Implementation by zphang in 21955
Pix2Struct, MatCha, DePlot
Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.
* Add Pix2Struct by younesbelkada in 21400
* Add DePlot + MatCha on `transformers` by younesbelkada in 22528
Mega
MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGAβs compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.
* Add Mega: Moving Average Equipped Gated Attention by mnaylor5 in 21766
GPTBigCode
The model is a an optimized [GPT2 model](https://huggingface.co/docs/transformers/model_doc/gpt2) with support for Multi-Query Attention.
* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by jlamypoirier in 22575
NLLB-MoE
The mixture of experts version of the NLLB release has been added to the library.
* `NLLB-MoE` Adds the moe model by ArthurZucker in 22024
Serializing 8bit models
* [`bnb`] Let's make serialization of int8 models possible by younesbelkada in 22177
You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo [here](https://huggingface.co/ybelkada/bloom-1b7-8bit)
Breaking Changes
Ordering of height and width for the BLIP image processor
_Notes from the PR:_
The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.
In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.
* π¨π¨π¨ Fix ordering of height, width for BLIP image processor by amyeroberts in 22466
Prefix tokens for the NLLB tokenizer
The big problem was the `prefix` and `suffix` tokens of the NLLB tokenizer.
Previous behaviour:
python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> 2: '</s>'
>>> 256047 : 'eng_Latn'
New behaviour
python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]
In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:
python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)
* π¨π¨π¨ `[NLLB Tokenizer]` Fix the prefix tokens π¨π¨π¨ by ArthurZucker in 22313
TensorFlow ports
The BLIP model is now available in TensorFlow.
* Add TF port of BLIP by Rocketknight1 in 22090
Export TF Generate with a TF tokenizer
As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.
* Generate: Export TF generate with a TF tokenizer by gante in 22310
Task guides
A new task guide has been added, focusing on depth-estimation.
* Depth estimation task guide by MKhalusova in 22205
Bugfixes and improvements
* Load optimizer state on CPU to avoid CUDA OOM by sgugger in 22159
* Run all tests by default by sgugger in 22162
* Fix: unfinished_sequences with correct device by Stxr in 22184
* Revert 22152 MaskedImageCompletionOutput changes by amyeroberts in 22187
* Regression pipeline device by sgugger in 22190
* Update BridgeTowerForContrastiveLearning by abhiwand in 22145
* t5 remove data dependency by prathikr in 22097
* Fix DeepSpeed CI by ydshieh in 22194
* Fix typo in Align docs by alaradirik in 22199
* Update expected values in `MgpstrModelIntegrationTest` by ydshieh in 22195
* Italian Translation of migration.mdx by Baelish03 in 22183
* Update tiny model creation script by ydshieh in 22202
* Temporarily fix ONNX model exporting error by SatyaJandhyalaAtMS in 21830
* [`XGLM`] Add `accelerate` support for XGLM by younesbelkada in 22207
* fixes a typo in WhisperFeatureExtractor docs. by susnato in 22208
* Hotfix for natten issue with torch 2.0.0 on CircleCI by ydshieh in 22218
* fix typos in llama.mdx by keturn in 22223
* fix code example in mgp-str doc by wdp-007 in 22219
* Use `dash==2.8.1` for now for daily CI by ydshieh in 22227
* LLaMA house-keeping by sgugger in 22216
* fix AutoTP in deepspeed could not work for bloom by sywangyi in 22196
* Add LlamaForSequenceClassification by lewtun in 22209
* Removed .mdx extension in two links by MKhalusova in 22230
* fix(docs): fix task guide links in model docs by Seb0 in 22226
* Fix natten by alihassanijr in 22229
* Revert "Use `dash==2.8.1` for now for daily CI" by ydshieh in 22233
* Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by ma787639046 in 22234
* [trainer] param count for deepspeed zero3 by stas00 in 22193
* Update training_args.py -- a nightly install is not required anymore for torch.compile by pminervini in 22266
* [Docs] fix typos in some tokenizer docs by yesinkim in 22256
* Italian translation perf_infer_cpu by nickprock in 22243
* [Trainer] Add optional communication backends for torch.distributed when using GPU by heya5 in 22247
* Fix the gradient checkpointing bug of the llama model by yqy2001 in 22270
* Fix balanced and auto device_map by sgugger in 22271
* Rework a bit the LLaMA conversion script by sgugger in 22236
* Proper map location for optimizer load by sgugger in 22273
* Fix doc links by amyeroberts in 22274
* Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by ani300 in 22279
* Example of pad_to_multiple_of for padding and truncation guide & docstring update by MKhalusova in 22278
* Update vision docstring bool masked pos by amyeroberts in 22237
* replace_8bit_linear modules_to_not_convert default value fix by BlackSamorez in 22238
* Fix error in mixed precision training of `TFCvtModel` by gcuder in 22267
* More doctests by ydshieh in 22268
* fix more doctests by ydshieh in 22292
* Add translation perf_infer_gpu_one for it by davidegazze in 22296
* Restore fp16 support on xla gpu device by ymwangg in 22300
* Correct NATTEN function signatures and force new version by alihassanijr in 22298
* [deepspeed] offload + non-cpuadam optimizer exception doc by stas00 in 22044
* Final update of doctest by ydshieh in 22299
* Add MaskedImageModelingOutput by alaradirik in 22212
* Enable traced model for text-generation task by jiqing-feng in 22265
* add low_cpu_mem_usage option in run_clm.py example which will benefit⦠by sywangyi in 22288
* fix: Allow only test_file in pytorch and flax summarization by connor-henderson in 22293
* Fix position embeddings for GPT-J and CodeGen by njhill in 22069
* Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by silentghoul-spec in 22302
* Enforce `max_memory` for device_map strategies by sgugger in 22311
* Beef up Llama tests by gante in 22314
* docs: Resolve incorrect type typo in trainer methods by tomaarsen in 22316
* Chunkable token classification pipeline by luccailliau in 21771
* Fix PipelineTests skip conditions by ydshieh in 22320
* [deepspeed zero3] need `generate(synced_gpus=True, ...)` by stas00 in 22242
* [gptj] support older pytorch version by stas00 in 22325
* Move common properties to BackboneMixin by amyeroberts in 21855
* Backbone add mixin tests by amyeroberts in 22542
* Backbone add out indices by amyeroberts in 22493
* [`MBart`] Add `accelerate` support for MBart by younesbelkada in 22309
* Fixed gradient checkpoint bug for TimeSeriesTransformer by mollerup23 in 22272
* Mention why one needs to specify max_steps in Trainer by lhoestq in 22333
* Fix various imports by sgugger in 22281
* Minor typo in pipeline FillMaskPipeline's documentation. by SamuelLarkin in 22339
* Added type hints to TFDeiTModel by Batese2001 in 22327
* Fix --bf16 option support for Neuron after PR 22300 by jeffhataws in 22307
* Generate: add test for left-padding support by gante in 22322
* Enable training Llama with model or pipeline parallelism by kooshi in 22329
* Automatically create/update tiny models by ydshieh in 22275
* [HFTracer] Make embeddings ops take on the dtype of the weight by jamesr66a in 22347
* Fix typo in Greedy Search Description by awinml in 22345
* Generate: Add GPTNeoX integration test by gante in 22346
* Update docker files to use official torch 2.0.0 by ydshieh in 22357
* Pin tensorflow-text to go with tensorflow by sgugger in 22362
* Improve error message by Mahrkeenerh in 22361
* TensorFlow: pin maximum version to 2.12 by gante in 22364
* Resnet flax by Shubhamai in 21472
* [Trainer] add disclaimer that full_determinism is slow by stas00 in 22368
* [safetensors] don't use in `torch<1.10` by stas00 in 22370
* TensorFlow: additional missing `cmake` dependencies in CI by gante in 22383
* Changed world_size() to get_world_size() bugfix by Charlie-Bell in 22381
* Translated documentation in italian by nickprock in 22388
* Adapt find_tied_parameters to handle breaking change in Accelerate by sgugger in 22360
* load_in_8bit now respects 'balanced' device maps in multi-gpu environments by kooshi in 22377
* Wav2Vec2ProcessorWithLM can return N best hypotheses now by vsokolovskii in 22235
* Seq2seq trainer generation config arg by Natooz in 22323
* Generate: support for left-padding on GPTNeoX and Llama by gante in 22382
* [`bnb`] Force `requires_grad` to be `False` by younesbelkada in 22396
* Transformers env safetensors by sgugger in 22400
* [Pix2Struct] Add support to resize embeddings by NielsRogge in 22394
* Trainer: move Seq2SeqTrainer imports under the typing guard by gante in 22401
* Trainer: missing None check by gante in 22404
* Hardware Auto-Setup for Examples by dongreenberg in 22319
* [neptune] fix checkpoint bug with relative out_dir by kshitij12345 in 22102
* Fix bug in perplexity guide calculations and update perplexity numbers. Fixes 22348 by fpgaminer in 22411
* [performance] ensure `causal_mask` is created directly on device by jeffra in 22378
* MBart: Fix docs and doctests by gante in 22422
* Add clean_up_tokenization_spaces to config by ArthurZucker in 22341
* Hyperparameter search reporting to W&B by NoB0 in 22440
* [`bnb`] fix bnb failing test by younesbelkada in 22439
* [`Generate`] Add conditional generation for multimodal models by younesbelkada in 22424
* Don't hard error when cache version can't be converted to int by sgugger in 22427
* Use real tokenizers if tiny version(s) creation has issue(s) by ydshieh in 22428
* Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by sgugger in 22444
* [`Pix2Struct`] Fix slow test by younesbelkada in 22448
* Revert "Fix --bf16 option support for Neuron after PR 22300" by jeffhataws in 22451
* Update Neptune docs by normandy7 in 22452
* Avoid using personal HF token in CI by ydshieh in 22453
* Update release instructions by sgugger in 22454
* Pin ruff by sgugger in 22455
* Update: ignore padding support for TransfoXL training when n_clusters==0 by StefanHeng in 22457
* Rescale image back if it was scaled during PIL conversion by amyeroberts in 22458
* Skip flaky NLLB Moe test for now by amyeroberts in 22463
* Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by hvaara in 22285
* [NLLB-MoE] `model_type` update for auto mapping by ArthurZucker in 22470
* Llama: support for `max_position_embeddings` by gante in 22471
* Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by manueldeprada in 22473
* (Re-)Enable Nightly + Past CI by ydshieh in 22393
* Relax `eos_token_id < 0` checks in `generate()` from `ValueError` to warning by lewtun in 22472
* Update `Wav2Vec2ProcessorWithLM` doc example by ydshieh in 22474
* Making sure we can use safetensors to serialize all the time. by Narsil in 22437
* Update Neptune callback docstring by normandy7 in 22497
* Test fetch v2 by sgugger in 22367
* Update convert_llama_weights_to_hf.py by Ricardokevins in 22525
* [Time-Series] fix past_observed_mask type by elisim in 22076
* Fix llama tokenizer by ArthurZucker in 22402
* [WIP] docs: ko: sagemaker.mdx by jungnerd in 22509
* added biogpt token classifier by upjabir in 22447
* Generate: `TextIteratorStreamer` (streamer for gradio) by gante in 22501
* Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by larekrow in 22526
* llama docs: fix conversion script url by python273 in 22514
* fix LayoutLMv3TokenizerFast subword label after 'Δ ' token by thibaultdouzon in 21695
* [BLIP] fix cross attentions for BlipTextEncoder by zhbh01 in 22515
* [`Trainer`] Force `is_model_parallel` when model is loaded in multiple GPUs using `accelerate` by younesbelkada in 22532
* [`T5`] Enable naive Pipeline Parallelism training for T5 by younesbelkada in 22535
* Fix missing metrics with multiple eval datasets by hawkeoni in 22536
* [setup] drop deprecated `distutils` usage by XuehaiPan in 22531
* Generate: Enable easier TextStreamer customization by vblagoje in 22516
* [setup] migrate setup script to `pyproject.toml` by XuehaiPan in 22539
* Update test_image_processing_pix2struct.py by younesbelkada in 22543
* Fix OPTForQuestionAnswering doc string by curlup in 22481
* Generate: Add text streamer decoding options by gante in 22544
* π₯py38 + torch 2 π₯π₯π₯π by ydshieh in 22204
* Time to Say Goodbye, torch 1.7 and 1.8 by ydshieh in 22291
* [Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by TheWall9 in 22416
* Implemented safetensors checkpoints save/load for Trainer by ViktorooReps in 22498
* Remove hack for dynamic modules and use Python functions instead by sgugger in 22537
* [`bnb`] Fix typo by younesbelkada in 22556
* Add id2label and label2id to model's config in run_xnil by maziyarpanahi in 22558
* Soft error whisper. by Narsil in 22475
* corrected the code comment for the output of find_pruneable_heads_and_indices by SunHaozhe in 22557
* Flax Regnet by Shubhamai in 21867
* fix `_no_split_modules` for Whisper model by pacman100 in 22486
* Fix inverted conditional in TF common test! by Rocketknight1 in 22540
* Generate: `TextIteratorStreamer` timeout by gante in 22576
* Move back doctest instructions to setup.cfg by sgugger in 22587
* Tests: disable `accelerate_tests` mark warnings by gante in 22585
* Fix PT-TF equivalence test for GPT1 by Rocketknight1 in 22586
* Add thousands separator in training summary by qmeeus in 22583
* docs: ko: complete `_toctree.yml` by wonhyeongseo in 22581
* Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by mpenagar in 21926
* Fix a typo in one of the BLIP pretrained checkpoint names by Rocketknight1 in 22588
* Adding support for BPE merge creation from scores instead of ids. by Narsil in 22582
* Use native TF checkpoints for the BLIP TF tests by Rocketknight1 in 22593
* feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by kaustubh-s1 in 22591
* Adding Llama FastTokenizer support. by Narsil in 22264
* Revert error back into warning for byte fallback conversion. by Narsil in 22607
* Seq2SeqTrainer: use unwrapped model to retrieve the generation config by gante in 22584
* Make tiny model creation + pipeline testing more robust by ydshieh in 22500
* docs: Fix broken link to generation strategies by connor-henderson in 22623
* update_pip_test_mapping by ydshieh in 22606
* A script to add/update `pipeline_model_mapping` systematically by ydshieh in 22180
* [`bnb`] 8bit models should not be converted to `DDP` by younesbelkada in 22628
* LlamaTokenizerFast Fix (.., from_slow=True). by Narsil in 22630
* [`Blip`] Fix slow tests and doctests with correct values by younesbelkada in 22632
* Update tiny model summary file for recent models by ydshieh in 22637
* fix FSDP version related issues by pacman100 in 22489
* π[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` by gabrielwithappy in 22533
* Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by xssChauhan in 22596
* Fix typo by Ronalmoo in 22650
* Fix `MegaModel` CI by ydshieh in 22652
* π [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean by wonhyeongseo in 22508
* Small nit, by ArthurZucker in 22653
* [tokenization] do not push special file by ArthurZucker in 22657
* [OPT] Fix default attention mask size by ArthurZucker in 22649
* Generate: add API warning to streamers by gante in 22659
* Revert migration of setup to pyproject.toml by sgugger in 22658
* moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by iamarunbrahma in 22663
* Model parallelism: Moving labels to the same device as logits for BridgeTower models by shahad-mahmud in 22676
* (feat): Moving labels to same device as logits for Deit by xssChauhan in 22679
* Make dynamic code work with offline mode by sgugger in 22661
* Fix quantization docs typo by python273 in 22666
* use __func__ to check can_generate by xin3he in 22643
* add GPTNeoXForSequenceClassification by Asugawara in 22671
* Model parallelism: Moving labels to same devices as the logits are by shahad-mahmud in 22691
* Update some `MarkupLM` tests' expected values by ydshieh in 22667
* Make it easier to develop without a dev install by sgugger in 22697
* Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by mayankagarwals in 22702
* Clarify stride option by luccailliau in 22684
* Remove 2 failing ONNX conversion tests by ydshieh in 22660
* Replace -100s in predictions by the pad token by sgugger in 22693
* Fix decorator order by ydshieh in 22708
* Update input values for docstring by amyeroberts in 22631
* remove wrong doc in readme by ArthurZucker in 22723
* Added parallel device usage for GPT-J by jprivera44 in 22713
* add model resources for CPMAnt (new) by pioliverse in 20906
* Modify pipeline_tutorial.mdx by ARKA1112 in 22726
* [tests] switch to torchrun by stas00 in 22712
* `torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed by michaelbenayoun in 22728
* add fast support and option by ArthurZucker in 22724
* Update warning levels by NielsRogge in 22727
* Fix docstrings for TF BLIP by Rocketknight1 in 22618
* [Doctest] Add configuration_m2m_100.py by elabongaatuo in 22733
* [Doctest] Add configuration_mvp.py by elabongaatuo in 22735
* Indexing fix for gpt_bigcode by jlamypoirier in 22737
* Make vilt, switch_transformers compatible with model parallelism by Xrenya in 22703
* [Pix2struct] Simplify generation by NielsRogge in 22527
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* zphang
* LLaMA Implementation (21955)
* Seb0
* fix(docs): fix task guide links in model docs (22226)
* mnaylor5
* Add Mega: Moving Average Equipped Gated Attention (21766)
* Shubhamai
* Resnet flax (21472)
* Flax Regnet (21867)
* wonhyeongseo
* docs: ko: complete `_toctree.yml` (22581)
* π [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean (22508)
* jlamypoirier
* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (22575)
* Indexing fix for gpt_bigcode (22737)
* pioliverse
* add model resources for CPMAnt (new) (20906)