Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 19 of 30

4.8.2

Not secure
- Rename detr targets to labels 12280 (NielsRogge)
- fix ids_to_tokens naming error in tokenizer of deberta v2 12412 (hjptriplebee)
- Add option to save on each training node 12421 (sgugger)

4.8.1

Not secure
- Fix default for TensorBoard folder
- Ray Tune install 12338
- Tests fixes for Torch FX 12336

4.8.0

Not secure
Integration with the Hub

Our example scripts and Trainer are now optimized for publishing your model on the [Hugging Face Hub](https://huggingface.co/models), with Tensorboard training metrics, and an automatically authored model card which contains all the relevant metadata, including evaluation results.

Trainer Hub integration

Use --push_to_hub to create a model repo for your training and it will be saved with all relevant metadata at the end of the training.

Other flags are:
- `push_to_hub_model_id` to control the repo name
- `push_to_hub_organization` to specify an organization

Visualizing Training metrics on huggingface.co (based on Tensorboard)

By default if you have `tensorboard` installed the training scripts will use it to log, and the logging traces folder is conveniently located inside your model output directory, so you can push them to your model repo by default.

Any model repo that contains Tensorboard traces will spawn a Tensorboard server:

![image](https://user-images.githubusercontent.com/35901082/123144141-5c2af980-d429-11eb-8438-16b374d2fe73.png)

which makes it very convenient to see how the training went! This Hub feature is in Beta so let us know if anything looks weird :)

See this [model repo](https://huggingface.co/julien-c/reactiongif-roberta/tensorboard)

Model card generation

![image](https://user-images.githubusercontent.com/35901082/123144222-749b1400-d429-11eb-97f6-9834dcc97c6d.png)

The model card contains info about the datasets used, the eval results, ...

Many users were already adding their eval results to their model cards in markdown format, but this is a more structured way of adding them which will make it easier to parse and e.g. represent in leaderboards such as the ones on Papers With Code!

We use a format specified in collaboration with [PaperswithCode] (https://github.com/huggingface/huggingface_hub/blame/main/modelcard.md), see also [this repo](https://github.com/paperswithcode/model-index).

Model, tokenizer and configurations

All models, tokenizers and configurations having a revamp `push_to_hub()` method as well as a `push_to_hub` argument in their `save_pretrained()` method. The workflow of this method is changed a bit to be more like git, with a local clone of the repo in a folder of the working directory, to make it easier to apply patches (use `use_temp_dir=True` to clone in temporary folders for the same behavior as the experimental API).

- Clean push to hub API 12187 (sgugger)

Flax/JAX support

Flax/JAX is becoming a fully supported backend of the Transformers library with more models having an implementation in it. BART, CLIP and T5 join the already existing models, find the whole list [here](https://huggingface.co/transformers/#supported-frameworks).

- [Flax] FlaxAutoModelForSeq2SeqLM 12228 (patil-suraj)
- [FlaxBart] few small fixes 12247 (patil-suraj)
- [FlaxClip] fix test from/save pretrained test 12284 (patil-suraj)
- [Flax] [WIP] allow loading head model with base model weights 12255 (patil-suraj)
- [Flax] Fix flax test save pretrained 12256 (patrickvonplaten)
- [Flax] Add jax flax to env command 12251 (patrickvonplaten)
- add FlaxAutoModelForImageClassification in main init 12298 (patil-suraj)
- Flax T5 12150 (vasudevgupta7)
- [Flax T5] Fix weight initialization and fix docs 12327 (patrickvonplaten)
- Flax summarization script 12230 (patil-suraj)
- FlaxBartPretrainedModel -> FlaxBartPreTrainedModel 12313 (sgugger)

General improvements and bug fixes

- AutoTokenizer: infer the class from the tokenizer config if possible 12208 (sgugger)
- update desc for map in all examples 12226 (bhavitvyamalik)
- Depreciate pythonic Mish and support PyTorch 1.9 version of Mish 12240 (digantamisra98)
- [t5 doc] make the example work out of the box 12239 (stas00)
- Better CI feedback 12279 (LysandreJik)
- Fix for making student ProphetNet for Seq2Seq Distillation 12130 (vishal-burman)
- [DeepSpeed] don't ignore --adafactor 12257 (stas00)
- Tensorflow QA example 12252 (Rocketknight1)
- [tests] reset report_to to none, avoid deprecation warning 12293 (stas00)
- [trainer + examples] set log level from CLI 12276 (stas00)
- [tests] multiple improvements 12294 (stas00)
- Trainer: adjust wandb installation example 12291 (stefan-it)
- Fix and improve documentation for LEDForConditionalGeneration 12303 (ionicsolutions)
- [Flax] Main doc for event orga 12305 (patrickvonplaten)
- [trainer] 2 bug fixes and a rename 12309 (stas00)
- [docs] performance 12258 (stas00)
- Add CodeCarbon Integration 12304 (JetRunner)
- Optimizing away the `fill-mask` pipeline. 12113 (Narsil)
- Add output in a dictionary for TF `generate` method 12139 (stancld)
- Rewrite ProphetNet to adapt converting ONNX friendly 11981 (jiafatom)
- Add mention of the huggingface_hub methods for offline mode 12320 (LysandreJik)
- [Flax/JAX] Add how to propose projects markdown 12311 (patrickvonplaten)
- [TFWav2Vec2] Fix docs 12283 (chenht2010)
- Add all XxxPreTrainedModel to the main init 12314 (sgugger)
- Conda build 12323 (LysandreJik)
- Changed modeling_fx_utils.py to utils/fx.py for clarity 12326 (michaelbenayoun)

4.7.0

Not secure
DETR (NielsRogge)

Three new models are released as part of the DETR implementation: `DetrModel`, `DetrForObjectDetection` and `DetrForSegmentation`, in PyTorch.

DETR consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use things like region proposals, non-maximum suppression procedure, and anchor generation. Moreover, DETR can also be naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.

DETR can support any [timm](https://github.com/rwightman/pytorch-image-models) backbone.

The DETR model was proposed in [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.

- Add DETR 11653 (NielsRogge)
- Improve DETR 12147 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=detr

ByT5 (patrickvonplaten)

A new tokenizer is released as part of the ByT5 implementation: `ByT5Tokenizer`. It can be used with the T5 family of models.

The ByT5 model was presented in [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.

- ByT5 model 11971 (patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?search=byt5

RoFormer (JunnYu)

14 new models are released as part of the RoFormer implementation: `RoFormerModel`, `RoFormerForCausalLM`, `RoFormerForMaskedLM`, `RoFormerForSequenceClassification`, `RoFormerForTokenClassification`, `RoFormerForQuestionAnswering` and `RoFormerForMultipleChoice`, `TFRoFormerModel`, `TFRoFormerForCausalLM`, `TFRoFormerForMaskedLM`, `TFRoFormerForSequenceClassification`, `TFRoFormerForTokenClassification`, `TFRoFormerForQuestionAnswering` and `TFRoFormerForMultipleChoice`, in PyTorch and TensorFlow.

RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown improved performance on classification tasks with long texts. The RoFormer model was proposed in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.

- Add new model RoFormer (use rotary position embedding ) 11684 (JunnYu)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=roformer

HuBERT (patrickvonplaten)

HuBERT is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.

HuBERT was proposed in [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

Two new models are released as part of the HuBERT implementation: `HubertModel` and `HubertForCTC`, in PyTorch.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=hubert

- Hubert 11889 (patrickvonplaten)

Hugging Face Course - Part 1

On Monday, June 14th, 2021, we released the first part of the Hugging Face Course. The course is focused on the Hugging Face ecosystem, including `transformers`. Most of the material in the course is now linked from the `transformers` documentation which now includes videos to explain singular concepts.

- Add video links to the documentation 12162 (sgugger)
- Add link to the course 12229 (sgugger)

TensorFlow additions

The Wav2Vec2 model can now be used in TensorFlow:

- Adding TFWav2Vec2Model 11617 (will-rice)

PyTorch 1.9 support

- Add support for torch 1.9.0 12224 (LysandreJik )
- fix pt-1.9.0 add_ deprecation 12217 (stas00)

Notebooks

- NielsRogge has contributed five tutorials on the usage of BERT in his repository: [Transformers-Tutorials](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR)
- [Community Notebooks] Add Emotion Speech Noteboook 11900 (patrickvonplaten)

General improvements and bugfixes

- Vit deit fixes 11309 (NielsRogge)
- Enable option for subword regularization in more tokenizers. 11417 (PhilipMay)
- Fix gpt-2 warnings 11709 (LysandreJik)
- [Flax] Fix BERT initialization & token_type_ids default 11695 (patrickvonplaten)
- BigBird on TPU 11651 (vasudevgupta7)
- [T5] Add 3D attention mask to T5 model (2) (9643) 11197 (lexhuismans)
- Fix loading the best model on the last stage of training 11718 (vbyno)
- Fix T5 beam search when using parallelize 11717 (OyvindTafjord)
- [Flax] Correct example script 11726 (patrickvonplaten)
- Add Cloud details to README 11706 (marcvanzee)
- Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 11475 (michaelbenayoun)
- Improvements to Flax finetuning script 11727 (marcvanzee)
- Remove tapas model card 11739 (julien-c)
- Add visual + link to Premium Support webpage 11740 (julien-c)
- Issue with symbolic tracing for T5 11742 (michaelbenayoun)
- [BigBird Pegasus] Make tests faster 11744 (patrickvonplaten)
- Use new evaluation loop in TrainerQA 11746 (sgugger)
- Flax BERT fix token type init 11750 (patrickvonplaten)
- [TokenClassification] Label realignment for subword aggregation 11680 (Narsil)
- Fix checkpoint deletion 11748 (sgugger)
- Fix incorrect newline in 11650 11757 (oToToT)
- Add more subsections to main doc 11758 (patrickvonplaten)
- Fixed: Better names for nlp variables in pipelines' tests and docs. 11752 (01-vyom)
- add `dataset_name` to data_args and added accuracy metric 11760 (philschmid)
- Add Flax Examples and Cloud TPU README 11753 (avital)
- Fix a bug in summarization example which did not load model from config properly 11762 (tomy0000000)
- FlaxGPT2 11556 (patil-suraj)
- Fix usage of head masks by PT encoder-decoder models' `generate()` function 11621 (stancld)
- [T5 failing CI] Fix generate test 11770 (patrickvonplaten)
- [Flax MLM] Refactor run mlm with optax 11745 (patrickvonplaten)
- Add DOI badge to README 11771 (albertvillanova)
- Deprecate commands from the transformers-cli that are in the hf-cli 11779 (LysandreJik)
- Fix release utilpattern in conf.py 11784 (sgugger)
- Fix regression in regression 11785 (sgugger)
- A cleaner and more scalable implementation of symbolic tracing 11763 (michaelbenayoun)
- Fix failing test on Windows Platform 11589 (Lynx1820)
- [Flax] Align GLUE training script with mlm training script 11778 (patrickvonplaten)
- Patch recursive import 11812 (LysandreJik)
- fix roformer config doc 11813 (JunnYu)
- [Flax] Small fixes in `run_flax_glue.py` 11820 (patrickvonplaten)
- [Deepspeed] support `zero.Init` in `from_config` 11805 (stas00)
- Add flax text class colab 11824 (patrickvonplaten)
- Faster list concat for trainer_pt_utils.get_length_grouped_indices() 11825 (ctheodoris)
- Replace double occurrences as the last step 11367 (LysandreJik)
- [Flax] Fix PyTorch import error 11839 (patrickvonplaten)
- Fix reference to XLNet 11846 (sgugger)
- Switch mem metrics flag 11851 (sgugger)
- Fix flos single node 11844 (TevenLeScao)
- Fix two typos in docs 11852 (nickls)
- [Trainer] Report both steps and num samples per second 11818 (sgugger)
- Add some tests to the slow suite 11860 (LysandreJik)
- Enable memory metrics in tests that need it 11859 (LysandreJik)
- fixed a small typo in the CONTRIBUTING doc 11856 (stsuchi)
- typo 11858 (WrRan)
- Add option to log only once in multinode training 11819 (sgugger)
- [Wav2Vec2] SpecAugment Fast 11764 (patrickvonplaten)
- [lm examples] fix overflow in perplexity calc 11855 (stas00)
- [Examples] create model with custom config on the fly 11798 (stas00)
- [Wav2Vec2ForCTC] example typo fixed 11878 (madprogramer)
- [AutomaticSpeechRecognitionPipeline] Ensure input tensors are on device 11874 (francescorubbo)
- Fix usage of head masks by TF encoder-decoder models' `generate()` function 11775 (stancld)
- Correcting comments in T5Stack to reflect correct tuple order 11330 (talkhaldi)
- [Flax] Allow dataclasses to be jitted 11886 (patrickvonplaten)
- changing find_batch_size to work with tokenizer outputs 11890 (joerenner)
- Link official Cloud TPU JAX docs 11892 (avital)
- Flax Generate 11777 (patrickvonplaten)
- Update deepspeed config to reflect hyperparameter search parameters 11896 (Mindful)
- Adding new argument `max_new_tokens` for generate. 11476 (Narsil)
- Added Sequence Classification class in GPTNeo 11906 (bhadreshpsavani)
- [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 11918 (jayendra13)
- Test optuna and ray 11924 (LysandreJik)
- Use `self.assertEqual` instead of `assert` in deberta v2 test. 11935 (PhilipMay)
- Remove redundant `nn.log_softmax` in `run_flax_glue.py` 11920 (n2cholas)
- Add MT5ForConditionalGeneration as supported arch. to summarization README 11961 (PhilipMay)
- Add FlaxCLIP 11883 (patil-suraj)
- RAG-2nd2end-revamp 11893 (shamanez)
- modify qa-trainer 11872 (zhangfanTJU)
- get_ordinal(local=True) replaced with get_local_ordinal() in training_args.py 11922 (BassaniRiccardo)
- reinitialize wandb config for each hyperparameter search run 11945 (Mindful)
- Add regression tests for slow sentencepiece tokenizers. 11737 (PhilipMay)
- Authorize args when instantiating an AutoModel 11956 (LysandreJik)
- Neptune.ai integration 11937 (vbyno)
- [deepspeed] docs 11940 (stas00)
- typo correction 11973 (JminJ)
- Typo in usage example, changed to device instead of torch_device 11979 (albertovilla)
- [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` 11966 (stas00)
- [Trainer] add train loss and flops metrics reports 11980 (stas00)
- Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert 11983 (dependabot[bot])
- [RAG] Fix rag from pretrained question encoder generator behavior 11962 (patrickvonplaten)
- Fix examples in VisualBERT docs 11990 (gchhablani)
- [docs] fix xref to `PreTrainedModel.generate` 11049 (stas00)
- Update return introduction of `forward` method 11976 (kouyk)
- [deepspeed] Move code and doc into standalone files 11984 (stas00)
- [deepspeed] add nvme test skip rule 11997 (stas00)
- Fix weight decay masking in `run_flax_glue.py` 11964 (n2cholas)
- [Flax] Refactor MLM 12013 (patrickvonplaten)
- [Deepspeed] Assert on mismatches between ds and hf args 12021 (stas00)
- [TrainerArguments] format and sort __repr__, add __str__ 12018 (stas00)
- Fixed Typo in modeling_bart.py 12035 (ceevaaa)
- Fix deberta 2 Tokenizer Integration Test 12017 (PhilipMay)
- fix past_key_values docs 12049 (patil-suraj)
- [JAX] Bump jax lib 12053 (patrickvonplaten)
- Fixes bug that appears when using QA bert and distilation. 12026 (madlag)
- Extend pipelines for automodel tupels 12025 (Narsil)
- Add optional grouped parsers description to HfArgumentParser 12042 (peteriz)
- adds metric prefix. 12057 (riklopfer)
- [CI] skip failing test 12059 (stas00)
- Fix LUKE integration tests 12066 (NielsRogge)
- Fix tapas issue 12063 (NielsRogge)
- updated the original RAG implementation to be compatible with latest Pytorch-Lightning 11806 (shamanez)
- Replace legacy tensor.Tensor with torch.tensor/torch.empty 12027 (mariosasko)
- Add torch to requirements.txt in language-modeling 12040 (cdleong)
- Properly indent block_size 12070 (sgugger)
- [Deepspeed] various fixes 12058 (stas00)
- [Deepspeed Wav2vec2] integration 11638 (stas00)
- Update run_ner.py with id2label config 12001 (KoichiYasuoka)
- [wav2vec2 / Deepspeed] sync LayerDrop for Wav2Vec2Encoder + tests 12076 (stas00)
- [test] support more than 2 gpus 12074 (stas00)
- Wav2Vec2 Pretraining 11306 (anton-l)
- [examples/flax] pass decay_mask fn to optimizer 12087 (patil-suraj)
- [versions] rm require_version_examples 12088 (stas00)
- [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests 12089 (patrickvonplaten)
- Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args 12083 (kumapo)
- CLIPFeatureExtractor should resize images with kept aspect ratio 11994 (TobiasNorlund)
- New TF GLUE example 12028 (Rocketknight1)
- Appending label2id and id2label to models for inference 12102 (Rocketknight1)
- Fix a condition in test_generate_with_head_masking 11911 (stancld)
- [Flax] Adding Visual-Transformer 11951 (jayendra13)
- add relevant description to tqdm in examples 11927 (bhavitvyamalik)
- Fix head masking generate tests 12110 (patrickvonplaten)
- Flax CLM script 12023 (patil-suraj)
- Add from_pretrained to dummy timm objects 12097 (LysandreJik)
- Fix t5 error message 12136 (cccntu)
- Fix megatron_gpt2 attention block's causal mask 12007 (novatig)
- Add mlm pretraining xla torch readme 12011 (patrickvonplaten)
- add readme for flax clm 12111 (patil-suraj)
- [Flax] Add FlaxBart models 11537 (stancld)
- Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer 11810 (SaulLu)
- [Flax] Add links to google colabs 12146 (patrickvonplaten)
- Don't log anything before logging is setup in examples 12121 (sgugger)
- Use text_column_name variable instead of "text" 12132 (nbroad1881)
- [lm examples] Replicate --config_overrides addition to other LM examples 12135 (kumar-abhishek)
- [Flax] fix error message 12148 (patil-suraj)
- [optim] implement AdafactorSchedule 12123 (stas00)
- [style] consistent nn. and nn.functional 12124 (stas00)
- [Flax] Fix flax pt equivalence tests 12154 (patrickvonplaten)
- [style] consistent nn. and nn.functional: part2: templates 12153 (stas00)
- Flax Big Bird 11967 (vasudevgupta7)
- [style] consistent nn. and nn.functional: part 3 `tests` 12155 (stas00)
- [style] consistent nn. and nn.functional: part 4 `examples` 12156 (stas00)
- consistent nn. and nn.functional: part 5 docs 12161 (stas00)
- [Flax generate] Add params to generate 12171 (patrickvonplaten)
- Use a released version of optax rather than installing from Git. 12173 (avital)
- Have dummy processors have a `from_pretrained` method 12145 (LysandreJik)
- Add course banner 12157 (sgugger)
- Enable add_prefix_space on run_ner if necessary 12116 (kumapo)
- Update AutoModel classes in summarization example 12178 (ionicsolutions)
- Ray Tune Integration Updates 12134 (amogkam)
- [testing] ensure concurrent pytest workers use a unique port for torch.dist 12166 (stas00)
- Model card defaults 12122 (sgugger)
- Temporarily deactivate torch-scatter while we wait for new release 12181 (LysandreJik)
- Temporarily deactivate torchhub test 12184 (sgugger)
- [Flax] Add Beam Search 12131 (patrickvonplaten)
- updated DLC images and sample notebooks 12191 (philschmid)
- Enabling AutoTokenizer for HubertConfig. 12198 (Narsil)
- Use yaml to create metadata 12185 (sgugger)
- [Docs] fixed broken link 12205 (bhadreshpsavani)
- Pipeline update & tests 12207 (LysandreJik)

4.6.1

Not secure
Fix regression in models for sequence classification used for regression tasks 11785
Fix checkpoint deletion when load_bert_model_at_end = True 11748
Fix evaluation in question answering examples 11746
Fix release utils 11784

4.6.0

Not secure
Transformers aren't just for text - they can handle a huge range of input types, and there's been a flurry of papers and new models in the last few months applying them to vision tasks that had traditionally been dominated by convolutional networks. With this release, we're delighted to announce that several state-of-the-art pretrained vision and multimodal text+vision transformer models are now accessible in the huggingface/transformers repo. Give them a try!

ViT (NielsRogge)

Two new models are released as part of the ViT implementation: `ViTModel` and `ViTForImageClassification`, in PyTorch.

ViT is an image transformer-based model obtaining state-of-the-art results on image classification tasks. It was the first paper that successfully trained a Transformer encoder on ImageNet, attaining very good results compared to familiar convolutional architectures.

The Vision Transformer (ViT) model was proposed in An [Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=vit

DeiT (NielsRogge)

Three new models are released as part of the DeiT implementation: `DeiTModel`, `DeiTForImageClassification` and `DeiTForImageClassificationWithTeacher`, in PyTorch.

DeiT is an image transformer model similar to the ViT model. DeiT (data-efficient image transformers) models are more efficiently trained transformers for image classification, requiring far less data and far less computing resources compared to the original ViT models.

The DeiT model was proposed in [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deit

- Add DeiT (PyTorch) 11056 (NielsRogge)

CLIP (patil-suraj)

Three new models are released as part of the CLIP implementation: `CLIPModel`, `CLIPVisionModel` and `CLIPTextModel`, in PyTorch.

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.

The CLIP model was proposed in [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=clip

- CLIP 11445 (patil-suraj)

BigBirdPegasus (vasudevgupta7)

BigBird is a sparse-attention-based transformer that extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it has been shown that applying sparse, global, and random attention approximates full attention while being computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context, BigBird has shown improved performance on various long document NLP tasks, such as question answering and summarization, compared to BERT or RoBERTa.

The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others.

- Add BigBirdPegasus 10991 (vasudevgupta7)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=bigbird_pegasus

LUKE (NielsRogge, ikuyamada)

LUKE is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps improve performance on various downstream tasks involving reasoning about entities such as named entity recognition, extractive and cloze-style question answering, entity typing, and relation classification.

The LUKE model was proposed in LUKE: [Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto.

- Add LUKE 11223 (NielsRogge, ikuyamada)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=luke

Megatron (jdemouth)

The MegatronBERT model is added to the library, giving access to the 345m variants.

It is implemented comes with nine different models: `MegatronBertModel`, `MegatronBertForMaskedLM`, `MegatronBertForCausalLM`, `MegatronBertForNextSentencePrediction`, `MegatronBertForPreTraining`, `MegatronBertForSequenceClassification`, `MegatronBertForMultipleChoice`, `MegatronBertForTokenClassification`, `MegatronBertForQuestionAnswering`, in PyTorch.

The MegatronBERT model was proposed in [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

- Add nvidia megatron models 10911 (jdemouth)

Hub integration in Transformers

The Hugging Face Hub integrates better within `transformers`, through two new added features:
- Models, configurations and tokenizers now have a `push_to_hub` method to automatically push their state to the hub.
- The `Trainer` can now automatically push its underlying model, configuration and tokenizer in a similar fashion. Additionally, it is able to create a draft of model card on the fly with the training hyperparameters and evaluation results.

- Auto modelcard 11599 (sgugger)
- Trainer push to hub 11328 (sgugger)

DeepSpeed ZeRO Stage 3 & ZeRO-Infinity

The `Trainer` now integrates two additional stages of ZeRO: ZeRO stage 3 for parameter partitioning, and ZeRO Infinity which extends CPU Offload with NVMe Offload.

- [DeepSpeed] ZeRO Stage 3 10753 (stas00) [release notes](https://github.com/huggingface/transformers/issues/11044)
- [Deepspeed] ZeRO-Infinity integration plus config revamp 11418 (stas00) [release notes](https://github.com/huggingface/transformers/issues/11464)
- PLease read both release notes for configuration file changes


Flax

Flax support is getting more robust, with model code stabilizing and new models being added to the library.

- [FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py 11470 (patrickvonplaten)
- [Flax] Add Electra models 11426 (CoderPat)
- Adds Flax BERT finetuning example on GLUE 11564 (marcvanzee)

TensorFlow

We welcome Rocketknight1 as a TensorFlow contributor. This version includes a brand new TensorFlow example based on Keras, which will be followed by examples covering most tasks.
Additionally, more TensorFlow setups are covered by adding support for AMD-based GPUs and M1 Macs.

- Merge new TF example script 11360 (Rocketknight1)
- Update TF text classification example 11496 (Rocketknight1)
- run_text_classification.py fix 11660 (Rocketknight1)
- Accept tensorflow-rocm package when checking TF availability 11595 (mvsjober)
- Add MacOS TF version 11674 (jplu)

Pipelines

Two new pipelines are added:

- Adding `AutomaticSpeechRecognitionPipeline`. 11337 (Narsil)
- Add the ImageClassificationPipeline 11598 (LysandreJik)

Notebooks

- [Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips 11142 (Muennighoff)
- add bigbird-pegasus evaluation notebook 11654 (vasudevgupta7)
- Vit notebooks + vit/deit fixes 11309 (NielsRogge)

General improvements and bugfixes

- [doc] gpt-neo 11098 (stas00)
- Auto feature extractor 11097 (sgugger)
- accelerate question answering examples with no trainer 11091 (theainerd)
- dead link fixed 11103 (cronoik)
- GPTNeo: handle padded wte (11078) 11079 (leogao2)
- fix: The 'warn' method is deprecated 11105 (stas00)
- [examples] fix white space 11099 (stas00)
- Dummies multi backend 11100 (sgugger)
- Some styling of the training table in Notebooks 11118 (sgugger)
- Adds a note to resize the token embedding matrix when adding special … 11120 (LysandreJik)
- [BigBird] fix bigbird slow tests 11109 (vasudevgupta7)
- [versions] handle version requirement ranges 11110 (stas00)
- Adds use_auth_token with pipelines 11123 (philschmid)
- Fix and refactor check_repo 11127 (sgugger)
- Fix typing error in Trainer class (prediction_step) 11138 (jannisborn)
- Typo fix of the name of BertLMHeadModel in BERT doc 11133 (forest1988)
- [run_clm] clarify why we get the tokenizer warning on long input 11145 (stas00)
- [trainer] solve "scheduler before optimizer step" warning 11144 (stas00)
- Add fairscale and deepspeed back to the CI 11147 (LysandreJik)
- Updates SageMaker docs for updating DLCs 11140 (philschmid)
- Don't duplicate logs in TensorBoard and handle --use_env 11141 (sgugger)
- Run mlm pad to multiple for fp16 11128 (ak314)
- [tests] relocate core integration tests 11146 (stas00)
- [setup] extras[docs] must include 'all' 11148 (stas00)
- Add support for multiple models for one config in auto classes 11150 (sgugger)
- [setup] make fairscale and deepspeed setup extras 11151 (stas00)
- typo 11152 (stas00)
- Fix LogitsProcessor documentation 11130 (k-tahiro)
- Correct typographical error in README.md 11161 (Seyviour)
- Make `get_special_tokens_mask` consider all tokens 11163 (sgugger)
- Add a special tokenizer for CPM model 11068 (JetRunner)
- [examples/translation] support mBART-50 and M2M100 fine-tuning 11170 (patil-suraj)
- [examples run_clm] fix _LazyModule hasher error 11168 (stas00)
- added json dump and extraction of train run time 11167 (philschmid)
- Minor typos fixed 11182 (cronoik)
- model_path should be ignored as the checkpoint path 11157 (tsuchm)
- Added documentation for data collator. 10941 (fghuman)
- Fix typo 11188 (tma15)
- Replaced `which` with `who` 11183 (cronoik)
- Import torch.utils.checkpoint in ProphetNet 11214 (LysandreJik)
- Sagemaker test docs update for framework upgrade 11206 (philschmid)
- Use MSELoss with single class label in (M)BartForSequenceClassification 11178 (calpt)
- wav2vec2 converter: create the proper vocab.json while converting fairseq wav2vec2 finetuned model 11041 (cceyda)
- Add Matt as the TensorFlow reference 11212 (LysandreJik)
- Fix GPT-2 warnings 11213 (LysandreJik)
- fix docs for decoder_input_ids 11221 (patil-suraj)
- Add documentation for BertJapanese 11219 (forest1988)
- Replace error by warning when loading an architecture in another 11207 (sgugger)
- Refactor GPT2 11225 (patil-suraj)
- Doc check: a bit of clean up 11224 (sgugger)
- added cache_dir=model_args.cache_dir to all example with cache_dir arg 11220 (philschmid)
- Avoid using no_sync on SageMaker DP 11229 (sgugger)
- Indent code block in the documentation 11233 (sgugger)
- Run CI on deepspeed and fairscale 11172 (LysandreJik)
- [Deepspeed] zero3 tests band aid 11235 (stas00)
- Wav2Vec2 CommonVoice training - Save the processor before training starts 10910 (Nithin-Holla)
- Make "embeddings" plural in warning message within tokenization_utils_base 11228 (jstremme)
- Stale bot updated 10562 (LysandreJik)
- Close open files to suppress ResourceWarning 11240 (parakalan)
- Fix dimention misspellings. 11238 (odellus)
- Add prefix to examples in model_doc rst 11226 (forest1988)
- [troubleshooting] add 2 points of reference to the offline mode 11236 (stas00)
- Fix 10128 11248 (sgugger)
- [deepspeed] test on one node 2 gpus max 11237 (stas00)
- Trainer iterable dataset 11254 (sgugger)
- Adding pipeline task aliases. 11247 (Narsil)
- Support for set_epoch in IterableDataset 11258 (sgugger)
- Tokenizer fast save 11234 (sgugger)
- update dependency_versions_table 11273 (stas00)
- Workflow fixes 11270 (LysandreJik)
- Enabling multilingual models for translation pipelines. 10536 (Narsil)
- Trainer support for IterableDataset for evaluation and predict 11286 (sgugger)
- move device statements outside if statements 11292 (e-yi)
- modify double considering special tokens in `language_modeling.py` 11275 (taepd)
- [Trainer] fix the placement on device with fp16_full_eval 11322 (stas00)
- [Trainer] Add a progress bar for batches skipped 11324 (sgugger)
- Load checkpoint without re-creating the model 11318 (sgugger)
- Added translation example script 11196 (rajvi-k)
- [Generate] Remove outdated code 11331 (patrickvonplaten)
- [GPTNeo] create local attention mask ones 11335 (patil-suraj)
- Update to use datasets remove_cloumns method 11343 (sgugger)
- Add an error message for Reformer w/ .backward() 11117 (forest1988)
- Removed `max_length` from being mandatory within `generate`. 11314 (Narsil)
- Honor contributors to models 11329 (sgugger)
- [deepspeed] fix resume from checkpoint 11352 (stas00)
- Examples reorg 11350 (sgugger)
- Extract metric_key_prefix during NotebookProgressCallback.on_evaluate 11347 (lewtun)
- [testing doc] bring doc up to date 11359 (stas00)
- Remove boiler plate code 11340 (patrickvonplaten)
- Move old TF text classification script to legacy 11361 (Rocketknight1)
- [contributing doc] explain/link to good first issue 11346 (stas00)
- Fix token_type_ids error for big_bird model. 11355 (wlhgtc)
- [Wav2Vec2] Fix special tokens for Wav2Vec2 tokenizer 11349 (patrickvonplaten)
- [Flax] Correct typo 11374 (patrickvonplaten)
- [run_translation.py] fix typo 11372 (johnson7788)
- Add space 11373 (tma15)
- Correctly cast num_train_epochs to int 11379 (Rocketknight1)
- Fix typo 11369 (penut85420)
- Fix Trainer with remove_unused_columns=False 11382 (sgugger)
- [Flax] Big FlaxBert Refactor 11364 (patrickvonplaten)
- [Flax] Typo 11393 (patrickvonplaten)
- [Flax] Correct Flax <=> PyTorch conversion 11394 (patrickvonplaten)
- Fix small typo in text 11396 (maksym-del)
- Fix typos in README for text-classification 11391 (yoshitomo-matsubara)
- [Blenderbot] Integration Test should be slow 11395 (patrickvonplaten)
- Fixed trainer total_flos relaoding in distributed mode 11383 (TevenLeScao)
- [Wav2Vec2] Correct conversion script 11400 (patrickvonplaten)
- added support for exporting of T5 models to onnx with past_key_values. 10651 (Ki6an)
- Fixing bug in generation 11297 (nicola-decao)
- Fix cross-attention head mask for Torch encoder-decoder models 10605 (stancld)
- Default to accuracy metric in run_glue_no_trainer 11405 (sgugger)
- Enable option for subword regularization in `XLMRobertaTokenizer` 11149 (PhilipMay)
- wrong parentclass in documentation 11410 (cronoik)
- EncoderDecoderConfigs should not create new objects 11300 (cronoik)
- Updating checkpoint for GPT2ForSequenceClassification 11334 11434 (abiolaTresor)
- [BigBird] enable BigBirdForQuestionAnswering to return pooler output 11439 (vasudevgupta7)
- Upgrade Black to version 21.4b0 11442 (patrickvonplaten)
- TF BART models - Add `cross_attentions` to model output and fix cross-attention head masking 10699 (stancld)
- Add basic support for FP16 in SageMaker model parallelism 11407 (sgugger)
- Fix link to the TPU launcher script in the pytorch examples 11427 (amineabdaoui)
- Typo fixes 11432 (LSinev)
- Pass along seed to DistributedSampler 11406 (sgugger)
- Clarify description of the is_split_into_words argument 11449 (kstathou)
- [docs] fix invalid class name 11438 (stas00)
- [Makefile] make sure to test against the local checkout 11437 (stas00)
- Give each hub test a different repo name 11453 (sgugger)
- [Examples] Fixes inconsistency around eval vs val and predict vs test 11380 (bhadreshpsavani)
- Variable Correction for Consistency in Distillation Example 11444 (jaimeenahn)
- Remove max length beam scorer 11378 (GeetDsa)
- update QuickTour docs to reflect model output object 11462 (hamelsmu)
- Finish Making Quick Tour respect the model object 11467 (hamelsmu)
- fix docs for decoder_input_ids 11466 (patil-suraj)
- Update min versions in README and add Flax 11472 (sgugger)
- Update `PreTrainedTokenizerBase` to check/handle batch length for `text_pair` parameter 11486 (hamelsmu)
- [Docs] remove paragraph on CI from installation instructions 11493 (hamelsmu)
- [Flax] Add docstrings & model outputs 11498 (patrickvonplaten)
- Reformat to make code clearer in tokenizer call 11497 (sgugger)
- solved coefficient issue for the TF version of gelu_fast 11514 (michaelbenayoun)
- Split checkpoint from model_name_or_path in examples 11492 (sgugger)
- Pin HuggingFace Hub dependency 11502 (LysandreJik)
- correct incorrect dimension comment in Longformer model 11494 (fredo838)
- Fix `sp_model_kwargs` param missing at unpickle in `XLMRobertaTokenizer` 11430 (PhilipMay)
- [Master] Make style 11520 (patrickvonplaten)
- Update README.md 11489 (mrm8488)
- T5 Gradient Checkpointing 11353 (ceshine)
- Implement Fast Tokenization for Deberta 11387 (ShubhamSanghvi)
- Accepts BatchEncoding in LengthGroupedSampler 11431 (tma15)
- Fix do_eval default value in training_args.py 11511 (bonniehyeon)
- [examples, translation/summerization] resize token embeds 11524 (patil-suraj)
- Run model templates on master 11527 (LysandreJik)
- [Examples] Added support for test-file in QA examples with no trainer 11510 (bhadreshpsavani)
- Add Stas and Suraj as authors 11526 (sgugger)
- Improve task summary docs 11513 (hamelsmu)
- [debug utils] activation/weights underflow/overflow detector 11274 (stas00)
- [DeepSpeed] fp32 support 11499 (stas00)
- Fix examples in M2M100 docstrings 11540 (lewtun)
- [Flax BERT/Roberta] few small fixes 11558 (patil-suraj)
- [Wav2Vec2] Fix convert 11562 (patrickvonplaten)
- Remove `datasets` submodule. 11563 (LysandreJik)
- fix the mlm longformer example by changing [MASK] to <mask> 11559 (fredo838)
- [Wav2vec2] Fixed tokenization mistakes while adding single-char tokens to tokenizer 11538 (Muktan)
- Fix metric computation in `run_glue_no_trainer` 11569 (sgugger)
- Fixes a useless warning in `generate`. 11566 (Narsil)
- Fix checkpointing in SageMaker MP 11481 (sgugger)
- Update training tutorial 11533 (sgugger)
- [Deepspeed] fix resize_token_embeddings 11572 (stas00)
- Add multi-class, multi-label and regression to transformers 11012 (abhi1thakur)
- add importlib_metadata as dependency as it is required for py<3.8 11490 (cdeepali)
- Enable added tokens 11325 (LysandreJik)
- Make quality scripts work when one backend is missing. 11573 (sgugger)
- Removes SageMakerTrainer code but keeps class as wrapper 11587 (philschmid)
- Reproducible checkpoint 11582 (sgugger)
- [trainer] document resume randomness 11588 (stas00)
- [template runner CI] copies need to be fixed too 11585 (stas00)
- add importlib_metadata and huggingface_hub as dependency in the conda recipe 11591 (cdeepali)
- Pytorch - Lazy initialization of models 11471 (patrickvonplaten)
- fix head_mask for albert encoder part(`AlbertTransformer`) 11596 (baeseongsu)
- Fix Python version 11607 (LysandreJik)
- fix typo in command 11605 (vipulraheja)
- Fix typo in docstring 11611 (eldarkurtic)
- Re-styling in seq2seq attention 11613 (sgugger)
- [Lazy init] Fix edge cases 11615 (patrickvonplaten)
- [cuda ext tests] fixing tests 11619 (stas00)
- Fix RNG saves in distributed mode. 11620 (sgugger)
- Fix comment in run_clm_no_trainer.py 11624 (cccntu)
- make fix copy 11627 (patrickvonplaten)
- Reduce to 1 worker and set timeout for GPU TF tests 11633 (LysandreJik)
- [self-push CI] sync with self-scheduled 11637 (stas00)
- [examples] fix sys.path in conftest.py 11636 (stas00)
- [Examples] Check key exists in datasets first 11503 (oToToT)
- [Examples] Fix invalid links after reorg 11650 (oToToT)
- Update code example 11631 (NielsRogge)
- Add missing git dependency for RAG example 11634 (lhoestq)
- updated user permissions based on umask 11119 (bhavitvyamalik)
- Big Bird Fast Tokenizer implementation 11075 (tanmaylaud)
- Save scaler state dict when checkpointing 11663 (sgugger)
- [BigBird Pegasus] Add config to auto tokenizer 11667 (patrickvonplaten)
- Fixes NoneType exception when topk is larger than one coupled with a small context in the Question-Answering pipeline 11628 (psorianom)
- Add --text_column to run_summarization_no_trainer 11673 (cccntu)
- Fix docstring of description about input_ids 11672 (nxznm)
- Grammar and style edits for the frontpage README 11679 (Rocketknight1)
- Fix TF Roberta for mixed precision training 11675 (jplu)
- Test checkpointing 11682 (sgugger)
- Fix clip docs 11694 (patil-suraj)
- [Flax] Updates README and fixes bug 11701 (marcvanzee)
- remove defaults to None if optional 11703 (PhilipMay)

Page 19 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.