Product Research Enterprise Plans Docs

Transformers

Latest version: v4.50.3

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 21 of 33

4.9.1

Not secure

Fix barrier for SM distributed 12853 (sgugger)

4.9.0

Not secure

v4.9.0: TensorFlow examples, CANINE, tokenizer training, ONNX rework

ONNX rework

This version introduces a new package, `transformers.onnx`, which can be used to export models to ONNX. Contrary to the previous implementation, this approach is meant as an easily extendable package where users may define their own ONNX configurations and export the models they wish to export.

bash
python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/

Validating ONNX model...
-[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'}
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 8, 768) matchs (2, 8, 768)
-[✓] all values close (atol: 0.0001)
- Validating ONNX Model output "pooler_output":
-[✓] (2, 768) matchs (2, 768)
-[✓] all values close (atol: 0.0001)
All good, model saved at: onnx/bert-base-cased/model.onnx

- [RFC] Laying down building stone for more flexible ONNX export capabilities 11786 (mfuntowicz)

CANINE model

Four new models are released as part of the CANINE implementation: `CanineForSequenceClassification`, `CanineForMultipleChoice`, `CanineForTokenClassification` and `CanineForQuestionAnswering`, in PyTorch.

The CANINE model was proposed in [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It’s among the first papers that train a Transformer without using an explicit tokenization step (such as Byte Pair Encoding (BPE), WordPiece, or SentencePiece). Instead, the model is trained directly at a Unicode character level. Training at a character level inevitably comes with a longer sequence length, which CANINE solves with an efficient downsampling strategy, before applying a deep Transformer encoder.

- Add CANINE 12024 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=canine

Tokenizer training

This version introduces a new method to train a tokenizer from scratch based off of an existing tokenizer configuration.

py
from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("wikitext", name="wikitext-2-raw-v1", split="train")
We train on batch of texts, 1000 at a time here.
batch_size = 1000
corpus = (dataset[i : i + batch_size]["text"] for i in range(0, len(dataset), batch_size))

tokenizer = AutoTokenizer.from_pretrained("gpt2")
new_tokenizer = tokenizer.train_new_from_iterator(corpus, vocab_size=20000)

- Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) 12420 (SaulLu)
- Easily train a new fast tokenizer from a given one 12361 (sgugger)

TensorFlow examples

The `TFTrainer` is now entering deprecation - and it is replaced by `Keras`. With version v4.9.0 comes the end of a long rework of the TensorFlow examples, for them to be more Keras-idiomatic, clearer, and more robust.

- NER example for Tensorflow 12469 (Rocketknight1)
- TF summarization example 12617 (Rocketknight1)
- Adding TF translation example 12667 (Rocketknight1)
- Deprecate TFTrainer 12706 (Rocketknight1)

TensorFlow implementations

HuBERT is now implemented in TensorFlow:

- Add TFHubertModel 12206 (will-rice)

Breaking changes

When `load_best_model_at_end` was set to `True` in the `TrainingArguments`, having a different `save_strategy` and `eval_strategy` was accepted but the `save_strategy` was overwritten by the `eval_strategy` (the option to keep track of the best model needs to make sure there is an evaluation each time there is a save). This led to a lot of confusion with users not understanding why the script was not doing what it was told, so this situation will now raise an error indicating to set `save_strategy` and `eval_strategy` to the same values, and in the case that value is `"steps"`, `save_steps` must be a round multiple of `eval_steps`.

General improvements and bugfixes

- UpdateDescription of TrainingArgs param save_strategy 12328 (sam-qordoba)
- [Deepspeed] new docs 12077 (stas00)
- [ray] try fixing import error 12338 (richardliaw)
- [examples/Flax] move the examples table up 12341 (patil-suraj)
- Fix torchscript tests 12336 (LysandreJik)
- Add flax/jax quickstart 12342 (marcvanzee)
- Fixed a typo in readme 12356 (MichalPitr)
- Fix exception in prediction loop occurring for certain batch sizes 12350 (jglaser)
- Add FlaxBigBird QuestionAnswering script 12233 (vasudevgupta7)
- Replace NotebookProgressReporter by ProgressReporter in Ray Tune run 12357 (krfricke)
- [examples] remove extra white space from log format 12360 (stas00)
- fixed multiplechoice tokenization 12362 (cronoik)
- [trainer] add main_process_first context manager 12351 (stas00)
- [Examples] Replicates the new --log_level feature to all trainer-based pytorch 12359 (bhadreshpsavani)
- [Examples] Update Example Template for `--log_level` feature 12365 (bhadreshpsavani)
- [Examples] Replace `print` statement with `logger.info` in QA example utils 12368 (bhadreshpsavani)
- Onnx export v2 fixes 12388 (LysandreJik)
- [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers 12371 (ionicsolutions)
- Update run_mlm.py 12344 (TahaAslani)
- Add possibility to maintain full copies of files 12312 (sgugger)
- [CI] add dependency table sync verification 12364 (stas00)
- [Examples] Added context manager to datasets map 12367 (bhadreshpsavani)
- [Flax community event] Add more description to readme 12398 (patrickvonplaten)
- Remove the need for `einsum` in Albert's attention computation 12394 (mfuntowicz)
- [Flax] Adapt flax examples to include `push_to_hub` 12391 (patrickvonplaten)
- Tensorflow LM examples 12358 (Rocketknight1)
- [Deepspeed] match the trainer log level 12401 (stas00)
- [Flax] Add T5 pretraining script 12355 (patrickvonplaten)
- [models] respect dtype of the model when instantiating it 12316 (stas00)
- Rename detr targets to labels 12280 (NielsRogge)
- Add out of vocabulary error to ASR models 12288 (will-rice)
- Fix TFWav2Vec2 SpecAugment 12289 (will-rice)
- [example/flax] add summarization readme 12393 (patil-suraj)
- [Flax] Example scripts - correct weight decay 12409 (patrickvonplaten)
- fix ids_to_tokens naming error in tokenizer of deberta v2 12412 (hjptriplebee)
- Minor fixes in original RAG training script 12395 (shamanez)
- Added talks 12415 (suzana-ilic)
- [modelcard] fix 12422 (stas00)
- Add option to save on each training node 12421 (sgugger)
- Added to talks section 12433 (suzana-ilic)
- Fix default bool in argparser 12424 (sgugger)
- Add default bos_token and eos_token for tokenizer of deberta_v2 12429 (hjptriplebee)
- fix typo in mt5 configuration docstring 12432 (fcakyon)
- Add to talks section 12442 (suzana-ilic)
- [JAX/Flax readme] add philosophy doc 12419 (patil-suraj)
- [Flax] Add wav2vec2 12271 (patrickvonplaten)
- Add test for a WordLevel tokenizer model 12437 (SaulLu)
- [Flax community event] How to use hub during training 12447 (patrickvonplaten)
- [Wav2Vec2, Hubert] Fix ctc loss test 12458 (patrickvonplaten)
- Comment fast GPU TF tests 12452 (LysandreJik)
- Fix training_args.py barrier for torch_xla 12464 (jysohn23)
- Added talk details 12465 (suzana-ilic)
- Add TPU README 12463 (patrickvonplaten)
- Import check_inits handling of duplicate definitions. 12467 (Iwontbecreative)
- Validation split added: custom data files sgugger, patil-suraj 12407 (Souvic)
- Fixing bug with param count without embeddings 12461 (TevenLeScao)
- [roberta] fix lm_head.decoder.weight ignore_key handling 12446 (stas00)
- Rework notebooks and move them to the Notebooks repo 12471 (sgugger)
- fixed typo in flax-projects readme 12466 (mplemay)
- Fix TAPAS test uncovered by 12446 12480 (LysandreJik)
- Add guide on how to build demos for the Flax sprint 12468 (osanseviero)
- Add `Repository` import to the FLAX example script 12501 (LysandreJik)
- [examples/flax] clip style image-text training example 12491 (patil-suraj)
- [Flax] Fix wav2vec2 pretrain arguments 12498 (Wikidepia)
- [Flax] ViT training example 12300 (patil-suraj)
- Fix order of state and input in Flax Quickstart README 12510 (navjotts)
- [Flax] Dataset streaming example 12470 (patrickvonplaten)
- [Flax] Correct flax training scripts 12514 (patrickvonplaten)
- [Flax] Correct logging steps flax 12515 (patrickvonplaten)
- [Flax] Fix another bug in logging steps 12516 (patrickvonplaten)
- [Wav2Vec2] Flax - Adapt wav2vec2 script 12520 (patrickvonplaten)
- [Flax] Fix hybrid clip 12519 (patil-suraj)
- [RoFormer] Fix some issues 12397 (JunnYu)
- FlaxGPTNeo 12493 (patil-suraj)
- Updated README 12540 (suzana-ilic)
- Edit readme 12541 (SaulLu)
- implementing tflxmertmodel integration test 12497 (sadakmed)
- [Flax] Adapt examples to be able to use eval_steps and save_steps 12543 (patrickvonplaten)
- [examples/flax] add adafactor optimizer 12544 (patil-suraj)
- [Flax] Add FlaxMBart 12236 (stancld)
- Add a warning for broken ProphetNet fine-tuning 12511 (JetRunner)
- [trainer] add option to ignore keys for the train function too (11719) 12551 (shabie)
- MLM training fails with no validation file(same as 12406 for pytorch now) 12517 (Souvic)
- [Flax] Allow retraining from save checkpoint 12559 (patrickvonplaten)
- Adding prepare_decoder_input_ids_from_labels methods to all TF ConditionalGeneration models 12560 (Rocketknight1)
- Remove tf.roll wherever not needed 12512 (szutenberg)
- Double check for attribute num_examples 12562 (sgugger)
- [examples/hybrid_clip] fix loading clip vision model 12566 (patil-suraj)
- Remove logging of GPU count etc from run_t5_mlm_flax.py 12569 (ibraheem-moosa)
- raise exception when arguments to pipeline are incomplete 12548 (hwijeen)
- Init pickle 12567 (sgugger)
- Fix group_lengths for short datasets 12558 (sgugger)
- Don't stop at num_epochs when using IterableDataset 12561 (sgugger)
- Fixing the pipeline optimization by reindexing targets (V2) 12330 (Narsil)
- Fix MT5 init 12591 (sgugger)
- [model.from_pretrained] raise exception early on failed load 12574 (stas00)
- [doc] fix broken ref 12597 (stas00)
- Add Flax sprint project evaluation section 12592 (osanseviero)
- This will reduce "Already borrowed error": 12550 (Narsil)
- [Flax] Add flax marian 12595 (patrickvonplaten)
- [Flax] Fix cur step flax examples 12608 (patrickvonplaten)
- Simplify unk token 12582 (sgugger)
- Fix arg count for partial functions 12609 (sgugger)
- Pass `model_kwargs` when loading a model in `pipeline()` 12449 (aphedges)
- [Flax] Fix mt5 auto 12612 (patrickvonplaten)
- [Flax Marian] Add marian flax example 12614 (patrickvonplaten)
- [FLax] Fix marian docs 2 12615 (patrickvonplaten)
- [debugging utils] minor doc improvements 12525 (stas00)
- [doc] DP/PP/TP/etc parallelism 12524 (stas00)
- [doc] fix anchor 12620 (stas00)
- [Examples][Flax] added test file in summarization example 12630 (bhadreshpsavani)
- Point to the right file for hybrid CLIP 12599 (edugp)
- [flax]fix jax array type check 12638 (patil-suraj)
- Add tokenizer_file parameter to PreTrainedTokenizerFast docstring 12624 (lewisbails)
- Skip TestMarian_MT_EN 12649 (LysandreJik)
- The extended trainer tests should require torch 12650 (LysandreJik)
- Pickle auto models 12654 (sgugger)
- Pipeline should be agnostic 12656 (LysandreJik)
- Fix transfo xl integration test 12652 (LysandreJik)
- Remove SageMaker documentation 12657 (philschmid)
- Fixed docs 12646 (KickItLikeShika)
- fix typo in modeling_t5.py docstring 12640 (PhilipMay)
- Translate README.md to Simplified Chinese 12596 (JetRunner)
- Fix typo in README_zh-hans.md 12663 (JetRunner)
- Updates timeline for project evaluation 12660 (osanseviero)
- [WIP] Patch BigBird tokenization test 12653 (LysandreJik)
- **encode_plus() shouldn't run for W2V2CTC 12655 (LysandreJik)
- Add ByT5 option to example run_t5_mlm_flax.py 12634 (mapmeld)
- Wrong model is used in example, should be character instead of subword model 12676 (jsteggink)
- [Blenderbot] Fix docs 12227 (patrickvonplaten)
- Add option to load a pretrained model with mismatched shapes 12664 (sgugger)
- Fix minor docstring typos. 12682 (qqaatw)
- [tokenizer.prepare_seq2seq_batch] change deprecation to be easily actionable 12669 (stas00)
- [Flax Generation] Correct inconsistencies PyTorch/Flax 12662 (patrickvonplaten)
- [Deepspeed] adapt multiple models, add zero_to_fp32 tests 12477 (stas00)
- Add timeout to CI. 12684 (LysandreJik)
- Fix Tensorflow Bart-like positional encoding 11897 (JunnYu)
- [Deepspeed] non-native optimizers are mostly ok with zero-offload 12690 (stas00)
- Fix multiple choice doc examples 12679 (sgugger)
- Provide mask_time_indices to `_mask_hidden_states` to avoid double masking 12692 (mfuntowicz)
- Update TF examples README 12703 (Rocketknight1)
- Fix uninitialized variables when `config.mask_feature_prob > 0` 12705 (mfuntowicz)
- Only test the files impacted by changes in the diff 12644 (sgugger)
- flax model parallel training 12590 (patil-suraj)
- [test] split test into 4 sub-tests to avoid timeout 12710 (stas00)
- [trainer] release tmp memory in checkpoint load 12718 (stas00)
- [Flax] Correct shift labels for seq2seq models in Flax 12720 (patrickvonplaten)
- Fix typo in Speech2TextForConditionalGeneration example 12716 (will-rice)
- Init adds its own files as impacted 12709 (sgugger)
- LXMERT integration test typo 12736 (LysandreJik)
- Fix AutoModel tests 12733 (LysandreJik)
- Skip test while the model is not available 12739 (LysandreJik)
- Skip test while the model is not available 12740 (LysandreJik)
- Translate README.md to Traditional Chinese 12701 (qqaatw)
- Fix MBart failing test 12737 (LysandreJik)
- Patch T5 device test 12742 (LysandreJik)
- Fix DETR integration test 12734 (LysandreJik)
- Fix led torchscript 12735 (LysandreJik)
- Remove framework mention 12731 (LysandreJik)
- [doc] parallelism: Which Strategy To Use When 12712 (stas00)
- [doc] performance: batch sizes 12725 (stas00)
- Replace specific tokenizer in log message by AutoTokenizer 12745 (SaulLu)
- [Wav2Vec2] Correctly pad mask indices for PreTraining 12748 (patrickvonplaten)
- [doc] testing: how to trigger a self-push workflow 12724 (stas00)
- add intel-tensorflow-avx512 to the candidates 12751 (zzhou612)
- [flax/model_parallel] fix typos 12757 (patil-suraj)
- Turn on eval mode when exporting to ONNX 12758 (mfuntowicz)
- Preserve `list` type of `additional_special_tokens` in `special_token_map` 12759 (SaulLu)
- [Wav2Vec2] Padded vectors should not allowed to be sampled 12764 (patrickvonplaten)
- Add tokenizers class mismatch detection between `cls` and checkpoint 12619 (europeanplaice)
- Fix push_to_hub docstring and make it appear in doc 12770 (sgugger)
- [ray] Fix `datasets_modules` ImportError with Ray Tune 12749 (Yard1)
- Longer timeout for slow tests 12779 (LysandreJik)
- Enforce eval and save strategies are compatible when --load_best_model_at_end 12786 (sgugger)
- [CIs] add troubleshooting docs 12791 (stas00)
- Fix Padded Batch Error 12282 12487 (will-rice)
- Flax MLM: Allow validation split when loading dataset from local file 12689 (fgaim)
- [Longformer] Correct longformer docs 12809 (patrickvonplaten)
- [CLIP/docs] add and fix examples 12810 (patil-suraj)
- [trainer] sanity checks for `save_steps=0|None` and `logging_steps=0` 12796 (stas00)
- Expose get_config() on ModelTesters 12812 (LysandreJik)
- Refactor slow sentencepiece tokenizers. 11716 (PhilipMay)
- Refer warmup_ratio when setting warmup_num_steps. 12818 (tsuchm)
- Add versioning system to fast tokenizer files 12713 (sgugger)
- Add _CHECKPOINT_FOR_DOC to all models 12811 (LysandreJik)

4.8.2

Not secure

- Rename detr targets to labels 12280 (NielsRogge)
- fix ids_to_tokens naming error in tokenizer of deberta v2 12412 (hjptriplebee)
- Add option to save on each training node 12421 (sgugger)

4.8.1

Not secure

- Fix default for TensorBoard folder
- Ray Tune install 12338
- Tests fixes for Torch FX 12336

4.8.0

Not secure

Integration with the Hub

Our example scripts and Trainer are now optimized for publishing your model on the [Hugging Face Hub](https://huggingface.co/models), with Tensorboard training metrics, and an automatically authored model card which contains all the relevant metadata, including evaluation results.

Trainer Hub integration

Use --push_to_hub to create a model repo for your training and it will be saved with all relevant metadata at the end of the training.

Other flags are:
- `push_to_hub_model_id` to control the repo name
- `push_to_hub_organization` to specify an organization

Visualizing Training metrics on huggingface.co (based on Tensorboard)

By default if you have `tensorboard` installed the training scripts will use it to log, and the logging traces folder is conveniently located inside your model output directory, so you can push them to your model repo by default.

Any model repo that contains Tensorboard traces will spawn a Tensorboard server:

![image](https://user-images.githubusercontent.com/35901082/123144141-5c2af980-d429-11eb-8438-16b374d2fe73.png)

which makes it very convenient to see how the training went! This Hub feature is in Beta so let us know if anything looks weird :)

See this [model repo](https://huggingface.co/julien-c/reactiongif-roberta/tensorboard)

Model card generation

![image](https://user-images.githubusercontent.com/35901082/123144222-749b1400-d429-11eb-97f6-9834dcc97c6d.png)

The model card contains info about the datasets used, the eval results, ...

Many users were already adding their eval results to their model cards in markdown format, but this is a more structured way of adding them which will make it easier to parse and e.g. represent in leaderboards such as the ones on Papers With Code!

We use a format specified in collaboration with [PaperswithCode] (https://github.com/huggingface/huggingface_hub/blame/main/modelcard.md), see also [this repo](https://github.com/paperswithcode/model-index).

Model, tokenizer and configurations

All models, tokenizers and configurations having a revamp `push_to_hub()` method as well as a `push_to_hub` argument in their `save_pretrained()` method. The workflow of this method is changed a bit to be more like git, with a local clone of the repo in a folder of the working directory, to make it easier to apply patches (use `use_temp_dir=True` to clone in temporary folders for the same behavior as the experimental API).

- Clean push to hub API 12187 (sgugger)

Flax/JAX support

Flax/JAX is becoming a fully supported backend of the Transformers library with more models having an implementation in it. BART, CLIP and T5 join the already existing models, find the whole list [here](https://huggingface.co/transformers/#supported-frameworks).

- [Flax] FlaxAutoModelForSeq2SeqLM 12228 (patil-suraj)
- [FlaxBart] few small fixes 12247 (patil-suraj)
- [FlaxClip] fix test from/save pretrained test 12284 (patil-suraj)
- [Flax] [WIP] allow loading head model with base model weights 12255 (patil-suraj)
- [Flax] Fix flax test save pretrained 12256 (patrickvonplaten)
- [Flax] Add jax flax to env command 12251 (patrickvonplaten)
- add FlaxAutoModelForImageClassification in main init 12298 (patil-suraj)
- Flax T5 12150 (vasudevgupta7)
- [Flax T5] Fix weight initialization and fix docs 12327 (patrickvonplaten)
- Flax summarization script 12230 (patil-suraj)
- FlaxBartPretrainedModel -> FlaxBartPreTrainedModel 12313 (sgugger)

General improvements and bug fixes

- AutoTokenizer: infer the class from the tokenizer config if possible 12208 (sgugger)
- update desc for map in all examples 12226 (bhavitvyamalik)
- Depreciate pythonic Mish and support PyTorch 1.9 version of Mish 12240 (digantamisra98)
- [t5 doc] make the example work out of the box 12239 (stas00)
- Better CI feedback 12279 (LysandreJik)
- Fix for making student ProphetNet for Seq2Seq Distillation 12130 (vishal-burman)
- [DeepSpeed] don't ignore --adafactor 12257 (stas00)
- Tensorflow QA example 12252 (Rocketknight1)
- [tests] reset report_to to none, avoid deprecation warning 12293 (stas00)
- [trainer + examples] set log level from CLI 12276 (stas00)
- [tests] multiple improvements 12294 (stas00)
- Trainer: adjust wandb installation example 12291 (stefan-it)
- Fix and improve documentation for LEDForConditionalGeneration 12303 (ionicsolutions)
- [Flax] Main doc for event orga 12305 (patrickvonplaten)
- [trainer] 2 bug fixes and a rename 12309 (stas00)
- [docs] performance 12258 (stas00)
- Add CodeCarbon Integration 12304 (JetRunner)
- Optimizing away the `fill-mask` pipeline. 12113 (Narsil)
- Add output in a dictionary for TF `generate` method 12139 (stancld)
- Rewrite ProphetNet to adapt converting ONNX friendly 11981 (jiafatom)
- Add mention of the huggingface_hub methods for offline mode 12320 (LysandreJik)
- [Flax/JAX] Add how to propose projects markdown 12311 (patrickvonplaten)
- [TFWav2Vec2] Fix docs 12283 (chenht2010)
- Add all XxxPreTrainedModel to the main init 12314 (sgugger)
- Conda build 12323 (LysandreJik)
- Changed modeling_fx_utils.py to utils/fx.py for clarity 12326 (michaelbenayoun)

4.7.0

Not secure

DETR (NielsRogge)

Three new models are released as part of the DETR implementation: `DetrModel`, `DetrForObjectDetection` and `DetrForSegmentation`, in PyTorch.

DETR consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use things like region proposals, non-maximum suppression procedure, and anchor generation. Moreover, DETR can also be naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.

DETR can support any [timm](https://github.com/rwightman/pytorch-image-models) backbone.

The DETR model was proposed in [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.

- Add DETR 11653 (NielsRogge)
- Improve DETR 12147 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=detr

ByT5 (patrickvonplaten)

A new tokenizer is released as part of the ByT5 implementation: `ByT5Tokenizer`. It can be used with the T5 family of models.

The ByT5 model was presented in [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.

- ByT5 model 11971 (patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?search=byt5

RoFormer (JunnYu)

14 new models are released as part of the RoFormer implementation: `RoFormerModel`, `RoFormerForCausalLM`, `RoFormerForMaskedLM`, `RoFormerForSequenceClassification`, `RoFormerForTokenClassification`, `RoFormerForQuestionAnswering` and `RoFormerForMultipleChoice`, `TFRoFormerModel`, `TFRoFormerForCausalLM`, `TFRoFormerForMaskedLM`, `TFRoFormerForSequenceClassification`, `TFRoFormerForTokenClassification`, `TFRoFormerForQuestionAnswering` and `TFRoFormerForMultipleChoice`, in PyTorch and TensorFlow.

RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown improved performance on classification tasks with long texts. The RoFormer model was proposed in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.

- Add new model RoFormer (use rotary position embedding ) 11684 (JunnYu)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=roformer

HuBERT (patrickvonplaten)

HuBERT is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.

HuBERT was proposed in [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

Two new models are released as part of the HuBERT implementation: `HubertModel` and `HubertForCTC`, in PyTorch.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=hubert

- Hubert 11889 (patrickvonplaten)

Hugging Face Course - Part 1

On Monday, June 14th, 2021, we released the first part of the Hugging Face Course. The course is focused on the Hugging Face ecosystem, including `transformers`. Most of the material in the course is now linked from the `transformers` documentation which now includes videos to explain singular concepts.

- Add video links to the documentation 12162 (sgugger)
- Add link to the course 12229 (sgugger)

TensorFlow additions

The Wav2Vec2 model can now be used in TensorFlow:

- Adding TFWav2Vec2Model 11617 (will-rice)

PyTorch 1.9 support

- Add support for torch 1.9.0 12224 (LysandreJik )
- fix pt-1.9.0 add_ deprecation 12217 (stas00)

Notebooks

- NielsRogge has contributed five tutorials on the usage of BERT in his repository: [Transformers-Tutorials](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR)
- [Community Notebooks] Add Emotion Speech Noteboook 11900 (patrickvonplaten)

General improvements and bugfixes

- Vit deit fixes 11309 (NielsRogge)
- Enable option for subword regularization in more tokenizers. 11417 (PhilipMay)
- Fix gpt-2 warnings 11709 (LysandreJik)
- [Flax] Fix BERT initialization & token_type_ids default 11695 (patrickvonplaten)
- BigBird on TPU 11651 (vasudevgupta7)
- [T5] Add 3D attention mask to T5 model (2) (9643) 11197 (lexhuismans)
- Fix loading the best model on the last stage of training 11718 (vbyno)
- Fix T5 beam search when using parallelize 11717 (OyvindTafjord)
- [Flax] Correct example script 11726 (patrickvonplaten)
- Add Cloud details to README 11706 (marcvanzee)
- Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 11475 (michaelbenayoun)
- Improvements to Flax finetuning script 11727 (marcvanzee)
- Remove tapas model card 11739 (julien-c)
- Add visual + link to Premium Support webpage 11740 (julien-c)
- Issue with symbolic tracing for T5 11742 (michaelbenayoun)
- [BigBird Pegasus] Make tests faster 11744 (patrickvonplaten)
- Use new evaluation loop in TrainerQA 11746 (sgugger)
- Flax BERT fix token type init 11750 (patrickvonplaten)
- [TokenClassification] Label realignment for subword aggregation 11680 (Narsil)
- Fix checkpoint deletion 11748 (sgugger)
- Fix incorrect newline in 11650 11757 (oToToT)
- Add more subsections to main doc 11758 (patrickvonplaten)
- Fixed: Better names for nlp variables in pipelines' tests and docs. 11752 (01-vyom)
- add `dataset_name` to data_args and added accuracy metric 11760 (philschmid)
- Add Flax Examples and Cloud TPU README 11753 (avital)
- Fix a bug in summarization example which did not load model from config properly 11762 (tomy0000000)
- FlaxGPT2 11556 (patil-suraj)
- Fix usage of head masks by PT encoder-decoder models' `generate()` function 11621 (stancld)
- [T5 failing CI] Fix generate test 11770 (patrickvonplaten)
- [Flax MLM] Refactor run mlm with optax 11745 (patrickvonplaten)
- Add DOI badge to README 11771 (albertvillanova)
- Deprecate commands from the transformers-cli that are in the hf-cli 11779 (LysandreJik)
- Fix release utilpattern in conf.py 11784 (sgugger)
- Fix regression in regression 11785 (sgugger)
- A cleaner and more scalable implementation of symbolic tracing 11763 (michaelbenayoun)
- Fix failing test on Windows Platform 11589 (Lynx1820)
- [Flax] Align GLUE training script with mlm training script 11778 (patrickvonplaten)
- Patch recursive import 11812 (LysandreJik)
- fix roformer config doc 11813 (JunnYu)
- [Flax] Small fixes in `run_flax_glue.py` 11820 (patrickvonplaten)
- [Deepspeed] support `zero.Init` in `from_config` 11805 (stas00)
- Add flax text class colab 11824 (patrickvonplaten)
- Faster list concat for trainer_pt_utils.get_length_grouped_indices() 11825 (ctheodoris)
- Replace double occurrences as the last step 11367 (LysandreJik)
- [Flax] Fix PyTorch import error 11839 (patrickvonplaten)
- Fix reference to XLNet 11846 (sgugger)
- Switch mem metrics flag 11851 (sgugger)
- Fix flos single node 11844 (TevenLeScao)
- Fix two typos in docs 11852 (nickls)
- [Trainer] Report both steps and num samples per second 11818 (sgugger)
- Add some tests to the slow suite 11860 (LysandreJik)
- Enable memory metrics in tests that need it 11859 (LysandreJik)
- fixed a small typo in the CONTRIBUTING doc 11856 (stsuchi)
- typo 11858 (WrRan)
- Add option to log only once in multinode training 11819 (sgugger)
- [Wav2Vec2] SpecAugment Fast 11764 (patrickvonplaten)
- [lm examples] fix overflow in perplexity calc 11855 (stas00)
- [Examples] create model with custom config on the fly 11798 (stas00)
- [Wav2Vec2ForCTC] example typo fixed 11878 (madprogramer)
- [AutomaticSpeechRecognitionPipeline] Ensure input tensors are on device 11874 (francescorubbo)
- Fix usage of head masks by TF encoder-decoder models' `generate()` function 11775 (stancld)
- Correcting comments in T5Stack to reflect correct tuple order 11330 (talkhaldi)
- [Flax] Allow dataclasses to be jitted 11886 (patrickvonplaten)
- changing find_batch_size to work with tokenizer outputs 11890 (joerenner)
- Link official Cloud TPU JAX docs 11892 (avital)
- Flax Generate 11777 (patrickvonplaten)
- Update deepspeed config to reflect hyperparameter search parameters 11896 (Mindful)
- Adding new argument `max_new_tokens` for generate. 11476 (Narsil)
- Added Sequence Classification class in GPTNeo 11906 (bhadreshpsavani)
- [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 11918 (jayendra13)
- Test optuna and ray 11924 (LysandreJik)
- Use `self.assertEqual` instead of `assert` in deberta v2 test. 11935 (PhilipMay)
- Remove redundant `nn.log_softmax` in `run_flax_glue.py` 11920 (n2cholas)
- Add MT5ForConditionalGeneration as supported arch. to summarization README 11961 (PhilipMay)
- Add FlaxCLIP 11883 (patil-suraj)
- RAG-2nd2end-revamp 11893 (shamanez)
- modify qa-trainer 11872 (zhangfanTJU)
- get_ordinal(local=True) replaced with get_local_ordinal() in training_args.py 11922 (BassaniRiccardo)
- reinitialize wandb config for each hyperparameter search run 11945 (Mindful)
- Add regression tests for slow sentencepiece tokenizers. 11737 (PhilipMay)
- Authorize args when instantiating an AutoModel 11956 (LysandreJik)
- Neptune.ai integration 11937 (vbyno)
- [deepspeed] docs 11940 (stas00)
- typo correction 11973 (JminJ)
- Typo in usage example, changed to device instead of torch_device 11979 (albertovilla)
- [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` 11966 (stas00)
- [Trainer] add train loss and flops metrics reports 11980 (stas00)
- Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert 11983 (dependabot[bot])
- [RAG] Fix rag from pretrained question encoder generator behavior 11962 (patrickvonplaten)
- Fix examples in VisualBERT docs 11990 (gchhablani)
- [docs] fix xref to `PreTrainedModel.generate` 11049 (stas00)
- Update return introduction of `forward` method 11976 (kouyk)
- [deepspeed] Move code and doc into standalone files 11984 (stas00)
- [deepspeed] add nvme test skip rule 11997 (stas00)
- Fix weight decay masking in `run_flax_glue.py` 11964 (n2cholas)
- [Flax] Refactor MLM 12013 (patrickvonplaten)
- [Deepspeed] Assert on mismatches between ds and hf args 12021 (stas00)
- [TrainerArguments] format and sort __repr__, add __str__ 12018 (stas00)
- Fixed Typo in modeling_bart.py 12035 (ceevaaa)
- Fix deberta 2 Tokenizer Integration Test 12017 (PhilipMay)
- fix past_key_values docs 12049 (patil-suraj)
- [JAX] Bump jax lib 12053 (patrickvonplaten)
- Fixes bug that appears when using QA bert and distilation. 12026 (madlag)
- Extend pipelines for automodel tupels 12025 (Narsil)
- Add optional grouped parsers description to HfArgumentParser 12042 (peteriz)
- adds metric prefix. 12057 (riklopfer)
- [CI] skip failing test 12059 (stas00)
- Fix LUKE integration tests 12066 (NielsRogge)
- Fix tapas issue 12063 (NielsRogge)
- updated the original RAG implementation to be compatible with latest Pytorch-Lightning 11806 (shamanez)
- Replace legacy tensor.Tensor with torch.tensor/torch.empty 12027 (mariosasko)
- Add torch to requirements.txt in language-modeling 12040 (cdleong)
- Properly indent block_size 12070 (sgugger)
- [Deepspeed] various fixes 12058 (stas00)
- [Deepspeed Wav2vec2] integration 11638 (stas00)
- Update run_ner.py with id2label config 12001 (KoichiYasuoka)
- [wav2vec2 / Deepspeed] sync LayerDrop for Wav2Vec2Encoder + tests 12076 (stas00)
- [test] support more than 2 gpus 12074 (stas00)
- Wav2Vec2 Pretraining 11306 (anton-l)
- [examples/flax] pass decay_mask fn to optimizer 12087 (patil-suraj)
- [versions] rm require_version_examples 12088 (stas00)
- [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests 12089 (patrickvonplaten)
- Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args 12083 (kumapo)
- CLIPFeatureExtractor should resize images with kept aspect ratio 11994 (TobiasNorlund)
- New TF GLUE example 12028 (Rocketknight1)
- Appending label2id and id2label to models for inference 12102 (Rocketknight1)
- Fix a condition in test_generate_with_head_masking 11911 (stancld)
- [Flax] Adding Visual-Transformer 11951 (jayendra13)
- add relevant description to tqdm in examples 11927 (bhavitvyamalik)
- Fix head masking generate tests 12110 (patrickvonplaten)
- Flax CLM script 12023 (patil-suraj)
- Add from_pretrained to dummy timm objects 12097 (LysandreJik)
- Fix t5 error message 12136 (cccntu)
- Fix megatron_gpt2 attention block's causal mask 12007 (novatig)
- Add mlm pretraining xla torch readme 12011 (patrickvonplaten)
- add readme for flax clm 12111 (patil-suraj)
- [Flax] Add FlaxBart models 11537 (stancld)
- Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer 11810 (SaulLu)
- [Flax] Add links to google colabs 12146 (patrickvonplaten)
- Don't log anything before logging is setup in examples 12121 (sgugger)
- Use text_column_name variable instead of "text" 12132 (nbroad1881)
- [lm examples] Replicate --config_overrides addition to other LM examples 12135 (kumar-abhishek)
- [Flax] fix error message 12148 (patil-suraj)
- [optim] implement AdafactorSchedule 12123 (stas00)
- [style] consistent nn. and nn.functional 12124 (stas00)
- [Flax] Fix flax pt equivalence tests 12154 (patrickvonplaten)
- [style] consistent nn. and nn.functional: part2: templates 12153 (stas00)
- Flax Big Bird 11967 (vasudevgupta7)
- [style] consistent nn. and nn.functional: part 3 `tests` 12155 (stas00)
- [style] consistent nn. and nn.functional: part 4 `examples` 12156 (stas00)
- consistent nn. and nn.functional: part 5 docs 12161 (stas00)
- [Flax generate] Add params to generate 12171 (patrickvonplaten)
- Use a released version of optax rather than installing from Git. 12173 (avital)
- Have dummy processors have a `from_pretrained` method 12145 (LysandreJik)
- Add course banner 12157 (sgugger)
- Enable add_prefix_space on run_ner if necessary 12116 (kumapo)
- Update AutoModel classes in summarization example 12178 (ionicsolutions)
- Ray Tune Integration Updates 12134 (amogkam)
- [testing] ensure concurrent pytest workers use a unique port for torch.dist 12166 (stas00)
- Model card defaults 12122 (sgugger)
- Temporarily deactivate torch-scatter while we wait for new release 12181 (LysandreJik)
- Temporarily deactivate torchhub test 12184 (sgugger)
- [Flax] Add Beam Search 12131 (patrickvonplaten)
- updated DLC images and sample notebooks 12191 (philschmid)
- Enabling AutoTokenizer for HubertConfig. 12198 (Narsil)
- Use yaml to create metadata 12185 (sgugger)
- [Docs] fixed broken link 12205 (bhadreshpsavani)
- Pipeline update & tests 12207 (LysandreJik)

Page 21 of 33

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 21 of 33

4.9.1

4.9.0

4.8.2

4.8.1

4.8.0

4.7.0

Page 21 of 33

Links

Releases