Wav2Vec2 from facebook (patrickvonplaten)
Two new models are released as part of the Wav2Vec2 implementation: `Wav2Vec2Model` and `Wav2Vec2ForMaskedLM`, in PyTorch.
Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.
The Wav2Vec2 model was proposed in [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2
Available notebooks:
- https://colab.research.google.com/drive/18Ms6WjyjpsL-73Y2Vpagh9WdJvVEIoQL?usp=sharing
Contributions:
- Wav2Vec2 9659 (patrickvonplaten)
Future Additions
- Enable fine-tuning and pretraining for Wav2Vec2
- Add example script with dependency to wav2letter/flashlight
- Add Encoder-Decoder Wav2Vec2 model
ConvBERT
The ConvBERT model was proposed in [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Six new models are released as part of the ConvBERT implementation: `ConvBertModel`, `ConvBertForMaskedLM`, `ConvBertForSequenceClassification`, `ConvBertForTokenClassification`, `ConvBertForQuestionAnswering` and `ConvBertForMultipleChoice`. These models are available both in PyTorch and TensorFlow.
Contributions:
- ConvBERT Model 9717 (abhishekkrthakur)
- ConvBERT: minor fixes for conversion script 9937 (stefan-it)
- Fix GroupedLinearLayer in TF ConvBERT 9972 (abhishekkrthakur)
BORT
The BORT model was proposed in [Optimal Subarchitecture Extraction for BERT](https://arxiv.org/abs/2010.10499) by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.
The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.
Contributions:
- ADD BORT 9813 (stefan-it)
Trainer now supports Amazon SageMaker’s data parallel library (sgugger)
When executing a script with `Trainer` using Amazon SageMaker and enabling [SageMaker's data parallelism library](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html), `Trainer` will automatically use the `smdistributed` library. All maintained examples have been tested with this functionality. Here is an [overview of SageMaker data parallelism library](https://aws.amazon.com/blogs/aws/managed-data-parallelism-in-amazon-sagemaker-simplifies-training-on-large-datasets/).
- When on SageMaker use their env variables for saves 9876 (sgugger)
Community page
A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!
- Add a community page to the docs 9682 (sgugger)
Additional model architectures
DeBERTa now has more model heads available.
- Add DeBERTa head models 9691 (NielsRogge)
BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.
- BartForCausalLM analogs to `ProphetNetForCausalLM` 9128 (sadakmed)
Breaking changes
None.
General improvements and bugfixes
- Fix Trainer with a parallel model 9578 (sgugger)
- Switch metrics in run_ner to datasets 9567 (sgugger)
- Compliancy with tf-nightly 9570 (jplu)
- Make logs TF compliant 9565 (jplu)
- [setup.py] note on how to get to transformers exact dependencies from shell 9553 (stas00)
- Fix conda build 9589 (LysandreJik)
- BatchEncoding.to with device with tests 9584 (LysandreJik)
- Gradient accumulation for TFTrainer 9585 (kiyoungkim1)
- Upstream (and rename) sortish sampler 9574 (sgugger)
- [deepspeed doc] install issues + 1-gpu deployment 9582 (stas00)
- [TF Led] Fix wrong decoder attention mask behavior 9601 (patrickvonplaten)
- Remove unused token_type_ids in MPNet 9564 (jplu)
- Ignore lm_head decoder bias warning 9615 (LysandreJik)
- [deepspeed] --gradient_accumulation_steps fix 9622 (stas00)
- Remove duplicated extras["retrieval"] 9621 (n1t0)
- Fix: torch.utils.checkpoint.checkpoint attribute error. 9626 (devrimcavusoglu)
- Add head_mask/decoder_head_mask for BART 9569 (stancld)
- [Bart-like tests] Fix torch device for bart tests 9669 (patrickvonplaten)
- Fix DPRReaderTokenizer's attention_mask 9663 (mkserge)
- add mbart to automodel for masked lm 9673 (patrickvonplaten)
- Fix imports in conversion scripts 9674 (sgugger)
- Fix GPT conversion script 9676 (sgugger)
- Fix old Seq2SeqTrainer 9675 (sgugger)
- Update `past_key_values` in GPT-2 9596 (forest1988)
- Update integrations.py 9652 (max-yue)
- Fix TF Flaubert and XLM 9661 (jplu)
- New run_seq2seq script 9605 (sgugger)
- Add separated decoder_head_mask for T5 Models 9634 (stancld)
- Fix model templates and use less than 119 chars 9684 (sgugger)
- Restrain tokenizer.model_max_length default 9681 (sgugger)
- Speed up RepetitionPenaltyLogitsProcessor (pytorch) 9600 (LSinev)
- Use datasets squad_v2 metric in run_qa 9677 (sgugger)
- Fix label datatype in TF Trainer 9616 (jplu)
- New TF embeddings (cleaner and faster) 9418 (jplu)
- Fix TF template 9697 (jplu)
- Add t5 convert to transformers-cli 9654 (acul3)
- Fix Funnel Transformer conversion script 9683 (sgugger)
- Add notebook 9696 (NielsRogge)
- Fix Trainer and Args to mention AdamW, not Adam. 9685 (gchhablani)
- [deepspeed] fix the backward for deepspeed 9705 (stas00)
- Fix WAND_DISABLED test 9703 (sgugger)
- [trainer] no --deepspeed and --sharded_ddp together 9712 (stas00)
- fix typo 9708 (Muennighoff)
- Temporarily deactivate TPU tests while we work on fixing them 9720 (LysandreJik)
- Allow text generation for ProphetNetForCausalLM 9707 (guillaume-be)
- [LED] Reduce Slow Test required GPU RAM from 16GB to 8GB 9723 (patrickvonplaten)
- [T5] Fix T5 model parallel tests 9721 (patrickvonplaten)
- fix T5 head mask in model_parallel 9726 (patil-suraj)
- Fix mixed precision in TF models 9163 (jplu)
- Changing model default for TableQuestionAnsweringPipeline. 9729 (Narsil)
- Fix TF s2s models 9478 (jplu)
- Fix memory regression in Seq2Seq example 9713 (sgugger)
- examples: fix XNLI url 9741 (stefan-it)
- Fix some TF slow tests 9728 (jplu)
- Fixes to run_seq2seq and instructions 9734 (sgugger)
- Add `report_to` training arguments to control the integrations used 9735 (sgugger)
- Fix a TF test 9755 (jplu)
- [fsmt] token_type_ids isn't used 9736 (stas00)
- Fix broken [Open in Colab] links (9688) 9761 (wilcoln)
- Fix TFTrainer prediction output 9662 (janinaj)
- Use object store to pass trainer object to Ray Tune (makes it work with large models) 9749 (krfricke)
- Fix a typo in `Trainer.hyperparameter_search` docstring 9762 (sorami)
- [fsmt] onnx triu workaround 9738 (stas00)
- Fix model parallel definition in superclass 9787 (LysandreJik)
- Auto-resume training from checkpoint 9776 (sgugger)
- [PR/Issue templates] normalize, group, sort + add myself for deepspeed 9706 (stas00)
- [Flaky Generation Tests] Make sure that no early stopping is happening for beam search 9794 (patrickvonplaten)
- Fix broken links in the converting tf ckpt document 9791 (forest1988)
- Add head_mask/decoder_head_mask for TF BART models 9639 (stancld)
- Adding `skip_special_tokens=True` to FillMaskPipeline 9783 (Narsil)
- Improve pytorch examples for fp16 9796 (ak314)
- Smdistributed trainer 9798 (sgugger)
- RagTokenForGeneration: Fixed parameter name for logits_processor 9790 (michaelrglass)
- Fix fine-tuning translation scripts 9809 (mbiesialska)
- Allow RAG to output decoder cross-attentions 9789 (dblakely)
- Commit the last step on world_process_zero in WandbCallback 9805 (tristandeleu)
- Fix a bug in run_glue.py (9812) 9815 (forest1988)
- [LedFastTokenizer] Correct missing None statement 9828 (patrickvonplaten)
- [Setup.py] update jaxlib 9831 (patrickvonplaten)
- Add a test for TF mixed precision 9806 (jplu)
- Setup logging with a stdout handler 9816 (sgugger)
- Fix auto-resume training from checkpoint 9822 (jncasey)
- [MT5 Import init] Fix typo 9830 (patrickvonplaten)
- Adding a test to prevent late failure in the Table question answering pipeline. 9808 (Narsil)
- Remove a TF usage warning and rework the documentation 9756 (jplu)
- Delete a needless duplicate condition 9826 (tomohideshibata)
- Clean TF Bert 9788 (jplu)
- Add a flag for find_unused_parameters 9820 (sgugger)
- Fix TF template 9840 (jplu)
- Fix model templates 9842 (LysandreJik)
- Add tpu_zone and gcp_project in training_args_tf.py 9825 (kiyoungkim1)
- Labeled pull requests 9849 (LysandreJik)
- [GA forks] Test on every push 9851 (LysandreJik)
- When resuming training from checkpoint, Trainer loads model 9818 (sgugger)
- Allow --arg Value for booleans in HfArgumentParser 9823 (sgugger)
- [traner] fix --lr_scheduler_type choices 9800 (stas00)
- Pin memory in Trainer by default 9857 (abhishekkrthakur)
- Partial local tokenizer load 9807 (LysandreJik)
- Remove submodule 9868 (LysandreJik)
- Fixing flaky conversational test + flag it as a pipeline test. 9837 (Narsil)
- Fix computation of attention_probs when head_mask is provided. 9853 (mfuntowicz)
- Deprecate model_path in Trainer.train 9854 (sgugger)
- Remove redundant `test_head_masking = True` flags in test files 9858 (stancld)
- [docs] expand install instructions 9817 (stas00)
- on_log event should occur *after* the current log is written 9872 (abhishekkrthakur)
- pin_memory -> dataloader_pin_memory 9874 (abhishekkrthakur)
- Adding a new `return_full_text` parameter to TextGenerationPipeline. 9852 (Narsil)
- Clarify use of unk_token in slow tokenizers' docstrings 9875 (ethch18)
- Add XLA test 9848 (jplu)
- [seq2seq] correctly handle mt5 9879 (stas00)
- [trainer] [deepspeed] refactor deepspeed setup devices 9880 (stas00)
- [doc] nested markup is invalid in rst 9898 (stas00)
- Clarify definition of seed argument in TrainingArguments 9903 (lewtun)
- TFBart lables consider both pad token and -100 9847 (kiyoungkim1)
- Add head_mask and decoder_head_mask to FSMT 9819 (stancld)
- Doc title in the template 9910 (sgugger)
- [seq2seq] fix logger format for non-main process 9911 (stas00)
- [wandb] restore WANDB_DISABLED=true to disable wandb 9896 (stas00)
- Fit chinese wwm to new datasets 9887 (wlhgtc)
- Remove subclass for sortish sampler 9907 (sgugger)
- AdaFactor: avoid updating group["lr"] attributes 9751 (ceshine)
- [docs] fix auto model docs 9924 (patil-suraj)
- Add new model docs 9667 (patrickvonplaten)
- Fix bart conversion script 9923 (patil-suraj)
- Tensorflow doc changes on loss output size 9922 (janjitse)
- [Tokenizer Utils Base] Make pad function more flexible 9928 (patrickvonplaten)
- [Bart models] fix typo in naming 9944 (patrickvonplaten)
- ALBERT Tokenizer integration test 9943 (LysandreJik)
- Fix 9918 9932 (sgugger)
- Bump numpy 9934 (sgugger)
- Use compute_loss in prediction_step 9935 (sgugger)
- Add head_mask and decoder_head_mask to PyTorch LED 9856 (stancld)
- [research proj] [lxmert] remove bleach dependency 9970 (stas00)
- Fix Longformer and LED 9942 (jplu)
- fix steps_in_epoch variable in trainer when using max_steps 9969 (yylun)
- [run_clm.py] fix getting extention 9977 (patil-suraj)
- Added integration tests for TensorFlow implementation of the ALBERT model 9976 (spatil6)
- TF DistilBERT integration tests 9975 (spatil6)
- Added integration tests for TensorFlow implementation of the mobileBERT 9978 (spatil6)
- Added integration tests for TensorFlow implementation of the MPNet model 9979 (spatil6)
- Added integration tests for Pytorch implementation of the ALBERT model 9980 (spatil6)
- distilbert: fix creation of sinusoidal embeddings 9917 (stefan-it)
- Add `from_slow` in fast tokenizers build and fixes some bugs 9987 (sgugger)
- Added Integration testing for Pytorch implementation of DistilBert model from issue 9948' 9995 (danielpatrickhug)
- Fix model templates 9999 (LysandreJik)
- [Proposal] Adding new `encoder_no_repeat_ngram_size` to `generate`. 9984 (Narsil)
- Remove unintentional "double" assignment in TF-BART like models 9997 (stancld)
- [trainer] a few fixes 9993 (stas00)
- Update tokenizers requirement 10077 (n1t0)
- Bump minimum Jax requirement to 2.8.0 10027 (patrickvonplaten)