DETR (NielsRogge)
Three new models are released as part of the DETR implementation: `DetrModel`, `DetrForObjectDetection` and `DetrForSegmentation`, in PyTorch.
DETR consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use things like region proposals, non-maximum suppression procedure, and anchor generation. Moreover, DETR can also be naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.
DETR can support any [timm](https://github.com/rwightman/pytorch-image-models) backbone.
The DETR model was proposed in [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.
- Add DETR 11653 (NielsRogge)
- Improve DETR 12147 (NielsRogge)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=detr
ByT5 (patrickvonplaten)
A new tokenizer is released as part of the ByT5 implementation: `ByT5Tokenizer`. It can be used with the T5 family of models.
The ByT5 model was presented in [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
- ByT5 model 11971 (patrickvonplaten)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?search=byt5
RoFormer (JunnYu)
14 new models are released as part of the RoFormer implementation: `RoFormerModel`, `RoFormerForCausalLM`, `RoFormerForMaskedLM`, `RoFormerForSequenceClassification`, `RoFormerForTokenClassification`, `RoFormerForQuestionAnswering` and `RoFormerForMultipleChoice`, `TFRoFormerModel`, `TFRoFormerForCausalLM`, `TFRoFormerForMaskedLM`, `TFRoFormerForSequenceClassification`, `TFRoFormerForTokenClassification`, `TFRoFormerForQuestionAnswering` and `TFRoFormerForMultipleChoice`, in PyTorch and TensorFlow.
RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown improved performance on classification tasks with long texts. The RoFormer model was proposed in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
- Add new model RoFormer (use rotary position embedding ) 11684 (JunnYu)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=roformer
HuBERT (patrickvonplaten)
HuBERT is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
HuBERT was proposed in [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
Two new models are released as part of the HuBERT implementation: `HubertModel` and `HubertForCTC`, in PyTorch.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=hubert
- Hubert 11889 (patrickvonplaten)
Hugging Face Course - Part 1
On Monday, June 14th, 2021, we released the first part of the Hugging Face Course. The course is focused on the Hugging Face ecosystem, including `transformers`. Most of the material in the course is now linked from the `transformers` documentation which now includes videos to explain singular concepts.
- Add video links to the documentation 12162 (sgugger)
- Add link to the course 12229 (sgugger)
TensorFlow additions
The Wav2Vec2 model can now be used in TensorFlow:
- Adding TFWav2Vec2Model 11617 (will-rice)
PyTorch 1.9 support
- Add support for torch 1.9.0 12224 (LysandreJik )
- fix pt-1.9.0 add_ deprecation 12217 (stas00)
Notebooks
- NielsRogge has contributed five tutorials on the usage of BERT in his repository: [Transformers-Tutorials](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR)
- [Community Notebooks] Add Emotion Speech Noteboook 11900 (patrickvonplaten)
General improvements and bugfixes
- Vit deit fixes 11309 (NielsRogge)
- Enable option for subword regularization in more tokenizers. 11417 (PhilipMay)
- Fix gpt-2 warnings 11709 (LysandreJik)
- [Flax] Fix BERT initialization & token_type_ids default 11695 (patrickvonplaten)
- BigBird on TPU 11651 (vasudevgupta7)
- [T5] Add 3D attention mask to T5 model (2) (9643) 11197 (lexhuismans)
- Fix loading the best model on the last stage of training 11718 (vbyno)
- Fix T5 beam search when using parallelize 11717 (OyvindTafjord)
- [Flax] Correct example script 11726 (patrickvonplaten)
- Add Cloud details to README 11706 (marcvanzee)
- Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 11475 (michaelbenayoun)
- Improvements to Flax finetuning script 11727 (marcvanzee)
- Remove tapas model card 11739 (julien-c)
- Add visual + link to Premium Support webpage 11740 (julien-c)
- Issue with symbolic tracing for T5 11742 (michaelbenayoun)
- [BigBird Pegasus] Make tests faster 11744 (patrickvonplaten)
- Use new evaluation loop in TrainerQA 11746 (sgugger)
- Flax BERT fix token type init 11750 (patrickvonplaten)
- [TokenClassification] Label realignment for subword aggregation 11680 (Narsil)
- Fix checkpoint deletion 11748 (sgugger)
- Fix incorrect newline in 11650 11757 (oToToT)
- Add more subsections to main doc 11758 (patrickvonplaten)
- Fixed: Better names for nlp variables in pipelines' tests and docs. 11752 (01-vyom)
- add `dataset_name` to data_args and added accuracy metric 11760 (philschmid)
- Add Flax Examples and Cloud TPU README 11753 (avital)
- Fix a bug in summarization example which did not load model from config properly 11762 (tomy0000000)
- FlaxGPT2 11556 (patil-suraj)
- Fix usage of head masks by PT encoder-decoder models' `generate()` function 11621 (stancld)
- [T5 failing CI] Fix generate test 11770 (patrickvonplaten)
- [Flax MLM] Refactor run mlm with optax 11745 (patrickvonplaten)
- Add DOI badge to README 11771 (albertvillanova)
- Deprecate commands from the transformers-cli that are in the hf-cli 11779 (LysandreJik)
- Fix release utilpattern in conf.py 11784 (sgugger)
- Fix regression in regression 11785 (sgugger)
- A cleaner and more scalable implementation of symbolic tracing 11763 (michaelbenayoun)
- Fix failing test on Windows Platform 11589 (Lynx1820)
- [Flax] Align GLUE training script with mlm training script 11778 (patrickvonplaten)
- Patch recursive import 11812 (LysandreJik)
- fix roformer config doc 11813 (JunnYu)
- [Flax] Small fixes in `run_flax_glue.py` 11820 (patrickvonplaten)
- [Deepspeed] support `zero.Init` in `from_config` 11805 (stas00)
- Add flax text class colab 11824 (patrickvonplaten)
- Faster list concat for trainer_pt_utils.get_length_grouped_indices() 11825 (ctheodoris)
- Replace double occurrences as the last step 11367 (LysandreJik)
- [Flax] Fix PyTorch import error 11839 (patrickvonplaten)
- Fix reference to XLNet 11846 (sgugger)
- Switch mem metrics flag 11851 (sgugger)
- Fix flos single node 11844 (TevenLeScao)
- Fix two typos in docs 11852 (nickls)
- [Trainer] Report both steps and num samples per second 11818 (sgugger)
- Add some tests to the slow suite 11860 (LysandreJik)
- Enable memory metrics in tests that need it 11859 (LysandreJik)
- fixed a small typo in the CONTRIBUTING doc 11856 (stsuchi)
- typo 11858 (WrRan)
- Add option to log only once in multinode training 11819 (sgugger)
- [Wav2Vec2] SpecAugment Fast 11764 (patrickvonplaten)
- [lm examples] fix overflow in perplexity calc 11855 (stas00)
- [Examples] create model with custom config on the fly 11798 (stas00)
- [Wav2Vec2ForCTC] example typo fixed 11878 (madprogramer)
- [AutomaticSpeechRecognitionPipeline] Ensure input tensors are on device 11874 (francescorubbo)
- Fix usage of head masks by TF encoder-decoder models' `generate()` function 11775 (stancld)
- Correcting comments in T5Stack to reflect correct tuple order 11330 (talkhaldi)
- [Flax] Allow dataclasses to be jitted 11886 (patrickvonplaten)
- changing find_batch_size to work with tokenizer outputs 11890 (joerenner)
- Link official Cloud TPU JAX docs 11892 (avital)
- Flax Generate 11777 (patrickvonplaten)
- Update deepspeed config to reflect hyperparameter search parameters 11896 (Mindful)
- Adding new argument `max_new_tokens` for generate. 11476 (Narsil)
- Added Sequence Classification class in GPTNeo 11906 (bhadreshpsavani)
- [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 11918 (jayendra13)
- Test optuna and ray 11924 (LysandreJik)
- Use `self.assertEqual` instead of `assert` in deberta v2 test. 11935 (PhilipMay)
- Remove redundant `nn.log_softmax` in `run_flax_glue.py` 11920 (n2cholas)
- Add MT5ForConditionalGeneration as supported arch. to summarization README 11961 (PhilipMay)
- Add FlaxCLIP 11883 (patil-suraj)
- RAG-2nd2end-revamp 11893 (shamanez)
- modify qa-trainer 11872 (zhangfanTJU)
- get_ordinal(local=True) replaced with get_local_ordinal() in training_args.py 11922 (BassaniRiccardo)
- reinitialize wandb config for each hyperparameter search run 11945 (Mindful)
- Add regression tests for slow sentencepiece tokenizers. 11737 (PhilipMay)
- Authorize args when instantiating an AutoModel 11956 (LysandreJik)
- Neptune.ai integration 11937 (vbyno)
- [deepspeed] docs 11940 (stas00)
- typo correction 11973 (JminJ)
- Typo in usage example, changed to device instead of torch_device 11979 (albertovilla)
- [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` 11966 (stas00)
- [Trainer] add train loss and flops metrics reports 11980 (stas00)
- Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert 11983 (dependabot[bot])
- [RAG] Fix rag from pretrained question encoder generator behavior 11962 (patrickvonplaten)
- Fix examples in VisualBERT docs 11990 (gchhablani)
- [docs] fix xref to `PreTrainedModel.generate` 11049 (stas00)
- Update return introduction of `forward` method 11976 (kouyk)
- [deepspeed] Move code and doc into standalone files 11984 (stas00)
- [deepspeed] add nvme test skip rule 11997 (stas00)
- Fix weight decay masking in `run_flax_glue.py` 11964 (n2cholas)
- [Flax] Refactor MLM 12013 (patrickvonplaten)
- [Deepspeed] Assert on mismatches between ds and hf args 12021 (stas00)
- [TrainerArguments] format and sort __repr__, add __str__ 12018 (stas00)
- Fixed Typo in modeling_bart.py 12035 (ceevaaa)
- Fix deberta 2 Tokenizer Integration Test 12017 (PhilipMay)
- fix past_key_values docs 12049 (patil-suraj)
- [JAX] Bump jax lib 12053 (patrickvonplaten)
- Fixes bug that appears when using QA bert and distilation. 12026 (madlag)
- Extend pipelines for automodel tupels 12025 (Narsil)
- Add optional grouped parsers description to HfArgumentParser 12042 (peteriz)
- adds metric prefix. 12057 (riklopfer)
- [CI] skip failing test 12059 (stas00)
- Fix LUKE integration tests 12066 (NielsRogge)
- Fix tapas issue 12063 (NielsRogge)
- updated the original RAG implementation to be compatible with latest Pytorch-Lightning 11806 (shamanez)
- Replace legacy tensor.Tensor with torch.tensor/torch.empty 12027 (mariosasko)
- Add torch to requirements.txt in language-modeling 12040 (cdleong)
- Properly indent block_size 12070 (sgugger)
- [Deepspeed] various fixes 12058 (stas00)
- [Deepspeed Wav2vec2] integration 11638 (stas00)
- Update run_ner.py with id2label config 12001 (KoichiYasuoka)
- [wav2vec2 / Deepspeed] sync LayerDrop for Wav2Vec2Encoder + tests 12076 (stas00)
- [test] support more than 2 gpus 12074 (stas00)
- Wav2Vec2 Pretraining 11306 (anton-l)
- [examples/flax] pass decay_mask fn to optimizer 12087 (patil-suraj)
- [versions] rm require_version_examples 12088 (stas00)
- [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests 12089 (patrickvonplaten)
- Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args 12083 (kumapo)
- CLIPFeatureExtractor should resize images with kept aspect ratio 11994 (TobiasNorlund)
- New TF GLUE example 12028 (Rocketknight1)
- Appending label2id and id2label to models for inference 12102 (Rocketknight1)
- Fix a condition in test_generate_with_head_masking 11911 (stancld)
- [Flax] Adding Visual-Transformer 11951 (jayendra13)
- add relevant description to tqdm in examples 11927 (bhavitvyamalik)
- Fix head masking generate tests 12110 (patrickvonplaten)
- Flax CLM script 12023 (patil-suraj)
- Add from_pretrained to dummy timm objects 12097 (LysandreJik)
- Fix t5 error message 12136 (cccntu)
- Fix megatron_gpt2 attention block's causal mask 12007 (novatig)
- Add mlm pretraining xla torch readme 12011 (patrickvonplaten)
- add readme for flax clm 12111 (patil-suraj)
- [Flax] Add FlaxBart models 11537 (stancld)
- Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer 11810 (SaulLu)
- [Flax] Add links to google colabs 12146 (patrickvonplaten)
- Don't log anything before logging is setup in examples 12121 (sgugger)
- Use text_column_name variable instead of "text" 12132 (nbroad1881)
- [lm examples] Replicate --config_overrides addition to other LM examples 12135 (kumar-abhishek)
- [Flax] fix error message 12148 (patil-suraj)
- [optim] implement AdafactorSchedule 12123 (stas00)
- [style] consistent nn. and nn.functional 12124 (stas00)
- [Flax] Fix flax pt equivalence tests 12154 (patrickvonplaten)
- [style] consistent nn. and nn.functional: part2: templates 12153 (stas00)
- Flax Big Bird 11967 (vasudevgupta7)
- [style] consistent nn. and nn.functional: part 3 `tests` 12155 (stas00)
- [style] consistent nn. and nn.functional: part 4 `examples` 12156 (stas00)
- consistent nn. and nn.functional: part 5 docs 12161 (stas00)
- [Flax generate] Add params to generate 12171 (patrickvonplaten)
- Use a released version of optax rather than installing from Git. 12173 (avital)
- Have dummy processors have a `from_pretrained` method 12145 (LysandreJik)
- Add course banner 12157 (sgugger)
- Enable add_prefix_space on run_ner if necessary 12116 (kumapo)
- Update AutoModel classes in summarization example 12178 (ionicsolutions)
- Ray Tune Integration Updates 12134 (amogkam)
- [testing] ensure concurrent pytest workers use a unique port for torch.dist 12166 (stas00)
- Model card defaults 12122 (sgugger)
- Temporarily deactivate torch-scatter while we wait for new release 12181 (LysandreJik)
- Temporarily deactivate torchhub test 12184 (sgugger)
- [Flax] Add Beam Search 12131 (patrickvonplaten)
- updated DLC images and sample notebooks 12191 (philschmid)
- Enabling AutoTokenizer for HubertConfig. 12198 (Narsil)
- Use yaml to create metadata 12185 (sgugger)
- [Docs] fixed broken link 12205 (bhadreshpsavani)
- Pipeline update & tests 12207 (LysandreJik)