LayoutLM-v2 and LayoutXLM
Four new models are released as part of the LatourLM-v2 implementation: `LayoutLMv2ForSequenceClassification`, `LayoutLMv2Model`, `LayoutLMv2ForTokenClassification` and `LayoutLMv2ForQuestionAnswering`, in PyTorch.
The LayoutLMV2 model was proposed in [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves [LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html) to obtain state-of-the-art results across several document image understanding benchmarks:
- Add LayoutLMv2 + LayoutXLM 12604 (NielsRogge)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=layoutlmv2
BEiT
Three new models are released as part of the BEiT implementation: `BeitModel`, `BeitForMaskedImageModeling`, and `BeitForImageClassification`, in PyTorch.
The BEiT model was proposed in [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the [original ViT paper](https://arxiv.org/abs/2010.11929)), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s [DALL-E model](https://arxiv.org/abs/2102.12092) given masked patches.
- Add BEiT 12994 (NielsRogge)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=beit
Speech improvements
The Wav2Vec2 and HuBERT models now have a sequence classification head available.
- Add Wav2Vec2 & Hubert ForSequenceClassification 13153 (anton-l)
DeBERTa in TensorFlow (kamalkraj)
The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow.
- Deberta tf 12972 (kamalkraj)
- Deberta_v2 tf 13120 (kamalkraj)
Flax model additions
EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax!
- FlaxEncoderDecoder allowing Bert2Bert and Bert2GPT2 in Flax 13008 (ydshieh)
- FlaxDistilBERT 13324 (kamalkraj)
- FlaxAlBERT 13294 (kamalkraj)
TensorFlow examples
A new example has been added in TensorFlow: multiple choice!
Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch.
- Add TF multiple choice example 12865 (Rocketknight1)
- TF/Numpy variants for all DataCollator classes 13105 (Rocketknight1)
Auto API refactor
The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed.
- Disentangle auto modules from other modeling files 13023 (sgugger)
- Fix AutoTokenizer when no fast tokenizer is available 13336 (sgugger)
Slight breaking change
When loading some kinds of corrupted state dictionaries of models, the `PreTrainedModel.from_pretrained` method was sometimes silently ignoring weights. This has now become a real error.
- Fix from_pretrained with corrupted state_dict 12939 (sgugger)
General improvements and bugfixes
- Improving pipeline tests 12784 (Narsil)
- Pin git python to <3.1.19 12858 (patrickvonplaten)
- [tests] fix logging_steps requirements 12860 (stas00)
- [Sequence Feature Extraction] Add truncation 12804 (patrickvonplaten)
- add `classifier_dropout` to classification heads 12794 (PhilipMay)
- Fix barrier for SM distributed 12853 (sgugger)
- Add possibility to ignore imports in test_fecther 12801 (sgugger)
- Add accelerate to examples requirements 12888 (sgugger)
- Fix documentation of BigBird tokenizer 12889 (sgugger)
- Better heuristic for token-classification pipeline. 12611 (Narsil)
- Fix push_to_hub for TPUs 12895 (sgugger)
- `Seq2SeqTrainer` set max_length and num_beams only when non None 12899 (cchen-dialpad)
- [FLAX] Minor fixes in CLM example 12914 (stefan-it)
- Correct validation_split_percentage argument from int (ex:5) to float (0.05) 12897 (Elysium1436)
- Fix typo in the example of MobileBertForPreTraining 12919 (buddhics)
- Add option to set max_len in run_ner 12929 (sgugger)
- Fix QA examples for roberta tokenizer 12928 (sgugger)
- Print defaults when using --help for scripts 12930 (sgugger)
- Fix StoppingCriteria ABC signature 12918 (willfrey)
- Add missing classmethod decorators 12927 (willfrey)
- fix distiller.py 12910 (chutaklee)
- Update generation_logits_process.py 12901 (willfrey)
- Update generation_logits_process.py 12900 (willfrey)
- Update tokenization_auto.py 12896 (willfrey)
- Fix docstring typo in tokenization_auto.py 12891 (willfrey)
- [Flax] Correctly Add MT5 12988 (patrickvonplaten)
- ONNX v2 raises an Exception when using PyTorch < 1.8.0 12933 (mfuntowicz)
- Moving feature-extraction pipeline to new testing scheme 12843 (Narsil)
- Add CpmTokenizerFast 12938 (JetRunner)
- fix typo in gradient_checkpointing arg 12855 (21jun)
- Log Azure ML metrics only for rank 0 12766 (harshithapv)
- Add substep end callback method 12951 (wulu473)
- Add multilingual documentation support 12952 (JetRunner)
- Fix division by zero in NotebookProgressPar 12953 (sgugger)
- [FLAX] Minor fixes in LM example 12947 (stefan-it)
- Prevent `Trainer.evaluate()` crash when using only tensorboardX 12963 (aphedges)
- Fix typo in example of DPRReader 12954 (tadejsv)
- Place BigBirdTokenizer in sentencepiece-only objects 12975 (sgugger)
- fix typo in example/text-classification README 12974 (fullyz)
- Fix template for inputs docstrings 12976 (sgugger)
- fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception 12981 (PhilipMay)
- Cast logits from bf16 to fp32 at the end of TF_T5 12332 (szutenberg)
- Update CANINE test 12453 (NielsRogge)
- pad_to_multiple_of added to DataCollatorForWholeWordMask 12999 (Aktsvigun)
- [Flax] Align jax flax device name 12987 (patrickvonplaten)
- [Flax] Correct flax docs 12782 (patrickvonplaten)
- T5: Create position related tensors directly on device instead of CPU 12846 (armancohan)
- Skip ProphetNet test 12462 (LysandreJik)
- Create perplexity.rst 13004 (sashavor)
- GPT-Neo ONNX export 12911 (michaelbenayoun)
- Update generate method - Fix floor_divide warning 13013 (nreimers)
- [Flax] Correct pt to flax conversion if from base to head 13006 (patrickvonplaten)
- [Flax T5] Speed up t5 training 13012 (patrickvonplaten)
- FX submodule naming fix 13016 (michaelbenayoun)
- T5 with past ONNX export 13014 (michaelbenayoun)
- Fix ONNX test: Put smaller ALBERT model 13028 (LysandreJik)
- Tpu tie weights 13030 (sgugger)
- Use min version for huggingface-hub dependency 12961 (lewtun)
- tfhub.de -> tfhub.dev 12565 (abhishekkrthakur)
- [Flax] Refactor gpt2 & bert example docs 13024 (patrickvonplaten)
- Add MBART to models exportable with ONNX 13049 (LysandreJik)
- Add to ONNX docs 13048 (LysandreJik)
- Fix small typo in M2M100 doc 13061 (SaulLu)
- Add try-except for torch_scatter 13040 (JetRunner)
- docs: add HuggingArtists to community notebooks 13050 (AlekseyKorshuk)
- Fix ModelOutput instantiation form dictionaries 13067 (sgugger)
- Roll out the test fetcher on push tests 13055 (sgugger)
- Fix fallback of test_fetcher 13071 (sgugger)
- Revert to all tests whil we debug what's wrong 13072 (sgugger)
- Use original key for label in DataCollatorForTokenClassification 13057 (ibraheem-moosa)
- [Doctest] Setup, quicktour and task_summary 13078 (sgugger)
- Add VisualBERT demo notebook 12263 (gchhablani)
- Install git 13091 (LysandreJik)
- Fix classifier dropout in AlbertForMultipleChoice 13087 (ibraheem-moosa)
- Doctests job 13088 (LysandreJik)
- Fix VisualBert Embeddings 13017 (gchhablani)
- Proper import for unittest.mock.patch 13085 (sgugger)
- Reactive test fecthers on scheduled test with proper git install 13097 (sgugger)
- Change a parameter name in FlaxBartForConditionalGeneration.decode() 13074 (ydshieh)
- [Flax/JAX] Run jitted tests at every commit 13090 (patrickvonplaten)
- Rely on huggingface_hub for common tools 13100 (sgugger)
- [FlaxCLIP] allow passing params to image and text feature methods 13099 (patil-suraj)
- Ci last fix 13103 (sgugger)
- Improve type checker performance 13094 (bschnurr)
- Fix VisualBERT docs 13106 (gchhablani)
- Fix CircleCI nightly tests 13113 (sgugger)
- Create py.typed 12893 (willfrey)
- Fix flax gpt2 hidden states 13109 (ydshieh)
- Moving fill-mask pipeline to new testing scheme 12943 (Narsil)
- Fix omitted lazy import for xlm-prophetnet 13052 (minwhoo)
- Fix classifier dropout in bertForMultipleChoice 13129 (mandelbrot-walker)
- Fix frameworks table so it's alphabetical 13118 (osanseviero)
- [Feature Processing Sequence] Remove duplicated code 13051 (patrickvonplaten)
- Ci continue through smi failure 13140 (LysandreJik)
- Fix missing `seq_len` in `electra` model when `inputs_embeds` is used. 13128 (sararb)
- Optimizes ByT5 tokenizer 13119 (Narsil)
- Add splinter 12955 (oriram)
- [AutoFeatureExtractor] Fix loading of local folders if config.json exists 13166 (patrickvonplaten)
- Fix generation docstrings regarding input_ids=None 12823 (jvamvas)
- Update namespaces inside torch.utils.data to the latest. 13167 (qqaatw)
- Fix the loss calculation of ProphetNet 13132 (StevenTang1998)
- Fix LUKE tests 13183 (NielsRogge)
- Add min and max question length options to TapasTokenizer 12803 (NielsRogge)
- SageMaker: Fix sagemaker DDP & metric logs 13181 (philschmid)
- correcting group beam search function output score bug 13211 (sourabh112)
- Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account 13056 (SaulLu)
- remove unwanted control-flow code from DeBERTa-V2 13145 (kamalkraj)
- Fix load_tf_weights alias. 13159 (qqaatw)
- Add RemBert to AutoTokenizer 13224 (LysandreJik)
- Allow local_files_only for fast pretrained tokenizers 13225 (BramVanroy)
- fix `AutoModel.from_pretrained(..., torch_dtype=...)` 13209 (stas00)
- Fix broken links in Splinter documentation 13237 (oriram)
- Custom errors and BatchSizeError 13184 (AmbiTyga)
- Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert 13226 (dependabot[bot])
- Update generation_logits_process.py 12671 (willfrey)
- Remove side effects of disabling gradient computaiton 13257 (LysandreJik)
- Replace assert statement with if condition and raise ValueError 13263 (nishprabhu)
- Better notification service 13267 (LysandreJik)
- Fix failing Hubert test 13261 (LysandreJik)
- Add CLIP tokenizer to AutoTokenizer 13258 (LysandreJik)
- Some `model_type`s cannot be in the mapping 13259 (LysandreJik)
- Add require flax to MT5 Flax test 13260 (LysandreJik)
- Migrating conversational pipeline tests to new testing format 13114 (Narsil)
- fix `tokenizer_class_from_name` for models with `-` in the name 13251 (stas00)
- Add error message concerning revision 13266 (BramVanroy)
- Move `image-classification` pipeline to new testing 13272 (Narsil)
- [Hotfix] Fixing the test (warnings was incorrect.) 13278 (Narsil)
- Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. 13277 (Narsil)
- Moving `summarization` pipeline to new testing format. 13279 (Narsil)
- Moving `table-question-answering` pipeline to new testing. 13280 (Narsil)
- Moving `table-question-answering` pipeline to new testing 13281 (Narsil)
- Hotfixing master tests. 13282 (Narsil)
- Moving `text2text-generation` to new pipeline testing mecanism 13283 (Narsil)
- Add DINO conversion script 13265 (NielsRogge)
- Moving `text-generation` pipeline to new testing framework. 13285 (Narsil)
- Moving `token-classification` pipeline to new testing. 13286 (Narsil)
- examples: add keep_linebreaks option to CLM examples 13150 (stefan-it)
- Moving `translation` pipeline to new testing scheme. 13297 (Narsil)
- Fix BeitForMaskedImageModeling 13275 (NielsRogge)
- Moving `zero-shot-classification` pipeline to new testing. 13299 (Narsil)
- Fixing mbart50 with `return_tensors` argument too. 13301 (Narsil)
- [Flax] Correct all return tensors to numpy 13307 (patrickvonplaten)
- examples: only use keep_linebreaks when reading TXT files 13320 (stefan-it)
- Slow tests - run rag token in half precision 13304 (patrickvonplaten)
- [Slow tests] Disable Wav2Vec2 pretraining test for now 13303 (patrickvonplaten)
- Announcing the default model used by the pipeline (with a link). 13276 (Narsil)
- use float 16 in causal mask and masked bias 13194 (hwijeen)
- ✨ add citation file 13214 (flaxel)
- Improve documentation of pooler_output in ModelOutput 13228 (navjotts)
- fix: typo spelling grammar 13212 (slowy07)
- Check None before going through iteration 13250 (qqaatw)
- Use existing functionality for 13251 13333 (sgugger)
- neptune.ai logger: add ability to connect to a neptune.ai run 13319 (fcakyon)
- Update label2id in the model config for run_glue 13334 (sgugger)
- :bug: fix small model card bugs 13310 (nateraw)
- Fall back to `observed_batch_size` when the `dataloader` does not know the `batch_size`. 13188 (mbforbes)
- Fixes 12941 where use_auth_token not been set up early enough 13205 (bennimmo)
- Correct wrong function signatures on the docs website 13198 (qqaatw)
- Fix release utils 13337 (sgugger)
- Add missing module __spec__ 13321 (laurahanu)
- Use DS callable API to allow hf_scheduler + ds_optimizer 13216 (tjruwase)
- Tests fetcher tests 13340 (sgugger)
- [Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests 13313 (patrickvonplaten)
- Fixing a typo in the data_collator documentation 13309 (Serhiy-Shekhovtsov)
- Add GPT2ForTokenClassification 13290 (tucan9389)
- Doc mismatch fixed 13345 (Apoorvgarg-creator)
- Handle nested dict/lists of tensors as inputs in the Trainer 13338 (sgugger)
- [doc] correct TP implementation resources 13248 (stas00)
- Fix minor typo in parallelism doc 13289 (jaketae)
- Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication 13152 (olenmg)
- TF CLM example fix typo 13002 (Rocketknight1)
- Add generate kwargs to Seq2SeqTrainingArguments 13339 (sgugger)