v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.
TAPAS (NielsRogge)
Four new models are released as part of the TAPAS implementation: `TapasModel`, `TapasForQuestionAnswering`, `TapasForMaskedLM` and `TapasForSequenceClassification`, in PyTorch.
TAPAS is a question answering model, used to answer queries given a table. It is a multi-modal model, joining text for the query and tabular data.
The TAPAS model was proposed in [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://www.aclweb.org/anthology/2020.acl-main.398) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
- Tapas v4 (tres) 9117 (NielsRogge)
- AutoModelForTableQuestionAnswering 9154 (LysandreJik)
- TableQuestionAnsweringPipeline 9145 (LysandreJik)
MPNet (StillKeepTry)
Six new models are released as part of the MPNet implementation: `MPNetModel`, `MPNetForMaskedLM`, `MPNetForSequenceClassification`, `MPNetForMultipleChoice`, `MPNetForTokenClassification`, `MPNetForQuestionAnswering`, in both PyTorch and TensorFlow.
MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.
The MPNet model was proposed in MPNet: [Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
- MPNet: Masked and Permuted Pre-training for Language Understanding 8971 (StillKeepTry)
Model parallel (alexorona)
Model parallelism is introduced, allowing users to load very large models on two or more GPUs by spreading the model layers over them. This can allow GPU training even for very large models.
- gpt2 and t5 parallel modeling 8696 (alexorona)
- Model parallel documentation 8741 (LysandreJik)
- Patch model parallel test 8825, 8920 (LysandreJik)
Conda release (LysandreJik)
Transformers welcome their first conda releases, with v4.0.0, v4.0.1 and v4.1.0. The conda packages are now officially maintained on the `huggingface` channel.
- Put Transformers on Conda 8918 (LysandreJik)
Multi-part uploads (julien-c)
For the first time, very large models can be uploaded to the model hub, by using multi-part uploads.
- transformers-cli: LFS multipart uploads (> 5GB) 8663 (julien-c)
New examples and reorganization (sgugger)
We introduced a refactored SQuAD example & notebook, which is faster and simpler than the previous scripts.
The example directory has been re-ordered as we introduce the separation between "examples", which are maintained examples showcasing how to do one specific task, and "research projects", which are bigger projects and maintained by the community.
- New squad example 8992 (sgugger)
- Reorganize examples 9010 (sgugger)
Introduction of fairscale with Sharded DDP (sgugger)
We introduce support for fariscale's ShardedDDP in the `Trainer`, allowing reduced memory usage when training models in a distributed fashion.
- Experimental support for fairscale ShardedDDP 9139 (sgugger)
- Fix gradient clipping for Sharded DDP 9168 (sgugger)
Barthez (moussaKam)
The BARThez model is a French variant of the BART model. We welcome its specific tokenizer to the library and multiple checkpoints to the modelhub.
- Add barthez model 8393 (moussaKam)
General improvements and bugfixes
- `disable_ngram_loss` fix for prophetnet 8554 (Zhylkaaa)
- Fix run_ner script 8664 (sgugger)
- [tokenizers] convert_to_tensors: don't reconvert when the type is already right 8283 (stas00)
- [examples/seq2seq] fix PL deprecation warning 8577 (stas00)
- Add sentencepiece to the CI and fix tests 8672 (sgugger)
- Alternative to globals() 8667 (sgugger)
- Update the bibtex with EMNLP demo 8678 (JetRunner)
- Document adam betas TrainingArguments 8688 (sgugger)
- Fix rag finetuning + add finetuning test 8585 (lhoestq)
- moved temperature warper before topP/topK warpers 8686 (theorm)
- Vectorize RepetitionPenaltyLogitsProcessor to improve performance 8598 (bdalal)
- [Generate Test] fix flaky ci 8694 (patrickvonplaten)
- Fix bug in x-attentions output for roberta and harden test to catch it 8660 (ysgit)
- Add pip install update to resolve import error in transformers notebook 8616 (jessicayung)
- Improve bert-japanese tokenizer handling 8659 (julien-c)
- Change default cache path 8734 (sgugger)
- [trainer] make generate work with multigpu 8716 (stas00)
- consistent ignore keys + make private 8737 (stas00)
- Fix max length in run_plm script 8738 (sgugger)
- Add early stopping callback to pytorch trainer 8581 (cbrochtrup)
- Support various BERT relative position embeddings (2nd) 8276 (zhiheng-huang)
- Fix slow tests v2 8746 (LysandreJik)
- MT5 should have an autotokenizer 8743 (LysandreJik)
- added instructions for syncing upstream master with forked master via PR 8745 (bdalal)
- fix rag index names in eval_rag.py example 8730 (lhoestq)
- [core] implement support for run-time dependency version checking 8645 (stas00)
- New TF model inputs 8602 (jplu)
- Big model table 8774 (sgugger)
- Attempt to get a better fix for QA 8768 (Narsil)
- Fix QA argument handler 8765 (LysandreJik)
- Return correct Bart hidden state tensors 8747 (joeddav)
- [XLNet] Fix mems behavior 8567 (patrickvonplaten)
- [s2s] finetune.py: specifying generation min_length 8478 (danyaljj)
- Revert "[s2s] finetune.py: specifying generation min_length" 8805 (patrickvonplaten)
- Fix PPLM 8779 (chutaklee)
- [s2s finetune trainer] potpurri of small fixes 8807 (stas00)
- [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes 8791 (KristianHolsheimer)
- [Flax test] Add require pytorch to flix flax test 8816 (patrickvonplaten)
- Fix dpr<>bart config for RAG 8808 (patrickvonplaten)
- Extend typing to path-like objects in `PretrainedConfig` and `PreTrainedModel` 8770 (gcompagnoni)
- Fix setup.py on Windows 8798 (jplu)
- BART & FSMT: fix decoder not returning hidden states from the last layer 8597 (maksym-del)
- suggest a numerical limit of 50MB for determining slow 8824 (stas00)
- [MT5] Add use_cache to config 8832 (patrickvonplaten)
- [Pegasus] Refactor Tokenizer 8731 (patrickvonplaten)
- [CI] implement job skipping for doc-only PRs 8826 (stas00)
- Migration guide from v3.x to v4.x 8763 (LysandreJik)
- Add T5 Encoder for Feature Extraction 8717 (agemagician)
- token-classification: use is_world_process_zero instead of is_world_master() 8828 (stefan-it)
- Correct docstring. 8845 (Fraser-Greenlee)
- Add a direct link to the big table 8850 (sgugger)
- Use model.from_pretrained for DataParallel also 8795 (shaie)
- Remove deprecated `evalutate_during_training` 8852 (sgugger)
- Attempt to fix Flax CI error(s) 8829 (mfuntowicz)
- NerPipeline (TokenClassification) now outputs offsets of words 8781 (Narsil)
- [s2s trainer] fix DP mode 8823 (stas00)
- Ctrl for sequence classification 8812 (elk-cloner)
- Fix docstring for language code in mBart 8848 (RQuispeC)
- 2 typos in modeling_rag.py 8676 (ratthachat)
- Make the big table creation/check platform independent 8856 (sgugger)
- Prevent BatchEncoding from blindly passing casts down to the tensors it contains 8860 (Craigacp)
- Better warning when loading a tokenizer with AutoTokenizer w/o Sneten… 8881 (LysandreJik)
- [CI] skip docs-only jobs take 2 8853 (stas00)
- Better support for resuming training 8878 (sgugger)
- Add a `parallel_mode` property to TrainingArguments 8877 (sgugger)
- [trainer] start using training_args.parallel_mode 8882 (stas00)
- [ci] skip doc jobs take 3 8885 (stas00)
- Transfoxl seq classification 8868 (spatil6)
- Warning about too long input for fast tokenizers too 8799 (Narsil)
- [trainer] improve code readability 8903 (stas00)
- [PyTorch] Refactor Resize Token Embeddings 8880 (patrickvonplaten)
- Don't warn that models aren't available if Flax is available. 8841 (skye)
- Avoid erasing the attention mask when double padding 8915 (sgugger)
- Fix move when the two cache folders exist 8917 (sgugger)
- Tweak wording + Add badge w/ number of models on the hub 8914 (julien-c)
- [s2s finetune_trainer] add instructions for distributed training 8884 (stas00)
- Better booleans handling in the TF models 8777 (jplu)
- Fix TF T5 only encoder model with booleans 8925 (LysandreJik)
- [ci] skip doc jobs - circleCI is not reliable - disable skip for now 8926 (stas00)
- [seq2seq] document the caveat of leaky native amp 8930 (stas00)
- Don't pass in token_type_ids to BART for GLUE 8929 (ethanjperez)
- Fix typo for `modeling_bert` import resulting in ImportError 8931 (machelreid)
- Fix QA pipeline on Windows 8947 (sgugger)
- Add TFGPT2ForSequenceClassification based on DialogRPT 8714 (spatil6)
- Remove sourcerer 8965 (clmnt)
- Use word_ids to get labels in run_ner 8962 (sgugger)
- Small fix to the run clm script 8973 (sgugger)
- Update quicktour docs to showcase the use of truncation 8975 (navjotts)
- Copyright 8970 (sgugger)
- Check table as independent script 8976 (LysandreJik)
- [training] SAVE_STATE_WARNING was removed in pytorch 8979 (stas00)
- Optional layers 8961 (jplu)
- Make `ModelOutput` pickle-able 8989 (sgugger)
- Fix interaction of return_token_type_ids and add_special_tokens 8854 (LysandreJik)
- Removed unused `encoder_hidden_states` and `encoder_attention_mask` 8972 (guillaume-be)
- Checking output format + check raises ValueError 8986 (Narsil)
- Templates overhaul 1 8993 (LysandreJik)
- Diverse beam search 2 9006 (patrickvonplaten)
- Fix link to stable version in the doc navbar 9007 (sgugger)
- Remove use of deprecated method in Trainer HP search 8996 (sgugger)
- Add the code_search_net datasets tag to CodeBERTa model cards 9005 (SBrandeis)
- fixes 8968 9009 (cronoik)
- Flax Masked Language Modeling training example 8728 (mfuntowicz)
- [Bart] Refactor - fix issues, consistency with the library, naming 8900 (patrickvonplaten)
- [wip] [ci] doc-job-skip take 4 dry-run 8980 (stas00)
- Fix typo in modeling_tf_bart 9020 (astariul-colanim)
- Fix documention of book in LayoutLM 9017 (sgugger)
- MPNet copyright files 9015 (sgugger)
- Enforce all objects in the main init are documented 9014 (sgugger)
- Refactor FLAX tests 9034 (sgugger)
- Remove value error 8985 (jplu)
- Change nn.dropout to layer.Dropout in TFBart 9047 (astariul-colanim)
- update tatoeba workflow 9051 (patil-suraj)
- Fix PreTrainedTokenizer.pad when first inputs are empty 9018 (sgugger)
- Remove docs only check 9065 (LysandreJik)
- Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/movement-pruning/lxmert 9062 (dependabot[bot])
- Make ProphetNetModel really compatible with EncoderDecoder 9033 (patrickvonplaten)
- Fix min_null_pred in the run_qa script 9067 (sgugger)
- [model_cards] Migrate cards from this repo to model repos on huggingface.co 9013 (julien-c)
- Fix embeddings resizing in TF models 8657 (jplu)
- Patch *ForCausalLM model with TF resize_token_embeddings 9092 (LysandreJik)
- [RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers 9098 (patrickvonplaten)
- Fix variable name in TrainingArguments docstring 9096 (navjotts)
- Fix a broken link in documentation 9101 (SBrandeis)
- [CI doc] safely testing experimental CI features 9070 (stas00)
- Add parallelization support for T5EncoderModel 9082 (agemagician)
- Fix T5 and BART for TF 9063 (jplu)
- [finetune_trainer] enhancements and fixes 9042 (stas00)
- Fix a bug in eval_batch_retrieval of eval_rag.py 9089 (yoshitomo-matsubara)
- Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks 9076 (lewtun)
- [doc] pytorch native amp leak fix landed in 1.7.1 9115 (stas00)
- Fix stack overflow 9114 (LysandreJik)
- Fix T5 model parallel test 9107 (LysandreJik)
- Fix tf2.4 9120 (jplu)
- Added TF OpenAi GPT1 Sequence Classification 9105 (spatil6)
- [TF Bart] Refactor TFBart 9029 (patrickvonplaten)
- Fix typo in trainer_tf.py 9132 (luckynozomi)
- [Bart] fix bart loss masking 9131 (patrickvonplaten)
- [Bart] Correct wrong order in shift token to right in Bart 9134 (patrickvonplaten)
- Fix Bart Shift 9135 (patrickvonplaten)
- Fix TF Transfo XL 9129 (jplu)
- [Examples] Add automatic dataset splitting in language-modeling examples 9133 (TevenLeScao)
- Fix T5 Encoder model parallel tests 9140 (LysandreJik)
- Add possibility to switch between APEX and AMP in Trainer 9137 (sgugger)
- [Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init 9054 (patrickvonplaten)
- DistilBertForSequenceClassification 9148 (AndreaSottana)
- Support for private models from huggingface.co 9141 (julien-c)
- Update notebook table and transformers intro notebook 9136 (sgugger)
- Add message to documentation that longformer doesn't support token_type_ids 9152 (HHousen)