Pegasus, mBART, DPR, self-documented outputs and new pipelines
Pegasus
The Pegasus model from [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu, was added to the library in PyTorch.
Model implemented as a collaboration between Jingqing Zhang and sshleifer in 6340
- PegasusForConditionalGeneration (torch version) 6340
- add pegasus finetuning script 6811 [script](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune_pegasus_xsum.sh). (warning very slow)
DPR
The DPR model from [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih was added to the library in PyTorch.
- Add DPR model 5279 (lhoestq)
- Fix tests imports dpr 5576 (lhoestq)
DeeBERT
The DeeBERT model from [DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference](https://www.aclweb.org/anthology/2020.acl-main.204/) by Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin has been added to the `examples/` folder alongside its training script, in PyTorch.
- Add DeeBERT (entropy-based early exiting for *BERT) 5477 (Ji-Xin)
Self-documented outputs
As well as returning tuples, PyTorch and TensorFlow models now return a subclass of `ModelOutput` that is appropriate. A `ModelOutput` is a dataclass containing all model returns. This allows for easier inspection, and for self-documenting model outputs.
- Change model outputs types to self-document outputs 5438 (sgugger)
- Tf model outputs 6247 (sgugger)
Models return tuples by default, and return self-documented outputs if the `return_dict` configuration flag is set to `True` or if the `return_dict=True` keyword argument is passed to the forward/call method.
Summary of the behavior:
python
The new outputs are opt-in, you have to activate them explicitly with `return_dict=True`
Either at instantiation
model = BertForSequenceClassification.from_pretrained('bert-base-cased', return_dict=True)
Or when calling the model
output = model(**inputs, return_dict=True)
You can access the elements of the outputs with
(1) named attributes
loss = outputs.loss
logits = outputs.logits
(2) their names as strings like a dict
loss = outputs["loss"]
logits = outputs["logits"]
(3) their index as integers or slices in the pre-3.1.0 outputs tuples
loss = outputs[0]
logits = outputs[1]
loss, logits = outputs[:2]
One **breaking behavior** of these new outputs (which is the reason you have to opt-in to use these new outputs:
Iterating on the outputs now return the names (keys) instead of the values:
print([element for element in outputs])
>>> ['loss', 'logits']
Thus you cannot unpack the output like pre-3.1.0 (you get the string names instead of the values):
(But you can query a slice like indicated in (3) above)
loss_keys, logits_key = outputs
Encoder-Decoder framework
The encoder-decoder framework has been enhanced to allow more encoder decoder model combinations, *e.g.*:
Bert2Bert, Bert2GPT2, Roberta2Roberta, Longformer2Roberta, ....
- [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer 6411 (patrickvonplaten)
- [EncoderDecoder] Add Cross Attention for GPT2 6415 (patrickvonplaten)
- [EncoderDecoder] Add functionality to tie encoder decoder weights 6538 (patrickvonplaten)
- Multiple combinations of EncoderDecoder models have been fine-tuned and evaluated on CNN/Daily-Mail summarization: https://huggingface.co/models?search=cnn_dailymail-fp16 (patrickvonplaten)
TensorFlow as a first-class citizen
As we continue working towards having TensorFlow be a first-class citizen, we continually improve on our TensorFlow API and models.
- [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile 5395 (patrickvonplaten)
- [Benchmark] Add benchmarks for TF Training 5594 (patrickvonplaten)
Machine Translation
MarianMTModel
- [en-zh](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh?text=My+name+is+Wolfgang+and+I+live+in+Berlin) and **357** other checkpoints for machine translation were added from the Helsinki-NLP group's Tatoeba Project (sshleifer + jorgtied). There are now > 1300 supported pairs for machine translation.
- Marian converter updates 6342 (sshleifer)
- Marian distill scripts + integration test 6799 (sshleifer)
mBART
The mBART model from [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) was can now be accessed through `MBartForConditionalGeneration`.
- Add mbart-large-cc25, support translation finetuning 5129 (sshleifer)
- [mbart] prepare_translation_batch passes **kwargs to allow DeprecationWarning 5581 (sshleifer)
- MBartForConditionalGeneration 6441 (patil-suraj)
- [fix] mbart_en_ro_generate test now identical to fairseq 5731 (sshleifer)
- [Doc] explaining romanian postprocessing for MBART BLEU hacking 5943 (sshleifer)
- [test] partial coverage for train_mbart_enro_cc25.sh 5976 (sshleifer)
- MbartTokenizer: do not hardcode vocab size 5998 (sshleifer)
- MBART: support summarization tasks where max_src_len > max_tgt_len 6003 (sshleifer)
- Fix 6096: MBartTokenizer's mask token 6098 (sshleifer)
- [s2s] Document better mbart finetuning command 6229 (sshleifer)
- mBART Conversion script 6230 (sshleifer)
- [s2s] add BartTranslationDistiller for distilling mBART 6363 (sshleifer)
- [Doc] add more MBart and other doc 6490 (patil-suraj)
examples/seq2seq
- examples/seq2seq/finetune.py supports --task translation
- All sequence to sequence tokenizers (T5, Bart, Marian, Pegasus) expose a `prepare_seq2seq_batch` method that makes batches for sequence to sequence trianing.
PRs:
- Seq2SeqDataset uses linecache to save memory 5792 (Pradhy729)
- [examples/seq2seq]: add --label_smoothing option 5919 (sshleifer)
- seq2seq/run_eval.py can take decoder_start_token_id 5949 (sshleifer)
- [examples (seq2seq)] fix preparing decoder_input_ids for T5 5994 (patil-suraj)
- [s2s] add support for overriding config params 6149 (stas00)
- s2s: fix LR logging, remove some dead code. 6205 (sshleifer)
- [s2s] tiny QOL improvement: run_eval prints scores 6341 (sshleifer)
- [s2s] fix label_smoothed_nll_loss 6344 (patil-suraj)
- [s2s] fix --gpus clarg collision 6358 (sshleifer)
- [s2s] Script to save wmt data to disk 6403 (sshleifer)
- rename prepare_translation_batch -> prepare_seq2seq_batch 6103 (sshleifer)
- Mult rouge by 100: standard units 6359 (sshleifer)
- allow spaces in bash args with "$" 6521 (sshleifer)
- [seq2seq] MAX_LEN env var for MT commands 5837 (sshleifer)
- [seq2seq] distillation.py accepts trainer arguments 5865 (sshleifer)
- [s2s]Use prepare_translation_batch for Marian finetuning 6293 (sshleifer)
- [BartTokenizer] add prepare s2s batch 6212 (patil-suraj)
- [T5Tokenizer] add prepare_seq2seq_batch method 6122 (patil-suraj)
- [s2s] round runtime in run_eval 6798 (sshleifer)
- [s2s README] Add more dataset download instructions 6737 (sshleifer)
- [s2s] round bleu, rouge to 4 digits 6704 (sshleifer)
- [s2s] command line args for faster val steps 6833
New documentation
Several new documentation pages have been added and older documentation has been tweaked to be more accurate and understandable. An open in colab button has been added on the tutorial pages.
- Guide to fixed-length model perplexity evaluation 5449 (joeddav)
- Improvements to PretrainedConfig documentation 5642 (sgugger)
- Document model outputs 5673 (sgugger)
- docs(wandb): explain how to use W&B integration 5607 (borisdayma)
- Model utils doc 6005 (sgugger)
- ONNX documentation 5992 (mfuntowicz)
- Tokenizer documentation 6110 (sgugger)
- Pipeline documentation 6175 (sgugger)
- Encoder decoder config docs 6195 (afcruzs)
- Colab button 6389 (sgugger)
- Generation documentation 6470 (sgugger)
- Add custom datasets tutorial 6466 (joeddav)
- Logging documentation 6852 (sgugger)
Trainer updates
New additions to the `Trainer`
- Added data collator for permutation (XLNet) language modeling and related calls 5522 (shngt)
- Trainer support for iterabledataset 5834 (Pradhy729)
- Adding PaddingDataCollator 6442 (sgugger)
- Add hyperparameter search to Trainer 6576 (sgugger)
- [examples] Add trainer support for question-answering 4829 (patil-suraj)
- Adds comet_ml to the list of auto-experiment loggers 6176 (dsblank)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task 6644 (HuangLianzhe)
New models & model architectures
The following model architectures have been added to the library
- FlaubertForTokenClassification 5644 (stas00)
- TFXLMForTokenClassification 5614 (LysandreJik)
- TFXLMForMultipleChoice 5614 (LysandreJik)
- TFFlaubertForTokenClassification 5614 (LysandreJik)
- TFFlaubertForMultipleChoice 5614 (LysandreJik)
- TFElectraForSequenceClassification 6227 (jplu)
- TFElectraForMultipleChoice 6227 (jplu)
- TF Longformer 5764 (patrickvonplaten)
- CamembertForCausalLM 6577 (patil-suraj)
Regression testing on TPU & TPU CI
Thanks to zcain117 we now have access to TPU CI for the PyTorch/xla framework. This enables regression testing on the TPU aspects of the `Trainer`, and offers very simple regression testing on model training performance.
- Test XLA examples 5583
- Add setup for TPU CI to run every hour. 6219 (zcain117)
- Add missing docker arg for TPU CI. 6393 (zcain117)
- Get GKE logs via kubectl logs instead of gcloud logging read. 6446 (zcain117)
New pipelines
New pipelines have been added:
- Zero shot classification pipeline 5760 (joeddav)
- Addition of a DialoguePipeline 5516 (guillaume-be)
- Add targets arg to fill-mask pipeline 6239 (joeddav)
Community notebooks
- [Fine-tune Electra and interpret with Integrated Gradients](https://github.com/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb) #6321 (elsanns)
- Update ONNX notebook to include section on quantization. 6831 (mfuntowicz)
Centralized logging
Logging is now centralized. The library offers methods to handle the verbosity level of all loggers contained in the library. [Link to logging doc here]:
- Centralize logging 6434 (LysandreJik)
Bug fixes and improvements
- [Reformer] Adapt Reformer MaskedLM Attn mask 5560 (patrickvonplaten)
- Make T5 compatible with ONNX 5518 (abelriboulot)
- [Bart] enable test_torchscript, update test_tie_weights 5457 (sshleifer)
- [docs] fix model_doc links in model summary 5566 (patil-suraj)
- [Benchmark] Readme for benchmark 5363 (patrickvonplaten)
- Fix Inconsistent NER Grouping (Pipeline) 4987 (enzoampil)
- QA pipeline BART compatible 5496 (mfuntowicz)
- More explicit error when failing to tensorize overflowing tokens 5633 (LysandreJik)
- Should check that torch TPU is available 5636 (LysandreJik)
- Add forum link in the docs 5637 (sgugger)
- Fixed TextGenerationPipeline on torch + GPU 5629 (TevenLeScao)
- Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) 5632 (TevenLeScao)
- [squad] add version tag to squad cache 5669 (lazovich)
- Deprecate old past arguments 5671 (sgugger)
- Pipeline model type check 5679 (JetRunner)
- rename the functions to match the rest of the test convention 5692 (stas00)
- doc improvements 5688 (stas00)
- Fix Trainer in DataParallel setting 5685 (sgugger)
- [Longformer] fix longformer global attention output 5659 (patrickvonplaten)
- [Fix] github actions CI by reverting 5138 5686 (sshleifer)
- [Reformer classification head] Implement the reformer model classification head for text classification 5198 (as-stevens)
- Cleanup bart caching logic 5640 (sshleifer)
- [AutoModels] Fix config params handling of all PT and TF AutoModels 5665 (patrickvonplaten)
- [cleanup] T5 test, warnings 5761 (sshleifer)
- [fix] T5 ONNX test: model.to(torch_device) 5769 (mfuntowicz)
- [Benchmark] fix benchmark non standard model 5801 (patrickvonplaten)
- [Benchmark] Fix models without `architectures` param in config 5808 (patrickvonplaten)
- [Longformer] fix longformer slow-down 5811 (patrickvonplaten)
- [seq2seq] pack_dataset.py rewrites dataset in max_tokens format 5819 (sshleifer)
- [seq2seq] Don't copy self.source in sortishsampler 5818 (sshleifer)
- [cleanups] make Marian save as Marian 5830 (sshleifer)
- [Reformer] - Cache hidden states and buckets to speed up inference 5578 (patrickvonplaten)
- Lightning Updates for v0.8.5 5798 (nateraw)
- Update tokenizers to 0.8.1.rc to fix Mac OS X issues 5867 (sepal)
- Xlnet outputs 5883 (TevenLeScao)
- DataParallel fixes 5733 (stas00)
- [cleanup] squad processor 5868 (sshleifer)
- Improve doc of use_cache 5912 (sgugger)
- [Fix] seq2seq pack_dataset.py actually packs 5913 (sshleifer)
- Add AlbertForPretraining to doc 5914 (sgugger)
- DataParallel fix: multi gpu evaluation 5926 (csarron)
- Clarify arg class 5916 (sgugger)
- [CI] self-scheduled runner tests examples/ 5927 (sshleifer)
- Update doc to new model outputs 5946 (sgugger)
- [CI] Install examples/requirements.txt 5956 (sshleifer)
- Expose padding_strategy on squad processor to fix QA pipeline performance regression 5932 (mfuntowicz)
- [docs] Add integration test example to copy pasta template 5961 (sshleifer)
- Cleanup Trainer and expose customization points 5982 (sgugger)
- Avoid unnecessary warnings when loading pretrained model 5922 (sgugger)
- Ensure OpenAI GPT position_ids is correctly initialized and registered at init. 5773 (mfuntowicz)
- [CI] Don't test apex 6021 (sshleifer)
- add a summary report flag for run_examples on CI 6035 (stas00)
- don't complain about missing W&B when WANDB_DISABLED=true 6036 (stas00)
- Allow to set Adam beta1, beta2 in TrainingArgs 5592 (gonglinyuan)
- Fix the return documentation rendering for all model outputs 6022 (sgugger)
- Fix typo (model saving TF) 5734 (Colanim)
- Add new AutoModel classes in pipeline 6062 (patil-suraj)
- [pack_dataset] don't sort before packing, only pack train 5954 (sshleifer)
- CL util to convert models to fp16 before upload 5953 (sshleifer)
- Add fire to setup.cfg to make isort happy 6066 (sgugger)
- [fix] no warning for position_ids buffer 6063 (sshleifer)
- Pipelines should use tuples instead of namedtuples 6061 (LysandreJik)
- Moving
transformers package import statements to relative imports in some files 5796 (afcruzs)
- github issue template suggests who to tag 5790 (sshleifer)
- Make all data collators accept dict 6065 (sgugger)
- Add inference widget examples 5825 (clmnt)
- [s2s] Delete useless method, log tokens_per_batch 6081 (sshleifer)
- Logs should not be hidden behind a logger.info 6097 (LysandreJik)
- Fix zero-shot pipeline single seq output shape 6104 (joeddav)
- [fix] add bart to LM_MAPPING 6099 (sshleifer)
- [Fix] position_ids tests again 6100 (sshleifer)
- Fix deebert tests 6102 (sshleifer)
- Use FutureWarning to deprecate 6111 (sgugger)
- Added capability to quantize a model while exporting through ONNX. 6089 (mfuntowicz)
- XLNet PLM Readme 6121 (LysandreJik)
- Fix TF CTRL model naming 6134 (jplu)
- Use google style to document properties 6130 (sgugger)
- Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} 5614
- Rework TF trainer 6038 (jplu)
- Actually the extra_id are from 0-99 and not from 1-100 5967 (orena1)
- add another e.g. to avoid confusion 6055 (orena1)
- Tf trainer cleanup 6143 (sgugger)
- Switch from return_tuple to return_dict 6138 (sgugger)
- Fix FlauBERT GPU test 6142 (LysandreJik)
- Enable ONNX/ONNXRuntime optimizations through converter script 6131 (mfuntowicz)
- Add Pytorch Native AMP support in Trainer 6151 (prajjwal1)
- enable easy checkout switch 5645 (stas00)
- Replace mecab-python3 with fugashi for Japanese tokenization 6086 (polm)
- parse arguments from dict 4869 (patil-suraj)
- Harmonize both Trainers API 6157 (sgugger)
- Model output test 6155 (sgugger)
- [s2s] clean up + doc 6184 (stas00)
- Add script to convert BERT tf2.x checkpoint to PyTorch 5791 (mar-muel)
- Empty assert hunt 6056 (TevenLeScao)
- Fix saved model creation 5468 (jplu)
- Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. 5331 (jaymody)
- [DataCollatorForLanguageModeling] fix labels 6213 (patil-suraj)
- Fix _shift_right function in TFT5PreTrainedModel 6214 (maurice-g)
- Remove outdated BERT tips 6217 (JetRunner)
- run_hans label fix 6221 (VictorSanh)
- Make the order of additional special tokens deterministic 5704 (gonglinyuan)
- cleanup torch unittests 6196 (stas00)
- test_tokenization_common.py: Remove redundant coverage 6224 (sshleifer)
- [Reformer] fix reformer fp16 test 6237 (patrickvonplaten)
- [Reformer] Make random seed generator available on random seed and not on model device 6244 (patrickvonplaten)
- Update to match renamed attributes in fairseq master 5972 (LilianBordeau)
- [WIP] lightning_base: support --lr_scheduler with multiple possibilities 6232 (stas00)
- Trainer + wandb quality of life logging tweaks 6241 (TevenLeScao)
- Add strip_accents to basic BertTokenizer. 6280 (PhilipMay)
- Argument to set GPT2 inner dimension 6296 (TevenLeScao)
- [Reformer] fix default generators for pytorch < 1.6 6300 (patrickvonplaten)
- Remove redundant line in run_pl_glue.py 6305 (xujiaze13)
- [Fix] text-classification PL example 6027 (bhashithe)
- fix the shuffle agrument usage and the default 6307 (stas00)
- CI dependency wheel caching 6287 (LysandreJik)
- Patch GPU failures 6281 (LysandreJik)
- fix consistency CrossEntropyLoss in modeling_bart 6265 (idoh)
- Add a script to check all models are tested and documented 6298 (sgugger)
- Fix the tests for Electra 6284 (jplu)
- [examples] consistently use --gpus, instead of --n_gpu 6315 (stas00)
- refactor almost identical tests 6339 (stas00)
- Small docfile fixes 6328 (sgugger)
- Patch models 6326 (LysandreJik)
- Ci GitHub caching 6382 (LysandreJik)
- Fix links for open in colab 6391 (sgugger)
- [EncoderDecoderModel] add a `add_cross_attention` boolean to config 6377 (patrickvonplaten)
- Feed forward chunking 6024 (Pradhy729)
- add pl_glue example test 6034 (stas00)
- testing utils: capturing std streams context manager 6231 (stas00)
- Fix tokenizer saving and loading error 6026 (yobekiko)
- Warn if debug requested without TPU 6390 (dmlap)
- [Performance improvement] "Bad tokens ids" optimization 6064 (guillaume-be)
- pl version: examples/requirements.txt is single source of truth 6309 (stas00)
- [s2s] wmt download script use less ram 6405 (stas00)
- [pl] restore lr logging behavior for glue, ner examples 6314 (stas00)
- lr_schedulers: add get_polynomial_decay_schedule_with_warmup 6361 (stas00)
- [examples] add pytest dependency 6425 (sshleifer)
- [test] replace capsys with the more refined CaptureStderr/CaptureStdout 6422 (stas00)
- Fixes to make life easier with the nlp library 6423 (sgugger)
- Move prediction_loss_only to TrainingArguments 6426 (sgugger)
- Activate check on the CI 6427 (sgugger)
- cleanup tf unittests: part 2 6260 (stas00)
- Fix docs and bad word tokens generation_utils.py 6387 (ZhuBaohe)
- Test model outputs equivalence 6445 (LysandreJik)
- add LongformerTokenizerFast in AutoTokenizer 6463 (patil-suraj)
- add BartTokenizerFast in AutoTokenizer 6464 (patil-suraj)
- Add POS tagging and Phrase chunking token classification examples 6457 (vblagoje)
- Clean directory after script testing 6453 (JetRunner)
- Use hash to clean the test dirs 6475 (JetRunner)
- Sort unique_no_split_tokens to make it deterministic 6461 (lhoestq)
- Fix TPU Convergence bug 6488 (jysohn23)
- Support additional dictionaries for BERT Japanese tokenizers 6515 (singletongue)
- [doc] Summary of the models fixes 6511 (stas00)
- Remove deprecated assertEquals 6532 (JetRunner)
- [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() 6494 (stas00)
- [sched] polynomial_decay_schedule use default power=1.0 6473 (stas00)
- Fix flaky ONNX tests 6531 (mfuntowicz)
- [doc] make the text more readable, fix some typos, add some disambiguation 6508 (stas00)
- [doc] multiple corrections to "Summary of the tasks" 6509 (stas00)
- replace _ with __ rst links 6541 (stas00)
- Fixed label datatype for STS-B 6492 (amodaresi)
- fix incorrect codecov reports 6553 (stas00)
- [docs] Fix wrong newline in the middle of a paragraph 6573 (romainr)
- [docs] Fix number of 'ug' occurrences in tokenizer_summary 6574 (romainr)
- add BartConfig.force_bos_token_to_be_generated 6526 (sshleifer)
- Fix bart base test 6587 (sshleifer)
- Feed forward chunking others 6365 (Pradhy729)
- tf generation utils: remove unused kwargs 6591 (sshleifer)
- [BartTokenizerFast] add prepare_seq2seq_batch 6543 (patil-suraj)
- [doc] lighter 'make test' 6512 (stas00)
- [docs] Copy code button misses '...' prefixed code 6518 (romainr)
- removed redundant arg in prepare_inputs 6614 (prajjwal1)
- add intro to nlp lib & dataset links to custom datasets tutorial 6583 (joeddav)
- Add tests to Trainer 6605 (sgugger)
- TFTrainer dataset doc & fix evaluation bug 6618 (joeddav)
- Add tests/test_tokenization_reformer.py 6485 (D-Roberts)
- [Tests] fix attention masks in Tests 6621 (patrickvonplaten)
- XLNet Bug when training with apex 16-bit precision 6567 (johndolgov)
- Move threshold up for flaky test with Electra 6622 (sgugger)
- Regression test for pegasus bugfix 6606 (sshleifer)
- Trainer automatically drops unused columns in nlp datasets 6449 (sgugger)
- [Docs model summaries] Add pegasus to docs 6640 (patrickvonplaten)
- [Doc model summary] add MBart model summary 6649 (patil-suraj)
- Specify config filename in HfArgumentParser 6626 (jarednielsen)
- Don't reset the dataset type + plug for rm unused columns 6683 (sgugger)
- Fixed DataCollatorForLanguageModeling not accepting lists of lists 6685 (TevenLeScao)
- Update repo to isort v5 6686 (sgugger)
- Fix PL token classification examples 6682 (vblagoje)
- Lat fix for Ray HP search 6691 (sgugger)
- Create PULL_REQUEST_TEMPLATE.md 6660 (stas00)
- [doc] remove BartForConditionalGeneration.generate 6659 (stas00)
- Move unused args to kwargs 6694 (sgugger)
- [fixdoc] Add import to pegasus usage doc 6698 (sshleifer)
- Fix hyperparameter_search doc 6695 (sgugger)
- Remove hard-coded uses of float32 to fix mixed precision use 6648 (schmidek)
- Add DPR to models summary 6690 (lhoestq)
- Add typing.overload for convert_ids_tokens 6637 (tamuhey)
- Allow tests in examples to use cuda or fp16,if they are available 5512 (Joel-hanson)
- ci/gh/self-scheduled: add newline to make examples tests run even if src/ tests fail 6706 (sshleifer)
- Use separate tqdm progressbars 6696 (sgugger)
- More tests to Trainer 6699 (sgugger)
- Add tokenizer to Trainer 6689 (sgugger)
- tensor.nonzero() is deprecated in PyTorch 1.6 6715 (mfuntowicz)
- [Albert] Add position ids to allowed uninitialized weights 6719 (patrickvonplaten)
- Fix ONNX test_quantize unittest 6716 (mfuntowicz)
- [squad] make examples and dataset accessible from SquadDataset object 6710 (lazovich)
- Fix pegasus-xsum integration test 6726 (sshleifer)
- T5Tokenizer adds EOS token if not already added 5866 (sshleifer)
- Install nlp for github actions test 6728 (sgugger)
- [Torchscript] Fix docs 6740 (patrickvonplaten)
- Add "tie_word_embeddings" config param 6692 (patrickvonplaten)
- Fix tf boolean mask in graph mode 6741 (JayYip)
- Fix TF optimizer 6717 (jplu)
- [TF Longformer] Improve Speed for TF Longformer 6447 (patrickvonplaten)
- add __init__.py to utils 6754 (joeddav)
- [s2s] run_eval.py QOL improvements and cleanup 6746 (sshleifer)
- s2s distillation uses AutoModelForSeqToSeqLM 6761 (sshleifer)
- Add AdaFactor optimizer from fairseq 6722 (moscow25)
- Adds Adafactor to the docs and slightly fixes the formatting 6765 (LysandreJik)
- Fix the TF Trainer gradient accumulation and the TF NER example 6713 (jplu)
- Fix run_squad.py to work with BART 6756 (tomgrek)
- Add NLP install to self-scheduled CI 6767 (sshleifer)
- [testing] replace hardcoded paths to allow running tests from anywhere 6523 (stas00)
- [test schedulers] adjust to test the first step's reading 6429 (stas00)
- new Makefile target: docs 6510 (stas00)
- [transformers-cli] fix logger getter 6777 (stas00)
- PL: --adafactor option 6776 (sshleifer)
- [style] set the minimal required version for `black` 6784 (stas00)
- Transformer-XL: Improved tokenization with sacremoses 6322 (RafaelWO)
- prepare_seq2seq_batch makes labels/ decoder_input_ids made later. 6654 (sshleifer)
- t5 model should make decoder_attention_mask 6800 (sshleifer)
- [s2s] Test hub configs in self-scheduled CI 6809 (sshleifer)
- [bart] rename self-attention -> attention 6708 (sshleifer)
- [tests] fix typos in inputs 6818 (stas00)
- Fixed open in colab link 6825 (PandaWhoCodes)
- clarify shuffle 6312 (xujiaze13)
- TF Flaubert w/ pre-norm 6841 (LysandreJik)
- Fix resuming training for Windows 6847 (sgugger)
- Only access loss tensor every logging_steps 6802 (jysohn23)
- Add checkpointing to Ray Tune HPO 6747 (krfricke)
- Split hp search methods 6857 (sgugger)
- Fix marian slow test 6854 (sshleifer)
- Bart can make decoder_input_ids from labels 6758 (sshleifer)
- add a final report to all pytest jobs 6861 (stas00)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. 6875 (mfuntowicz)
- [Generate] Facilitate PyTorch generate using `ModelOutputs` 6735 (patrickvonplaten)