Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 23 of 30

3.5.0

Not secure
Model versioning, TensorFlow encoder-decoder models, new scripts, refactor of the `generate` method

Model versioning

We host more and more of the community's models which is awesome ❤️. To scale this sharing, we needed to change the infra to both support more models, and unlock new powerful features.

To that effect, we have rebuilt the storage backend that we use for models (currently S3), to **our own git repos (using S3 as a git-lfs endpoint for large files), with one model = one repo**.

The benefits of this switch are:

* **built-in versioning** (I mean… it’s git. It’s pretty much what you use for versioning. Versioning in S3 has a ton a limitations)
* **access control** (will unlock private models, private datasets, etc)
* **scalability** (our usage of S3 to maintain lists of models was starting to bottleneck)

Let's dive in to the actual changes:

I. On the website
---

You'll now see a "Browse files and versions" tab or button on each model page. (design is not final, we'll make it more prominent/streamlined in the near future)

This is what this page looks like:

![Screenshot 2020-11-06 at 19.23.05|584x500](https://aws1.discourse-cdn.com/standard14/uploads/hellohellohello/optimized/1X/02272b7436c2db622eb873d9ca2b61349cdebfa6_2_1168x1000.jpeg)

**The UX should look familiar and self-explanatory**, but we'll add more ML-specific features in the future.

You can:
* see commit histories and diffs of changes made to any text file, like config.json:
* changes made by the HuggingFace team will be way clearer – we can perform updates to the models to ensure they work well with the library(ies) (you'll be able to opt out from those changes)
* Large binary files are stored using https://git-lfs.github.com/ which is pretty standard now, and interoperable out of the box with git
* Ability to update your text files, like your README.md model card, directly on the website!
* with instant preview 🔥

II. In the transformers library
---

The PR to enable this new storage mode in the `transformers` library is available here: **https://github.com/huggingface/transformers/pull/8324**

This PR has two parts:

**1. changes to the file downloading code** used in `from_pretrained()` methods to use the new file URLs.
Large files are stored in an S3 bucket and served by Cloudfront so downloads should be as fast as they are right now.

In addition, you now have a way to pin a specific version of a model, to a commit hash, tag or branch.

For instance:
python
tokenizer = AutoTokenizer.from_pretrained(
"julien-c/EsperBERTo-small",
revision="v2.0.1" tag name, or branch name, or commit hash
)

Finally, the networking code is more robust and doesn't gobble up errors anymore, so in case you have trouble downloading a specific file you'll know exactly why.

**2. changes to the model upload CLI** to create a model repo then be able to git clone and git push to it.
We are intentionally not wrapping `git` too much because we expect most model authors to be familiar with git (and possibly git-lfs), let us know if not the case.

To create a repo:
bash
transformers-cli repo create your-model-name


Then you'll get a repo url that you'll be able to clone:
bash
git clone https://huggingface.co/username/your-model-name

Then commit as usual
cd your-model-name
echo "hello" >> README.md
git add . && git commit -m "Update from $USER"


A nice side effect of the new system on the upload side is that file uploading should be more robust for very large files (hello T5!) as git-lfs handles the networking code.

By the way, again, **every model is its own repo**. So you can git clone any public model if you'd like:

bash
git clone https://huggingface.co/gpt2


But you won't be able to push unless it's one of your models (or one of your orgs').

III. Backward compatibility
---

* **Backward compatibility on model downloads is expected**, because even though the new models will be stored in huggingface.co-hosted git repos, we will backport all file changes to S3 automatically.
* **⚠️ Model uploads using the current system won't work anymore**: you'll need to upgrade your transformers installation to the next release, `v3.5.0`, or to build from `master`.
Alternatively, in the next week or so we'll add the ability to create a repo from the website directly so you'll be able to push even without the transformers library.

TFMarian, TFMbart, TFPegasus, TFBlenderbot

- Add tensorflow 2.0 functionality for SOTA seq2seq transformers 7987 (sshleifer)


New and updated scripts

We'working on giving examples on how to leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library and the Trainer API. Those scripts are meant as examples easy to customize, with lots of comments explaining the various steps. The following tasks are now covered:

- Text classification : New run glue script 7917 (sgugger)
- Causal Language Modeling: New run_clm script 8105 (sgugger)
- Masked Language Modeling: Add line by line option to mlm/plm scripts 8240 (sgugger)
- Token classification: Add new token classification example 8340 (sgugger)

Seq2Seq Trainer

A child of `Trainer` specialized for training seq2seq models, from patil-suraj, stas00 and sshleifer. Accessible through `examples/seq2seq/finetune_trainer.py`. API is similar to `examples/seq2seq/finetune.py`, but API support is better. Example scripts are in `examples/seq2seq/builtin_trainer`.


- [seq2seq testing] multigpu test run via subprocess 7281 (stas00)
- [s2s trainer] tests to use distributed on multi-gpu machine 7965 (stas00)
- [Seq2Seq] Allow EncoderDecoderModels to be trained with Seq2Seq 7809 (patrickvonplaten)
- [Seq2Seq Trainer] Make sure padding is implemented for models without pad_token 8043 (patrickvonplaten)
- [Seq2SeqTrainer] Move import to init to make file self-contained 8194 (patrickvonplaten)
- [s2s test] cleanup 8131 (stas00)
- [Seq2Seq] Correct import in Seq2Seq Trainer 8254 (patrickvonplaten)
- [Seq2Seq] Make Seq2SeqArguments an independent file 8267 (patrickvonplaten)
- [Seq2SeqDataCollator] dont pass add_ prefix_space=False to all tokenizers 8329 (sshleifer)

Seq2Seq Testing and Documentation Improvements
- [s2s] create doc for pegasus/fsmt replication 7934 (stas00)
- [s2s] test_distributed_eval 8315 (stas00)
- [s2s] test_bash_script.py - actually learn something 8318 (stas00)
- [s2s examples test] fix data path 8398 (stas00)
- [s2s test_finetune_trainer] failing multigpu test 8400 (stas00)
- [s2s/distill] remove run_distiller.sh, fix xsum script 8412 (sshleifer)


Docs for DistillBART Paper Replication
Re-run experiments from [the paper](https://arxiv.org/abs/2010.13002) [here](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md#distilbart)
- [s2s] distillBART docs for paper replication 8150 (sshleifer)

Refactoring the `generate()` function

The `generate()` method now has a new design so that the user can directly call upon the methods
`sample()`, `greedy_search()`, `beam_search()` and `beam_sample()`. The code was made more readable, and beam search was sped-up by *ca.* 5-10%.

Refactoring the generate() function 6949 (patrickvonplaten)

Notebooks

- added qg evaluation notebook 7958 (zolekode)
- adding beginner-friendly notebook on text classification with DistilBERT/TF 7964 (peterbayerle)
- [Notebooks] Add new encoder-decoder notebooks 8246 (patrickvonplaten)

General improvements and bugfixes

- Respect the 119 line chars 7928 (LysandreJik)
- PPL guide code snippet minor fix 7938 (joeddav)
- [ProphetNet] Add Question Generation Model + Test 7942 (patrickvonplaten)
- [multiple models] skip saving/loading deterministic state_dict keys 7878 (stas00)
- Add missing comma 7870 (mrm8488)
- TensorBoard/Wandb/optuna/raytune integration improvements. 7935 (madlag)
- [ProphetNet] Correct Doc string example 7944 (patrickvonplaten)
- [GPT2 batch generation] Make test clearer. `do_sample=True` is not deterministic. 7947 (patrickvonplaten)
- fix 'encode_plus' docstring for 'special_tokens_mask' (0s and 1s were reversed) 7949 (epwalsh)
- Herbert tokenizer auto load 7968 (rmroczkowski)
- [testing] slow tests should be marked as slow 7895 (stas00)
- support relative path for best_model_checkpoint 7973 (HaebinShin)
- Disable inference API for t5-11b 7978 (julien-c)
- [fsmt test] basic config test with online model + super tiny model 7860 (stas00)
- Add whole word mask support for lm fine-tune 7925 (wlhgtc)
- [PretrainedConfig] Fix save pretrained config for edge case 7943 (patrickvonplaten)
- GPT2 - Remove else branch adding 0 to the hidden state if token_type_embeds is None. 7977 (mfuntowicz)
- Fixing the "translation", "translation_XX_to_YY" pipelines. 7975 (Narsil)
- FillMaskPipeline: support passing top_k on __call__ 7971 (julien-c)
- Only log total_flos at the end of training 7981 (sgugger)
- add zero shot pipeline tags & examples 7983 (joeddav)
- Reload checkpoint 7984 (sgugger)
- [gh ci] less output ( --durations=50) 7989 (sshleifer)
- Move NoLayerEmbedTokens 7945 (sshleifer)
- update zero shot default widget example 7992 (joeddav)
- [RAG] Handle the case when title is None while loading own datasets 7941 (lalitpagaria)
- [tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups 7970 (thomwolf)
- [Reformer] remove reformer pad_token_id 7991 (patrickvonplaten)
- Fix BatchEncoding.word_to_tokens for removed tokens 7939 (n1t0)
- Handling longformer model_type 7990 (ethanjperez)
- [doc prepare_seq2seq_batch] fix docs 8013 (patil-suraj)
- [tokenizers] Fixing 8001 - Adding tests on tokenizers serialization 8006 (thomwolf)
- Add mixed precision evaluation 8036 (luyug)
- [docs] [testing] distributed training 7993 (stas00)
- [fix] FSMT slow test uses lists instead of torch tensors 8031 (sshleifer)
- update version for scipy 7998 (suliuzh)
- [cleanup] pegasus,marian,mbart pytorch tests 8033 (sshleifer)
- Fix label name in DataCollatorForNextSentencePrediction test 8048 (sgugger)
- Tiny TF Bart fixes 8023 (LysandreJik)
- Mlflow integration callback 8016 (noise-field)
- Minor error fix of 'bart-large-cnn' details in the pretrained_models doc 8053 (forest1988)
- invalid argument wwm passed to the run_language_modeling.py file 8050 (mohammadreza-Banaei73)
- Fix + Test 8049 (LysandreJik)
- [testing] fixing crash in deberta 8057 (stas00)
- [TF] from_pt should respect authorized_unexpected_keys 8056 (sshleifer)
- Fix TF training arguments instantiation 8063 (LysandreJik)
- Doc fixes in preparation for the docstyle PR 8061 (sgugger)
- Doc styling 8067 (sgugger)
- Fix doc examples 8082 (mymusise)
- Fix comet_ml import and add ensure availability 7933 (dsblank)
- Doc styling fixes 8074 (sgugger)
- Fix DeBERTa docs 8092 (LysandreJik)
- [CI] generate separate report files as artifacts 7995 (stas00)
- Move style_doc to extra_quality_checks 8081 (sshleifer)
- Fix IterableDataset with __len__ in Trainer 8095 (cccntu)
- Fix assertion error message for MLflowCallback 8091 (harupy)
- Fix a bug for `CallbackHandler.callback_list` 8052 (harupy)
- [setup] update/add setup targets 8076 (stas00)
- DEP: pinned sentencepiece to 0.1.91 in setup.py to fix build issues with newer versions 8069 (jmwoloso)
- infer entailment label id on zero shot pipeline 8059 (joeddav)
- Fully remove codecov 8093 (LysandreJik)
- Add AzureML in integrations via dedicated callback 8062 (davidefiocco)
- Adjust setup so that all extras run on Windows 8102 (sgugger)
- Move installation instructions to the top 8106 (sgugger)
- [gh actions] run artifacts job always 8110 (stas00)
- [testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements 8107 (stas00)
- [DOC] Improve pipeline() docstrings for config and tokenizer 8123 (BramVanroy)
- Document the various LM Auto models 8118 (sgugger)
- Rename add_start_docstrings_to_callable 8120 (sgugger)
- Update CI cache 8126 (LysandreJik)
- Upgrade PyTorch Lightning to 1.0.2 7852 (SeanNaren)
- Document tokenizer_class in configurations 8152 (sgugger)
- Smarter prediction loop and no- -> no_ in console args 8151 (sgugger)
- Add a template for examples and apply it for mlm and plm examples 8153 (sgugger)
- [testing] distributed: correct subprocess output checking 8157 (stas00)
- Fix eval ref miss in Chinese WWM. 8115 (wlhgtc)
- [CI] Better reports 2 8163 (stas00)
- Fixing some warnings in DeBerta 8176 (Narsil)
- Ci test tf super slow 8007 (LysandreJik)
- Doc fixes and filter warning in wandb 8189 (sgugger)
- Finalize lm examples 8188 (sgugger)
- Replace swish with silu 8166 (TFUsers)
- Remove deprecated arguments from new run_clm 8197 (sgugger)
- Minor style improvements for the Flax BERT and RoBERTa examples 8178 (avital)
- Fix two bugs with --logging_first_step 8193 (abisee)
- [Bug fix] Fixed value for BlenderBot pad token 8205 (guillaume-be)
- Fix the behaviour of DefaultArgumentHandler (removing it). 8180 (Narsil)
- Fix ignore files behavior in doctests 8213 (bryant1410)
- Patch reports 8238 (LysandreJik)
- Fix bad import with PyTorch <= 1.4.1 8237 (sgugger)
- Fix TensorBoardCallback for older versions of PyTorch 8239 (sgugger)
- Add XLMProphetNetTokenizer to tokenization auto 8245 (LysandreJik)
- [EncoderDecoder] fix encoder decoder config model type bug 8243 (patrickvonplaten)
- [bart] 2 SinusoidalPositionalEmbedding fixes 8226 (stas00)
- [fix] Skip tatoeba tests if Tatoeba-Challenge not cloned 8260 (sshleifer)
- [FIX] TextGenerationPipeline is currently broken. 8256 (Narsil)
- Updated ConversationalPipeline to work with encoder-decoder models 8207 (guillaume-be)
- [distributed testing] forward the worker stderr to the parent process 8262 (stas00)
- [examples] minimal version requirement run-time check in PL 8133 (stas00)
- Clean Trainer tests and datasets dep 8268 (sgugger)
- improve documentation of training_args.py 8270 (PhilipMay)
- Data collator for token classification 8274 (sgugger)
- [CIs] Better reports everywhere 8275 (stas00)
- [blenderbot] regex fix 8282 (stas00)
- [Generate Test] fix greedy generate test 8293 (patrickvonplaten)
- Fix validation file loading in scripts 8298 (sgugger)
- Improve QA pipeline error handling 8286 (Narsil)
- Speedup doc build 8301 (sgugger)
- Fix path to old run_language_modeling.py script 8302 (mrm8488)
- Clean up data collators and datasets 8308 (sgugger)
- change TokenClassificationTask class methods to static methods 7902 (donchev7)
- Output global_attentions in Longformer models 7562 (gui11aume)
- Make Trainer evaluation handle dynamic seq_length 8336 (sgugger)
- Docs bart training ref 8330 (lvwerra)
- Some added tests for TokenClassificationArgumentHandler 8366 (Narsil)
- [All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs 8071 (ysgit)
- [TF generate] Cut encoder outptus to just last hidden states for now 8368 (patrickvonplaten)
- [make] rewrite modified_py_files in python to be cross-platform 8371 (stas00)
- Fix DataCollatorForWholeWordMask 8379 (cccntu)
- Fix DataCollatorForWholeWordMask again 8397 (cccntu)
- comet_ml init weirdness 8410 (stas00)
- updating tag for exbert viz 8408 (smanjil)
- Fix some tooling for windows 8359 (jplu)
- examples/docs: caveat that PL examples don't work on TPU 8309 (sshleifer)
- add evaluate doc - trainer.evaluate returns 'epoch' from training 8273 (PhilipMay)
- Bug fix for permutation language modelling 8409 (shngt)
- [fsmt tokenizer] support lowercase tokenizer 8389 (stas00)
- Bump tokenizers 8419 (sgugger)
- [fsmt convert script] fairseq broke chkpt data - fixing that 8377 (stas00)
- Deprecate old data/metrics functions 8420 (sgugger)
- [Tests] Add Common Test for Training + Fix a couple of bugs 8415 (patrickvonplaten)
- [docs] remove sshleifer from issue-template :( 8418 (sshleifer)
- Fix bart shape comment 8423 (sshleifer)
- [docs] [testing] gpu decorators table 8422 (stas00)
- Check all models are in an auto class 8425 (sgugger)
- [github CI] add a multi-gpu job for all example tests 8341 (stas00)
- Changing XLNet default from not using memories to 512 context size following paper 8417 (TevenLeScao)
- Patch token classification pipeline 8364 (LysandreJik)

3.4.0

Not secure
ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

ProphetNET

Two new models are released as part of the ProphetNet implementation: `ProphetNet` and `XLM-ProphetNet`.

ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.

XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.

The ProphetNet model was proposed in [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063), by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.

It was added to the library in PyTorch with the following checkpoints:

- `microsoft/xprophetnet-large-wiki100-cased-xglue-ntg`
- `microsoft/prophetnet-large-uncased`
- `microsoft/prophetnet-large-uncased-cnndm`
- `microsoft/xprophetnet-large-wiki100-cased`
- `microsoft/xprophetnet-large-wiki100-cased-xglue-qg`

Contributions:
- ProphetNet 7157 (qiweizhen, patrickvonplaten)

BlenderBot

Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.

The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.

It was added to the library in PyTorch with the following checkpoints:
- `facebook/blenderbot-90M`
- `facebook/blenderbot-3B`

Contributions:
- Blenderbot 7418 (sshleifer)


SqueezeBERT

The SqueezeBERT model was proposed in [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.

It was added to the library in PyTorch with the following checkpoints:

- `squeezebert/squeezebert-mnli`
- `squeezebert/squeezebert-uncased`
- `squeezebert/squeezebert-mnli-headless`

Contributions:
- SqueezeBERT architecture 7083 (forresti)
- Fix squeezebert docs 7587 (LysandreJik)

DeBERTa

The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It was added to the library in PyTorch with the following checkpoints:

- `microsoft/deberta-base`
- `microsoft/deberta-large`

Contributions:
- Add DeBERTa model 5929 (BigBird01)
- Fix DeBERTa integration tests 7729 (LysandreJik)

Both SentencePiece and Tokenizers are now optional libraries

Support for SentencePiece is now part of the `tokenizers` library! Thanks to this we now have near-full support of fast tokenizers in the library.

With this new feature, we slightly change the paradigm regarding installation:
- SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future
- Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.

- [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies 7659 (thomwolf)

The main `__init__` has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an `ImportError` will be raised at init (with instructions on how to install the missing dependency) 7537 (sgugger)

Improvements made to the `Trainer`
The `Trainer` API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new `TrainerCallback` class has been added to allow the user to easily customize the default training loop.

- Remove config assumption in Trainer 7464 (sgugger)
- Clean the Trainer state 7490 (sgugger)
- Small QOL improvements to TrainingArguments 7475 (sgugger)
- Allow nested tensors in predicted logits 7542 (sgugger)
- Trainer callbacks 7596 (sgugger)
- Add specific notebook ProgressCalback 7793 (sgugger)
- Small fixes to NotebookProgressCallback 7813 (sgugger)
- Add predict step accumulation 7767 (sgugger)
- Don't use `store_xxx` on optional bools 7786 (sgugger)

Seq2Seq Trainer
A child of `Trainer` specialized for training seq2seq models, from patil-suraj and sshleifer. Accessible through `examples/seq2seq/finetune_trainer.py`.
- example scripts at `examples/seq2seq/builtin_trainer/`
- same functionality as `examples/seq2seq/finetune.py`, but better TPU support.
- [examples/s2s] clean up finetune_trainer 7509 (patil-suraj)
- [s2s] trainer scripts: Remove --run_name, thanks sylvain! 7521 (sshleifer)
- [s2s] Adafactor support for builtin trainer 7522 (sshleifer)
- [s2s] add config params like Dropout in Seq2SeqTrainingArguments 7532 (patil-suraj)
- Distributed Trainer: 2 little fixes 7461 (sshleifer)
- [s2sTrainer] test + code cleanup 7467 (sshleifer)
- Seq2SeqDataset: avoid passing src_lang everywhere 7470 (amanpreet692)
- [s2strainer] fix eval dataset loading 7477 (patil-suraj)
- [pseudolabels] cleanup markdown table 7653 (sshleifer)

Distributed Generation
- You can run `model.generate` in pytorch on a large dataset and split the work across multiple GPUs, using `examples/seq2seq/run_distributed_eval.py`
- [s2s] release pseudolabel links and instructions 7639 (sshleifer)
- [s2s] Fix t5 warning for distributed eval 7487 (sshleifer)
- [s2s] fix kwargs style 7488 (sshleifer)
- [s2s] fix lockfile and peg distillation constants 7545 (sshleifer)
- [s2s] fix nltk pytest race condition with FileLock 7515 (sshleifer)

Notebooks

- Train T5 in Tensoflow 2 Community Notebook 7428 (HarrisDePerceptron)

General improvements and bugfixes

- remove codecov PR comments 7400 (sshleifer)
- Get a better error when check_copies fails 7457 (sgugger)
- Multi-GPU Testing setup 7453 (LysandreJik)
- Fix LXMERT with DataParallel 7471 (LysandreJik)
- Number of GPUs for multi-gpu 7472 (LysandreJik)
- Make transformers install check positive 7473 (FremyCompany)
- Alphabetize model lists 7478 (sgugger)
- Bump isort version. 7484 (sgugger)
- Add forgotten return_dict argument in the docs 7483 (sgugger)
- Enable pegasus fp16 by clamping large activations 7243 (sshleifer)
- Update LayoutLM doc 7388 (al31415)
- Report Tune metrics in final evaluation 7507 (krfricke)
- Fix Ray Tune progress_reporter kwarg 7508 (krfricke)
- [Seq2Seq] Fix a couple of bugs and clean examples 7474 (patrickvonplaten)
- [Attention Mask] Fix data type 7513 (patrickvonplaten)
- Fix seq2seq example test 7518 (sgugger)
- Remove labels from the RagModel example 7560 (sgugger)
- added script for fine-tuning roberta for sentiment analysis task 7505 (DhavalTaunk08)
- LayoutLM: add exception handling for bbox values 7452 (al31415)
- Cleanup documentation for BART, Marian, MBART and Pegasus 7523 (sgugger)
- Add Electra unexpected keys 7569 (LysandreJik)
- Fix tokenization in SQuAD for RoBERTa, Longformer, BART 7387 (tholor)
- docs(pretrained_models): fix num parameters 7575 (amineabdaoui)
- Update Code example according to deprecation of AutoModeWithLMHead 7555 (jshamg)
- Allow soft dependencies in the namespace with ImportErrors at use 7537 (sgugger)
- Fix post_init of some TrainingArguments 7525 (sgugger)
- Check and update model list in index.rst automatically 7527 (sgugger)
- Expand test to locate flakiness 7580 (sgugger)
- Custom TF weights loading 7422 (jplu)
- Documentation fixes 7585 (sgugger)
- Documentation framework toggle should stick 7586 (LysandreJik)
- Support T5 Distillation w/hidden state supervision 7599 (sshleifer)
- [makefile] check only .py files 7588 (stas00)
- [TF generation] Fix typo 7582 (SidJain1412)
- change return dicitonary for DataCollatorForNextSentencePrediction from masked_lm_labels to labels 7595 (gmihaila)
- Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch 7598 (AdrienDS)
- typo fix 7611 (agemagician)
- [bart] fix config.classif_dropout 7593 (sshleifer)
- [s2s] save first batch to json for debugging purposes 6810 (sshleifer)
- Add GPT2ForSequenceClassification based on DialogRPT 7501 (LysandreJik)
- Fix wrong reference name/filename in docstring of `SquadProcessor` 7616 (phiyodr)
- Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH 7610 (GabrielePicco)
- Add GPT2 to sequence classification auto model 7630 (LysandreJik)
- Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load 6935 (w4nderlust)
- Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer 7141 (thomwolf)
- Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) 7658 (thomwolf)
- Fix RobertaForCausalLM docs 7642 (LysandreJik)
- [s2s] configure lr_scheduler from command line 7641 (patil-suraj)
- [pseudo] Switch URLS to CDN 7661 (sshleifer)
- [s2s] Switch README urls to cdn 7670 (sshleifer)
- fix nn.DataParallel compatibility with PyTorch 1.5 7671 (guhur)
- Update XLM-RoBERTa pretrained model details 7669 (noahtren)
- Fix dataset cardinality 7678 (jplu)
- [pegasus] Faster tokenizer tests 7672 (stas00)
- Delete extra test file in repo root 7681 (sshleifer)
- Better links for models in README and doc index 7680 (sgugger)
- Import integration libraries first 7650 (dsblank)
- Fix title level in Blenderbot doc 7687 (sgugger)
- Fix flaky test in test_trainer 7689 (sgugger)
- Adds license information for default and distilbert models 7688 (ankane)
- Fix docstring in AutoModel class 7694 (al31415)
- [examples] bump pl=0.9.0 7053 (sshleifer)
- Corrected typo: maked → masked 7703 (miggymigz)
- fixed typo in warning line 207. 7718 (Berowne)
- Fix typo in all model docs 7714 (sgugger)
- Fix check for xla in PreTrainedModel.save_pretrained() 7699 (fteufel)
- Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural. 7696 (AndreaSottana)
- The input training data files (multiple files in glob format). 7717 (kfkelvinng)
- Fix trainer callback 7720 (cccntu)
- Fix tf text class 7724 (jplu)
- Fix 7731 7732 (LysandreJik)
- Fix 3 failing slow bart/blender tests 7652 (sshleifer)
- Add license info to nlptown/bert-base-multilingual-uncased-sentiment 7738 (alexcombessie)
- [marian] Automate Tatoeba-Challenge conversion 7709 (sshleifer)
- ElectraTokenizerFast 7754 (LysandreJik)
- Gpt1 for sequence classification 7683 (fmcurti)
- [Rag] Fix loading of pretrained Rag Tokenizer 7756 (patrickvonplaten)
- Do not softmax when num_labels==1 7726 (LysandreJik)
- Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 7742 (noamwies)
- fixed lots of typos. 7758 (NieTiger)
- Adding optional trial argument to model_init 7759 (madlag)
- Faster pegasus tokenization test with reduced data size 7762 (sshleifer)
- Fix bert position ids in DPR convert script 7776 (lhoestq)
- Add batch inferencing support for GPT2LMHeadModel 7552 (cccntu)
- fix examples/rag imports, tests 7712 (sshleifer)
- Fix TF savedmodel in Roberta 7795 (jplu)
- Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. 7728 (Narsil)
- Upgrading in pipelines TFAutoModelWithLMHead to new Causal/Masked/Seq2Seq LM classes 7730 (Narsil)
- fix wandb/comet problems 7830 (stas00)
- [utils/check_copies.py] fix DeprecationWarning 7834 (stas00)
- [DOC] Typo and fix the input of labels to `cross_entropy` 7841 (katarinaslama)
- [seq2seq] get_git_info fails gracefully 7843 (stas00)
- [Pipelines] Fix links to model lists 7826 (julien-c)
- Herbert polish model 7798 (rmroczkowski)
- [cleanup] assign todos, faster bart-cnn test 7835 (sshleifer)
- Remove masked_lm_labels from returned dictionary in DataCollatorForNextSentencePrediction 7818 (vblagoje)
- [testing] fix/hide warnings 7837 (stas00)
- Small fixes to HP search 7839 (sgugger)
- [testing] disable FutureWarning in examples tests 7842 (stas00)
- Fix missing reference titles in retrieval evaluation of RAG 7817 (lhoestq)
- [seq2seq testing] improve readability 7845 (stas00)
- [s2s testing] turn all to unittests, use auto-delete temp dirs 7859 (stas00)
- Fix Rag example docstring 7872 (patrickvonplaten)
- Remove duplicated mish activation function 7856 (Razcle)
- [tests] fix slow bart cnn test, faster marian tests 7888 (sshleifer)
- Fix small type hinting error 7820 (AndreaSottana)
- Add support to provide initial tokens to decoder of encoder-decoder type models 7577 (ayushtiku5)
- style: fix typo 7883 (rememberYou)
- [testing] remove USE_CUDA 7861 (stas00)
- [CIs] report slow tests add --durations=0 to some pytest jobs 7884 (stas00)
- style: fix typo in the README 7882 (rememberYou)
- [RAG] Propagating of n_docs as parameter to all RagModel's related functions 7891 (lalitpagaria)
- Trainer with Iterable Dataset 7858 (j-rossi-nl)
- Allow Custom Dataset in RAG Retriever 7763 (lhoestq)
- Modelling Encoder-Decoder | Error :- `decoder_config` used before intialisation 7903 (ayubSubhaniya)
- [Docstring] fix t5 training docstring 7911 (patrickvonplaten)
- Raise error when using AMP on non-CUDA device 7869 (BramVanroy)
- [EncoderDecoder] Fix Typo 7915 (patrickvonplaten)
- [testing] rename skip targets + docs 7863 (stas00)

3.3.1

Not secure
Fixes errors due to the name conflicts between the datasets library and local folder or modules named datasets.

3.3.0

Not secure
RAG

RAG Model

The RAG model is a retrieval-augmented generation model that can be leveraged for question-answering tasks using `RagTokenForGeneration` or `RagSequenceForGeneration` as proposed in [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.

It was added to the library in PyTorch with the following checkpoints:

- `facebook/rag-token-nq`
- `facebook/rag-sequence-nq`
- `facebook/rag-token-base`
- `facebook/rag-sequence-base`

Contributions:
- **RAG** 6813 (ola13)
- [RAG] Add `attention_mask` to RAG generate 7373 (patrickvonplaten)
- [RAG] Add missing doc and attention_mask to rag 7382 (patrickvonplaten)
- [Rag] Fix wrong usage of `num_beams` and `bos_token_id` in Rag Sequence generation 7386 (patrickvonplaten)
- [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests 7372 (lhoestq)
- [RAG] Remove dependency on `examples/seq2seq` from rag 7395 (ola13)
- [Rag] fix rag retriever save_pretrained method 7399 (patrickvonplaten)
- [RAG] Clean Rag readme in examples 7413 (ola13)
- [RAG] Model cards - clean cards 7420 (patrickvonplaten)
- Document RAG again 7377 (sgugger)

Bug fixes and improvements

- Mark big downloads slow 7325 (sgugger)
- [Bug Fix] The actual batch_size is inconsistent with the settings. 7235 (HuangLianzhe)
- Fixed results of SQuAD-FR evaluation 7313 (psorianom)
- [s2s] add supported architecures to MD 7252 (sshleifer)
- Add num workers cli arg 7322 (chadykamar)
- [s2s] add src_lang kwarg for distributed eval 7300 (sshleifer)
- [s2s] only save metrics.json from rank zero 7331 (sshleifer)
- [code quality] fix confused flake8 7309 (stas00)
- [testing] skip decorators: docs, tests, bugs 7334 (stas00)
- Fixed evaluation_strategy on epoch end bug 7340 (WissamAntoun)
- Models doc 7345 (sgugger)
- Ensure that integrations are imported before transformers or ml libs 7330 (dsblank)
- [Benchmarks] Change all args to from `no_...` to their positive form 7075 (fmcurti)
- Remove reference to args in XLA check 7344 (ZeroCool2u)
- wip: Code to add lang tags to marian model cards 6586 (sshleifer)
- Expand a bit the documentation doc 7350 (sgugger)
- Check decorator order 7326 (sgugger)
- Update modeling_tf_longformer.py 7359 (Line290)
- Updata tokenization_auto.py 6870 (hjptriplebee)
- Update the TF models to remove their interdependencies 7238 (jplu)
- Make PyTorch model files independent from each other 7352 (sgugger)
- Clean RAG docs and template docs 7348 (sgugger)
- Fixing case in which `Trainer` hung while saving model in distributed training 7365 (TevenLeScao)
- Formatter 7368 (LysandreJik)
- [seq2seq] make it easier to run the scripts 7274 (stas00)
- Remove mentions of RAG from the docs 7376 (sgugger)
- [fsmt] build/test scripts 7257 (stas00)
- [s2s] distributed eval allows num_return_sequences > 1 7254 (sshleifer)
- Seq2SeqTrainer 6769 (patil-suraj)
- modeling_bart: 3 small cleanups that dont change outputs 7381 (sshleifer)
- Check config type using `type` instead of `isinstance` 7363 (LysandreJik)
- [s2s, examples] minor doc changes 7385 (patil-suraj)
- Remove unhelpful bart warning 7391 (sshleifer)
- [code quality] new make target that combines style and quality targets 7310 (stas00)
- Speedup check_copies script 7394 (sgugger)
- Fix BartModel output documentation 7390 (sgugger)
- Fix FP16 and attention masks in FunnelTransformer 7374 (sgugger)
- [Longformer, Bert, Roberta, ...] Fix multi gpu training 7272 (patrickvonplaten)
- [s2s] add create student script 7290 (patil-suraj)
- [s2s] rougeLSum expects \n between sentences 7410 (sshleifer)
- [T5] allow config.decoder_layers to control decoer size 7409 (sshleifer)
- Flos fix 7384 (marrrcin)
- Catch PyTorch warning when saving/loading scheduler 7401 (sgugger)
- Pull request template 7392 (LysandreJik)
- Reorganize documentation navbar 7423 (sgugger)

3.2.0

Not secure
Bert Seq2Seq models, FSMT, Funnel Transformer, LXMERT

BERT Seq2seq models

The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using EncoderDecoderModel as proposed in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

It was added to the library in PyTorch with the following checkpoints:

- `google/roberta2roberta_L-24_bbc`
- `google/roberta2roberta_L-24_gigaword`
- `google/roberta2roberta_L-24_cnn_daily_mail`
- `google/roberta2roberta_L-24_discofuse`
- `google/roberta2roberta_L-24_wikisplit`
- `google/bert2bert_L-24_wmt_de_en`
- `google/bert2bert_L-24_wmt_en_de`

Contributions:

- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. 6594 (patrickvonplaten)

FSMT (FairSeq MachineTranslation)

FSMT (FairSeq MachineTranslation) models were introduced in [Facebook FAIR’s WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616) by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.

It was added to the library in PyTorch, with the following checkpoints:

- `facebook/wmt19-en-ru`
- `facebook/wmt19-en-de`
- `facebook/wmt19-ru-en`
- `facebook/wmt19-de-en`

Contributions:

- [ported model] FSMT (FairSeq MachineTranslation) 6940 (stas00)
- build/eval/gen-card scripts for fsmt 7155 (stas00)
- skip failing FSMT CUDA tests until investigated 7220 (stas00)
- [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test 7224 (stas00)
- [s2s] adjust finetune + test to work with fsmt 7263 (stas00)
- [fsmt] SinusoidalPositionalEmbedding no need to pass device 7292 (stas00)
- Adds FSMT to LM head AutoModel 7312 (LysandreJik)

LayoutLM

The LayoutLM model was proposed in [LayoutLM: Pre-training of Text and Layout for Document Image Understandin](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.

It was added to the library in PyTorch with the following checkpoints:

- `layoutlm-base-uncased`
- `layoutlm-large-uncased`

Contributions:

- Add LayoutLM Model 7064 (liminghao1630)
- Fixes for LayoutLM 7318 (sgugger)
Funnel Transformer

The Funnel Transformer model was proposed in the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236). It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.

It was added to the library in both PyTorch and TensorFlow, with the following checkpoints:

- `funnel-transformer/small`
- `funnel-transformer/small-base`
- `funnel-transformer/medium`
- `funnel-transformer/medium-base`
- `funnel-transformer/intermediate`
- `funnel-transformer/intermediate-base`
- `funnel-transformer/large`
- `funnel-transformer/large-base`
- `funnel-transformer/xlarge`
- `funnel-transformer/xlarge-base`

Contributions:

- Funnel transformer 6908 (sgugger)
- Add TF Funnel Transformer 7029 (sgugger)

LXMERT

The LXMERT model was proposed in [LXMERT: Learning Cross-Modality Encoder Representations from Transformers](https://arxiv.org/abs/1908.07490) by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders (one for the vision modality, one for the language modality, and then one to fuse both modalities) pre-trained using a combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.

It was added to the library in TensorFlow with the following checkpoints:

- `unc-nlp/lxmert-base-uncased`
- `unc-nlp/lxmert-vqa-uncased`
- `unc-nlp/lxmert-gqa-uncased`

Contributions

- Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models 5793 (eltoto1219)
- [LXMERT] Fix tests on gpu 6946 (patrickvonplaten)

New pipelines

The following pipeline was added to the library:

- [pipelines] Text2TextGenerationPipeline 6744 (patil-suraj)

Notebooks

The following community notebooks were contributed to the library:

- Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. 6986 (eltoto1219)
- [Community notebooks] Add notebook on fine-tuning GPT-2 Model with Trainer Class 7005 (philschmid)
- Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks 7255 (NadirEM)
- added multilabel text classification notebook using distilbert to community notebooks 7201 (DhavalTaunk08)

Encoder-decoder architectures

An additional encoder-decoder architecture was added:

- [EncoderDecoder] Add xlm-roberta to encoder decoder 6878 (patrickvonplaten)

Bug fixes and improvements

- TF Flaubert w/ pre-norm 6841 (LysandreJik)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task 6644 (HuangLianzhe)
- Fix in Adafactor docstrings 6845 (sgugger)
- Fix resuming training for Windows 6847 (sgugger)
- Only access loss tensor every logging_steps 6802 (jysohn23)
- Marian distill scripts + integration test 6799 (sshleifer)
- Add checkpointing to Ray Tune HPO 6747 (krfricke)
- Split hp search methods 6857 (sgugger)
- Update ONNX notebook to include section on quantization. 6831 (mfuntowicz)
- Fix marian slow test 6854 (sshleifer)
- [s2s] command line args for faster val steps 6833 (sshleifer)
- Bart can make decoder_input_ids from labels 6758 (sshleifer)
- add a final report to all pytest jobs 6861 (stas00)
- Logging doc 6852 (sgugger)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. 6875 (mfuntowicz)
- [Generate] Facilitate PyTorch generate using `ModelOutputs` 6735 (patrickvonplaten)
- Add cache_dir to save features TextDataset 6879 (jysohn23)
- [Docs, Examples] Fix QA example for PT 6890 (patrickvonplaten)
- Update modeling_bert.py 6897 (parthe)
- [Electra] fix warning for position ids 6884 (patrickvonplaten)
- minor docs grammar fixes 6889 (harrywang)
- Fix error class instantiation 6634 (tamuhey)
- Output attention takes an s 6903 (sgugger)
- [testing] fix ambiguous test 6898 (stas00)
- test_tf_common: remove un_used mixin class parameters 6866 (PuneethaPai)
- Template updates 6914 (sgugger)
- Changed link to the correct paper in the second paragraph 6905 (sengl)
- tweak tar command in readme 6919 (brettkoonce)
- [s2s]: script to convert pl checkpoints to hf checkpoints 6911 (sshleifer)
- [s2s] allow task_specific_params=summarization_xsum 6923 (sshleifer)
- move wandb/comet logger init to train() to allow parallel logging 6850 (krfricke)
- [s2s] use --eval_beams command line arg 6926 (sshleifer)
- [s2s] support early stopping based on loss, rather than rouge 6927 (sshleifer)
- Fix mixed precision issue in TF DistilBert 6915 (chiapas)
- [docstring] misc arg doc corrections 6932 (stas00)
- [s2s] distill: --normalize_hidden --supervise_forward 6834 (sshleifer)
- [s2s] run_eval.py parses generate_kwargs 6948 (sshleifer)
- [doc] remove the implied defaults to :obj:`None`, s/True/ :obj:`True/, etc. 6956 (stas00)
- [s2s] warn if --fp16 for torch 1.6 6977 (sshleifer)
- feat: allow prefix for any generative model 5885 (borisdayma)
- Trainer with grad accum 6930 (sgugger)
- Cannot index `None` 6984 (LysandreJik)
- [docstring] missing arg 6933 (stas00)
- [testing] add dependency: parametrize 6958 (stas00)
- Fixed the default number of attention heads in Reformer Configuration 6973 (tznurmin)
- [gen utils] missing else case 6980 (stas00)
- match CI's version of flake8 6941 (stas00)
- Conversion scripts shouldn't have relative imports 6991 (LysandreJik)
- Add missing arguments for BertWordPieceTokenizer 5810 (monologg)
- fixed trainer tr_loss memory leak 6999 (StuartMesham)
- Floating-point operations logging in trainer 6768 (TevenLeScao)
- Fixing FLOPS merge by checking if torch is available 7013 (LysandreJik)
- [Longformer] Fix longformer documentation 7016 (patrickvonplaten)
- pegasus.rst: fix expected output 7017 (sshleifer)
- adding TRANSFORMERS_VERBOSITY env var 6961 (stas00)
- [generation] consistently add eos tokens 6982 (stas00)
- [from_pretrained] Allow tokenizer_type ≠ model_type 6995 (julien-c)
- replace torch.triu with onnx compatible code 6929 (HenryDashwood)
- Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence 6677 (LysandreJik)
- add -y to bypass prompt for transformers-cli upload 7035 (stas00)
- Fix confusing warnings during TF2 import from PyTorch 6623 (jcrocholl)
- Albert pretrain datasets/ datacollator 6168 (yl-to)
- Fix template 7040 (LysandreJik)
- Small fixes in tf template 7044 (sgugger)
- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. 6594 (patrickvonplaten)
- fix to ensure that returned tensors after the tokenization is Long 7039 (GeetDsa)
- [BertGeneration] Correct Doc Title 7048 (patrickvonplaten)
- [BertGeneration, Docs] Fix another old name in docs 7050 (patrickvonplaten)
- [xlm tok] config dict: fix str into int to match definition 7034 (stas00)
- [s2s] --eval_max_generate_length 7018 (sshleifer)
- Fix CI with change of name of nlp 7054 (sgugger)
- [wip/s2s] DistributedSortishSampler 7056 (sshleifer)
- these tests require non-multigpu env 7059 (stas00)
- [BertGeneration] Clean naming 7068 (patrickvonplaten)
- Document the dependcy on datasets 7058 (sgugger)
- Automate the lists in auto-xxx docs 7061 (sgugger)
- Add tests and fix various bugs in ModelOutput 7073 (sgugger)
- Compute loss method 7074 (sgugger)
- [T5Tokenizer] remove prefix_tokens 7078 (patil-suraj)
- [s2s] run_eval supports --prefix clarg. 6953 (sshleifer)
- fix bug in pegasus converter 7094 (sshleifer)
- [s2s] two stage run_distributed_eval.py 7105 (sshleifer)
- Update xsum length penalty to better values 7107 (sshleifer)
- [s2s] distributed eval cleanup 7110 (sshleifer)
- [s2s distill] allow pegasus-12-12 7104 (sshleifer)
- Temporarily skip failing tests due to dependency change 7118 (LysandreJik)
- fix link to paper 7116 (btel)
- ignore FutureWarning in tests 7079 (stas00)
- fix deprecation warnings 7033 (stas00)
- [examples testing] restore code 7099 (stas00)
- Clean up autoclass doc 7081 (sgugger)
- Add Mirror Option for Downloads 6679 (JetRunner)
- [s2s] distributed eval in one command 7124 (sshleifer)
- [QOL] add signature for prepare_seq2seq_batch 7108 (sshleifer)
- Fix reproducible tests in Trainer 7119 (sgugger)
- [logging] remove no longer needed verbosity override 7100 (stas00)
- Fix TF Trainer loss calculation 6998 (chiapas)
- Add quotes to paths in MeCab arguments 7142 (polm)
- Multi predictions trainer 7126 (sgugger)
- fix ZeroDivisionError and epoch counting 7125 (chiapas)
- [EncoderDecoderModel] fix indentation error 7131 (patrickvonplaten)
- [docs] add testing documentation 7101 (stas00)
- Refactoring the TF activations functions 7150 (jplu)
- fix the warning message of overflowed sequence 7151 (xiye17)
- [doc] [testing] improve/expand the Parametrization section 7156 (stas00)
- Add empty random document case to DataCollatorForNextSentencePrediction 7161 (choidongyeon)
- [s2s run_eval] new features 7109 (stas00)
- use the correct add_start_docstrings 7174 (stas00)
- [s2s] distributed eval cleanup 7186 (sshleifer)
- remove duplicated code 7173 (stas00)
- remove deprecated flag 7171 (stas00)
- Transformer-XL: Remove unused parameters 7087 (RafaelWO)
- Trainer multi label 7191 (sgugger)
- Change to use relative imports in some files & Add python prompt symbols to example codes 7202 (soheeyang)
- [s2s] run_eval/run_eval_search tweaks 7192 (stas00)
- [s2s] dynamic batch size with --max_tokens_per_batch 7030 (sshleifer)
- [s2s] remove double assert 7223 (sshleifer)
- Add customized text to widget 7204 (mrm8488)
- Rewrites BERT in Flax to the new Linen API 7211 (marcvanzee)
- token-classification: update url of GermEval 2014 dataset 6571 (stefan-it)
- Fix a few countings (steps / epochs) in trainer_tf.py 7175 (chiapas)
- Add new pre-trained models BERTweet and PhoBERT 6129 (datquocnguyen)
- [s2s] distributed_eval.py saves better speed info 7242 (sshleifer)
- [testing doc] slow has to be last 7251 (stas00)
- examples/seq2seq/__init__.py mutates sys.path 7194 (stas00)
- [Bug fix] Fixed target_mapping preparation for XLNet (Pytorch) 7267 (guillaume-be)
- [example/glue] fix compute_metrics_fn for bart like models 7248 (patil-suraj)
- Disable missing weight warning for RobertaForMaskedLM/CamembertForMaskedLM 7282 (raphael0202)
- Fix 7284 7289 (sgugger)
- [s2s tests] fix test_run_eval_search 7297 (stas00)
- [s2s] s/alpha_loss_encoder/alpha_encoder_loss/ 7298 (stas00)
- [s2s] save hostname with repo info 7301 (sshleifer)
- Copy code from Bert to Roberta and add safeguard script 7219 (sgugger)
- Fix 7304 7305 (sgugger)
- Fix saving TF custom models 7291 (jplu)
- is_pretokenized -> is_split_into_words 7236 (sgugger)
- Add possibility to evaluate every epoch 7302 (sgugger)
- Support for Windows in check_copies 7316 (sgugger)
- Create an XLA parameter and fix the mixed precision 7311 (jplu)

3.1.0

Not secure
Pegasus, mBART, DPR, self-documented outputs and new pipelines

Pegasus

The Pegasus model from [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu, was added to the library in PyTorch.

Model implemented as a collaboration between Jingqing Zhang and sshleifer in 6340

- PegasusForConditionalGeneration (torch version) 6340
- add pegasus finetuning script 6811 [script](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune_pegasus_xsum.sh). (warning very slow)


DPR

The DPR model from [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih was added to the library in PyTorch.

- Add DPR model 5279 (lhoestq)
- Fix tests imports dpr 5576 (lhoestq)

DeeBERT

The DeeBERT model from [DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference](https://www.aclweb.org/anthology/2020.acl-main.204/) by Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin has been added to the `examples/` folder alongside its training script, in PyTorch.

- Add DeeBERT (entropy-based early exiting for *BERT) 5477 (Ji-Xin)

Self-documented outputs

As well as returning tuples, PyTorch and TensorFlow models now return a subclass of `ModelOutput` that is appropriate. A `ModelOutput` is a dataclass containing all model returns. This allows for easier inspection, and for self-documenting model outputs.

- Change model outputs types to self-document outputs 5438 (sgugger)
- Tf model outputs 6247 (sgugger)

Models return tuples by default, and return self-documented outputs if the `return_dict` configuration flag is set to `True` or if the `return_dict=True` keyword argument is passed to the forward/call method.

Summary of the behavior:
python
The new outputs are opt-in, you have to activate them explicitly with `return_dict=True`
Either at instantiation
model = BertForSequenceClassification.from_pretrained('bert-base-cased', return_dict=True)
Or when calling the model
output = model(**inputs, return_dict=True)

You can access the elements of the outputs with
(1) named attributes
loss = outputs.loss
logits = outputs.logits

(2) their names as strings like a dict
loss = outputs["loss"]
logits = outputs["logits"]

(3) their index as integers or slices in the pre-3.1.0 outputs tuples
loss = outputs[0]
logits = outputs[1]
loss, logits = outputs[:2]

One **breaking behavior** of these new outputs (which is the reason you have to opt-in to use these new outputs:
Iterating on the outputs now return the names (keys) instead of the values:
print([element for element in outputs])
>>> ['loss', 'logits']
Thus you cannot unpack the output like pre-3.1.0 (you get the string names instead of the values):
(But you can query a slice like indicated in (3) above)
loss_keys, logits_key = outputs


Encoder-Decoder framework
The encoder-decoder framework has been enhanced to allow more encoder decoder model combinations, *e.g.*:
Bert2Bert, Bert2GPT2, Roberta2Roberta, Longformer2Roberta, ....

- [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer 6411 (patrickvonplaten)
- [EncoderDecoder] Add Cross Attention for GPT2 6415 (patrickvonplaten)
- [EncoderDecoder] Add functionality to tie encoder decoder weights 6538 (patrickvonplaten)
- Multiple combinations of EncoderDecoder models have been fine-tuned and evaluated on CNN/Daily-Mail summarization: https://huggingface.co/models?search=cnn_dailymail-fp16 (patrickvonplaten)

TensorFlow as a first-class citizen

As we continue working towards having TensorFlow be a first-class citizen, we continually improve on our TensorFlow API and models.

- [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile 5395 (patrickvonplaten)
- [Benchmark] Add benchmarks for TF Training 5594 (patrickvonplaten)
Machine Translation


MarianMTModel

- [en-zh](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh?text=My+name+is+Wolfgang+and+I+live+in+Berlin) and **357** other checkpoints for machine translation were added from the Helsinki-NLP group's Tatoeba Project (sshleifer + jorgtied). There are now > 1300 supported pairs for machine translation.
- Marian converter updates 6342 (sshleifer)
- Marian distill scripts + integration test 6799 (sshleifer)

mBART

The mBART model from [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) was can now be accessed through `MBartForConditionalGeneration`.

- Add mbart-large-cc25, support translation finetuning 5129 (sshleifer)
- [mbart] prepare_translation_batch passes **kwargs to allow DeprecationWarning 5581 (sshleifer)
- MBartForConditionalGeneration 6441 (patil-suraj)
- [fix] mbart_en_ro_generate test now identical to fairseq 5731 (sshleifer)
- [Doc] explaining romanian postprocessing for MBART BLEU hacking 5943 (sshleifer)
- [test] partial coverage for train_mbart_enro_cc25.sh 5976 (sshleifer)
- MbartTokenizer: do not hardcode vocab size 5998 (sshleifer)
- MBART: support summarization tasks where max_src_len > max_tgt_len 6003 (sshleifer)
- Fix 6096: MBartTokenizer's mask token 6098 (sshleifer)
- [s2s] Document better mbart finetuning command 6229 (sshleifer)
- mBART Conversion script 6230 (sshleifer)
- [s2s] add BartTranslationDistiller for distilling mBART 6363 (sshleifer)
- [Doc] add more MBart and other doc 6490 (patil-suraj)

examples/seq2seq

- examples/seq2seq/finetune.py supports --task translation
- All sequence to sequence tokenizers (T5, Bart, Marian, Pegasus) expose a `prepare_seq2seq_batch` method that makes batches for sequence to sequence trianing.

PRs:

- Seq2SeqDataset uses linecache to save memory 5792 (Pradhy729)
- [examples/seq2seq]: add --label_smoothing option 5919 (sshleifer)
- seq2seq/run_eval.py can take decoder_start_token_id 5949 (sshleifer)
- [examples (seq2seq)] fix preparing decoder_input_ids for T5 5994 (patil-suraj)
- [s2s] add support for overriding config params 6149 (stas00)
- s2s: fix LR logging, remove some dead code. 6205 (sshleifer)
- [s2s] tiny QOL improvement: run_eval prints scores 6341 (sshleifer)
- [s2s] fix label_smoothed_nll_loss 6344 (patil-suraj)
- [s2s] fix --gpus clarg collision 6358 (sshleifer)
- [s2s] Script to save wmt data to disk 6403 (sshleifer)
- rename prepare_translation_batch -> prepare_seq2seq_batch 6103 (sshleifer)
- Mult rouge by 100: standard units 6359 (sshleifer)
- allow spaces in bash args with "$" 6521 (sshleifer)
- [seq2seq] MAX_LEN env var for MT commands 5837 (sshleifer)
- [seq2seq] distillation.py accepts trainer arguments 5865 (sshleifer)
- [s2s]Use prepare_translation_batch for Marian finetuning 6293 (sshleifer)
- [BartTokenizer] add prepare s2s batch 6212 (patil-suraj)
- [T5Tokenizer] add prepare_seq2seq_batch method 6122 (patil-suraj)
- [s2s] round runtime in run_eval 6798 (sshleifer)
- [s2s README] Add more dataset download instructions 6737 (sshleifer)
- [s2s] round bleu, rouge to 4 digits 6704 (sshleifer)
- [s2s] command line args for faster val steps 6833


New documentation

Several new documentation pages have been added and older documentation has been tweaked to be more accurate and understandable. An open in colab button has been added on the tutorial pages.

- Guide to fixed-length model perplexity evaluation 5449 (joeddav)
- Improvements to PretrainedConfig documentation 5642 (sgugger)
- Document model outputs 5673 (sgugger)
- docs(wandb): explain how to use W&B integration 5607 (borisdayma)
- Model utils doc 6005 (sgugger)
- ONNX documentation 5992 (mfuntowicz)
- Tokenizer documentation 6110 (sgugger)
- Pipeline documentation 6175 (sgugger)
- Encoder decoder config docs 6195 (afcruzs)
- Colab button 6389 (sgugger)
- Generation documentation 6470 (sgugger)
- Add custom datasets tutorial 6466 (joeddav)
- Logging documentation 6852 (sgugger)

Trainer updates

New additions to the `Trainer`

- Added data collator for permutation (XLNet) language modeling and related calls 5522 (shngt)
- Trainer support for iterabledataset 5834 (Pradhy729)
- Adding PaddingDataCollator 6442 (sgugger)
- Add hyperparameter search to Trainer 6576 (sgugger)
- [examples] Add trainer support for question-answering 4829 (patil-suraj)
- Adds comet_ml to the list of auto-experiment loggers 6176 (dsblank)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task 6644 (HuangLianzhe)

New models & model architectures

The following model architectures have been added to the library

- FlaubertForTokenClassification 5644 (stas00)
- TFXLMForTokenClassification 5614 (LysandreJik)
- TFXLMForMultipleChoice 5614 (LysandreJik)
- TFFlaubertForTokenClassification 5614 (LysandreJik)
- TFFlaubertForMultipleChoice 5614 (LysandreJik)
- TFElectraForSequenceClassification 6227 (jplu)
- TFElectraForMultipleChoice 6227 (jplu)
- TF Longformer 5764 (patrickvonplaten)
- CamembertForCausalLM 6577 (patil-suraj)

Regression testing on TPU & TPU CI

Thanks to zcain117 we now have access to TPU CI for the PyTorch/xla framework. This enables regression testing on the TPU aspects of the `Trainer`, and offers very simple regression testing on model training performance.

- Test XLA examples 5583
- Add setup for TPU CI to run every hour. 6219 (zcain117)
- Add missing docker arg for TPU CI. 6393 (zcain117)
- Get GKE logs via kubectl logs instead of gcloud logging read. 6446 (zcain117)

New pipelines

New pipelines have been added:

- Zero shot classification pipeline 5760 (joeddav)
- Addition of a DialoguePipeline 5516 (guillaume-be)
- Add targets arg to fill-mask pipeline 6239 (joeddav)

Community notebooks

- [Fine-tune Electra and interpret with Integrated Gradients](https://github.com/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb) #6321 (elsanns)
- Update ONNX notebook to include section on quantization. 6831 (mfuntowicz)

Centralized logging

Logging is now centralized. The library offers methods to handle the verbosity level of all loggers contained in the library. [Link to logging doc here]:

- Centralize logging 6434 (LysandreJik)

Bug fixes and improvements

- [Reformer] Adapt Reformer MaskedLM Attn mask 5560 (patrickvonplaten)
- Make T5 compatible with ONNX 5518 (abelriboulot)
- [Bart] enable test_torchscript, update test_tie_weights 5457 (sshleifer)
- [docs] fix model_doc links in model summary 5566 (patil-suraj)
- [Benchmark] Readme for benchmark 5363 (patrickvonplaten)
- Fix Inconsistent NER Grouping (Pipeline) 4987 (enzoampil)
- QA pipeline BART compatible 5496 (mfuntowicz)
- More explicit error when failing to tensorize overflowing tokens 5633 (LysandreJik)
- Should check that torch TPU is available 5636 (LysandreJik)
- Add forum link in the docs 5637 (sgugger)
- Fixed TextGenerationPipeline on torch + GPU 5629 (TevenLeScao)
- Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) 5632 (TevenLeScao)
- [squad] add version tag to squad cache 5669 (lazovich)
- Deprecate old past arguments 5671 (sgugger)
- Pipeline model type check 5679 (JetRunner)
- rename the functions to match the rest of the test convention 5692 (stas00)
- doc improvements 5688 (stas00)
- Fix Trainer in DataParallel setting 5685 (sgugger)
- [Longformer] fix longformer global attention output 5659 (patrickvonplaten)
- [Fix] github actions CI by reverting 5138 5686 (sshleifer)
- [Reformer classification head] Implement the reformer model classification head for text classification 5198 (as-stevens)
- Cleanup bart caching logic 5640 (sshleifer)
- [AutoModels] Fix config params handling of all PT and TF AutoModels 5665 (patrickvonplaten)
- [cleanup] T5 test, warnings 5761 (sshleifer)
- [fix] T5 ONNX test: model.to(torch_device) 5769 (mfuntowicz)
- [Benchmark] fix benchmark non standard model 5801 (patrickvonplaten)
- [Benchmark] Fix models without `architectures` param in config 5808 (patrickvonplaten)
- [Longformer] fix longformer slow-down 5811 (patrickvonplaten)
- [seq2seq] pack_dataset.py rewrites dataset in max_tokens format 5819 (sshleifer)
- [seq2seq] Don't copy self.source in sortishsampler 5818 (sshleifer)
- [cleanups] make Marian save as Marian 5830 (sshleifer)
- [Reformer] - Cache hidden states and buckets to speed up inference 5578 (patrickvonplaten)
- Lightning Updates for v0.8.5 5798 (nateraw)
- Update tokenizers to 0.8.1.rc to fix Mac OS X issues 5867 (sepal)
- Xlnet outputs 5883 (TevenLeScao)

- DataParallel fixes 5733 (stas00)
- [cleanup] squad processor 5868 (sshleifer)
- Improve doc of use_cache 5912 (sgugger)
- [Fix] seq2seq pack_dataset.py actually packs 5913 (sshleifer)
- Add AlbertForPretraining to doc 5914 (sgugger)
- DataParallel fix: multi gpu evaluation 5926 (csarron)
- Clarify arg class 5916 (sgugger)

- [CI] self-scheduled runner tests examples/ 5927 (sshleifer)
- Update doc to new model outputs 5946 (sgugger)
- [CI] Install examples/requirements.txt 5956 (sshleifer)
- Expose padding_strategy on squad processor to fix QA pipeline performance regression 5932 (mfuntowicz)
- [docs] Add integration test example to copy pasta template 5961 (sshleifer)
- Cleanup Trainer and expose customization points 5982 (sgugger)
- Avoid unnecessary warnings when loading pretrained model 5922 (sgugger)
- Ensure OpenAI GPT position_ids is correctly initialized and registered at init. 5773 (mfuntowicz)
- [CI] Don't test apex 6021 (sshleifer)
- add a summary report flag for run_examples on CI 6035 (stas00)
- don't complain about missing W&B when WANDB_DISABLED=true 6036 (stas00)
- Allow to set Adam beta1, beta2 in TrainingArgs 5592 (gonglinyuan)
- Fix the return documentation rendering for all model outputs 6022 (sgugger)

- Fix typo (model saving TF) 5734 (Colanim)
- Add new AutoModel classes in pipeline 6062 (patil-suraj)
- [pack_dataset] don't sort before packing, only pack train 5954 (sshleifer)
- CL util to convert models to fp16 before upload 5953 (sshleifer)
- Add fire to setup.cfg to make isort happy 6066 (sgugger)
- [fix] no warning for position_ids buffer 6063 (sshleifer)
- Pipelines should use tuples instead of namedtuples 6061 (LysandreJik)
- Moving
transformers package import statements to relative imports in some files 5796 (afcruzs)
- github issue template suggests who to tag 5790 (sshleifer)
- Make all data collators accept dict 6065 (sgugger)
- Add inference widget examples 5825 (clmnt)
- [s2s] Delete useless method, log tokens_per_batch 6081 (sshleifer)
- Logs should not be hidden behind a logger.info 6097 (LysandreJik)
- Fix zero-shot pipeline single seq output shape 6104 (joeddav)
- [fix] add bart to LM_MAPPING 6099 (sshleifer)
- [Fix] position_ids tests again 6100 (sshleifer)
- Fix deebert tests 6102 (sshleifer)
- Use FutureWarning to deprecate 6111 (sgugger)
- Added capability to quantize a model while exporting through ONNX. 6089 (mfuntowicz)
- XLNet PLM Readme 6121 (LysandreJik)
- Fix TF CTRL model naming 6134 (jplu)
- Use google style to document properties 6130 (sgugger)
- Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} 5614
- Rework TF trainer 6038 (jplu)
- Actually the extra_id are from 0-99 and not from 1-100 5967 (orena1)
- add another e.g. to avoid confusion 6055 (orena1)
- Tf trainer cleanup 6143 (sgugger)
- Switch from return_tuple to return_dict 6138 (sgugger)
- Fix FlauBERT GPU test 6142 (LysandreJik)
- Enable ONNX/ONNXRuntime optimizations through converter script 6131 (mfuntowicz)
- Add Pytorch Native AMP support in Trainer 6151 (prajjwal1)
- enable easy checkout switch 5645 (stas00)
- Replace mecab-python3 with fugashi for Japanese tokenization 6086 (polm)
- parse arguments from dict 4869 (patil-suraj)
- Harmonize both Trainers API 6157 (sgugger)
- Model output test 6155 (sgugger)
- [s2s] clean up + doc 6184 (stas00)
- Add script to convert BERT tf2.x checkpoint to PyTorch 5791 (mar-muel)
- Empty assert hunt 6056 (TevenLeScao)
- Fix saved model creation 5468 (jplu)
- Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. 5331 (jaymody)
- [DataCollatorForLanguageModeling] fix labels 6213 (patil-suraj)
- Fix _shift_right function in TFT5PreTrainedModel 6214 (maurice-g)
- Remove outdated BERT tips 6217 (JetRunner)
- run_hans label fix 6221 (VictorSanh)
- Make the order of additional special tokens deterministic 5704 (gonglinyuan)
- cleanup torch unittests 6196 (stas00)
- test_tokenization_common.py: Remove redundant coverage 6224 (sshleifer)
- [Reformer] fix reformer fp16 test 6237 (patrickvonplaten)
- [Reformer] Make random seed generator available on random seed and not on model device 6244 (patrickvonplaten)
- Update to match renamed attributes in fairseq master 5972 (LilianBordeau)
- [WIP] lightning_base: support --lr_scheduler with multiple possibilities 6232 (stas00)
- Trainer + wandb quality of life logging tweaks 6241 (TevenLeScao)
- Add strip_accents to basic BertTokenizer. 6280 (PhilipMay)
- Argument to set GPT2 inner dimension 6296 (TevenLeScao)
- [Reformer] fix default generators for pytorch < 1.6 6300 (patrickvonplaten)
- Remove redundant line in run_pl_glue.py 6305 (xujiaze13)
- [Fix] text-classification PL example 6027 (bhashithe)
- fix the shuffle agrument usage and the default 6307 (stas00)
- CI dependency wheel caching 6287 (LysandreJik)
- Patch GPU failures 6281 (LysandreJik)
- fix consistency CrossEntropyLoss in modeling_bart 6265 (idoh)
- Add a script to check all models are tested and documented 6298 (sgugger)
- Fix the tests for Electra 6284 (jplu)
- [examples] consistently use --gpus, instead of --n_gpu 6315 (stas00)

- refactor almost identical tests 6339 (stas00)
- Small docfile fixes 6328 (sgugger)
- Patch models 6326 (LysandreJik)
- Ci GitHub caching 6382 (LysandreJik)
- Fix links for open in colab 6391 (sgugger)
- [EncoderDecoderModel] add a `add_cross_attention` boolean to config 6377 (patrickvonplaten)

- Feed forward chunking 6024 (Pradhy729)
- add pl_glue example test 6034 (stas00)
- testing utils: capturing std streams context manager 6231 (stas00)
- Fix tokenizer saving and loading error 6026 (yobekiko)
- Warn if debug requested without TPU 6390 (dmlap)
- [Performance improvement] "Bad tokens ids" optimization 6064 (guillaume-be)
- pl version: examples/requirements.txt is single source of truth 6309 (stas00)
- [s2s] wmt download script use less ram 6405 (stas00)

- [pl] restore lr logging behavior for glue, ner examples 6314 (stas00)
- lr_schedulers: add get_polynomial_decay_schedule_with_warmup 6361 (stas00)
- [examples] add pytest dependency 6425 (sshleifer)
- [test] replace capsys with the more refined CaptureStderr/CaptureStdout 6422 (stas00)
- Fixes to make life easier with the nlp library 6423 (sgugger)
- Move prediction_loss_only to TrainingArguments 6426 (sgugger)
- Activate check on the CI 6427 (sgugger)
- cleanup tf unittests: part 2 6260 (stas00)
- Fix docs and bad word tokens generation_utils.py 6387 (ZhuBaohe)
- Test model outputs equivalence 6445 (LysandreJik)
- add LongformerTokenizerFast in AutoTokenizer 6463 (patil-suraj)
- add BartTokenizerFast in AutoTokenizer 6464 (patil-suraj)
- Add POS tagging and Phrase chunking token classification examples 6457 (vblagoje)

- Clean directory after script testing 6453 (JetRunner)
- Use hash to clean the test dirs 6475 (JetRunner)
- Sort unique_no_split_tokens to make it deterministic 6461 (lhoestq)
- Fix TPU Convergence bug 6488 (jysohn23)
- Support additional dictionaries for BERT Japanese tokenizers 6515 (singletongue)
- [doc] Summary of the models fixes 6511 (stas00)
- Remove deprecated assertEquals 6532 (JetRunner)
- [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() 6494 (stas00)
- [sched] polynomial_decay_schedule use default power=1.0 6473 (stas00)
- Fix flaky ONNX tests 6531 (mfuntowicz)

- [doc] make the text more readable, fix some typos, add some disambiguation 6508 (stas00)

- [doc] multiple corrections to "Summary of the tasks" 6509 (stas00)
- replace _ with __ rst links 6541 (stas00)
- Fixed label datatype for STS-B 6492 (amodaresi)
- fix incorrect codecov reports 6553 (stas00)
- [docs] Fix wrong newline in the middle of a paragraph 6573 (romainr)
- [docs] Fix number of 'ug' occurrences in tokenizer_summary 6574 (romainr)
- add BartConfig.force_bos_token_to_be_generated 6526 (sshleifer)
- Fix bart base test 6587 (sshleifer)
- Feed forward chunking others 6365 (Pradhy729)
- tf generation utils: remove unused kwargs 6591 (sshleifer)
- [BartTokenizerFast] add prepare_seq2seq_batch 6543 (patil-suraj)
- [doc] lighter 'make test' 6512 (stas00)
- [docs] Copy code button misses '...' prefixed code 6518 (romainr)
- removed redundant arg in prepare_inputs 6614 (prajjwal1)
- add intro to nlp lib & dataset links to custom datasets tutorial 6583 (joeddav)
- Add tests to Trainer 6605 (sgugger)
- TFTrainer dataset doc & fix evaluation bug 6618 (joeddav)
- Add tests/test_tokenization_reformer.py 6485 (D-Roberts)
- [Tests] fix attention masks in Tests 6621 (patrickvonplaten)
- XLNet Bug when training with apex 16-bit precision 6567 (johndolgov)
- Move threshold up for flaky test with Electra 6622 (sgugger)
- Regression test for pegasus bugfix 6606 (sshleifer)
- Trainer automatically drops unused columns in nlp datasets 6449 (sgugger)
- [Docs model summaries] Add pegasus to docs 6640 (patrickvonplaten)
- [Doc model summary] add MBart model summary 6649 (patil-suraj)
- Specify config filename in HfArgumentParser 6626 (jarednielsen)
- Don't reset the dataset type + plug for rm unused columns 6683 (sgugger)
- Fixed DataCollatorForLanguageModeling not accepting lists of lists 6685 (TevenLeScao)
- Update repo to isort v5 6686 (sgugger)
- Fix PL token classification examples 6682 (vblagoje)
- Lat fix for Ray HP search 6691 (sgugger)
- Create PULL_REQUEST_TEMPLATE.md 6660 (stas00)
- [doc] remove BartForConditionalGeneration.generate 6659 (stas00)
- Move unused args to kwargs 6694 (sgugger)
- [fixdoc] Add import to pegasus usage doc 6698 (sshleifer)
- Fix hyperparameter_search doc 6695 (sgugger)
- Remove hard-coded uses of float32 to fix mixed precision use 6648 (schmidek)
- Add DPR to models summary 6690 (lhoestq)
- Add typing.overload for convert_ids_tokens 6637 (tamuhey)
- Allow tests in examples to use cuda or fp16,if they are available 5512 (Joel-hanson)
- ci/gh/self-scheduled: add newline to make examples tests run even if src/ tests fail 6706 (sshleifer)
- Use separate tqdm progressbars 6696 (sgugger)
- More tests to Trainer 6699 (sgugger)
- Add tokenizer to Trainer 6689 (sgugger)
- tensor.nonzero() is deprecated in PyTorch 1.6 6715 (mfuntowicz)
- [Albert] Add position ids to allowed uninitialized weights 6719 (patrickvonplaten)
- Fix ONNX test_quantize unittest 6716 (mfuntowicz)
- [squad] make examples and dataset accessible from SquadDataset object 6710 (lazovich)
- Fix pegasus-xsum integration test 6726 (sshleifer)
- T5Tokenizer adds EOS token if not already added 5866 (sshleifer)
- Install nlp for github actions test 6728 (sgugger)
- [Torchscript] Fix docs 6740 (patrickvonplaten)
- Add "tie_word_embeddings" config param 6692 (patrickvonplaten)
- Fix tf boolean mask in graph mode 6741 (JayYip)
- Fix TF optimizer 6717 (jplu)
- [TF Longformer] Improve Speed for TF Longformer 6447 (patrickvonplaten)
- add __init__.py to utils 6754 (joeddav)
- [s2s] run_eval.py QOL improvements and cleanup 6746 (sshleifer)
- s2s distillation uses AutoModelForSeqToSeqLM 6761 (sshleifer)
- Add AdaFactor optimizer from fairseq 6722 (moscow25)
- Adds Adafactor to the docs and slightly fixes the formatting 6765 (LysandreJik)
- Fix the TF Trainer gradient accumulation and the TF NER example 6713 (jplu)
- Fix run_squad.py to work with BART 6756 (tomgrek)
- Add NLP install to self-scheduled CI 6767 (sshleifer)
- [testing] replace hardcoded paths to allow running tests from anywhere 6523 (stas00)
- [test schedulers] adjust to test the first step's reading 6429 (stas00)
- new Makefile target: docs 6510 (stas00)
- [transformers-cli] fix logger getter 6777 (stas00)
- PL: --adafactor option 6776 (sshleifer)
- [style] set the minimal required version for `black` 6784 (stas00)
- Transformer-XL: Improved tokenization with sacremoses 6322 (RafaelWO)
- prepare_seq2seq_batch makes labels/ decoder_input_ids made later. 6654 (sshleifer)
- t5 model should make decoder_attention_mask 6800 (sshleifer)
- [s2s] Test hub configs in self-scheduled CI 6809 (sshleifer)
- [bart] rename self-attention -> attention 6708 (sshleifer)
- [tests] fix typos in inputs 6818 (stas00)
- Fixed open in colab link 6825 (PandaWhoCodes)
- clarify shuffle 6312 (xujiaze13)
- TF Flaubert w/ pre-norm 6841 (LysandreJik)
- Fix resuming training for Windows 6847 (sgugger)
- Only access loss tensor every logging_steps 6802 (jysohn23)
- Add checkpointing to Ray Tune HPO 6747 (krfricke)
- Split hp search methods 6857 (sgugger)
- Fix marian slow test 6854 (sshleifer)
- Bart can make decoder_input_ids from labels 6758 (sshleifer)
- add a final report to all pytest jobs 6861 (stas00)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. 6875 (mfuntowicz)
- [Generate] Facilitate PyTorch generate using `ModelOutputs` 6735 (patrickvonplaten)

Page 23 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.