Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 24 of 30

3.0.2

Not secure
Tokenizer fixes

Fixes bugs introduced by v3.0.0 and v3.0.1 in tokenizers.

3.0.1

Not secure
Better backward-compatibility for tokenizers following v3.0.0 refactoring

Version v3.0.0, included a refactoring of the tokenizers' backend to allow a simpler and more flexible [user-facing API](https://huggingface.co/transformers/preprocessing.html).

This refactoring was conducted with a particular focus on keeping backward compatibility for the v2.X encoding, truncation and padding API but still led to two breaking changes that could have been avoided.

This patch aims to bring back better backward compatibility, by implementing the following updates:
- the `prepare_for_model` method is now publicly exposed again for both slow and fast tokenizers with an API compatible with both the v2.X truncation/padding API and [the v3.0 recommended API](https://huggingface.co/transformers/preprocessing.html).
- the truncation strategy now defaults again to `longest_first` instead of `first_only`.

Bug fixes and improvements:
- Better support for TransfoXL tokenizer when using TextGenerationPipeline https://github.com/huggingface/transformers/pull/5465 (TevenLeScao)
- Fix use of meme Transformer-XL generations https://github.com/huggingface/transformers/pull/4826 (tommccoy1)
- Fixing a bug in the NER pipeline which lead to discarding the last identified entity https://github.com/huggingface/transformers/pull/5439 (mfuntowicz and enzoampil)
- Better QAPipelines https://github.com/huggingface/transformers/pull/5429 (mfuntowicz)
- Add Question-Answering and MLM heads to the Reformer model https://github.com/huggingface/transformers/pull/5433 (patrickvonplaten)
- Refactoring the LongFormer https://github.com/huggingface/transformers/pull/5219 (patrickvonplaten)
- Various fixes on tokenizers and tests (sshleifer)
- Many improvements to the doc and tutorials (sgugger)
- Fix TensorFlow dataset generator in run_glue https://github.com/huggingface/transformers/pull/4881 (jplu)
- Update Bertabs example to work again https://github.com/huggingface/transformers/pull/5355 (MichaelJanz)
- Move GenerationMixin to separate file https://github.com/huggingface/transformers/pull/5254 (yjernite)

3.0.0

Not secure
New tokenizer API, TensorFlow improvements, enhanced documentation & tutorials

Breaking changes since `v2`

- In 4874 the language modeling BERT has been split in two: `BertForMaskedLM` and `BertLMHeadModel`. `BertForMaskedLM` therefore cannot do causal language modeling anymore, and cannot accept the `lm_labels` argument.
- The `Trainer` data collator is now a method instead of a class
- Directly setting a tokenizer special token attributes (e.g. `tokenizer.mask_token = '<mask>'` now only associate the token to the attribute of the tokenizer but doesn't add the token to the vocabulary if it is not in the vocabulary. Tokens are only added by using the `tokenizer.add_special_tokens()` and `tokenizer.add_tokens()` methods
- The `prepare_for_model` method was removed as part of the new tokenizer API.
- The truncation method is now `only_first` by default.

New Tokenizer API (n1t0, thomwolf, mfuntowicz)

The tokenizers has evolved quickly in version 2, with the addition of rust tokenizers. It now has a simpler and more flexible API aligned between Python (slow) and Rust (fast) tokenizers. This new API let you control truncation and padding deeper allowing things like dynamic padding or padding to a multiple of 8.

The redesigned API is explained in detail here 4510 and here: https://huggingface.co/transformers/master/preprocessing.html

Notable changes:

- it's now possible to truncate to the max input length of a model while padding the longest sequence in a batch
- padding and truncation are decoupled and easier to control
- it's possible to pad to a multiple of a predefined length, e.g. 8 which can give significant speeds up on recent NVIDIA GPU (V100)
- a generic wrapper using `tokenizer.__call__` can be used for all case (single sequence, pair of sequences to groups, batches, etc...)
- tokenizers now accept pre-tokenized inputs (when the input is already split in word strings e.g. for NER)
- All the Rust tokenizers are now fully tested like slow tokenizers
- A new class `AddedToken` can be used to have a more fine-grained control on how added tokens behave during tokenization. In particular the user can control (1) whether left and right spaces are removed around the token during tokenization (2) whether the token will be identified inside another word and (3) whether the token will be recognized in normalized forms (e.g. in lower case if the tokenizer uses lower-casing)
- Serialization issues where fixed
- Possiblity to create NumPy tensors when using `return_tensors` parameter on tokenizers.
- Introduced a new enum `TensorType` to map all the possible tensor backends we support: `TensorType.TENSORFLOW`, `TensorType.PYTORCH`, `TensorType.NUMPY`
- Tokenizers now accept `TensorType` enum on `encode(...)`, `encode_plus(...)`, `batch_encode_plus(...)` tokenizer method for `return_tensors` parameters.
- `BatchEncoding` new property `is_fast` indicates if the `BatchEncoding` comes from a Python (slow) tokenizer or a Rust (fast) tokenizer.
- Slow and Fast Tokenizers are now picklable. So is their output, the dict sub-class `BatchEncoding`.

Several PRs to make the API more stable have been made:

- [tokenizers] Fix 5081 and improve backward compatibility 5125 (thomwolf)
- Tokenizers API developments 5103 (thomwolf)
- Clearer error message in the use-case of 5169 (thomwolf)
- Add more tests on tokenizers serialization - fix bugs 5056 (thomwolf)
- [Tokenization] Fix 5181 - make 5155 more explicit - move back the default logging level in tests to WARNING 5252 (thomwolf)
- [tokenizers] Several small improvements and bug fixes 5287
- Add `pad_to_multiple_of` on tokenizers (reimport) 5054 (mfuntowicz)
- [tokenizers] Updates data processors, docstring, examples and model cards to the new API 5308

TensorFlow improvements (jplu, dzorlu, LysandreJik)

Very big release for TensorFlow!
- TensorFlow models can now compute the loss themselves, using the `TFPretrainedModel.compute_loss` method. 4530
- Can now resize token embeddings in TensorFlow 4351
- Cleaning TensorFlow models 5229

Enhanced documentation (sgugger)

We welcome sgugger as a team member in New York. He already introduced a lot of very cool documentation changes:

- Added a [model summary](https://huggingface.co/transformers/master/model_summary.html) #4789
- Expose classes used in documentation 4808
- Explain how to preview the docs in a PR 4795
- Clean documentation 4849
- Remove old doc page and add note about cache in installation 5027
- Fix all sphynx warnings 5068 (sgugger)
- Update pipeline examples to doctest syntax 5030
- Reorganize documentation 5064
- Update installation page and add contributing to the doc 5084
- Update glossary 5148
- Quick tour 5145
- Switch master/stable doc and add older releases 5193
- Add version control menu 5222
- Don't recreate old docs 5243
- Tokenization tutorial 5257
- Remove links for all docs 5280
- New model sharing tutorial 5323

Training & fine-tuning quickstart

- Our own joeddav added a training & fine-tuning quickstart to the documentation 5034!

MobileBERT

The MobileBERT from [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou, was added to the library for both PyTorch and TensorFlow.

A single checkpoint is added: `mobilebert-uncased` which is the `uncased_L-24_H-128_B-512_A-4_F-4_OPT` checkpoint converted to our API.

This model was first implemented in PyTorch by lonePatient, ported to the library by vshampor, then finalized and implemented in Tensorflow by LysandreJik.

Eli5 examples (yjernite) 4968

- The examples/eli5 folder contains training code for the dense retriever and to fine-tune a BART model, the jupyter notebook for the blog post, and the code for the live demo.

- The RetriBert model implements the dense passage retriever. It's basically a wrapper for two Bert models and projection matrices, but it does gradient checkpointing in a way that is very different from a concurrent PR and Yacine thought it would be easier to write its own class for now and see if we can merge into the BART code later.

Enhanced examples/seq2seq (sshleifer)

- the `examples/seq2seq` [folder](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md) is a combination of the old `examples/summarization` and `examples/translation` folders.
- Finetuning works well for summarization, more experiments needed for translation. Finetuning works on multi-gpu, saves rouge scores during validation, and provides `--freeze_encoder` and `--freeze_embeds` options. These options make finetuning BART 5x faster on the cnn/dailymail dataset.
- Distillbart code is added in distillation.py. It only supports summarization, for now.
- Evaluation works well for both summarization and translation.
- New weights and biases [shared task](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md#xsum-shared-task) for collaboration on the XSUM summarization task

Distilbart (sshleifer)
- Distilbart models are smaller versions of `bart-large-cnn` and `bart-large-xsum`. They can be loaded using `BartForConditionalGeneration.from_pretrained('sshleifer/distilbart-xsum-12-6')`, for example See this [tweet](https://twitter.com/sam_shleifer/status/1276160367853547522?s=20) for more info on available models and their speed/performance.
- Commands to reproduce are available in the `examples/seq2seq` [folder](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md)

BERT Loses Patience (JetRunner)

Add BERT Loses Patience (Patience-based Early Exit) based on the paper https://arxiv.org/abs/2006.04152 and the official implementation https://github.com/JetRunner/PABEE

Unifying `label` arguments (sgugger) 4722

- Deprecate any argument that's not `labels` (like `masked_lm_labels`, `lm_labels`, etc.) to `labels`.

NumPy type in tokenizers (mfuntowicz) 4585

Introduce a new tensor type for return_tensors on tokenizer for NumPy.

- As we're introducing more than two tensor backend alternatives I created an enum TensorType listing all the possible tensor we can create TensorType.TENSORFLOW, TensorType.PYTORCH, TensorType.NUMPY. This might help newcomers who don't know about "tf", "pt".
*Note: TensorType are compatible with previous "tf", "pt" and now "np" str to allow backward compatibility (+unittest)*

- Numpy is now a possible target when creating tensors. This is usefull for JAX.

Community notebooks

- Adding notebooks for Fine Tuning 4732 (abhimishra91):
- Multi-class classification: Using DistilBert
- Multi-label classification: Using Bert
- Summarization: Using T5 - Model Tracking with WandB
- [Speed up Fine-Tuning in Transformers with Dynamic Padding / Bucketing](https://github.com/ELS-RD/transformers-notebook/blob/master/Divide_Hugging_Face_Transformers_training_time_by_2_or_more.ipynb) #5195 (pommedeterresautee)
- [How to use Benchmarks](https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb) (patrickvonplaten) #5312

Benchmarks (patrickvonplaten)

The benchmark script was consolidated and some features were added:

Adds the functionality to measure the following functionalities for TF and PT (4912):

- Tensorflow:
- Inference: CPU, GPU, GPU + XLA, GPU + eager mode, CPU + eager mode, TPU
- PyTorch:
- Inference: CPU, CPU + torchscript, GPU, GPU + torchscript, GPU + mixed precision, Torch/XLA TPU
- Training: CPU, GPU, GPU + mixed precision, Torch/XLA TPU

- [Benchmark] Add encoder decoder to benchmark and clean labels 4810
- [Benchmark] add tpu and torchscipt for benchmark 4850
- [Benchmark] Extend Benchmark to all model type extensions 5241
- [Benchmarks] improve Example Plotter 5245

Hidden states, attentions and cache

Before v3.0.0, the way to handle attentions, model hidden states, and whether to use the cache in models that have it for sequential decoding was to specify an argument in the configuration. In version v3.0.0, while we do maintain that argument for backwards compatibility, we introduce a new way of handling these through the `forward` and `call` methods.

- Output attentions 4538 (Bharat123rox)
- Output hidden states 4978 (drjosephliu)
- Use cache 5194 (patrickvonplaten)

Revamped `AutoModel`s (patrickvonplaten)

The `AutoModelWithLMHead` encompasses all models with a language modeling head, not making the distinction between causal, masked and seq2seq models. Three new auto models are added:

- `AutoModelForCausalLM` for Autoregressive models
- `AutoModelForMaskedLM` for Autoencoding models
- `AutoModelForSeq2SeqCausalLM` for Sequence-to-sequence models with causal LM for the decoder

New model & tokenizer architectures

- XLMRobertaForQuestionAnswering 4855 (sgugger)
- ElectraForQuestionAnswering 4913 (patil-suraj)
- Add AlbertForMultipleChoice 4959 (sgugger)
- BartForQuestionAnswering 4908 (patil-suraj)
- BartTokenizerFast 4878 (patil-suraj)
- Add DistilBertForMultipleChoice 5032 (sgugger)
- ElectraForMultipleChoice 4954 (sgugger)

ONNX

- Fixed a bug causing invalid ordering of the inputs in the underlying ONNX IR.
- Increased logging to giv ethe user more information about the exported variables.

Bug fixes and improvements

- TFRobertaModelIntegrationTest requires tf 4726 (sshleifer)
- Cleanup glue for TPU 4621 (jysohn23)
- [Reformer] Improved memory if input is shorter than chunk length 4720 (patrickvonplaten)
- Pipelines: miscellanea of QoL improvements and small features 4632 (julien-c)
- Fix bug when changing the <EOS> token for generate 4745 (patrickvonplaten)
- never_split on slow tokenizers should not split 4723 (mfuntowicz)
- PretrainedModel.generate: remove unused kwargs 4761 (sshleifer)
- Codecov is now setup differently to have better insights into code coverage 4768 (LysandreJik)
- Don't access pad_token_id if there is no pad_token 4773 (sgugger)
- Removed deprecated use of Variable API from pplm example 4619 (prajjwal1)
- Add drop_last arg for data loader 4757 4925 (setu4993)
- No silent error when XLNet's `d_head` is already in the configuration 4747 (LysandreJik)
- MarianTokenizer: delete unused constants 4802 (sshleifer)
- NER: Add new WNUT’17 example 4681 (stefan-it)
- [EncoderDecoderConfig] automatically set decoder config to decoder 4809 (patrickvonplaten)
- Add matplotlib to known 3rd party dependencies 4800 (sshleifer)
- Pipelines test and new kwarg 4812 (sshleifer)
- Updated path "cd examples/text-generation/pplm" 4778 (Mr-Ruben)
- [marian tests] pass device to pipeline 4815 (sshleifer)
- Export PretrainedBartModel from __init__ 4819 (BramVanroy)
- Updates args in tf squad example. 4820 (daniel-shan)
- [Generate] beam search should generate without replacement (patrickvonplaten)
- TFTrainer: Align how the checkpoints are managed the same way than in the PyTorch trainer. 4831 (jplu)
- [Longformer] Remove redundant code 4839 (ZhuBaohe)
- [cleanup] consolidate some prune_heads logic 4799 (sshleifer)
- Fix the __getattr__ method in BatchEncoding 4772 (jplu)
- Consolidate summarization examples 4837 (aretius)
- Fix a bug in the initialization and serialization of TFRobertaClassificationHead 4884 (harkous)
- [examples] Cleanup summarization docs 4876 (sshleifer)
- run_pplm.py bug fix 4867 (songyouwei)
- Remove unused arguments in Multiple Choice example 4853 (sgugger)
- Deal with multiple choice in common tests 4886 (sgugger
- Fix the CI 4903 (sgugger)
- [All models] fix docs after adding output attentions to all forward functions 4909 (patrickvonplaten)
- Add more models to common tests 4910 (sgugger)
- [ctrl] fix pruning of MultiHeadAttention 4904 (aretius)
- Don't init TPU device twice 4916 (patrickvonplaten)
- Run a single wandb instance per TPU run 4851 (LysandreJik)
- check type before logging in trainer to ensure values are scalars 4883 (mgoldey)
- Split LMBert model in two 4874 (sgugger)
- Make multiple choice models work with input_embeds 4921 (sgugger)
- Fix resize_token_embeddings for Transformer-XL 4759 (RafaelWO)
- [mbart] Fix fp16 testing logic 4949 (sshleifer)
- Hans data with newer tokenizer API 4854 (sgugger)
- Fix parameter 'output_attentions' docstring 4976 (ZhuBaohe)
- Improve ONNX logging 4999 (mfuntowicz)
- NER: fix construction of input examples for RoBERTa 4943 (stefan-it)
- Possible fix to make AMP work with DDP in the trainer 4728 (BramVanroy)
- Make DataCollator a callable 5015 (sgugger)
- Increase pipeline support for ONNX export. 5005 (mfuntowicz)
- Fix importing transformers on Windows - SIGKILL not defined 4997 (mfuntowicz)
- TFTrainer: improve logging 4946 (borisdayma)
- Add position_ids in TFElectra models docstring 5021 (sgugger)
- [Bart] Question Answering Model is added to tests 5024 (patrickvonplaten)
- Ability to pickle/unpickle BatchEncoding pickle (reimport) 5039 (mfuntowicz)
- refactor(wandb): consolidate import 5044 (borisdayma)
- [cleanup] Hoist ModelTester objects to top level 4939 (aretius)
- Convert hans to Trainer 5025 (sgugger)
- Fix marian tokenizer save pretrained 5043 (sshleifer)
- [cleanup] examples test_run_squad uses tiny model 5059 (sshleifer)
- Add header and fix command for HANS 5082 (sgugger)
- [examples] SummarizationModule improvements 4951 (sshleifer)
- Some changes to simplify the generation function 5031 (yjernite)
- Make default_data_collator more flexible and deprecate old behavior 5060 (sgugger)
- [MarianTokenizer] Switch to sacremoses for punc normalization 5092 (sshleifer)
- [style] add pandas to setup.cfg 5093 (sshleifer)
- [ElectraForQuestionAnswering] fix qa example in doc 4929 (patil-suraj)
- Fixing TPU training by disabling wandb.watch gradients logging 4926 (patil-suraj)
- [docs] fix T5 training doc 5080 (patil-suraj)
- support local_files_only option for tf models 5116 (ogarin)
- [cleanup] generate_beam_search comments 5115 (sshleifer)
- [fix] Move _adjust_logits above postprocess to fix Marian.generate 5126 (sshleifer)
- Pin `sphinx-rtd-theme` 5128 (LysandreJik)
- Add missing arg in 02-transformers notebook 5085 (pri-ax)
- [cleanup] remove redundant code in SummarizationDataset 5119 (sshleifer)
- AutoTokenizer supports mbart-large-en-ro 5121(sshleifer)
- Fix in Reformer Config documentation 5138 (erickrf)
- [bart-mnli] Fix class flipping bug 5141 (sshleifer)
- [MobileBert] fix dropout 5150 (ZhuBaohe)
- SummarizationPipeline: init required task name 5086 (julien-c)
- [examples] fixes arguments for summarization finetune scripts 5157 (ieBoytsov)
- Fixing docs for Encoder Decoder Config 5171 (mikaelsouza)
- fix bart doc 5132 (fuzihaofzh)
- Added feature to move added tokens in vocabulary for Transformer-XL 4953 (RafaelWO)
- Add support for gradient checkpointing in BERT 4659 (ibeltagy)
- Fix for IndexError when Roberta Tokenizer is called on empty text 4209 (malteos)
- Add TF auto model to the docs + fix sphinx warnings (again) 5187 (sgugger)
- Have documentation fail on warning 5189 (LysandreJik)
- Cleaner warning when loading pretrained models 4557 (thomwolf)
- Upgrade examples to pl=0.8.1 5146 (sshleifer)
- [fix] mobilebert had wrong path, causing slow test failure 5205 (sshleifer)
- [fix] remove unused import 5206 (sshleifer)
- [Reformer] Axial Pos Emb Improve mem usage reformer 5209 (patrickvonplaten)
- [pl_examples] revert deletion of optimizer_step 5227 (sshleifer)
- [bart] add config.extra_pos_embeddings to facilitate reuse 5190 (sshleifer)
- Only put tensors on a device 5223 (sgugger)
- Fix PABEE division by zero error 5233 (JetRunner)
- Use the script in utils 5224 (sgugger)
- Delay decay schedule until the end of warmup 4940 (amodaresi)
- Replace pad_token with -100 for LM loss calculation 4718 (setu4993)
- examples/seq2seq supports translation 5202 (sshleifer)
- Fix convert_graph_to_onnx script 5230 (n1t0)
- Refactor Code samples; Test code samples 5036 (LysandreJik)
- [Generation] fix docs for decoder_input_ids 5306 (patrickvonplaten)
- [pipelines] Change summarization default to distilbart-cnn-12-6 5289 (sshleifer)
- Add BART-base modeling and configuration 5315 (JetRunner)
- CircleCI stores cleaner output at test_outputs.txt 5291 (sshleifer)
- [pl_examples] default warmup steps=0 5316 (sshleifer)

2.11.0

Not secure
Longformer

- Longformer (ibeltagy)
- Longformer for QA (patil-suraj + patrickvonplaten)
- Longformer fast tokenizer (patil-suraj)
- Longformer for sequence classification (patil-suraj)
- Longformer for token classification (patil-suraj)
- Longformer for Multiple Choice (patrickvonplaten)
- More user-friendly handling of global attention mask vs local attention mask (patrickvonplaten)
- Fix longformer attention mask type casting when using APEX (peskotivesgeroff)

New community notebooks!

- Long Sequence Modeling with Reformer (patrickvonplaten)
- Fine-tune BART for summarization (ohmeow)
- Fine-tune a pre-trained Transformer on anyone's tweets (borisdayma, lavanyashukla)
- A step-by-step guide to tracking hugging face model performance with wandb (jxmorris12, lavanyashukla)
- Fine-tune Longformer for QA (patil-suraj)
- Pretrain Longformer (ibeltagy)
- Fine-tune T5 for sentiment span extraction (enzoampil)

URLs to model weights are not hardcoded anymore (julien-c)

Archive maps were dictionaries linking pre-trained models to their S3 URLs. Since the arrival of the model hub, these have become obsolete.

⚠️ This PR is breaking for the following models: BART, Flaubert, bert-japanese, bert-base-finnish, bert-base-dutch. ⚠️
Those models now have to be instantiated with their full model id:

> "cl-tohoku/bert-base-japanese"
> "cl-tohoku/bert-base-japanese-whole-word-masking"
> "cl-tohoku/bert-base-japanese-char"
> "cl-tohoku/bert-base-japanese-char-whole-word-masking"
> "TurkuNLP/bert-base-finnish-cased-v1"
> "TurkuNLP/bert-base-finnish-uncased-v1"
> "wietsedv/bert-base-dutch-cased"
> "flaubert/flaubert_small_cased"
> "flaubert/flaubert_base_uncased"
> "flaubert/flaubert_base_cased"
> "flaubert/flaubert_large_cased"
>
> all variants of "facebook/bart"

Update: ⚠️ This PR is also breaking for ALBERT from Tensorflow. See issue 4806 for discussion and resolution ⚠️

Fixes and improvements

- Fix convert_token_type_ids_from_sequences for fast tokenizers (n1t0, 4503)
- Fixed the default tokenizer of the summarization pipeline (sshleifer, 4506)
- The `max_len` attribute is now more robust, and warns the user about deprecation (mfuntowicz, 4528)
- Added type hints to `modeling_utils.py` (bglearning, 3911)
- MMBT model now has `nn.Module` as a superclass (shoarora, 4533)
- Fixing tokenization of extra_id symbols in the T5 tokenizer (mansimov, 4353)
- Slow GPU tests run daily (julien-c, 4465)
- Removed PyTorch artifacts in TensorFlow XLNet implementation (ZhuBaohe, 4410)
- Fixed the T5 Cross Attention Position Bias (ZhuBaohe, 4499)
- The `transformers-cli` is now cross-platform (BramVanroy, 4131) + (patrickvonplaten, 4614)
- GPT-2, CTRL: Accept `input_ids` and `past` of variable length (patrickvonplaten, 4581)
- Added back `--do_lower_case` to SQuAD examples.
- Correct framework test requirement for language generation tests (sshleifer, 4616)
- Fix `add_special_tokens` on fast tokenizers (n1t0, 4531)
- MNLI & SST-2 bugs were fixed (stdcoutzyx, 4546)
- Fixed BERT example for NSP and multiple choice (siboehm, 3953)
- Encoder/decoder fix initialization and save/load bug (patrickvonplaten, 4680)
- Fix onnx export input names order (RensDimmendaal, 4641)
- Configuration: ensure that id2label always takes precedence over `num_labels` (julien-c, direct commit to `master`)
- Make docstring match argument (sgugger, 4711)
- Specify PyTorch versions for examples (LysandreJik, 4710)
- Override get_vocab for fast tokenizers (mfuntowicz, 4717)
- Tokenizer should not add special tokens for text generation (patrickvonplaten, 4686)

2.10.0

Not secure
Reformer (patrickvonplaten)

- Added a new model "Reformer": https://arxiv.org/abs/2001.04451 to the library. Original trax code: https://github.com/google/trax/tree/master/trax/models/reformer was translated to PyTorch.

- Reformer uses chunked attention and reversible layers to model sequences as long as 500,000 tokens.

- Reformer is currently available as a casual language model and will soon also be available as encoder only ("Bert"-like) model.

- Two pretrained weights are uploaded: https://huggingface.co/models?search=google%2Freformer

- https://huggingface.co/google/reformer-enwik8 is the first char lm in the library

Additional architectures

- The `ElectraForSequenceClassification` was added by liuzzi

Trainer Tweaks and fixes (LysandreJik, julien-c )

TPU (LysandreJik):
- Model saving, as well as optimizer and scheduler saving mid-training were hanging
- Fixed the optimizer weight updates

Trainer (julien-c)
- Fixed the `nn.DataParallel` support compatibility for PyTorch `v1.5.0`
- Distributed evaluation: SequentialDistributedSampler + gather all results
- Move model to correct device
- Map optimizer to correct device after loading from checkpoint (shaoyent)

QOL: Tokenization, Pipelines

- New method for all tokenizers: `tokenizer.decode_batch`, to decode an entire batch (sshleifer)
- the NER pipeline now returns entity groups (enzoampil)

ONNX Conversion script (mfuntowicz)

- Added a [conversion](https://github.com/huggingface/transformers/blob/master/src/transformers/convert_graph_to_onnx.py) script to convert both PyTorch/TensorFlow models to ONNX.
- Added a [notebook](https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb) explaining how it works

Community notebooks

We've started adding community notebooks to the repository. Three notebooks have made their way into our codebase:

- [Training T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) patil-suraj
- [Fine-tuning T5](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb)
- [Fine-tuning DialoGPT](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) ncoop57

Predict stage for GLUE task, easy submit to gluebenchmark.com

-Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (stdcoutzyx)

Fixes and improvements

- Support flake8 3.8 (julien-c)
- Tests are now faster thanks to using dummy smaller models (sshleifer)
- Fixed the eval loss in the trainer (patil-suraj)
- Fixed the `p_mask` in SQuAD pre-processing (LysandreJik)
- Github Actions pytorch test are no longer pinned to `torch==1.4.0` (mfuntowicz)
- Fixed the multiple-choice script with overflowing tokens (LysandreJik)
- Allow for `None` values in `GradientAccumulator` (jarednielsen, improved by jplu)
- MBart tokenizer saving/loading id was fixed (Mehrad0711)
- TF generation: Fix issue for batch output generation of different output length.(patrickvonplaten)
- Fixed the FP-16 support in the T5 model (patrickvonplaten)
- `run_language_modeling` fix: actually use the `overwrite_cache` argument (borisdayma)
- Better, version compatible way to get the learning rate in the trainer (rakeshchada)
- Fixed the slow tests that were failing on GPU (sshleifer, patrickvonplaten, LysandreJik)
- ONNX conversion tokenizer fix (RensDimmendaal)
- Correct TF formatting to exclude LayerNorms from weight decay (oliverastrand)
- Removed warning of deprecation (Colanim)
- fix no grad in second pruning in run_bertology (TobiasLee)

2.9.1

Not secure
Marian (sshleifer)
- A new model architecture, `MarianMTModel` with 1,008+ pretrained weights is available for machine translation in PyTorch.
- The corresponding `MarianTokenizer` uses a `prepare_translation_batch` method to prepare model inputs.
- All pretrained model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`
- See [docs](https://huggingface.co/transformers/model_doc/marian.html) for information on pretrained model discovery and naming, or find your language [here](https://huggingface.co/models?search=Helsinki-NLP)

AlbertForPreTraining (jarednielsen)

A new model architecture has been added: `AlbertForPreTraining` in both PyTorch and TensorFlow

TF 2.2 compatibility (mfuntowicz, jplu)

Changes have been made to both the TensorFlow scripts and our internals so that we are compatible with TensorFlow 2.2

TFTrainer now supports new tasks

- Multiple choice has been added to the TFTrainer (ViktorAlm)
- Question Answering has been added to the TFTrainer (jplu)

Fixes and improvements

- Fixed a bug with the tf generation pipeline (patrickvonplaten)
- Fixed the XLA spawn (julien-c)
- The sentiment analysis pipeline tokenizer was cased while the model was uncased (mfuntowicz)
- Albert was added to the conversion CLI (fgaim)
- CamemBERT's token ID generation from tokenizer were removed like RoBERTa, as the model does not use them (LysandreJik)
- Additional migration documentation was added (guoquan)
- GPT-2 can now be exported to ONNX (tianleiwu)
- Simplify cache vars and allow for TRANSFORMERS_CACHE env (BramVanroy)
- Remove hard-coded pad token id in distilbert and albert (monologg)
- BART tests were fixed on GPU (julien-c)
- Better wandb integration (vanpelt, borisdayma, julien-c)

Page 24 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.