Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 22 of 30

4.2.0

Not secure
LED from AllenAI (patrickvonplaten)

Four new models are released as part of the LED implementation: `LEDModel`, `LEDForConditionalGeneration`, `LEDForSequenceClassification`, `LEDForQuestionAnswering`, in PyTorch. The first two models have a TensorFlow version.

LED is the encoder-decoder variant of the Longformer model by allenai.

The LED model was proposed in [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led

Available notebooks:
- Evaluation: https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing
- Finetuning: https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing

Contributions:
- LED 9278 (patrickvonplaten)
- [LED Test] fix common inputs pt for flaky pt-tf led test 9459 (SBrandeis, patrickvonplaten)
- [TF Led] Fix flaky TF Led test 9513 (patrickvonplaten)

Generation Scores & other outputs (patrickvonplaten)

The PyTorch generation function now allows to return:

- `scores` - the logits generated at each step
- `attentions` - all attention weights at each generation step
- `hidden_states` - all hidden states at each generation step

by simply adding `return_dict_in_generate` to the config or as an input to `.generate()`

Tweet:
- https://twitter.com/SimonBrandeis/status/1346858472000937984

Notebooks for a better explanation:
- https://discuss.huggingface.co/t/announcement-generationoutputs-scores-attentions-and-hidden-states-now-available-as-outputs-to-generate/3094/2
- https://discuss.huggingface.co/t/generation-probabilities-how-to-compute-probabilities-of-output-scores-for-gpt2/3175

PR:
- Add flags to return scores, hidden states and / or attention weights in GenerationMixin 9150 (SBrandeis)

TensorFlow improvements

TensorFlow BERT-like model improvements (jplu)

The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.

- Improve BERT-like models performance with better self attention 9124 (jplu)

Better integration in TensorFlow Serving (jplu)

This version introduces a new API for TensorFlow saved models, which can now be exported with `model.save_pretrained("path", saved_model=True)` and easily loaded into a TensorFlow Serving environment.

- New serving 9419 (jplu)

DeepSpeed integration (stas00)

Initial support for [DeepSpeed](https://github.com/microsoft/DeepSpeed) to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see [this comment](https://github.com/huggingface/transformers/issues/8771#issuecomment-759248400)). Stay tuned for more details in the coming weeks!
- [trainer] deepspeed integration 9211 (stas00)

Model templates (patrickvonplaten)

The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the [README](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model).
- Model Templates for Seq2Seq 9251 (patrickvonplaten)
- [Seq2Seq Templates] Add embedding scale to templates 9342 (patrickvonplaten)
- [Seq2Seq Templates] Add forgotten imports to templates 9346 (patrickvonplaten)

Faster import (sgugger)

The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.

- Fast transformers import part 1 9441 (sgugger)
- Transformers fast import part 2 9446 (sgugger)
- Fast imports part 3 9474 (sgugger)

Documentation highlights (Qbiwan, NielsRogge)

Some models now have improved documentation. The `LayoutLM` model has seen a general overhaul in its documentation thanks to NielsRogge.

The tokenizer-only models `Bertweet`, `Herbert` and `Phobert` now have their own documentation pages thanks to Qbiwan.

- Improve LayoutLM 9476 (NielsRogge)
- Improve documentation coverage for Bertweet 9379 (Qbiwan)
- Improve documentation coverage for Herbert 9428 (Qbiwan)
- Improve documentation coverage for Phobert 9427 (Qbiwan)

Breaking changes

There are no breaking changes between the previous version and this one.
This will be the first version to require TensorFlow >= 2.3.

General improvements and bugfixes

- add tests for the new sharded ddp fairscale integration 9177 (stas00)
- Added TF CTRL Sequence Classification 9151 (spatil6)
- [trainer] apex fixes and tests 9180 (stas00)
- Fix link to old NER fine-tuning script 9182 (mrm8488)
- fixed not JSON serializable error in run_qa.py with fp16 9186 (WissamAntoun)
- [setup] correct transformers version format 9176 (stas00)
- Fix link to old SQUAD fine-tuning script 9181 (mrm8488)
- Add new run_swag example 9175 (sgugger)
- Add timing inside Trainer 9196 (sgugger)
- GPT-model attention heads pruning example 9189 (altsoph)
- [t5 doc] typos 9199 (stas00)
- [run_glue] add speed metrics 9198 (stas00)
- Added TF TransfoXL Sequence Classification 9169 (spatil6)
- [finetune trainer] better logging and help 9203 (stas00)
- [RAG] Add Ray implementation for distributed retrieval 9197 (amogkam)
- [T5] Fix warning for changed EncDec Attention Bias weight 9231 (patrickvonplaten)
- Improve BERT-like models performance with better self attention 9124 (jplu)
- Fix TF template 9234 (jplu)
- Fix beam search generation for GPT2 and T5 on model parallelism 9219 (TobiasNorlund)
- add base model classes to bart subclassed models 9230 (patil-suraj)
- [MPNet] Add slow to fast tokenizer converter 9233 (patrickvonplaten)
- Adding performer fine-tuning research exampke 9239 (TevenLeScao)
- Update the README of the text classification example 9237 (sgugger)
- [EncoderDecoder] Make tests more aggressive 9256 (patrickvonplaten)
- Fix script that check objects are documented 9259 (sgugger)
- Seq2seq trainer 9241 (sgugger)
- Fix link to old language modeling script 9254 (mrm8488)
- Fix link to bertabs/README.md 9255 (mrm8488)
- Fix TF BART for saved model creation 9252 (jplu)
- Add speed metrics to all example scripts + template 9260 (sgugger)
- Revert renaming in finetune_trainer 9262 (sgugger)
- Fix gpt2 document 9272 (xu-song)
- Fix param error 9273 (xu-song)
- [Seq2Seq Templates] Fix check_repo.py templates file 9277 (patrickvonplaten)
- Minor documentation revisions from copyediting 9266 (connorbrinton)
- Adapt to new name of `label_smoothing_factor` training arg 9282 (sgugger)
- Add caching mechanism to BERT, RoBERTa 9183 (patil-suraj)
- [Templates] Adapt Bert 9284 (patrickvonplaten)
- allow integer device for BatchEncoding 9271 (jethrokuan)
- Fix typo in file_utils.py 9289 (jungwhank)
- [bert_generation] enable cache by default 9296 (patil-suraj)
- Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids 9220 (ratthachat)
- fix typo in modeling_encoder_decoder.py 9297 (daniele-sartiano)
- Update tokenization_utils_base.py 9293 (BramVanroy)
- [Bart doc] Fix outdated statement 9299 (patrickvonplaten)
- add translation example 9303 (vasudevgupta7)
- [GPT2] Correct gradient checkpointing 9308 (patrickvonplaten)
- [Seq2SeqTrainer] Fix Typo 9320 (patrickvonplaten)
- [Seq2Seq Templates] Correct some TF-serving errors and add gradient checkpointing to PT by default. 9334 (patrickvonplaten)
- Fix TF T5 9301 (jplu)
- Fix TF TransfoXL 9302 (jplu)
- [prophetnet] wrong import 9349 (stas00)
- [apex.normalizations.FusedLayerNorm] torch.cuda.is_available() is redundant as apex handles that internally 9350 (stas00)
- Make sure to use return dict for the encoder call inside RagTokenForGeneration 9363 (dblakely)
- [Docs] `past_key_values` return a tuple of tuple as a default 9381 (patrickvonplaten)
- [docs] Fix TF base model examples: outputs.last_hidden_states -> state 9382 (ck37)
- Fix typos in README and bugs in RAG example code for end-to-end evaluation and finetuning 9355 (yoshitomo-matsubara)
- Simplify marian distillation script 9394 (sshleifer)
- Add utility function for retrieving locally cached models 8836 (cdpierse)
- Fix TF CTRL 9291 (jplu)
- Put back LXMert example 9401 (sgugger)
- Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert 9402 (dependabot[bot])
- Fix TF Flaubert 9292 (jplu)
- [trainer] parametrize default output_dir 9352 (stas00)
- Fix utils on Windows 9368 (jplu)
- Fix TF DPR 9283 (jplu)
- [Docs] Tokenizer Squad 2.0 example 9378 (patrickvonplaten)
- replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm 9386 (stas00)
- [test_model_parallelization] multiple fixes 9354 (stas00)
- Fix TF Longformer 9348 (jplu)
- [logging] autoflush 9385 (stas00)
- TF >= 2.3 cleaning 9369 (jplu)
- [trainer] --model_parallel hasn't been implemented for most models 9347 (stas00)
- Fix TF Funnel 9300 (jplu)
- Fix documentation links always pointing to master. 9217 (sugeeth14)
- [examples/text-classification] Fix a bug for using own regression dataset 9411 (forest1988)
- [trainer] group fp16 args together 9409 (stas00)
- [model parallel] add experimental warning 9412 (stas00)
- improve readme text to private models/versioning/api 9424 (clmnt)
- [PyTorch Bart] Split Bart into different models 9343 (patrickvonplaten)
- [docs] outline sharded ddp doc 9208 (stas00)
- [Refactor] Splitting pipelines.py into its own module. 9279 (Narsil)
- Fix link to Evaluate TAPAS Notebook 9414 (mrm8488)
- Fix link to Notebook to fine-tune TAPAS 9413 (mrm8488)
- Allow example to use a revision and work with private models 9407 (sgugger)
- [trainer] self.model_wrapped + _model_unwrap 9390 (stas00)
- Fix URLs to TAPAS notebooks 9435 (NielsRogge)
- Upgrade styler to better handle lists 9423 (sgugger)
- [Docs] Add useful links to model sharing 9431 (patrickvonplaten)
- Store transformers version info when saving the model 9421 (JetRunner)
- [GenerationOutputs] Fix GenerationOutputs Tests 9443 (patrickvonplaten)
- Remove nested lxmert 9440 (sgugger)
- [make fixup] a more reliable version of branching point discovery 9449 (stas00)
- Prophetnet optimization 9453 (guillaume-be)
- New serving 9419 (jplu)
- [Docs] Improve model sharing doc 9454 (patrickvonplaten)
- [TFGPT2] - Fix flaky past_key_values test 9460 (patrickvonplaten)
- Removing duplicated code for Translation,Summarization and Text2TextGeneration pipelines 9433 (Narsil)
- [README] Add new models 9465 (patrickvonplaten)
- [Generation] Fix bug for manual decoder_input_ids + warning message 9472 (patrickvonplaten)
- Makes HfArgumentParser compatible with Python 3.9 9479 (Tpt)
- Fix TF input for np.ndarray 9294 (jplu)
- Making Conversation possible to create directly a full conversation 9434 (Narsil)
- fix(wandb): fix config 9489 (borisdayma)
- Fixing tests. It seems master changed something in the warnings. 9483 (Narsil)
- Reformat the TF serving outputs 9482 (jplu)
- [ray] add maintainers for Ray / Tune 9499 (richardliaw)
- Fix template 9504 (jplu)
- Full rework of the TF input/output embeddings and bias resizing 9193 (jplu)
- Remove tolerance + drop_rows_to_fit by default 9507 (LysandreJik)
- Fix template 9512 (jplu)
- New Updated DistilGPT-2 Finetuning and Generation 9494 (tripathiaakash)
- Make doc styler detect lists on rst and better support for Windows 9488 (sgugger)
- Enable TruncationStrategy override for pipelines 9432 (Narsil)
- [doc] How To Request Support document stab 9288 (stas00)
- [trainer] remove `--model_parallel` 9451 (stas00)
- Fix cardinality 9505 (jplu)
- Make doc styler behave properly on Windows 9516 (sgugger)
- [trainer] round numbers in trainer state 9491 (stas00)
- [make docs] parallel build 9522 (stas00)
- [TFBart] Split TF-Bart 9497 (patrickvonplaten)
- [ProphetNet] Fix naming and wrong config 9514 (patrickvonplaten)
- Update 'Develop on Windows' guidelines 9519 (SBrandeis)
- Shouldn't stale issues/PRs with feature request label 9511 (LysandreJik)
- [Blenderbot] Fix Links 9532 (patrickvonplaten)
- [T5] enable T5 fp16 9487 (patil-suraj)
- LayoutLM Config 9539 (LysandreJik)
- Fix fill mask pipeline slow test using deprecated argument 9541 (LysandreJik)
- Refactor `prepare_seq2seq_batch` 9524 (sgugger)
- Use the right version of tokenizers 9550 (sgugger)
- fix BlenderbotSmallTokenizer 9538 (patil-suraj)
- Doc: Update pretrained_models wording 9545 (julien-c)
- Fix barthez tokenizer 9562 (LysandreJik)
- Fix classification script: enable dynamic padding with truncation 9554 (pashok3d)
- Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) 9557 (LSinev)
- Update run_glue for do_predict with local test data (9442) 9486 (forest1988)
- [CI] use correct deps for torchhub 9552 (stas00)
- Fix data parallelism in Trainer 9566 (sgugger)
- Fix slow tests v4.2.0 9561 (LysandreJik)

4.1.1

Not secure
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

TAPAS (NielsRogge)

Four new models are released as part of the TAPAS implementation: `TapasModel`, `TapasForQuestionAnswering`, `TapasForMaskedLM` and `TapasForSequenceClassification`, in PyTorch.

TAPAS is a question answering model, used to answer queries given a table. It is a multi-modal model, joining text for the query and tabular data.

The TAPAS model was proposed in [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://www.aclweb.org/anthology/2020.acl-main.398) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.

- Tapas v4 (tres) 9117 (NielsRogge)
- AutoModelForTableQuestionAnswering 9154 (LysandreJik)
- TableQuestionAnsweringPipeline 9145 (LysandreJik)

MPNet (StillKeepTry)

Six new models are released as part of the MPNet implementation: `MPNetModel`, `MPNetForMaskedLM`, `MPNetForSequenceClassification`, `MPNetForMultipleChoice`, `MPNetForTokenClassification`, `MPNetForQuestionAnswering`, in both PyTorch and TensorFlow.

MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.

The MPNet model was proposed in MPNet: [Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
- MPNet: Masked and Permuted Pre-training for Language Understanding 8971 (StillKeepTry)

Model parallel (alexorona)

Model parallelism is introduced, allowing users to load very large models on two or more GPUs by spreading the model layers over them. This can allow GPU training even for very large models.

- gpt2 and t5 parallel modeling 8696 (alexorona)
- Model parallel documentation 8741 (LysandreJik)
- Patch model parallel test 8825, 8920 (LysandreJik)

Conda release (LysandreJik)

Transformers welcome their first conda releases, with v4.0.0, v4.0.1 and v4.1.0. The conda packages are now officially maintained on the `huggingface` channel.

- Put Transformers on Conda 8918 (LysandreJik)

Multi-part uploads (julien-c)

For the first time, very large models can be uploaded to the model hub, by using multi-part uploads.

- transformers-cli: LFS multipart uploads (> 5GB) 8663 (julien-c)

New examples and reorganization (sgugger)

We introduced a refactored SQuAD example & notebook, which is faster and simpler than the previous scripts.

The example directory has been re-ordered as we introduce the separation between "examples", which are maintained examples showcasing how to do one specific task, and "research projects", which are bigger projects and maintained by the community.

- New squad example 8992 (sgugger)
- Reorganize examples 9010 (sgugger)

Introduction of fairscale with Sharded DDP (sgugger)

We introduce support for fariscale's ShardedDDP in the `Trainer`, allowing reduced memory usage when training models in a distributed fashion.

- Experimental support for fairscale ShardedDDP 9139 (sgugger)
- Fix gradient clipping for Sharded DDP 9168 (sgugger)

Barthez (moussaKam)

The BARThez model is a French variant of the BART model. We welcome its specific tokenizer to the library and multiple checkpoints to the modelhub.

- Add barthez model 8393 (moussaKam)

General improvements and bugfixes

- `disable_ngram_loss` fix for prophetnet 8554 (Zhylkaaa)
- Fix run_ner script 8664 (sgugger)
- [tokenizers] convert_to_tensors: don't reconvert when the type is already right 8283 (stas00)
- [examples/seq2seq] fix PL deprecation warning 8577 (stas00)
- Add sentencepiece to the CI and fix tests 8672 (sgugger)
- Alternative to globals() 8667 (sgugger)
- Update the bibtex with EMNLP demo 8678 (JetRunner)
- Document adam betas TrainingArguments 8688 (sgugger)
- Fix rag finetuning + add finetuning test 8585 (lhoestq)
- moved temperature warper before topP/topK warpers 8686 (theorm)
- Vectorize RepetitionPenaltyLogitsProcessor to improve performance 8598 (bdalal)
- [Generate Test] fix flaky ci 8694 (patrickvonplaten)
- Fix bug in x-attentions output for roberta and harden test to catch it 8660 (ysgit)
- Add pip install update to resolve import error in transformers notebook 8616 (jessicayung)
- Improve bert-japanese tokenizer handling 8659 (julien-c)
- Change default cache path 8734 (sgugger)
- [trainer] make generate work with multigpu 8716 (stas00)
- consistent ignore keys + make private 8737 (stas00)
- Fix max length in run_plm script 8738 (sgugger)
- Add early stopping callback to pytorch trainer 8581 (cbrochtrup)
- Support various BERT relative position embeddings (2nd) 8276 (zhiheng-huang)
- Fix slow tests v2 8746 (LysandreJik)
- MT5 should have an autotokenizer 8743 (LysandreJik)
- added instructions for syncing upstream master with forked master via PR 8745 (bdalal)
- fix rag index names in eval_rag.py example 8730 (lhoestq)
- [core] implement support for run-time dependency version checking 8645 (stas00)
- New TF model inputs 8602 (jplu)
- Big model table 8774 (sgugger)
- Attempt to get a better fix for QA 8768 (Narsil)
- Fix QA argument handler 8765 (LysandreJik)
- Return correct Bart hidden state tensors 8747 (joeddav)
- [XLNet] Fix mems behavior 8567 (patrickvonplaten)
- [s2s] finetune.py: specifying generation min_length 8478 (danyaljj)
- Revert "[s2s] finetune.py: specifying generation min_length" 8805 (patrickvonplaten)
- Fix PPLM 8779 (chutaklee)
- [s2s finetune trainer] potpurri of small fixes 8807 (stas00)
- [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes 8791 (KristianHolsheimer)
- [Flax test] Add require pytorch to flix flax test 8816 (patrickvonplaten)
- Fix dpr<>bart config for RAG 8808 (patrickvonplaten)
- Extend typing to path-like objects in `PretrainedConfig` and `PreTrainedModel` 8770 (gcompagnoni)
- Fix setup.py on Windows 8798 (jplu)
- BART & FSMT: fix decoder not returning hidden states from the last layer 8597 (maksym-del)
- suggest a numerical limit of 50MB for determining slow 8824 (stas00)
- [MT5] Add use_cache to config 8832 (patrickvonplaten)
- [Pegasus] Refactor Tokenizer 8731 (patrickvonplaten)
- [CI] implement job skipping for doc-only PRs 8826 (stas00)
- Migration guide from v3.x to v4.x 8763 (LysandreJik)
- Add T5 Encoder for Feature Extraction 8717 (agemagician)
- token-classification: use is_world_process_zero instead of is_world_master() 8828 (stefan-it)
- Correct docstring. 8845 (Fraser-Greenlee)
- Add a direct link to the big table 8850 (sgugger)
- Use model.from_pretrained for DataParallel also 8795 (shaie)
- Remove deprecated `evalutate_during_training` 8852 (sgugger)
- Attempt to fix Flax CI error(s) 8829 (mfuntowicz)
- NerPipeline (TokenClassification) now outputs offsets of words 8781 (Narsil)
- [s2s trainer] fix DP mode 8823 (stas00)
- Ctrl for sequence classification 8812 (elk-cloner)
- Fix docstring for language code in mBart 8848 (RQuispeC)
- 2 typos in modeling_rag.py 8676 (ratthachat)
- Make the big table creation/check platform independent 8856 (sgugger)
- Prevent BatchEncoding from blindly passing casts down to the tensors it contains 8860 (Craigacp)
- Better warning when loading a tokenizer with AutoTokenizer w/o Sneten… 8881 (LysandreJik)
- [CI] skip docs-only jobs take 2 8853 (stas00)
- Better support for resuming training 8878 (sgugger)
- Add a `parallel_mode` property to TrainingArguments 8877 (sgugger)
- [trainer] start using training_args.parallel_mode 8882 (stas00)
- [ci] skip doc jobs take 3 8885 (stas00)
- Transfoxl seq classification 8868 (spatil6)
- Warning about too long input for fast tokenizers too 8799 (Narsil)
- [trainer] improve code readability 8903 (stas00)
- [PyTorch] Refactor Resize Token Embeddings 8880 (patrickvonplaten)
- Don't warn that models aren't available if Flax is available. 8841 (skye)
- Avoid erasing the attention mask when double padding 8915 (sgugger)
- Fix move when the two cache folders exist 8917 (sgugger)
- Tweak wording + Add badge w/ number of models on the hub 8914 (julien-c)
- [s2s finetune_trainer] add instructions for distributed training 8884 (stas00)
- Better booleans handling in the TF models 8777 (jplu)
- Fix TF T5 only encoder model with booleans 8925 (LysandreJik)
- [ci] skip doc jobs - circleCI is not reliable - disable skip for now 8926 (stas00)
- [seq2seq] document the caveat of leaky native amp 8930 (stas00)
- Don't pass in token_type_ids to BART for GLUE 8929 (ethanjperez)
- Fix typo for `modeling_bert` import resulting in ImportError 8931 (machelreid)
- Fix QA pipeline on Windows 8947 (sgugger)
- Add TFGPT2ForSequenceClassification based on DialogRPT 8714 (spatil6)
- Remove sourcerer 8965 (clmnt)
- Use word_ids to get labels in run_ner 8962 (sgugger)
- Small fix to the run clm script 8973 (sgugger)
- Update quicktour docs to showcase the use of truncation 8975 (navjotts)
- Copyright 8970 (sgugger)
- Check table as independent script 8976 (LysandreJik)
- [training] SAVE_STATE_WARNING was removed in pytorch 8979 (stas00)
- Optional layers 8961 (jplu)
- Make `ModelOutput` pickle-able 8989 (sgugger)
- Fix interaction of return_token_type_ids and add_special_tokens 8854 (LysandreJik)
- Removed unused `encoder_hidden_states` and `encoder_attention_mask` 8972 (guillaume-be)
- Checking output format + check raises ValueError 8986 (Narsil)
- Templates overhaul 1 8993 (LysandreJik)
- Diverse beam search 2 9006 (patrickvonplaten)
- Fix link to stable version in the doc navbar 9007 (sgugger)
- Remove use of deprecated method in Trainer HP search 8996 (sgugger)
- Add the code_search_net datasets tag to CodeBERTa model cards 9005 (SBrandeis)
- fixes 8968 9009 (cronoik)
- Flax Masked Language Modeling training example 8728 (mfuntowicz)
- [Bart] Refactor - fix issues, consistency with the library, naming 8900 (patrickvonplaten)
- [wip] [ci] doc-job-skip take 4 dry-run 8980 (stas00)
- Fix typo in modeling_tf_bart 9020 (astariul-colanim)
- Fix documention of book in LayoutLM 9017 (sgugger)
- MPNet copyright files 9015 (sgugger)
- Enforce all objects in the main init are documented 9014 (sgugger)
- Refactor FLAX tests 9034 (sgugger)
- Remove value error 8985 (jplu)
- Change nn.dropout to layer.Dropout in TFBart 9047 (astariul-colanim)
- update tatoeba workflow 9051 (patil-suraj)
- Fix PreTrainedTokenizer.pad when first inputs are empty 9018 (sgugger)
- Remove docs only check 9065 (LysandreJik)
- Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/movement-pruning/lxmert 9062 (dependabot[bot])
- Make ProphetNetModel really compatible with EncoderDecoder 9033 (patrickvonplaten)
- Fix min_null_pred in the run_qa script 9067 (sgugger)
- [model_cards] Migrate cards from this repo to model repos on huggingface.co 9013 (julien-c)
- Fix embeddings resizing in TF models 8657 (jplu)
- Patch *ForCausalLM model with TF resize_token_embeddings 9092 (LysandreJik)
- [RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers 9098 (patrickvonplaten)
- Fix variable name in TrainingArguments docstring 9096 (navjotts)
- Fix a broken link in documentation 9101 (SBrandeis)
- [CI doc] safely testing experimental CI features 9070 (stas00)
- Add parallelization support for T5EncoderModel 9082 (agemagician)
- Fix T5 and BART for TF 9063 (jplu)
- [finetune_trainer] enhancements and fixes 9042 (stas00)
- Fix a bug in eval_batch_retrieval of eval_rag.py 9089 (yoshitomo-matsubara)
- Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks 9076 (lewtun)
- [doc] pytorch native amp leak fix landed in 1.7.1 9115 (stas00)
- Fix stack overflow 9114 (LysandreJik)
- Fix T5 model parallel test 9107 (LysandreJik)
- Fix tf2.4 9120 (jplu)
- Added TF OpenAi GPT1 Sequence Classification 9105 (spatil6)
- [TF Bart] Refactor TFBart 9029 (patrickvonplaten)
- Fix typo in trainer_tf.py 9132 (luckynozomi)
- [Bart] fix bart loss masking 9131 (patrickvonplaten)
- [Bart] Correct wrong order in shift token to right in Bart 9134 (patrickvonplaten)
- Fix Bart Shift 9135 (patrickvonplaten)
- Fix TF Transfo XL 9129 (jplu)
- [Examples] Add automatic dataset splitting in language-modeling examples 9133 (TevenLeScao)
- Fix T5 Encoder model parallel tests 9140 (LysandreJik)
- Add possibility to switch between APEX and AMP in Trainer 9137 (sgugger)
- [Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init 9054 (patrickvonplaten)
- DistilBertForSequenceClassification 9148 (AndreaSottana)
- Support for private models from huggingface.co 9141 (julien-c)
- Update notebook table and transformers intro notebook 9136 (sgugger)
- Add message to documentation that longformer doesn't support token_type_ids 9152 (HHousen)

4.0.1

Not secure
This patch releases introduces:

- A better error message when trying to instantiate a SentencePiece-based tokenizer without having SentencePiece installed. 8881
- Fixes an incorrect attribute in the trainer. 8996

4.0.0

Not secure

4.0.0rc1

Not secure
Breaking changes since v3.x

Version v4.0.0 introduces several breaking changes that were necessary.

1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.

The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.

How to obtain the same behavior as v3.x in v4.x

- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:

In version `v3.x`:
py
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx")

to obtain the same in version `v4.x`:
py
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)


2. SentencePiece is removed from the required dependencies

The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation.

This includes the **slow** versions of:
- `XLNetTokenizer`
- `AlbertTokenizer`
- `CamembertTokenizer`
- `MBartTokenizer`
- `PegasusTokenizer`
- `T5Tokenizer`
- `ReformerTokenizer`
- `XLMRobertaTokenizer`

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally:

In version `v3.x`:
bash
pip install transformers

to obtain the same in version `v4.x`:
bash
pip install transformers[sentencepiece]

or
bash
pip install transformers sentencepiece

3. The architecture of the repo has been updated so that each model resides in its folder

The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.

This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.

In version `v3.x`:
bash
from transformers.modeling_bert import BertLayer

to obtain the same in version `v4.x`:
bash
from transformers.models.bert.modeling_bert import BertLayer


4. Switching the `return_dict` argument to `True` by default

The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of named-tuples-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.

This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass.

In version `v3.x`:
bash
outputs = model(**inputs)

to obtain the same in version `v4.x`:
bash
outputs = model(**inputs, return_dict=False)


5. Removed some deprecated attributes

Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in https://github.com/huggingface/transformers/pull/8604.

Model Templates

Version 4.0.0 will be the first to include the experimental feature of model templates. These model templates aim to facilitate the addition of new models to the library by doing most of the work: generating the model/configuration/tokenization/test files that fit the API, with respect to the choice the user has made in terms of naming and functionality.

This release includes a model template for the encoder model (similar to the BERT architecture). Generating a model using the template will generate the files, put them at the appropriate location, reference them throughout the code-base, and generate a working test suite. The user should then only modify the files to their liking, rather than creating the model from scratch.

Feedback welcome, get started from the README [here](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model).

- Model templates encoder only 8509 (LysandreJik)

New model additions

mT5 and T5 version 1.1 (patrickvonplaten )

The T5v1.1 is an improved version of the original T5 model, see here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md

The multilingual T5 model (mT5) was presented in https://arxiv.org/abs/2010.11934 and is based on the T5v1.1 architecture.

Multiple pre-trained checkpoints have been added to the library:
- t5v1_1: https://huggingface.co/models?search=t5-v1_1
- mT5: https://huggingface.co/models?search=mt5

Relevant pull requests:
- T5 & mT5 8552 (patrickvonplaten)
- [MT5] More docs 8589 (patrickvonplaten)
- Fix init for MT5 8591 (sgugger)

TF DPR

The DPR model has been added in TensorFlow to match its PyTorch counterpart by ratthachat

- Add TFDPR 8203 (ratthachat)

TF Longformer

Additional heads have been added to the TensorFlow Longformer implementation: SequenceClassification, MultipleChoice and TokenClassification

- Tf longformer for sequence classification 8231 (elk-cloner)

Bug fixes and improvements

- [s2s/distill] hparams.tokenizer_name = hparams.teacher 8382 (ShichaoSun)
- [examples] better PL version check 8429 (stas00)
- Question template 8440 (sgugger)
- [docs] improve bart/marian/mBART/pegasus docs 8421 (sshleifer)
- Add auto next sentence prediction 8432 (jplu)
- Windows dev section in the contributing file 8436 (jplu)
- [testing utils] get_auto_remove_tmp_dir more intuitive behavior 8401 (stas00)
- Add missing import 8444 (jplu)
- [T5 Tokenizer] Fix t5 special tokens 8435 (patrickvonplaten)
- using multi_gpu consistently 8446 (stas00)
- Add missing tasks to `pipeline` docstring 8428 (bryant1410)
- [No merge] TF integration testing 7621 (LysandreJik)
- [T5Tokenizer] fix t5 token type ids 8437 (patrickvonplaten)
- Bug fix for apply_chunking_to_forward chunking dimension check 8391 (pedrocolon93)
- Fix TF Longformer 8460 (jplu)
- Add next sentence prediction loss computation 8462 (jplu)
- Fix TF next sentence output 8466 (jplu)
- Example NER script predicts on tokenized dataset 8468 (sarnoult)
- Replaced unnecessary iadd operations on lists in tokenization_utils.py with proper list methods 8433 (bombs-kim)
- Flax/Jax documentation 8331 (mfuntowicz)
- [s2s] distill t5-large -> t5-small 8376 (sbhaktha)
- Update deploy-docs dependencies on CI to enable Flax 8475 (mfuntowicz)
- Fix on "examples/language-modeling" to support more datasets 8474 (zeyuyun1)
- Fix doc bug 8500 (mymusise)
- Model sharing doc 8498 (sgugger)
- Fix SqueezeBERT for masked language model 8479 (forresti)
- Fix logging in the examples 8458 (jplu)
- Fix check scripts for Windows 8491 (jplu)
- Add pretraining loss computation for TF Bert pretraining 8470 (jplu)
- [T5] Bug correction & Refactor 8518 (patrickvonplaten)
- Model sharing doc: more tweaks 8520 (julien-c)
- [T5] Fix load weights function 8528 (patrickvonplaten)
- Rework some TF tests 8492 (jplu)
- [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency 8073 (thomwolf)
- Adding the prepare_seq2seq_batch function to ProphetNet 8515 (forest1988)
- Update version to v4.0.0-dev 8568 (sgugger)
- TAPAS tokenizer & tokenizer tests 8482 (LysandreJik)
- Switch `return_dict` to `True` by default. 8530 (sgugger)
- Fix mixed precision issue for GPT2 8572 (jplu)
- Reorganize repo 8580 (sgugger)
- Tokenizers: ability to load from model subfolder 8586 (julien-c)
- Fix model templates 8595 (sgugger)
- [examples tests] tests that are fine on multi-gpu 8582 (stas00)
- Fix check repo utils 8600 (sgugger)
- Tokenizers should be framework agnostic 8599 (LysandreJik)
- Remove deprecated 8604 (sgugger)
- Fixed link to the wrong paper. 8607 (cronoik)
- Reset loss to zero on logging in Trainer to avoid bfloat16 issues 8561 (bminixhofer)
- Fix DataCollatorForLanguageModeling 8621 (sgugger)
- [s2s] multigpu skip 8613 (stas00)
- [s2s] fix finetune.py to adjust for 8530 changes 8612 (stas00)
- tf_bart typo - self.self.activation_dropout 8611 (ratthachat)
- New TF loading weights 8490 (jplu)
- Adding PrefixConstrainedLogitsProcessor 8529 (nicola-decao)
- [Tokenizer Doc] Improve tokenizer summary 8622 (patrickvonplaten)
- Fixes the training resuming with gradient accumulation 8624 (sgugger)
- Fix training from scratch in new scripts 8623 (sgugger)
- [s2s] distillation apex breaks return_dict obj 8631 (stas00)
- Updated the Extractive Question Answering code snippets 8636 (cronoik)
- Fix missing return_dict in RAG example to use a custom knowledge source 8653 (lhoestq)
- Fix a bunch of slow tests 8634 (LysandreJik)
- Better filtering of the model outputs in Trainer 8633 (sgugger)

3.5.1

Not secure
Fix a typo that raised an error instead of a deprecation warning.

Page 22 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.