Transformers

Latest version: v4.50.3

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 15 of 33

4.22.0

Not secure

Swin Transformer v2

The Swin Transformer V2 model was proposed in [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Swin Transformer v2 improves the original [Swin Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/swin) using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

* Add swin transformer v2 by nandwalritik in 17469

VideoMAE

The VideoMAE model was proposed in [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders ([MAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae)) to video, claiming state-of-the-art performance on several video classification benchmarks.

VideoMAE is an extension of [ViTMAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae) for video.

* Add VideoMAE by NielsRogge in 17821

Donut

The Donut model was proposed in [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

* Add Donut by NielsRogge in 18488

Pegasus-X

The PEGASUS-X model was proposed in [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

* PEGASUS-X by zphang in 18551

X-CLIP

The X-CLIP model was proposed in [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/main/en/model_doc/clip) for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

X-CLIP is a minimal extension of CLIP for video-language understanding.

* Add X-CLIP by NielsRogge in 18852

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including [ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428), [ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244), etc.
These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

* ERNIE-2.0 and ERNIE-3.0 models by nghuyong in 18686

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

* TensorFlow MobileViT by sayakpaul in 18555
* [LayoutLMv3] Add TensorFlow implementation by ChrisFugl in 18678

New task-specific architectures

A new question answering head was added for the LayoutLM model.

* Add LayoutLMForQuestionAnswering model by ankrgyl in 18407

New pipelines

Two new pipelines are available in `transformers`: a document question answering pipeline, as well as an image to text generation pipeline.

* Add DocumentQuestionAnswering pipeline by ankrgyl in 18414
* Add Image To Text Generation pipeline by OlivierDehaene in 18821

M1 support

There is now Mac M1 support in PyTorch in `transformers` in pipelines and the Trainer.

* `pipeline` support for `device="mps"` (or any other string) by julien-c in 18494
* mac m1 `mps` integration by pacman100 in 18598

Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago.
Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions.
This project can be followed [here](https://github.com/huggingface/transformers/issues/18817).

* PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by sgugger in 19016

Generate method updates

The `generate` method now starts enforcing stronger validation in order to ensure proper usage.

* Generate: validate `model_kwargs` (and catch typos in generate arguments) by gante in 18261
* Generate: validate `model_kwargs` on TF (and catch typos in generate arguments) by gante in 18651
* Generate: add model class validation by gante in 18902

API changes

The `as_target_tokenizer` and `as_target_processor` context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:
py
with tokenizer.as_target_tokenizer():
encoded_labels = tokenizer(labels, padding=True)

becomes
py
encoded_labels = tokenizer(text_target=labels, padding=True)

* Replace `as_target` context managers by direct calls by sgugger in 18325

Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

* Supporting seq2seq models for `bitsandbytes` integration by younesbelkada in 18579
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models by younesbelkada in 17901

Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

* Load sharded pt to flax by ArthurZucker in 18419

TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

* TF Examples Rewrite by Rocketknight1 in 18451

DeBERTa-v2 is now trainable with XLA.

* TF: XLA-trainable DeBERTa v2 by gante in 18546

Documentation changes

* Split model list on modality by stevhliu in 18328

Improvements and bugfixes

* sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by LysandreJik in 18320
* Fix sacremoses sof dependency for Transformers XL by sgugger in 18321
* Owlvit test fixes by alaradirik in 18303
* [Flax] Fix incomplete batches in example scripts by sanchit-gandhi in 17863
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by sywangyi in 18229
* Update feature extractor docs by stevhliu in 18324
* fixed typo by banda-larga in 18331
* updated translation by banda-larga in 18333
* Updated _toctree.yml by nickprock in 18337
* Update automatic_speech_recognition.py by bofenghuang in 18339
* Fix codeparrot deduplication - ignore whitespaces by loubnabnl in 18023
* Remove Flax OPT from doctest for now by ydshieh in 18338
* Include tensorflow-aarch64 as a candidate by ankrgyl in 18345
* [BLOOM] Deprecate `position_ids` by thomasw21 in 18342
* Migrate metric to Evaluate library for tensorflow examples by VijayKalmath in 18327
* Migrate metrics used in flax examples to Evaluate by VijayKalmath in 18348
* [Docs] Fix Speech Encoder Decoder doc sample by sanchit-gandhi in 18346
* Fix OwlViT torchscript tests by ydshieh in 18347
* Fix some doctests by ydshieh in 18359
* [FX] Symbolic trace for Bloom by michaelbenayoun in 18356
* Fix TFSegformerForSemanticSegmentation doctest by ydshieh in 18362
* fix FSDP ShardedGradScaler by pacman100 in 18358
* Migrate metric to Evaluate in Pytorch examples by atturaioe in 18369
* Correct the spelling of bleu metric by ToluClassics in 18375
* Remove pt-like calls on tf tensor by amyeroberts in 18393
* Fix from_pretrained kwargs passing by YouJiacheng in 18387
* Add a check regarding the number of occurrences of by ydshieh in 18389
* Add evaluate to test dependencies by sgugger in 18396
* Fix OPT doc tests by ArthurZucker in 18365
* Fix doc tests by NielsRogge in 18397
* Add balanced strategies for device_map in from_pretrained by sgugger in 18349
* Fix docs by NielsRogge in 18399
* Adding fine-tuning models to LUKE by ikuyamada in 18353
* Fix ROUGE add example check and update README by sgugger in 18398
* Add Flax BART pretraining script by duongna21 in 18297
* Rewrite push_to_hub to use upload_files by sgugger in 18366
* Layoutlmv2 tesseractconfig by kelvinAI in 17733
* fix: create a copy for tokenizer object by YBooks in 18408
* Fix uninitialized parameter in conformer relative attention. by PiotrDabkowski in 18368
* Fix the hub user name in a longformer doctest checkpoint by ydshieh in 18418
* Change audio kwarg to images in TROCR processor by ydshieh in 18421
* update maskformer docs by alaradirik in 18423
* Fix `test_load_default_pipelines_tf` test error by ydshieh in 18422
* fix run_clip README by ydshieh in 18332
* Improve `generate` docstring by JoaoLages in 18198
* Accept `trust_remote_code` and ignore it in `PreTrainedModel.from_pretrained` by ydshieh in 18428
* Update pipeline word heuristic to work with whitespace in token offsets by davidbenton in 18402
* Add programming languages by cakiki in 18434
* fixing error when using sharded ddp by pacman100 in 18435
* Update _toctree.yml by stevhliu in 18440
* support ONNX export of XDropout in deberta{,_v2} and sew_d by garymm in 17502
* Add Spanish translation of run_scripts.mdx by donelianc in 18415
* Update no trainer scripts for language modeling and image classification examples by nandwalritik in 18443
* Update pinned hhub version by osanseviero in 18448
* Fix failing tests for XLA generation in TF by dsuess in 18298
* add zero-shot obj detection notebook to docs by alaradirik in 18453
* fix: keras fit tests for segformer tf and minor refactors. by sayakpaul in 18412
* Fix torch version comparisons by LSinev in 18460
* [BLOOM] Clean modeling code by thomasw21 in 18344
* change shape to support dynamic batch input in tf.function XLA generate for tf serving by nlpcat in 18372
* HFTracer.trace can now take callables and torch.nn.Module by michaelbenayoun in 18457
* Update no trainer scripts for multiple-choice by kiansierra in 18468
* Fix load of model checkpoints in the Trainer by sgugger in 18470
* Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by thomasw21 in 18363
* Add machine type in the artifact of Examples directory job by ydshieh in 18459
* Update no trainer examples for QA and Semantic Segmentation by kiansierra in 18474
* Add `TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING` by ydshieh in 18469
* Fixing issue where generic model types wouldn't load properly with the pipeline by Narsil in 18392
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight by harrydrippin in 18226
* Refactor `TFSwinLayer` to increase serving compatibility by harrydrippin in 18352
* Add TF prefix to TF-Res test class by ydshieh in 18481
* Remove py.typed by sgugger in 18485
* Fix pipeline tests by sgugger in 18487
* Use new huggingface_hub tools for download models by sgugger in 18438
* Fix `test_dbmdz_english` by updating expected values by ydshieh in 18482
* Move cache folder to huggingface/hub for consistency with hf_hub by sgugger in 18492
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` by ydshieh in 18484
* disable Onnx test for google/long-t5-tglobal-base by ydshieh in 18454
* Typo reported by Joel Grus on TWTR by julien-c in 18493
* Just re-reading the whole doc every couple of months 😬 by julien-c in 18489
* `transformers-cli login` => `huggingface-cli login` by julien-c in 18490
* Add seed setting to image classification example by regisss in 18519
* [DX fix] Fixing QA pipeline streaming a dataset. by Narsil in 18516
* Clean up hub by sgugger in 18497
* update fsdp docs by pacman100 in 18521
* Fix compatibility with 1.12 by sgugger in 17925
* Specify en in doc-builder README example by ankrgyl in 18526
* New cache fixes: add safeguard before looking in folders by sgugger in 18522
* unpin resampy by ydshieh in 18527
* ✨ update to use interlibrary links instead of Markdown by stevhliu in 18500
* Add example of multimodal usage to pipeline tutorial by stevhliu in 18498
* [VideoMAE] Add model to doc tests by NielsRogge in 18523
* Update perf_train_gpu_one.mdx by mishig25 in 18532
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by Rasmusafj in 18473
* Add Spanish translation of converting_tensorflow_models.mdx by donelianc in 18512
* Spanish translation of summarization.mdx by AguilaCudicio in 15947)
* Let's not cast them all by younesbelkada in 18471
* fix: data2vec-vision Onnx ready-made configuration. by NikeNano in 18427
* Add mt5 onnx config by ChainYo in 18394
* Minor update of `run_call_with_unpacked_inputs` by ydshieh in 18541
* BART - Fix attention mask device issue on copied models by younesbelkada in 18540
* Adding a new `align_to_words` param to qa pipeline. by Narsil in 18010
* 📝 update metric with evaluate by stevhliu in 18535
* Restore _init_weights value in no_init_weights by YouJiacheng in 18504
* 📝 update documentation build section by stevhliu in 18548
* Preserve hub-related kwargs in AutoModel.from_pretrained by sgugger in 18545
* Use commit hash to look in cache instead of calling head by sgugger in 18534
* Update philosophy to include other preprocessing classes by stevhliu in 18550
* Properly move cache when it is not in default path by sgugger in 18563
* Adds CLIP to models exportable with ONNX by unography in 18515
* raise atol for MT5OnnxConfig by ydshieh in 18560
* fix string by mrwyattii in 18568
* Segformer TF: fix output size in documentation by joihn in 18572
* Fix resizing bug in OWL-ViT by alaradirik in 18573
* Fix LayoutLMv3 documentation by pocca2048 in 17932
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by donebydan in 18486
* german docs translation by flozi00 in 18544
* Deberta V2: Fix critical trace warnings to allow ONNX export by iiLaurens in 18272
* [FX] _generate_dummy_input supports audio-classification models for labels by michaelbenayoun in 18580
* Fix docstrings with last version of hf-doc-builder styler by sgugger in 18581
* fix owlvit tests, update docstring examples by alaradirik in 18586
* Return the permuted hidden states if return_dict=True by amyeroberts in 18578
* Add type hints for ViLT models by donelianc in 18577
* update doc for perf_train_cpu_many, add intel mpi introduction by sywangyi in 18576
* typos by stas00 in 18594
* FSDP bug fix for `load_state_dict` by pacman100 in 18596
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` by ydshieh in 18600
* Fix URLs by NielsRogge in 18604
* Update BLOOM parameter counts by Muennighoff in 18531
* [doc] fix anchors by stas00 in 18591
* [fsmt] deal with -100 indices in decoder ids by stas00 in 18592
* small change by younesbelkada in 18584
* Flax Remat for LongT5 by KMFODA in 17994
* Change scheduled CIs to use torch 1.12.1 by ydshieh in 18644
* Add checks for some workflow jobs by ydshieh in 18583
* TF: Fix generation repetition penalty with XLA by gante in 18648
* Update longt5.mdx by flozi00 in 18634
* Update run_translation_no_trainer.py by zhoutang776 in 18637
* [bnb] Minor modifications by younesbelkada in 18631
* Examples: add Bloom support for token classification by stefan-it in 18632
* Fix Yolos ONNX export test by ydshieh in 18606
* Fix matmul inputs dtype by JingyaHuang in 18585
* Update feature extractor methods to enable type cast before normalize by amyeroberts in 18499
* Allow users to force TF availability by Rocketknight1 in 18650
* [LongT5] Correct docs long t5 by patrickvonplaten in 18669
* Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by gante in 18653
* Ping `detectron2` for CircleCI tests by ydshieh in 18680
* Rename method to avoid clash with property by amyeroberts in 18677
* Rename second input dimension from "sequence" to "num_channels" for CV models by regisss in 17976
* Fix repo consistency by lewtun in 18682
* Fix breaking change in `onnxruntime` for ONNX quantization by severinsimmler in 18336
* Add evaluate to examples requirements by muellerzr in 18666
* [bnb] Move documentation by younesbelkada in 18671
* Add an examples folder for code downstream tasks by loubnabnl in 18679
* `model.tie_weights()` should be applied after `accelerator.prepare()` by Gladiator07 in 18676
* Generate: add missing `**model_kwargs` in sample tests by gante in 18696
* Temp fix for broken detectron2 import by patrickvonplaten in 18699
* [Hotfix] pin detectron2 5aeb252 to avoid test fix by ydshieh in 18701
* Fix Data2VecVision ONNX test by ydshieh in 18587
* Add missing tokenizer tests - Longformer by tgadeliya in 17677
* remove check for main process for trackers initialization by Gladiator07 in 18706
* Unpin detectron2 by ydshieh in 18727
* Removing warning of model type for `microsoft/tapex-base-finetuned-wtq` by Narsil in 18711
* improve `add_tokens` docstring by SaulLu in 18687
* CLI: Don't check the model head when there is no model head by gante in 18733
* Update perf_infer_gpu_many.mdx by mishig25 in 18744
* Add minor doc-string change to include hp_name param in hyperparameter_search by constantin-huetterer in 18700
* fix pipeline_tutorial.mdx doctest by ydshieh in 18717
* Add TF implementation of `XGLMModel` by stancld in 16543
* fixed docstring typos by JadeKim042386 in 18739
* add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer by SaulLu in 18693
* examples/run_summarization_no_trainer: fixed incorrect param to hasattr by rahular in 18720
* Add ONNX support for Longformer by deutschmn in 17176
* Determine framework automatically before ONNX export by rachthree in 18615
* streamlining 'checkpointing_steps' parsing by rahular in 18755
* CLI: Improved error control and updated hub requirement by gante in 18752
* [VisionEncoderDecoder] Add gradient checkpointing by patrickvonplaten in 18697
* [Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now by patrickvonplaten in 18749
* Fix incomplete outputs of FlaxBert by duongna21 in 18772
* Fix broken link DeepSpeed documentation link by philschmid in 18783
* fix missing block when there is no failure by ydshieh in 18775
* fix a possible typo in auto feature extraction by fcakyon in 18779
* Fix memory leak issue in `torch_fx` tests by ydshieh in 18547
* Fix mock in `test_cached_files_are_used_when_internet_is_down` by Wauplin in 18804
* Add SegFormer and ViLT links by NielsRogge in 18808
* send model to the correct device by ydshieh in 18800
* Revert to and safely handle flag in owlvit config by amyeroberts in 18750
* Add docstring for BartForCausalLM by ekagra-ranjan in 18795
* up by qqaatw in 18805
* [Swin, Swinv2] Fix attn_mask dtype by NielsRogge in 18803
* Run tests if skip condition not met by amyeroberts in 18764
* Remove ViltForQuestionAnswering from check_repo by NielsRogge in 18762
* Adds OWLViT to models exportable with ONNX by unography in 18588
* Adds GroupViT to models exportable with ONNX by unography in 18628
* LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it by anthony2261 in 18774
* Added Docstrings for Deberta and DebertaV2 [PyTorch] by Tegzes in 18610
* Improving the documentation for "word", within the pipeline. by Narsil in 18763
* Disable nightly CI temporarily by ydshieh in 18820
* Pin max tf version by gante in 18818
* Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost by kongzii in 18647
* oob performance improvement for cpu DDP by sywangyi in 18595
* Warn on TPUs when the custom optimizer and model device are not the same by muellerzr in 18668
* Update location identification by LysandreJik in 18834
* fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary by azonti in 18607
* [DETR] Add num_channels attribute by NielsRogge in 18714
* Pin ffspec by sgugger in 18837
* Improve GPT2 doc by ekagra-ranjan in 18787
* Add an option to `HfArgumentParser.parse_{dict,json_file}` to raise an Exception when there extra keys by FelixSchneiderZoom in 18692
* Improve Text Generation doc by ekagra-ranjan in 18788
* Add SegFormer ONNX support by NielsRogge in 18006
* Add security warning about the from_pretrained() method by lewtun in 18801
* Owlvit memory leak fix by alaradirik in 18734
* Create pipeline_tutorial.mdx german docs by flozi00 in 18625
* Unpin fsspec by albertvillanova in 18846
* Delete `state_dict` to release memory as early as possible by ydshieh in 18832
* Generate: smaller TF serving test by gante in 18840
* add a script to get time info. from GA workflow jobs by ydshieh in 18822
* Pin rouge_score by albertvillanova in 18247
* Minor typo in prose of model outputs documentation. by pcuenca in 18848
* reflect max_new_tokens in `Seq2SeqTrainer` by kumapo in 18786
* Adds timeout argument to training_args to avoid socket timeouts in DDP by gugarosa in 18562
* Cache results of is_torch_tpu_available() by comaniac in 18777
* Tie weights after preparing the model in run_clm by sgugger in 18855
* Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests by ankrgyl in 18854
* Split docs on modality by stevhliu in 18205
* if learning rate is a tensor, get item (float) by kmckiern in 18861
* Fix naming issue with ImageToText pipeline by OlivierDehaene in 18864
* [LayoutLM] Add clarification to docs by NielsRogge in 18716
* Add OWL-ViT to the appropriate section by NielsRogge in 18867
* Clean up utils.hub using the latest from hf_hub by sgugger in 18857
* pin Slack SDK to 3.18.1 to avoid failing issue by ydshieh in 18869
* Fix number of examples for iterable datasets in multiprocessing by sgugger in 18856
* postpone bnb load until it's needed by stas00 in 18859
* A script to download artifacts and perform CI error statistics by ydshieh in 18865
* Remove cached torch_extensions on CI runners by ydshieh in 18868
* Update docs landing page by stevhliu in 18590
* Finetune guide for semantic segmentation by stevhliu in 18640
* Add Trainer to quicktour by stevhliu in 18723
* TF: TFMarianMTModel final logits bias as a layer by gante in 18833
* Mention TF and Flax checkpoints by LysandreJik in 18894
* Correct naming pegasus x by patrickvonplaten in 18896
* Update perf_train_gpu_one.mdx by thepurpleowl in 18442
* Add type hints to XLM-Roberta-XL models by asofiaoliveira in 18475
* Update Chinese documentation by zkep in 18893
* Generate: get the correct beam index on eos token by gante in 18851
* Mask t5 relative position bias then head pruned by hadaev8 in 17968
* updating gather function with gather_for_metrics in run_wav2vec2_pretraining by arun99481 in 18877
* Fix decode_input_ids to bare T5Model and improve doc by ekagra-ranjan in 18791
* Fix `test_tf_encode_plus_sent_to_model` for `LayoutLMv3` by ydshieh in 18898
* fixes bugs to handle non-dict output by alaradirik in 18897
* Further reduce the number of alls to head for cached objects by sgugger in 18871
* unpin slack_sdk version by ydshieh in 18901
* Fix incorrect size of input for 1st strided window length in `Perplexity of fixed-length models` by ekagra-ranjan in 18906
* [VideoMAE] Improve code examples by NielsRogge in 18919
* Add checks for more workflow jobs by ydshieh in 18905
* Accelerator end training by nbroad1881 in 18910
* update the train_batch_size in case HPO change batch_size_per_device by sywangyi in 18918
* Update TF fine-tuning docs by Rocketknight1 in 18654
* TF: final bias as a layer in seq2seq models (replicate TFMarian fix) by gante in 18903
* remvoe `_create_and_check_torch_fx_tracing` in specific test files by ydshieh in 18667
* [DeepSpeed ZeRO3] Fix performance degradation in sharded models by tjruwase in 18911
* pin TF 2.9.1 for self-hosted CIs by ydshieh in 18925
* Fix XLA fp16 and bf16 error checking by ymwangg in 18913
* Starts on a list of external deps required for dev by colindean in 18929
* Add image height and width to ONNX dynamic axes by lewtun in 18915
* Skip some doctests in quicktour by stevhliu in 18927
* Fix LayoutXLM wrong link in README by Devlee247 in 18932
* Update translation requests contact by NimaBoscarino in 18941
* [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* by sanchit-gandhi in 18361
* Neptune.ai integration improvements by Raalsky in 18934
* Generate: Simplify is_pad_token_not_equal_to_eos_token_id by ekagra-ranjan in 18933
* Fix train_step, test_step and tests for CLIP by Rocketknight1 in 18684
* Exit early in load if no weights are in the sharded state dict by sgugger in 18937
* update black target version by BramVanroy in 18955
* RFC: Replace custom TF embeddings by Keras embeddings by gante in 18939
* TF: unpin maximum TF version by gante in 18917
* Revert "TF: unpin maximum TF version by sgugger in 18917)"
* remove unused activation dropout by shijie-wu in 18842
* add DDP HPO support for sigopt by sywangyi in 18931
* Remove `decoder_position_ids` from `check_decoder_model_past_large_inputs` by ydshieh in 18980
* create Past CI results as tables for GitHub issue by ydshieh in 18953
* Remove dropout in embedding layer of OPT by shijie-wu in 18845
* Fix TF start docstrings by Rocketknight1 in 18991
* Align try_to_load_from_cache with huggingface_hub by sgugger in 18966
* Fix tflongformer int dtype by Rocketknight1 in 18907
* TF: correct TFBart embeddings weights name when load_weight_prefix is passed by gante in 18993
* fix checkpoint name for wav2vec2 conformer by ydshieh in 18994
* added type hints by daspartho in 18996
* TF: TF 2.10 unpin + related onnx test skips by gante in 18995
* Fixed typo by tnusser in 18921
* Removed issue in wav2vec link by chrisemezue in 18945
* Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug by alaradirik in 18997
* Add type hints for M2M by daspartho in 18998
* Fix tokenizer for XLMRobertaXL by ydshieh in 19004
* Update default revision for document-question-answering by ankrgyl in 18938
* Fixed bug which caused overwrite_cache to always be True by rahular in 19000
* add DDP HPO support for optuna by sywangyi in 19002
* add missing `require_tf` for `TFOPTGenerationTest` by ydshieh in 19010
* Re-add support for single url files in objects download by sgugger in 19014

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* nandwalritik
* Add swin transformer v2 (17469)
* Update no trainer scripts for language modeling and image classification examples (18443)
* ankrgyl
* Include tensorflow-aarch64 as a candidate (18345)
* Specify en in doc-builder README example (18526)
* Add LayoutLMForQuestionAnswering model (18407)
* Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (18854)
* Add DocumentQuestionAnswering pipeline (18414)
* Update default revision for document-question-answering (18938)
* ikuyamada
* Adding fine-tuning models to LUKE (18353)
* duongna21
* Add Flax BART pretraining script (18297)
* Fix incomplete outputs of FlaxBert (18772)
* donelianc
* Add Spanish translation of run_scripts.mdx (18415)
* Add Spanish translation of converting_tensorflow_models.mdx (18512)
* Add type hints for ViLT models (18577)
* sayakpaul
* fix: keras fit tests for segformer tf and minor refactors. (18412)
* TensorFlow MobileViT (18555)
* flozi00
* german docs translation (18544)
* Update longt5.mdx (18634)
* Create pipeline_tutorial.mdx german docs (18625)
* stancld
* Add TF implementation of `XGLMModel` (16543)
* ChrisFugl
* [LayoutLMv3] Add TensorFlow implementation (18678)
* zphang
* PEGASUS-X (18551)
* nghuyong
* add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (18686)

4.21.3

Not secure

Patch release to add a disclaimer about torch models hosted on the Hub. See [autoclass tutorial](https://huggingface.co/docs/transformers/v4.21.3/en/autoclass_tutorial#automodel).

4.21.2

Not secure

Fix a regression in the TableQA pipeline: Fix a regression in Trainer checkpoint loading: [18428](https://github.com/huggingface/transformers/pull/18428)

4.21.1

Not secure

Fix a regression in Trainer checkpoint loading: 18470

4.21.0

Not secure

TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with `tf.function` and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and [our benchmarks](https://huggingface.co/spaces/joaogante/tf_xla_generate_benchmarks). You can also see XLA generation in action in our [example notebooks](https://huggingface.co/docs/transformers/notebooks), particularly for [summarization](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb) and [translation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).

python
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
f"translate English to {language}: I have four cats and three dogs."
for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
print(tokenizer.decode(generated_text[0], skip_special_tokens=True))

* Generate: deprecate default `max_length` by gante in 18018
* TF: GPT-J compatible with XLA generation by gante in 17986
* TF: T5 can now handle a padded past (i.e. XLA generation) by gante in 17969
* TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by gante in 17857
* TF: generate without `tf.TensorArray` by gante in 17801
* TF: BART compatible with XLA generation by gante in 17479

New model additions

OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

* Add OWL-ViT model for zero-shot object detection by alaradirik in 17938
* Fix OwlViT tests by sgugger in 18253

NLLB

The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

* [M2M100] update conversion script by patil-suraj in 17916
* NLLB tokenizer by LysandreJik in 18126

MobileViT

The MobileViT model was proposed in [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

* add MobileViT model by hollance in 17354

Nezha

The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

* Nezha Pytorch implementation by sijunhe in 17776

GroupViT

The GroupViT model was proposed in [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by [CLIP](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip), GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

* Adding GroupViT Models by xvjiarui in 17313

MVP

The MVP model was proposed in [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

* Add MVP model by StevenTang1998 in 17787

CodeGen

The CodeGen model was proposed in [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on [The Pile](https://pile.eleuther.ai/), BigQuery, and BigPython.

* Add CodeGen model by rooa in 17443
* [CodeGen] support device_map="auto" for sharded checkpoints by patil-suraj in 17871

UL2

The UL2 model was presented in [Unifying Language Learning Paradigms](https://arxiv.org/pdf/2205.05131v1.pdf) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

* Add UL2 (just docs) by patrickvonplaten in 17740

Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add `trust_remote_code=True` when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the [added documentation](https://huggingface.co/docs/transformers/v4.21.0/en/add_new_pipeline#share-your-pipeline-on-the-hub).

* Custom pipeline by sgugger in 18079

PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

* CLI: tool to convert PT into TF weights and open hub PR by gante in https://github.com/huggingface/transformers/pull/17497

TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

* [SegFormer] TensorFlow port by sayakpaul in 17910
* Add TF DeiT implementation by amyeroberts in 17806
* Add TF ResNet model by amyeroberts in 17427
* TF implementation of RegNets by ariG23498 in 17554

Additionally, our TF models now support loading sharded checkpoints:

* TF Sharded by ArthurZucker in 17713

Flax-specific improvements

The following models have been ported to be used in JAX:

* Flax t5 Encoder by crystina-z in 17784

Additionally, our JAX models now support loading sharded checkpoints:

* Flax sharded by ArthurZucker in 17760

Additional model heads

The following models now have a brand new head for new tasks:

* Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by gilad19 in 17924
* Adding OPTForSeqClassification class by oneraghavan in 18123

ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

* add ONNX support for LeVit by gcheron in 18154
* add ONNX support for BLOOM by NouamaneTazi in 17961
* Add ONNX support for LayoutLMv3 by regisss in 17953
* Mrbean/codegen onnx by sam-h-bean in 17903
* Add ONNX support for DETR by regisss in 17904
* add onnx support for deberta and debertav2 by sam-h-bean in 17617

Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese

* Added translation of index.mdx to Portuguese Issue 16824 by rzimmerdev in 17565

Spanish

* Add Spanish translation of custom_models.mdx by donelianc in 17807

Italian

* Add Italian translation of sharing_custom_models.mdx by Xpiri in 17631
* Add Italian translation of converting_tensorflow_models.mdx by Xpiri in 18283
* Add Italian translation of create_model.mdx and serialization.mdx by F02934 in 17640
* Italian/accelerate by mfumanelli in 17698
* Italian/model sharing by mfumanelli in 17828
* Italian translation of run_scripts.mdx gh-17459 by lorenzobalzani in 17642
* Translation/debugging by nickprock in 18230
* Translation/training: italian translation training.mdx by nickprock in 17662
* Translation italian: multilingual.mdx by nickprock in 17768
* Added preprocessing.mdx italian translation by nickprock in 17600

Improvements and bugfixes

* [EncoderDecoder] Improve docs by NielsRogge in 18271
* [DETR] Improve code examples by NielsRogge in 18262
* patch for smddp import by carolynwang in 18244
* Fix Sylvain's nits on the original KerasMetricCallback PR by Rocketknight1 in 18300
* Add PYTEST_TIMEOUT for CircleCI test jobs by ydshieh in 18251
* Add PyTorch 1.11 to past CI by ydshieh in 18302
* Raise a TF-specific error when importing Torch classes by Rocketknight1 in 18280
* [ create_a_model.mdx ] translate to pt by Fellip15 in 18098
* Update translation.mdx by gorkemozkaya in 18169
* Add TFAutoModelForImageClassification to pipelines.py by ydshieh in 18292
* Adding type hints of TF:OpenAIGPT by Mathews-Tom in 18263
* Adding type hints of TF:CTRL by Mathews-Tom in 18264
* Replace false parameter by a buffer by sgugger in 18259
* Fix ORTTrainer failure on gpt2 fp16 training by JingyaHuang in 18017
* Owlvit docs test by alaradirik in 18257
* Good difficult issue override for the stalebot by LysandreJik in 18094
* Fix dtype of input_features in docstring by ydshieh in 18258
* Fix command of doc tests for local testing by oneraghavan in 18236
* Fix TF bad words filter with XLA by Rocketknight1 in 18286
* Allows `KerasMetricCallback` to use XLA generation by Rocketknight1 in 18265
* Skip passes report for `--make-reports` by ydshieh in 18250
* Update serving code to enable `saved_model=True` by amyeroberts in 18153
* Change how `take_along_axis` is computed in DeBERTa to stop confusing XLA by Rocketknight1 in 18256
* Fix torch version check in Vilt by ydshieh in 18260
* change bloom parameters to 176B by muhammad-ahmed-ghani in 18235
* TF: use the correct config with `(...)EncoderDecoder` models by gante in 18097
* Fix `no_trainer` CI by muellerzr in 18242
* Update notification service by ydshieh in 17921
* Make errors for loss-less models more user-friendly by sgugger in 18233
* Fix TrainingArguments help section by sgugger in 18232
* Better messaging and fix for incorrect shape when collating data. by CakeCrusher in 18119
* Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by viclzhu in 18221
* Update add_new_pipeline.mdx by zh-zheng in 18224
* Add custom config to quicktour by stevhliu in 18115
* skip some test_multi_gpu_data_parallel_forward by ydshieh in 18188
* Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by ydshieh in 18213
* Fix `LayoutXLM` docstrings by qqaatw in 17038
* update cache to v0.5 by ydshieh in 18203
* Reduce console spam when using the KerasMetricCallback by Rocketknight1 in 18202
* TF: Add missing cast to GPT-J by gante in 18201
* Use next-gen CircleCI convenience images by ydshieh in 18197
* Typo in readme by flozi00 in 18195
* [From pretrained] Allow download from subfolder inside model repo by patrickvonplaten in 18184
* Update docs README with instructions on locally previewing docs by snehankekre in 18196
* bugfix: div-->dim by orgoro in 18135
* Add vision example to README by sgugger in 18194
* Remove use_auth_token from the from_config method by duongna21 in 18192
* FSDP integration enhancements and fixes by pacman100 in 18134
* BLOOM minor fixes small test by younesbelkada in 18175
* fix typo inside bloom documentation by SaulLu in 18187
* Better default for offload_state_dict in from_pretrained by sgugger in 18183
* Fix template for new models in README by sgugger in 18182
* FIX: Typo by ayansengupta17 in 18156
* Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by ydshieh in 18073
* Fix expected loss values in some (m)T5 tests by ydshieh in 18177
* [HPO] update to sigopt new experiment api by sywangyi in 18147
* Fix incorrect type hint for lang by JohnGiorgi in 18161
* Fix check for falsey inputs in run_summarization by JohnGiorgi in 18155
* Adding support for `device_map` directly in `pipeline(..)` function. by Narsil in 17902
* Fixing a hard to trigger bug for `text-generation` pipeline. by Narsil in 18131
* Enable torchdynamo with torch_tensorrt(fx path) by frank-wei in 17765
* Make sharded checkpoints work in offline mode by sgugger in 18125
* add dataset split and config to model-index in TrainingSummary.from_trainer by loicmagne in 18064
* Add summarization name mapping for MultiNews by JohnGiorgi in 18117
* supported python versions reference by CakeCrusher in 18116
* TF: unpack_inputs decorator independent from main_input_name by gante in 18110
* TF: remove graph mode distinction when processing boolean options by gante in 18102
* Fix BLOOM dtype by Muennighoff in 17995
* CLI: reenable `pt_to_tf` test by gante in 18108
* Report value for a step instead of epoch. by zhawe01 in 18095
* speed up test by sijunhe in 18106
* Enhance IPEX integration in Trainer by jianan-gu in 18072
* Bloom Optimize operations by younesbelkada in 17866
* Add filename to info diaplyed when downloading things in from_pretrained by sgugger in 18099
* Fix image segmentation and object detection pipeline tests by sgugger in 18100
* Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by duongna21 in 18069
* Fix torchscript tests for GPT-NeoX by ydshieh in 18012
* Fix some typos. by Yulv-git in 17560
* [bloom] fix alibi device placement by stas00 in 18087
* Make predict() close progress bars after finishing by neverix in 17952)
* Update localized READMES when template is filled. by sgugger in 18062
* Fix type issue in using bucketing with Trainer by seopbo in 18051
* Fix slow CI by pinning resampy by sgugger in 18077
* Drop columns after loading samples in prepare_tf_dataset by Rocketknight1 in 17967
* [Generate Tests] Make sure no tokens are force-generated by patrickvonplaten in 18053
* Added Command for windows VENV activation in installation docs by darthvader2 in 18008
* Sort doc toc by sgugger in 18034
* Place inputs on device when include_inputs_for_metrics is True by sgugger in 18046
* Doc to dataset by sgugger in 18037
* Protect `TFGenerationMixin.seed_generator` so it's not created at import by Rocketknight1 in 18044
* Fix T5 incorrect weight decay in Trainer and official summarization example by ADAning in 18002
* Squash commits by NielsRogge in 17981
* Enable Past CI by ydshieh in 17919
* Fix T5/mT5 tests by Rocketknight1 in 18029
* [Flax] Bump to v0.4.1 by sanchit-gandhi in 17966
* Update expected values in DecisionTransformerModelIntegrationTest by ydshieh in 18016
* fixed calculation of ctc loss in TFWav2Vec2ForCTC by Sreyan88 in 18014
* Return scalar losses instead of per-sample means by Rocketknight1 in 18013
* sort list of models by hollance in 18011
* Replace BloomTokenizer by BloomTokenizerFast in doc by regisss in 18005
* Fix typo in error message in generation_utils by regisss in 18000
* Refactor to inherit from nn.Module instead of nn.ModuleList by amyeroberts in 17501
* Add link to existing documentation by LysandreJik in 17931
* only a stupid typo, but it can lead to confusion by Dobatymo in 17930
* Exclude Databricks from notebook env only if the runtime is below 11.0 by davidheryanto in 17988
* Shifting labels for causal LM when using label smoother by seungeunrho in 17987
* Restore original task in test_warning_logs by ydshieh in 17985
* Ensure PT model is in evaluation mode and lightweight forward pass done by amyeroberts in 17970
* XLA train step fixes by Rocketknight1 in 17973
* [Flax] Add remat (gradient checkpointing) by sanchit-gandhi in 17843
* higher atol to avoid flaky trainer test failure by ydshieh in 17979
* Fix FlaxBigBirdEmbeddings by ydshieh in 17842
* fixing fsdp autowrap functionality by pacman100 in 17922
* fix `bias` keyword argument in TFDebertaEmbeddings by WissamAntoun in 17940
* Update expected values in CodeGen tests by ydshieh in 17888
* Fix typo in perf_train_gpu_one.mdx by aliencaocao in 17983
* skip some gpt_neox tests that require 80G RAM by ydshieh in 17923
* feat: add pipeline registry abstraction by aarnphm in 17905
* skip some ipex tests until it works with torch 1.12 by ydshieh in 17964
* Fix number of examples for iterable dataset in distributed training by sgugger in 17951
* [Pipelines] Add revision tag to all default pipelines by patrickvonplaten in 17667
* Unifying training argument type annotations by jannisborn in 17934
* Fix GPT-NeoX-20B past handling, attention computation by zphang in 17811
* Fix 17893, removed dead code by clefourrier in 17917
* Fix prepare_tf_dataset when drop_remainder is not supplied by Rocketknight1 in 17950
* ExplicitEnum subclass str (JSON dump compatible) by BramVanroy in 17933
* PyTorch 1.12.0 for scheduled CI by ydshieh in 17949
* OPT - Fix Softmax NaN in half precision mode by younesbelkada in 17437
* Use explicit torch version in deepspeed CI by ydshieh in 17942
* fix regexes with escape sequence by stas00 in 17943
* Fix all is_torch_tpu_available issues by muellerzr in 17936
* Fix img seg tests (load checkpoints from `hf-internal-testing`) by mishig25 in 17939
* Remove imports and use forward references in ONNX feature by sgugger in 17926
* Fix job links in Slack report by ydshieh in 17892
* Add missing comment quotes by leondz in 17379
* Remove render tags by NielsRogge in 17897
* Fix the Conda package build by bryant1410 in 16737
* Remove DT_DOUBLE from the T5 graph by szutenberg in 17891
* Compute min_resolution in prepare_image_inputs by ydshieh in 17915
* Fixing a regression with `return_all_scores` introduced in 17606 by Narsil in 17906
* In `group_texts` function, drop last block if smaller than `block_size` by billray0259 in 17908
* Move logic into pixelshuffle layer by amyeroberts in 17899
* Fix loss computation in TFBertForPreTraining by Rocketknight1 in 17898
* Pin black to 22.3.0 to benefit from a stable --preview flag by LysandreJik in 17918
* Fix PyTorch/TF Auto tests by ydshieh in 17895
* Fix `test_number_of_steps_in_training_with_ipex` by ydshieh in 17889
* Update expected values in constrained beam search tests by ydshieh in 17887
* Fix bug in gpt2's (from-scratch) special scaled weight initialization by karpathy in 17877
* Update README_zh-hans.md by mmdjiji in 17861
* bert: add conversion script for BERT Token Dropping TF2 checkpoints by stefan-it in 17142
* Fix add new model like frameworks by sgugger in 17869
* Add type annotations for RoFormer models by donelianc in 17878
* fix by ydshieh in 17890
* fix mask by younesbelkada in 17837
* Add a TF in-graph tokenizer for BERT by Rocketknight1 in 17701
* Fix TF GPT2 test_onnx_runtime_optimize by ydshieh in 17874
* CLI: handle multimodal inputs by gante in 17839
* Properly get tests deps in test_fetcher by sgugger in 17870
* Fix `test_inference_instance_segmentation_head` by ydshieh in 17872
* Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer` by ydshieh in 17864
* Use higher value for hidden_size in Flax BigBird test by ydshieh in 17822
* Fix: torch.utils.checkpoint import error. by kumapo in 17849
* Add type hints for gptneox models by willtai in 17858
* Fix Splinter test by ydshieh in 17854
* [tests/VisionEncoderDecoder] import to_2tuple from test utils by patil-suraj in 17865
* Fix Constrained beam search duplication and weird output issue by boy2000-007man in 17814
* Improve encoder decoder model docs by Threepointone4 in 17815
* Improve vision models by NielsRogge in 17731
* Auto-build Docker images before on-merge if setup.py was changed by muellerzr in 17573
* Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts by muellerzr in 17856
* Index RNG states by global rank in saves by sgugger in 17852
* Change no trainer image_classification test by muellerzr in 17635
* Update modeling_cvt.py by F02934 in 17846
* Fix broken test for models with batchnorm by Rocketknight1 in 17841
* BLOOM minor changes on tokenizer by younesbelkada in 17823
* Improve performance docs by lvwerra in 17750
* Fix an error message in BigBird by ydshieh in 17840
* Fix properties of unset special tokens in non verbose mode by guillaumekln in 17797
* change message by SaulLu in 17836
* Add missing type hints for QDQBertModel by willtai in 17783
* Update type hints modeling_yoso.py by F02934 in 17827
* add doctests for DETR by qherreros in 17786
* Fix push CI artifact path by ydshieh in 17788
* Offload fixes by sgugger in 17810
* CLI: use hub's `create_commit` by gante in 17755
* initial commit by ArthurZucker in 17818
* Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict` by eranhirs in 17805
* Fix `top_k_top_p_filtering` having unexpected behavior by unifyh in 17744
* Remove duplicate code by lkm2835 in 17708
* CLI: convert sharded PT models by gante in 17959
* Improve error message Union not allowed by BramVanroy in 17769
* Add final_layer_norm to OPT model by thomasw21 in 17785
* Properly check for a TPU device by muellerzr in 17802
* Fix test for BF16 detection by sgugger in 17803
* Use 5e-5 For BigBird PT/Flax equivalence tests by ydshieh in 17780
* Prepare transformers for v0.8.0 huggingface-hub release by LysandreJik in 17716
* Fix forward reference imports in DeBERTa configs by sgugger in 17800
* Fix Automatic Download of Pretrained Weights in DETR by AnugunjNaman in 17712
* [ViTMAE] Fix docstrings and variable names by NielsRogge in 17710
* Add link to notebook by NielsRogge in 17791
* [CodeParrot] Near-deduplication with jaccard similarity by liyongsea in 17054
* Update modeling_longt5.py by bjascob in 17777
* Not use -1e4 as attn mask by ydshieh in 17306
* Fix cache for GPT-Neo-X by sgugger in 17764
* deprecate is_torch_bf16_available by stas00 in 17738
* Attempt to change Push CI to workflow_run by ydshieh in 17753
* Save huggingface checkpoint as artifact in mlflow callback by swethmandava in 17686
* Migrate HFDeepSpeedConfig from trfrs to accelerate by pacman100 in 17623
* feat: add num_workers arg to DataLoader by greg2451 in 17751
* Enable PyTorch nightly build CI by ydshieh in 17335

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* donelianc
* Add Spanish translation of custom_models.mdx (17807)
* Add type annotations for RoFormer models (17878)
* Xpiri
* Add Italian translation of sharing_custom_models.mdx (17631)
* Add Italian translation of converting_tensorflow_models.mdx (18283)
* F02934
* Add Italian translation of create_model.mdx and serialization.mdx (17640)
* Update modeling_cvt.py (17846)
* Update type hints modeling_yoso.py (17827)
* sayakpaul
* [SegFormer] TensorFlow port (17910)
* mfumanelli
* Italian/accelerate (17698)
* Italian/model sharing (17828)
* nickprock
* Translation/debugging (18230)
* Translation/training: italian translation training.mdx (17662)
* Translation italian: multilingual.mdx (17768)
* Added preprocessing.mdx italian translation (17600)
* sijunhe
* speed up test (18106)
* Nezha Pytorch implementation (17776)
* StevenTang1998
* Add MVP model (17787)
* ariG23498
* TF implementation of RegNets (17554)
* xvjiarui
* Adding GroupViT Models (17313)
* rooa
* Add CodeGen model (17443)

4.20.1

Not secure

This patch releases fixes a bug in the OPT models and makes Transformers compatible with `huggingface_hub` version 0.8.1.

* Add final_layer_norm to OPT model 17785
* Prepare transformers for v0.8.0 huggingface-hub release 17716

Page 15 of 33

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 15 of 33

4.22.0

4.21.3

4.21.2

4.21.1

4.21.0

4.20.1

Page 15 of 33

Links

Releases