Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 13 of 30

4.21.2

Not secure
Fix a regression in the TableQA pipeline: Fix a regression in Trainer checkpoint loading: [18428](https://github.com/huggingface/transformers/pull/18428)

4.21.1

Not secure
Fix a regression in Trainer checkpoint loading: 18470

4.21.0

Not secure
TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with `tf.function` and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and [our benchmarks](https://huggingface.co/spaces/joaogante/tf_xla_generate_benchmarks). You can also see XLA generation in action in our [example notebooks](https://huggingface.co/docs/transformers/notebooks), particularly for [summarization](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb) and [translation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).

python
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
f"translate English to {language}: I have four cats and three dogs."
for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
print(tokenizer.decode(generated_text[0], skip_special_tokens=True))


* Generate: deprecate default `max_length` by gante in 18018
* TF: GPT-J compatible with XLA generation by gante in 17986
* TF: T5 can now handle a padded past (i.e. XLA generation) by gante in 17969
* TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by gante in 17857
* TF: generate without `tf.TensorArray` by gante in 17801
* TF: BART compatible with XLA generation by gante in 17479

New model additions

OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

* Add OWL-ViT model for zero-shot object detection by alaradirik in 17938
* Fix OwlViT tests by sgugger in 18253

NLLB

The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

* [M2M100] update conversion script by patil-suraj in 17916
* NLLB tokenizer by LysandreJik in 18126

MobileViT

The MobileViT model was proposed in [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

* add MobileViT model by hollance in 17354

Nezha

The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

* Nezha Pytorch implementation by sijunhe in 17776

GroupViT

The GroupViT model was proposed in [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by [CLIP](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip), GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

* Adding GroupViT Models by xvjiarui in 17313

MVP

The MVP model was proposed in [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

* Add MVP model by StevenTang1998 in 17787

CodeGen

The CodeGen model was proposed in [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on [The Pile](https://pile.eleuther.ai/), BigQuery, and BigPython.

* Add CodeGen model by rooa in 17443
* [CodeGen] support device_map="auto" for sharded checkpoints by patil-suraj in 17871

UL2

The UL2 model was presented in [Unifying Language Learning Paradigms](https://arxiv.org/pdf/2205.05131v1.pdf) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

* Add UL2 (just docs) by patrickvonplaten in 17740

Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add `trust_remote_code=True` when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the [added documentation](https://huggingface.co/docs/transformers/v4.21.0/en/add_new_pipeline#share-your-pipeline-on-the-hub).

* Custom pipeline by sgugger in 18079

PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

* CLI: tool to convert PT into TF weights and open hub PR by gante in https://github.com/huggingface/transformers/pull/17497

TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

* [SegFormer] TensorFlow port by sayakpaul in 17910
* Add TF DeiT implementation by amyeroberts in 17806
* Add TF ResNet model by amyeroberts in 17427
* TF implementation of RegNets by ariG23498 in 17554

Additionally, our TF models now support loading sharded checkpoints:

* TF Sharded by ArthurZucker in 17713

Flax-specific improvements

The following models have been ported to be used in JAX:

* Flax t5 Encoder by crystina-z in 17784

Additionally, our JAX models now support loading sharded checkpoints:

* Flax sharded by ArthurZucker in 17760

Additional model heads

The following models now have a brand new head for new tasks:

* Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by gilad19 in 17924
* Adding OPTForSeqClassification class by oneraghavan in 18123

ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

* add ONNX support for LeVit by gcheron in 18154
* add ONNX support for BLOOM by NouamaneTazi in 17961
* Add ONNX support for LayoutLMv3 by regisss in 17953
* Mrbean/codegen onnx by sam-h-bean in 17903
* Add ONNX support for DETR by regisss in 17904
* add onnx support for deberta and debertav2 by sam-h-bean in 17617

Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese

* Added translation of index.mdx to Portuguese Issue 16824 by rzimmerdev in 17565

Spanish

* Add Spanish translation of custom_models.mdx by donelianc in 17807

Italian

* Add Italian translation of sharing_custom_models.mdx by Xpiri in 17631
* Add Italian translation of converting_tensorflow_models.mdx by Xpiri in 18283
* Add Italian translation of create_model.mdx and serialization.mdx by F02934 in 17640
* Italian/accelerate by mfumanelli in 17698
* Italian/model sharing by mfumanelli in 17828
* Italian translation of run_scripts.mdx gh-17459 by lorenzobalzani in 17642
* Translation/debugging by nickprock in 18230
* Translation/training: italian translation training.mdx by nickprock in 17662
* Translation italian: multilingual.mdx by nickprock in 17768
* Added preprocessing.mdx italian translation by nickprock in 17600

Improvements and bugfixes

* [EncoderDecoder] Improve docs by NielsRogge in 18271
* [DETR] Improve code examples by NielsRogge in 18262
* patch for smddp import by carolynwang in 18244
* Fix Sylvain's nits on the original KerasMetricCallback PR by Rocketknight1 in 18300
* Add PYTEST_TIMEOUT for CircleCI test jobs by ydshieh in 18251
* Add PyTorch 1.11 to past CI by ydshieh in 18302
* Raise a TF-specific error when importing Torch classes by Rocketknight1 in 18280
* [ create_a_model.mdx ] translate to pt by Fellip15 in 18098
* Update translation.mdx by gorkemozkaya in 18169
* Add TFAutoModelForImageClassification to pipelines.py by ydshieh in 18292
* Adding type hints of TF:OpenAIGPT by Mathews-Tom in 18263
* Adding type hints of TF:CTRL by Mathews-Tom in 18264
* Replace false parameter by a buffer by sgugger in 18259
* Fix ORTTrainer failure on gpt2 fp16 training by JingyaHuang in 18017
* Owlvit docs test by alaradirik in 18257
* Good difficult issue override for the stalebot by LysandreJik in 18094
* Fix dtype of input_features in docstring by ydshieh in 18258
* Fix command of doc tests for local testing by oneraghavan in 18236
* Fix TF bad words filter with XLA by Rocketknight1 in 18286
* Allows `KerasMetricCallback` to use XLA generation by Rocketknight1 in 18265
* Skip passes report for `--make-reports` by ydshieh in 18250
* Update serving code to enable `saved_model=True` by amyeroberts in 18153
* Change how `take_along_axis` is computed in DeBERTa to stop confusing XLA by Rocketknight1 in 18256
* Fix torch version check in Vilt by ydshieh in 18260
* change bloom parameters to 176B by muhammad-ahmed-ghani in 18235
* TF: use the correct config with `(...)EncoderDecoder` models by gante in 18097
* Fix `no_trainer` CI by muellerzr in 18242
* Update notification service by ydshieh in 17921
* Make errors for loss-less models more user-friendly by sgugger in 18233
* Fix TrainingArguments help section by sgugger in 18232
* Better messaging and fix for incorrect shape when collating data. by CakeCrusher in 18119
* Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by viclzhu in 18221
* Update add_new_pipeline.mdx by zh-zheng in 18224
* Add custom config to quicktour by stevhliu in 18115
* skip some test_multi_gpu_data_parallel_forward by ydshieh in 18188
* Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by ydshieh in 18213
* Fix `LayoutXLM` docstrings by qqaatw in 17038
* update cache to v0.5 by ydshieh in 18203
* Reduce console spam when using the KerasMetricCallback by Rocketknight1 in 18202
* TF: Add missing cast to GPT-J by gante in 18201
* Use next-gen CircleCI convenience images by ydshieh in 18197
* Typo in readme by flozi00 in 18195
* [From pretrained] Allow download from subfolder inside model repo by patrickvonplaten in 18184
* Update docs README with instructions on locally previewing docs by snehankekre in 18196
* bugfix: div-->dim by orgoro in 18135
* Add vision example to README by sgugger in 18194
* Remove use_auth_token from the from_config method by duongna21 in 18192
* FSDP integration enhancements and fixes by pacman100 in 18134
* BLOOM minor fixes small test by younesbelkada in 18175
* fix typo inside bloom documentation by SaulLu in 18187
* Better default for offload_state_dict in from_pretrained by sgugger in 18183
* Fix template for new models in README by sgugger in 18182
* FIX: Typo by ayansengupta17 in 18156
* Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by ydshieh in 18073
* Fix expected loss values in some (m)T5 tests by ydshieh in 18177
* [HPO] update to sigopt new experiment api by sywangyi in 18147
* Fix incorrect type hint for lang by JohnGiorgi in 18161
* Fix check for falsey inputs in run_summarization by JohnGiorgi in 18155
* Adding support for `device_map` directly in `pipeline(..)` function. by Narsil in 17902
* Fixing a hard to trigger bug for `text-generation` pipeline. by Narsil in 18131
* Enable torchdynamo with torch_tensorrt(fx path) by frank-wei in 17765
* Make sharded checkpoints work in offline mode by sgugger in 18125
* add dataset split and config to model-index in TrainingSummary.from_trainer by loicmagne in 18064
* Add summarization name mapping for MultiNews by JohnGiorgi in 18117
* supported python versions reference by CakeCrusher in 18116
* TF: unpack_inputs decorator independent from main_input_name by gante in 18110
* TF: remove graph mode distinction when processing boolean options by gante in 18102
* Fix BLOOM dtype by Muennighoff in 17995
* CLI: reenable `pt_to_tf` test by gante in 18108
* Report value for a step instead of epoch. by zhawe01 in 18095
* speed up test by sijunhe in 18106
* Enhance IPEX integration in Trainer by jianan-gu in 18072
* Bloom Optimize operations by younesbelkada in 17866
* Add filename to info diaplyed when downloading things in from_pretrained by sgugger in 18099
* Fix image segmentation and object detection pipeline tests by sgugger in 18100
* Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by duongna21 in 18069
* Fix torchscript tests for GPT-NeoX by ydshieh in 18012
* Fix some typos. by Yulv-git in 17560
* [bloom] fix alibi device placement by stas00 in 18087
* Make predict() close progress bars after finishing by neverix in 17952)
* Update localized READMES when template is filled. by sgugger in 18062
* Fix type issue in using bucketing with Trainer by seopbo in 18051
* Fix slow CI by pinning resampy by sgugger in 18077
* Drop columns after loading samples in prepare_tf_dataset by Rocketknight1 in 17967
* [Generate Tests] Make sure no tokens are force-generated by patrickvonplaten in 18053
* Added Command for windows VENV activation in installation docs by darthvader2 in 18008
* Sort doc toc by sgugger in 18034
* Place inputs on device when include_inputs_for_metrics is True by sgugger in 18046
* Doc to dataset by sgugger in 18037
* Protect `TFGenerationMixin.seed_generator` so it's not created at import by Rocketknight1 in 18044
* Fix T5 incorrect weight decay in Trainer and official summarization example by ADAning in 18002
* Squash commits by NielsRogge in 17981
* Enable Past CI by ydshieh in 17919
* Fix T5/mT5 tests by Rocketknight1 in 18029
* [Flax] Bump to v0.4.1 by sanchit-gandhi in 17966
* Update expected values in DecisionTransformerModelIntegrationTest by ydshieh in 18016
* fixed calculation of ctc loss in TFWav2Vec2ForCTC by Sreyan88 in 18014
* Return scalar losses instead of per-sample means by Rocketknight1 in 18013
* sort list of models by hollance in 18011
* Replace BloomTokenizer by BloomTokenizerFast in doc by regisss in 18005
* Fix typo in error message in generation_utils by regisss in 18000
* Refactor to inherit from nn.Module instead of nn.ModuleList by amyeroberts in 17501
* Add link to existing documentation by LysandreJik in 17931
* only a stupid typo, but it can lead to confusion by Dobatymo in 17930
* Exclude Databricks from notebook env only if the runtime is below 11.0 by davidheryanto in 17988
* Shifting labels for causal LM when using label smoother by seungeunrho in 17987
* Restore original task in test_warning_logs by ydshieh in 17985
* Ensure PT model is in evaluation mode and lightweight forward pass done by amyeroberts in 17970
* XLA train step fixes by Rocketknight1 in 17973
* [Flax] Add remat (gradient checkpointing) by sanchit-gandhi in 17843
* higher atol to avoid flaky trainer test failure by ydshieh in 17979
* Fix FlaxBigBirdEmbeddings by ydshieh in 17842
* fixing fsdp autowrap functionality by pacman100 in 17922
* fix `bias` keyword argument in TFDebertaEmbeddings by WissamAntoun in 17940
* Update expected values in CodeGen tests by ydshieh in 17888
* Fix typo in perf_train_gpu_one.mdx by aliencaocao in 17983
* skip some gpt_neox tests that require 80G RAM by ydshieh in 17923
* feat: add pipeline registry abstraction by aarnphm in 17905
* skip some ipex tests until it works with torch 1.12 by ydshieh in 17964
* Fix number of examples for iterable dataset in distributed training by sgugger in 17951
* [Pipelines] Add revision tag to all default pipelines by patrickvonplaten in 17667
* Unifying training argument type annotations by jannisborn in 17934
* Fix GPT-NeoX-20B past handling, attention computation by zphang in 17811
* Fix 17893, removed dead code by clefourrier in 17917
* Fix prepare_tf_dataset when drop_remainder is not supplied by Rocketknight1 in 17950
* ExplicitEnum subclass str (JSON dump compatible) by BramVanroy in 17933
* PyTorch 1.12.0 for scheduled CI by ydshieh in 17949
* OPT - Fix Softmax NaN in half precision mode by younesbelkada in 17437
* Use explicit torch version in deepspeed CI by ydshieh in 17942
* fix regexes with escape sequence by stas00 in 17943
* Fix all is_torch_tpu_available issues by muellerzr in 17936
* Fix img seg tests (load checkpoints from `hf-internal-testing`) by mishig25 in 17939
* Remove imports and use forward references in ONNX feature by sgugger in 17926
* Fix job links in Slack report by ydshieh in 17892
* Add missing comment quotes by leondz in 17379
* Remove render tags by NielsRogge in 17897
* Fix the Conda package build by bryant1410 in 16737
* Remove DT_DOUBLE from the T5 graph by szutenberg in 17891
* Compute min_resolution in prepare_image_inputs by ydshieh in 17915
* Fixing a regression with `return_all_scores` introduced in 17606 by Narsil in 17906
* In `group_texts` function, drop last block if smaller than `block_size` by billray0259 in 17908
* Move logic into pixelshuffle layer by amyeroberts in 17899
* Fix loss computation in TFBertForPreTraining by Rocketknight1 in 17898
* Pin black to 22.3.0 to benefit from a stable --preview flag by LysandreJik in 17918
* Fix PyTorch/TF Auto tests by ydshieh in 17895
* Fix `test_number_of_steps_in_training_with_ipex` by ydshieh in 17889
* Update expected values in constrained beam search tests by ydshieh in 17887
* Fix bug in gpt2's (from-scratch) special scaled weight initialization by karpathy in 17877
* Update README_zh-hans.md by mmdjiji in 17861
* bert: add conversion script for BERT Token Dropping TF2 checkpoints by stefan-it in 17142
* Fix add new model like frameworks by sgugger in 17869
* Add type annotations for RoFormer models by donelianc in 17878
* fix by ydshieh in 17890
* fix mask by younesbelkada in 17837
* Add a TF in-graph tokenizer for BERT by Rocketknight1 in 17701
* Fix TF GPT2 test_onnx_runtime_optimize by ydshieh in 17874
* CLI: handle multimodal inputs by gante in 17839
* Properly get tests deps in test_fetcher by sgugger in 17870
* Fix `test_inference_instance_segmentation_head` by ydshieh in 17872
* Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer` by ydshieh in 17864
* Use higher value for hidden_size in Flax BigBird test by ydshieh in 17822
* Fix: torch.utils.checkpoint import error. by kumapo in 17849
* Add type hints for gptneox models by willtai in 17858
* Fix Splinter test by ydshieh in 17854
* [tests/VisionEncoderDecoder] import to_2tuple from test utils by patil-suraj in 17865
* Fix Constrained beam search duplication and weird output issue by boy2000-007man in 17814
* Improve encoder decoder model docs by Threepointone4 in 17815
* Improve vision models by NielsRogge in 17731
* Auto-build Docker images before on-merge if setup.py was changed by muellerzr in 17573
* Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts by muellerzr in 17856
* Index RNG states by global rank in saves by sgugger in 17852
* Change no trainer image_classification test by muellerzr in 17635
* Update modeling_cvt.py by F02934 in 17846
* Fix broken test for models with batchnorm by Rocketknight1 in 17841
* BLOOM minor changes on tokenizer by younesbelkada in 17823
* Improve performance docs by lvwerra in 17750
* Fix an error message in BigBird by ydshieh in 17840
* Fix properties of unset special tokens in non verbose mode by guillaumekln in 17797
* change message by SaulLu in 17836
* Add missing type hints for QDQBertModel by willtai in 17783
* Update type hints modeling_yoso.py by F02934 in 17827
* add doctests for DETR by qherreros in 17786
* Fix push CI artifact path by ydshieh in 17788
* Offload fixes by sgugger in 17810
* CLI: use hub's `create_commit` by gante in 17755
* initial commit by ArthurZucker in 17818
* Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict` by eranhirs in 17805
* Fix `top_k_top_p_filtering` having unexpected behavior by unifyh in 17744
* Remove duplicate code by lkm2835 in 17708
* CLI: convert sharded PT models by gante in 17959
* Improve error message Union not allowed by BramVanroy in 17769
* Add final_layer_norm to OPT model by thomasw21 in 17785
* Properly check for a TPU device by muellerzr in 17802
* Fix test for BF16 detection by sgugger in 17803
* Use 5e-5 For BigBird PT/Flax equivalence tests by ydshieh in 17780
* Prepare transformers for v0.8.0 huggingface-hub release by LysandreJik in 17716
* Fix forward reference imports in DeBERTa configs by sgugger in 17800
* Fix Automatic Download of Pretrained Weights in DETR by AnugunjNaman in 17712
* [ViTMAE] Fix docstrings and variable names by NielsRogge in 17710
* Add link to notebook by NielsRogge in 17791
* [CodeParrot] Near-deduplication with jaccard similarity by liyongsea in 17054
* Update modeling_longt5.py by bjascob in 17777
* Not use -1e4 as attn mask by ydshieh in 17306
* Fix cache for GPT-Neo-X by sgugger in 17764
* deprecate is_torch_bf16_available by stas00 in 17738
* Attempt to change Push CI to workflow_run by ydshieh in 17753
* Save huggingface checkpoint as artifact in mlflow callback by swethmandava in 17686
* Migrate HFDeepSpeedConfig from trfrs to accelerate by pacman100 in 17623
* feat: add num_workers arg to DataLoader by greg2451 in 17751
* Enable PyTorch nightly build CI by ydshieh in 17335

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* donelianc
* Add Spanish translation of custom_models.mdx (17807)
* Add type annotations for RoFormer models (17878)
* Xpiri
* Add Italian translation of sharing_custom_models.mdx (17631)
* Add Italian translation of converting_tensorflow_models.mdx (18283)
* F02934
* Add Italian translation of create_model.mdx and serialization.mdx (17640)
* Update modeling_cvt.py (17846)
* Update type hints modeling_yoso.py (17827)
* sayakpaul
* [SegFormer] TensorFlow port (17910)
* mfumanelli
* Italian/accelerate (17698)
* Italian/model sharing (17828)
* nickprock
* Translation/debugging (18230)
* Translation/training: italian translation training.mdx (17662)
* Translation italian: multilingual.mdx (17768)
* Added preprocessing.mdx italian translation (17600)
* sijunhe
* speed up test (18106)
* Nezha Pytorch implementation (17776)
* StevenTang1998
* Add MVP model (17787)
* ariG23498
* TF implementation of RegNets (17554)
* xvjiarui
* Adding GroupViT Models (17313)
* rooa
* Add CodeGen model (17443)

4.20.1

Not secure
This patch releases fixes a bug in the OPT models and makes Transformers compatible with `huggingface_hub` version 0.8.1.

* Add final_layer_norm to OPT model 17785
* Prepare transformers for v0.8.0 huggingface-hub release 17716

4.20.0

Not secure
Big model inference

You can now use the big model inference of Accelerate directly in any call to `from_pretrained` by specifying `device_map="auto"` (or your own `device_map`). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

py
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)


* Use Accelerate in `from_pretrained` for big model inference by sgugger in 17341

BLOOM

The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

* BLOOM by younesbelkada in 17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

* Add CvT by NielsRogge and AnugunjNaman in 17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

* Adding GPT-NeoX-20B by zphang in 16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

* Add LayoutLMv3 by NielsRogge in 17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

* Adding LeViT Model by Facebook by AnugunjNaman in 17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

* Add `LongT5` model by stancld in 16792

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

* M-CTC-T Model by cwkeam in 16402

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

* Add trajectory transformer by CarlCochet in 17141

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

* [Wav2Vec2Conformer] Official release by patrickvonplaten in 17709
* Add Wav2Vec2Conformer by patrickvonplaten in 16812

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
* Add TFData2VecVision for semantic segmentation by sayakpaul in 17271
* Opt in flax and tf by ArthurZucker in 17388
* Add Tensorflow Swin model by amyeroberts in 16988

Flax implementations

OPT is now available in Flax.
* Opt in flax and tf by ArthurZucker in 17388

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

* Translation/italian: added pipeline_tutorial.mdx [Issue: 17459] by nickprock in 17507
* Add installation.mdx Italian translation by mfumanelli in 17530
* Setup for Italian translation and add quicktour.mdx translation by mfumanelli in 17472
* Adding the Portuguese version of the tasks/token_classification.mdx documentation by jonatasgrosman in 17492
* Adding the Portuguese version of the tasks/sequence_classification.mdx documentation by jonatasgrosman in 17352
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial by Fellip15 in 17076
* Added translation of installation.mdx to Portuguese Issue 16824 by rzimmerdev in 16979

Improvements and bugfixes

* Sort the model doc Toc Alphabetically by sgugger in 17723
* normalize keys_to_ignore by stas00 in 17722
* CLI: Add flag to push TF weights directly into main by gante in 17720
* Update requirements.txt by jeffra in 17719
* Revert "Change push CI to run on workflow_run event by ydshieh in 17692)"
* Documentation: RemBERT fixes by stefan-it in 17641
* Change push CI to run on workflow_run event by ydshieh in 17692
* fix tolerance for a bloom slow test by younesbelkada in 17634
* [LongT5] disable model parallel test by patil-suraj in 17702
* FX function refactor by michaelbenayoun in 17625
* Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes by haileyschoelkopf in 17639
* Swin main layer by amyeroberts in 17693
* Include a comment to reflect Amy's contributions by sayakpaul in 17689
* Rag end2end new by shamanez in 17650
* [LongT5] Rename checkpoitns by patrickvonplaten in 17700
* Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference by jianan-gu in 17153
* Fix doc builder Dockerfile by ydshieh in 17435
* Add FP16 Support for SageMaker Model Parallel by haohanchen-yagao in 17386
* enable cpu distribution training using mpirun by sywangyi in 17570
* Add Ray's scope to training arguments by BramVanroy in 17629
* Update modeling_gpt_neox.py by willfrey in 17575
* Fix dtype getter by sgugger in 17668
* explicitly set utf8 for Windows by BramVanroy in 17664
* Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy by sainttttt in 17669
* Add Visual Question Answering (VQA) pipeline by sijunhe in 17286
* Fix typo in adding_a_new_model README by ayushtues in 17679
* Avoid GPU OOM for a TF Rag test by ydshieh in 17638
* fix typo from emtpy to empty by domenicrosati in 17643
* [Generation Test] Make fast test actually fast by patrickvonplaten in 17661
* [Data2Vec] Speed up test by patrickvonplaten in 17660
* [BigBirdFlaxTests] Make tests slow by patrickvonplaten in 17658
* update README.md by loubnabnl in 17657
* 🐛 Properly raise `RepoNotFoundError` when not authenticated by SBrandeis in 17651
* Fixes 17128 . by mygithubid1 in 17356
* Fix dtype getters by sgugger in 17656
* Add skip logic for attentions test - Levit by amyeroberts in 17633
* Enable crop_center method to handle (W, H, C) images by alaradirik in 17626
* Move Clip image utils to image_utils.py by alaradirik in 17628
* Skip tests until bug is fixed. by sgugger in 17646
* Translation/autoclass by mfumanelli in 17615
* didn't exist in pt-1.9 by stas00 in 17644
* convert assertion to raised exception in debertav2 by sam-h-bean in 17619
* Pre-build DeepSpeed by ydshieh in 17607
* [modeling_utils] torch_dtype/auto floating dtype fixes by stas00 in 17614
* Running a pipeline of `float16`. by Narsil in 17637
* fix use_amp rename after pr 17138 by stas00 in 17636
* Fix very long job failure text in Slack report by ydshieh in 17630
* Adding `top_k` argument to `text-classification` pipeline. by Narsil in 17606
* Mention in the doc we drop support for fairscale by sgugger in 17610
* Use shape_list to safely get shapes for Swin by amyeroberts in 17591
* Add ONNX support for ConvNeXT by regisss in 17627
* Add ONNX support for ResNet by regisss in 17585
* has_attentions - consistent test skipping logic and tf tests by amyeroberts in 17495
* CLI: Print all different tensors on exception by gante in 17612
* TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed by gante in 17593
* Fix telemetry URL by sgugger in 17608
* CLI: Properly detect encoder-decoder models by gante in 17605
* Fix link for community notebooks by ngoquanghuy99 in 17602
* Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch by jianan-gu in 17138
* fix `train_new_from_iterator` in the case of byte-level tokenizers by SaulLu in 17549
* Explicit versions in docker files by ydshieh in 17586
* CLI: add stricter automatic checks to `pt-to-tf` by gante in 17588
* fix by ydshieh in 17589
* quicktour.mdx en -> pt translation by vitorfrois in 17074
* Fx support for Deberta-v[1-2], Hubert and LXMERT by michaelbenayoun in 17539
* Add examples telemetry by sgugger in 17552
* Fix gendered sentence in Spanish translation by omarespejel in 17558
* Fix circular import in onnx.utils by sgugger in 17577
* Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI by ydshieh in 17417
* Remove circular imports in layoutlm/__init__.py by regisss in 17576
* Add magic method to our TF models to convert datasets with column inference by Rocketknight1 in 17160
* [deepspeed / testing] reset global state by stas00 in 17553
* Remove RuntimeErrors for NaN-checking in 20B by zphang in 17563
* fix integration test levit by AnugunjNaman in 17555
* [deepspeed] fix load_best_model test by stas00 in 17550
* Update index.mdx by BritneyMuller in 17547
* Clean imports to fix test_fetcher by sgugger in 17531
* Update run_glue_no_trainer.py by bofenghuang in 17546
* Fix all offload and MP tests by sgugger in 17533
* Fix bug - layer names and activation from previous refactor by amyeroberts in 17524
* Add support for Perceiver ONNX export by deutschmn in 17213
* Allow from transformers import TypicalLogitsWarper by teticio in 17477
* Add Gated-SiLU to T5 by DanielHesslow in 17420
* Update URL for Hub PR docs by lewtun in 17532
* fix OPT-Flax CI tests by ArthurZucker in 17512
* [trainer/deepspeed] load_best_model (reimplement re-init) by stas00 in 17151
* Implemented loss for training AudioFrameClassification by MorenoLaQuatra in 17513
* Update configuration_auto.py by kamalkraj in 17527
* Check list of models in the main README and sort it by sgugger in 17517
* Fix when Accelerate is not installed by sgugger in 17518
* Clean README in post release job as well. by sgugger in 17519
* Fix CI tests hang forever by ydshieh in 17471
* Print more library versions in CI by ydshieh in 17384
* Split push CI into 2 workflows by ydshieh in 17369
* Fix Tapas tests by ydshieh in 17510
* CLI: tool to convert PT into TF weights and open hub PR by gante in 17497
* Fix flakey no-trainer test by muellerzr in 17515
* Deal with the error when task is regression by fireindark707 in 16330
* Fix CTRL tests by ydshieh in 17508
* Fix LayoutXLMProcessorTest by ydshieh in 17506
* Debug LukeForMaskedLM by Ryou0634 in 17499
* Fix MP and CPU offload tests for Funnel and GPT-Neo by sgugger in 17503
* Exclude Databricks from notebook env by sgugger in 17496
* Fix `tokenizer` type annotation in `pipeline(...)` by willfrey in 17500
* Refactor classes to inherit from nn.Module instead of nn.Sequential by amyeroberts in 17493
* Fix wav2vec2 export onnx model with attention_mask error by nilboy in 16004
* Add warning when using older version of torch for ViltFeatureExtractor by xhluca in 16756
* Fix typo of variable names for key and query projection layer by Kyeongpil in 17155
* Fixed wrong error message for missing weight file by 123jimin in 17216
* Add OnnxConfig for SqueezeBert iss17314 by Ruihua-Fang in 17315
* [GPT2Tokenizer] Fix GPT2 with bos token by patrickvonplaten in 17498
* [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) by patrickvonplaten in 17457
* Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()` by Witiko in 17119
* Add HF.co for PRs / Issues regarding specific model checkpoints by patrickvonplaten in 17485
* Fix checkpoint name by ydshieh in 17484
* Docker image build in parallel by ydshieh in 17434
* Added XLM onnx config by nandwalritik in 17030
* Disk offload fix by sgugger in 17428
* TF: GPT-2 generation supports left-padding by gante in 17426
* Fix ViTMAEModelTester by ydshieh in 17470
* [Generate] Fix output scores greedy search by patrickvonplaten in 17442
* Fix nits by omarespejel in 17349
* Fx support for multiple model architectures by michaelbenayoun in 17393
* typo IBERT in __repr__ quant_mode by scratchmex in 17398
* Fix typo (remove parenthesis) by mikcnt in 17415
* Improve notrainer examples by pacman100 in 17449
* [OPT] Fix bos token id default by patrickvonplaten in 17441
* Fix model parallelism test by sgugger in 17439
* Pin protobouf that breaks TensorBoard in PyTorch by sgugger in 17440
* Spanish translation of the file preprocessing.mdx by yharyarias in 16299
* Spanish translation of the files sagemaker.mdx and image_classification.mdx by SimplyJuanjo in 17262
* Added es version of bertology.mdx doc by jQuinRivero in 17255
* Wav2vec2 finetuning shared file system by patrickvonplaten in 17423
* fix link in performance docs by lvwerra in 17419
* Add link to Hub PR docs in model cards by lewtun in 17421
* Upd AutoTokenizer.from_pretrained doc examples by c00k1ez in 17416
* Support compilation via Torchdynamo, AOT Autograd, NVFuser by anijain2305 in 17308
* Add test for new model parallelism features by sgugger in 17401
* Make check_init script more robust and clean inits by sgugger in 17408
* Fix README localizer script by sgugger in 17407
* Fix expected value for OPT test `test_inference_no_head` by ydshieh in 17395
* Clean up CLIP tests by NielsRogge in 17380
* Enabling `imageGPT` auto feature extractor. by Narsil in 16871
* Add support for `device_map="auto"` to OPT by sgugger in 17382
* OPTForCausalLM lm_head input size should be config.word_embed_proj_dim by vfbd in 17225
* Traced models serialization and torchscripting fix by michaelbenayoun in 17206
* Fix Comet ML integration by mxschmdt in 17381
* Fix cvt docstrings by AnugunjNaman in 17367
* Correct & Improve Doctests for LayoutLMv2 by gnolai in 17168
* Fix CodeParrot training script by loubnabnl in 17291
* Fix a typo relative_postion_if_large -> relative_position_if_large by stancld in 17366
* Pin dill to fix examples by sgugger in 17368
* [Test OPT] Add batch generation test opt by patrickvonplaten in 17359
* Fix bug in Wav2Vec2 pretrain example by ddobokki in 17326
* fix for 17292 by nadahlberg in 17293
* [Generation] Fix Transition probs by patrickvonplaten in 17311
* [OPT] Run test in lower precision on GPU by patrickvonplaten in 17353
* Adding `batch_size` test to QA pipeline. by Narsil in 17330
* [BC] Fixing usage of text pairs by Narsil in 17324
* [tests] fix copy-n-paste error by stas00 in 17312
* Fix ci_url might be None by ydshieh in 17332
* fix by ydshieh in 17337
* Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts by muellerzr in 17331
* docs for typical decoding by jadermcs in 17186
* Not send successful report by ydshieh in 17329
* Fix test_t5_decoder_model_past_large_inputs by ydshieh in 17320
* Add onnx export cuda support by JingyaHuang in 17183
* Add Information Gain Filtration algorithm by mraunak in 16953
* Fix typo by kamalkraj in 17328
* remove by ydshieh in 17325
* Accepting real pytorch device as arguments. by Narsil in 17318
* Updating the docs for `max_seq_len` in QA pipeline by Narsil in 17316
* [T5] Fix init in TF and Flax for pretraining by patrickvonplaten in 17294
* Add type hints for ProphetNet (Pytorch) by jQuinRivero in 17223
* fix by patrickvonplaten in 17310
* [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing by caesar-one in 17112
* Add support for pretraining recurring span selection to Splinter by jvcop in 17247
* Add PR author in CI report + merged by info by ydshieh in 17298
* Fix dummy creation script by sgugger in 17304
* Doctest longformer by KMFODA in 16441
* [Test] Fix W2V-Conformer integration test by patrickvonplaten in 17303
* Improve mismatched sizes management when loading a pretrained model by regisss in 17257
* correct opt by patrickvonplaten in 17301
* Rewrite TensorFlow train_step and test_step by Rocketknight1 in 17057
* Fix tests of mixed precision now that experimental is deprecated by Rocketknight1 in 17300
* fix retribert's `test_torch_encode_plus_sent_to_model` by SaulLu in 17231
* [ConvNeXT] Fix drop_path_rate by NielsRogge in 17280
* Fix wrong PT/TF categories in CI report by ydshieh in 17272
* Fix missing job action button in CI report by ydshieh in 17270
* Fix test_model_parallelization by lkm2835 in 17249
* [Tests] Fix slow opt tests by patrickvonplaten in 17282
* docs(transformers): fix typo by k-zehnder in 17263
* logging documentation update by sanderland in 17174
* Use the PR URL in CI report by ydshieh in 17269
* Fix FlavaForPreTrainingIntegrationTest CI test by ydshieh in 17232
* Better error in the Auto API when a dep is missing by sgugger in 17289
* Make TrainerHyperParameterSigOptIntegrationTest slow test by ydshieh in 17288
* Automatically sort auto mappings by sgugger in 17250
* Mlflowcallback fix nonetype error by orieg in 17171
* Align logits and labels in OPT by MichelBartels in 17237
* Remove next sentence prediction from supported ONNX tasks by lewtun in 17276
* CodeParrot data pretokenization by loubnabnl in 16932
* Update codeparrot data preprocessing by loubnabnl in 16944
* Updated checkpoint support for Sagemaker Model Parallel by cavdard in 17219
* fixed bug in run_mlm_flax_stream.py by KennethEnevoldsen in 17203
* [doc] performance/scalability revamp by stas00 in 15723
* TF - Fix convnext classification example by gante in 17261
* Fix obvious typos in flax decoder impl by cloudhan in 17279
* Guide to create custom models in Spanish by ignacioct in 17158
* Translated version of model_sharing.mdx doc to spanish by Gerard-170 in 16184
* Add PR title to push CI report by ydshieh in 17246
* Fix push CI channel by ydshieh in 17242
* install dev. version of accelerate by ydshieh in 17243
* Fix Trainer for Datasets that don't have dict items by sgugger in 17239
* Handle copyright in add-new-model-like by sgugger in 17218
* fix --gpus option for docker by ydshieh in 17235
* Update self-push workflow by ydshieh in 17177
* OPT - fix docstring and improve tests slighly by patrickvonplaten in 17228
* OPT-fix by younesbelkada in 17229
* Fix typo in bug report template by fxmarty in 17178
* Black preview by sgugger in 17217
* update BART docs by patil-suraj in 17212
* Add test to ensure models can take int64 inputs by Rocketknight1 in 17210

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sayakpaul
* Include a comment to reflect Amy's contributions (17689)
* Add TFData2VecVision for semantic segmentation (17271)
* jianan-gu
* Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (17153)
* Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (17138)
* stancld
* Add `LongT5` model (16792)
* Fix a typo relative_postion_if_large -> relative_position_if_large (17366)
* mfumanelli
* Translation/autoclass (17615)
* Add installation.mdx Italian translation (17530)
* Setup for Italian translation and add quicktour.mdx translation (17472)
* cwkeam
* M-CTC-T Model (16402)
* zphang
* Remove RuntimeErrors for NaN-checking in 20B (17563)
* Adding GPT-NeoX-20B (16659)
* AnugunjNaman
* fix integration test levit (17555)
* Adding LeViT Model by Facebook (17466)
* Fix cvt docstrings (17367)
* yharyarias
* Spanish translation of the file preprocessing.mdx (16299)
* mraunak
* Add Information Gain Filtration algorithm (16953)
* rzimmerdev
* Added translation of installation.mdx to Portuguese Issue 16824 (16979)

4.19.4

Not secure
Fixes the errors message when trying to access a repo that does not exist (started to break due to changes in Hub API).

[🐛]Properly raise RepoNotFoundError when not authenticated 17651[

Page 13 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.