Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 17 of 30

4.12.0

Not secure
TrOCR and VisionEncoderDecoderModel

One new model is released as part of the TrOCR implementation: `TrOCRForCausalLM`, in PyTorch. It comes along a new `VisionEncoderDecoderModel` class, which allows to mix-and-match any vision Transformer encoder with any text Transformer as decoder, similar to the existing `SpeechEncoderDecoderModel` class.

The TrOCR model was proposed in [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282), by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.

The TrOCR model consists of an image transformer encoder and an autoregressive text transformer to perform optical character recognition in an end-to-end manner.

* Add TrOCR + VisionEncoderDecoderModel by NielsRogge in https://github.com/huggingface/transformers/pull/13874

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=trocr

SEW & SEW-D

SEW and SEW-D (Squeezed and Efficient Wav2Vec) were proposed in [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

SEW and SEW-D models use a Wav2Vec-style feature encoder and introduce temporal downsampling to reduce the length of the transformer encoder. SEW-D additionally replaces the transformer encoder with a DeBERTa one. Both models achieve significant inference speedups without sacrificing the speech recognition quality.

* Add the SEW and SEW-D speech models by anton-l in https://github.com/huggingface/transformers/pull/13962
* Add SEW CTC models by anton-l in https://github.com/huggingface/transformers/pull/14158

Compatible checkpoints are available on the Hub: https://huggingface.co/models?other=sew and https://huggingface.co/models?other=sew-d

DistilHuBERT

DistilHuBERT was proposed in [DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT](https://arxiv.org/abs/2110.01900), by Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee.

DistilHuBERT is a distilled version of the HuBERT model. Using only two transformer layers, the model scores competitively on the SUPERB benchmark tasks.

Compatible checkpoint is available on the Hub: https://huggingface.co/ntu-spml/distilhubert

TensorFlow improvements

Several bug fixes and UX improvements for TensorFlow

Keras callback

Introduction of a Keras callback to push to the hub each epoch, or after a given number of steps:

* Keras callback to push to hub each epoch, or after N steps by Rocketknight1 in https://github.com/huggingface/transformers/pull/13773

Updates on the encoder-decoder framework

The encoder-decoder framework is now available in TensorFlow, allowing mixing and matching different encoders and decoders together into a single encoder-decoder architecture!

* Add TFEncoderDecoderModel + Add cross-attention to some TF models by ydshieh in https://github.com/huggingface/transformers/pull/13222

Besides this, the `EncoderDecoderModel` classes have been updated to work similar to models like BART and T5. From now on, users don't need to pass `decoder_input_ids` themselves anymore to the model. Instead, they will be created automatically based on the `labels` (namely by shifting them one position to the right, replacing -100 by the `pad_token_id` and prepending the `decoder_start_token_id`). Note that this may result in training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0 that set the `decoder_input_ids` = `labels`.

* Fix EncoderDecoderModel classes to be more like BART and T5 by NielsRogge in https://github.com/huggingface/transformers/pull/14139

Speech improvements

* Add DistilHuBERT by anton-l in https://github.com/huggingface/transformers/pull/14174
* [Speech Examples] Add pytorch speech pretraining by patrickvonplaten in https://github.com/huggingface/transformers/pull/13877
* [Speech Examples] Add new audio feature by patrickvonplaten in https://github.com/huggingface/transformers/pull/14027
* Add ASR colabs by patrickvonplaten in https://github.com/huggingface/transformers/pull/14067
* [ASR] Make speech recognition example more general to load any tokenizer by patrickvonplaten in https://github.com/huggingface/transformers/pull/14079
* [Examples] Add an official audio classification example by anton-l in https://github.com/huggingface/transformers/pull/13722
* [Examples] Use Audio feature in speech classification by anton-l in https://github.com/huggingface/transformers/pull/14052

Auto-model API

To make it easier to extend the Transformers library, every Auto class a new `register` method, that allows you to register your own custom models, configurations or tokenizers. See more in the [documentation](https://huggingface.co/transformers/model_doc/auto.html#extending-the-auto-classes)

* Add an API to register objects to Auto classes by sgugger in https://github.com/huggingface/transformers/pull/13989

Bug fixes and improvements

* Fix filtering in test fetcher utils by sgugger in https://github.com/huggingface/transformers/pull/13766
* Fix warning for gradient_checkpointing by sgugger in https://github.com/huggingface/transformers/pull/13767
* Implement len in IterableDatasetShard by sgugger in https://github.com/huggingface/transformers/pull/13780
* [Wav2Vec2] Better error message by patrickvonplaten in https://github.com/huggingface/transformers/pull/13777
* Fix LayoutLM ONNX test error by nishprabhu in https://github.com/huggingface/transformers/pull/13710
* Enable readme link synchronization by qqaatw in https://github.com/huggingface/transformers/pull/13785
* Fix length of IterableDatasetShard and add test by sgugger in https://github.com/huggingface/transformers/pull/13792
* [docs/gpt-j] addd instructions for how minimize CPU RAM usage by patil-suraj in https://github.com/huggingface/transformers/pull/13795
* [examples `run_glue.py`] missing requirements `scipy`, `sklearn` by stas00 in https://github.com/huggingface/transformers/pull/13768
* [examples/flax] use Repository API for push_to_hub by patil-suraj in https://github.com/huggingface/transformers/pull/13672
* Fix gather for TPU by sgugger in https://github.com/huggingface/transformers/pull/13813
* [testing] auto-replay captured streams by stas00 in https://github.com/huggingface/transformers/pull/13803
* Add MultiBERTs conversion script by gchhablani in https://github.com/huggingface/transformers/pull/13077
* [Examples] Improve mapping in accelerate examples by patrickvonplaten in https://github.com/huggingface/transformers/pull/13810
* [DPR] Correct init by patrickvonplaten in https://github.com/huggingface/transformers/pull/13796
* skip gptj slow generate tests by patil-suraj in https://github.com/huggingface/transformers/pull/13809
* Fix warning situation: UserWarning: max_length is ignored when padding=True" by shirayu in https://github.com/huggingface/transformers/pull/13829
* Updating CITATION.cff to fix GitHub citation prompt BibTeX output. by arfon in https://github.com/huggingface/transformers/pull/13833
* Add TF notebooks by Rocketknight1 in https://github.com/huggingface/transformers/pull/13793
* Bart: check if decoder_inputs_embeds is set by silviu-oprea in https://github.com/huggingface/transformers/pull/13800
* include megatron_gpt2 in installed modules by stas00 in https://github.com/huggingface/transformers/pull/13834
* Delete MultiBERTs conversion script by gchhablani in https://github.com/huggingface/transformers/pull/13852
* Remove a duplicated bullet point in the GPT-J doc by yaserabdelaziz in https://github.com/huggingface/transformers/pull/13851
* Add Mistral GPT-2 Stability Tweaks by siddk in https://github.com/huggingface/transformers/pull/13573
* Fix broken link to distill models in docs by Randl in https://github.com/huggingface/transformers/pull/13848
* :sparkles: update image classification example by nateraw in https://github.com/huggingface/transformers/pull/13824
* Update no_* argument (HfArgumentParser) by BramVanroy in https://github.com/huggingface/transformers/pull/13865
* Update Tatoeba conversion by Traubert in https://github.com/huggingface/transformers/pull/13757
* Fixing 1-length special tokens cut. by Narsil in https://github.com/huggingface/transformers/pull/13862
* Fix flax summarization example: save checkpoint after each epoch and push checkpoint to the hub by ydshieh in https://github.com/huggingface/transformers/pull/13872
* Fixing empty prompts for text-generation when BOS exists. by Narsil in https://github.com/huggingface/transformers/pull/13859
* Improve error message when loading models from Hub by aphedges in https://github.com/huggingface/transformers/pull/13836
* Initial support for symbolic tracing with torch.fx allowing dynamic axes by michaelbenayoun in https://github.com/huggingface/transformers/pull/13579
* Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler by ZhaofengWu in https://github.com/huggingface/transformers/pull/13820
* Fixing question-answering with long contexts by Narsil in https://github.com/huggingface/transformers/pull/13873
* fix(integrations): consider test metrics by borisdayma in https://github.com/huggingface/transformers/pull/13888
* fix: replace asserts by value error by m5l14i11 in https://github.com/huggingface/transformers/pull/13894
* Update parallelism.md by hyunwoongko in https://github.com/huggingface/transformers/pull/13892
* Autodocument the list of ONNX-supported models by sgugger in https://github.com/huggingface/transformers/pull/13884
* Fixing GPU for token-classification in a better way. by Narsil in https://github.com/huggingface/transformers/pull/13856
* Update FSNER code in examples->research_projects->fsner by sayef in https://github.com/huggingface/transformers/pull/13864
* Replace assert statements with exceptions by ddrm86 in https://github.com/huggingface/transformers/pull/13871
* Fixing Backward compatiblity for zero-shot by Narsil in https://github.com/huggingface/transformers/pull/13855
* Update run_qa.py - CorrectTypo by akulagrawal in https://github.com/huggingface/transformers/pull/13857
* T5ForConditionalGeneration: enabling using past_key_values and labels in training by yssjtu in https://github.com/huggingface/transformers/pull/13805
* Fix trainer logging_nan_inf_filter in torch_xla mode by ymwangg in https://github.com/huggingface/transformers/pull/13896
* Fix hp search for non sigopt backends by sgugger in https://github.com/huggingface/transformers/pull/13897
* [Trainer] Fix nan-loss condition by anton-l in https://github.com/huggingface/transformers/pull/13911
* Raise exceptions instead of asserts in utils/download_glue_data by hirotasoshu in https://github.com/huggingface/transformers/pull/13907
* Add an example of exporting BartModel + BeamSearch to ONNX module. by fatcat-z in https://github.com/huggingface/transformers/pull/13765
* 12789 Replace assert statements with exceptions by djroxx2000 in https://github.com/huggingface/transformers/pull/13909
* Add missing whitespace to multiline strings by aphedges in https://github.com/huggingface/transformers/pull/13916
* [Wav2Vec2] Fix mask_feature_prob by patrickvonplaten in https://github.com/huggingface/transformers/pull/13921
* Fixes a minor doc issue (missing character) by mishig25 in https://github.com/huggingface/transformers/pull/13922
* Fix LED by Rocketknight1 in https://github.com/huggingface/transformers/pull/13882
* Add BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese by datquocnguyen in https://github.com/huggingface/transformers/pull/13788
* [trainer] memory metrics: add memory at the start report by stas00 in https://github.com/huggingface/transformers/pull/13915
* Image Segmentation pipeline by mishig25 in https://github.com/huggingface/transformers/pull/13828
* Adding support for tokens being suffixes or part of each other. by Narsil in https://github.com/huggingface/transformers/pull/13918
* Adds `PreTrainedModel.framework` attribute by StellaAthena in https://github.com/huggingface/transformers/pull/13817
* Fixed typo: herBERT -> HerBERT by adamjankaczmarek in https://github.com/huggingface/transformers/pull/13936
* [Generation] Fix max_new_tokens by patrickvonplaten in https://github.com/huggingface/transformers/pull/13919
* Fix typo in README.md by fullyz in https://github.com/huggingface/transformers/pull/13883
* Update bug-report.md by LysandreJik in https://github.com/huggingface/transformers/pull/13934
* fix issue 13904 -attribute does not exist- by oraby8 in https://github.com/huggingface/transformers/pull/13942
* Raise ValueError instead of asserts in src/transformers/benchmark/benchmark.py by AkechiShiro in https://github.com/huggingface/transformers/pull/13951
* Honor existing attention mask in tokenzier.pad by sgugger in https://github.com/huggingface/transformers/pull/13926
* [Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer when gradient checkpointing is enabled by patrickvonplaten in https://github.com/huggingface/transformers/pull/13961
* Change DataCollatorForSeq2Seq to pad labels to a multiple of `pad_to_multiple_of` by affjljoo3581 in https://github.com/huggingface/transformers/pull/13949
* Replace assert with unittest assertions by LuisFerTR in https://github.com/huggingface/transformers/pull/13957
* Raise exceptions instead of asserts in src/transformers/data/processors/xnli.py by midhun1998 in https://github.com/huggingface/transformers/pull/13945
* Make username optional in hub_model_id by sgugger in https://github.com/huggingface/transformers/pull/13940
* Raise exceptions instead of asserts in src/transformers/data/processors/utils.py by killazz67 in https://github.com/huggingface/transformers/pull/13938
* Replace assert by ValueError of src/transformers/models/electra/modeling_{electra,tf_electra}.py and all other models that had copies by AkechiShiro in https://github.com/huggingface/transformers/pull/13955
* Fix missing tpu variable in benchmark_args_tf.py by hardianlawi in https://github.com/huggingface/transformers/pull/13968
* Specify im-seg mask greyscole mode by mishig25 in https://github.com/huggingface/transformers/pull/13974
* [Wav2Vec2] Make sure tensors are always bool for mask_indices by patrickvonplaten in https://github.com/huggingface/transformers/pull/13977
* Fixing the lecture values by making sure defaults are not changed by Narsil in https://github.com/huggingface/transformers/pull/13976
* [parallel doc] dealing with layers larger than one gpu by stas00 in https://github.com/huggingface/transformers/pull/13980
* Remove wrong model_args supplied by qqaatw in https://github.com/huggingface/transformers/pull/13937
* Allow single byte decoding by patrickvonplaten in https://github.com/huggingface/transformers/pull/13988
* Replace assertion with ValueError exception by ddrm86 in https://github.com/huggingface/transformers/pull/14006
* Add strong test for configuration attributes by sgugger in https://github.com/huggingface/transformers/pull/14000
* Fix FNet tokenizer tests by LysandreJik in https://github.com/huggingface/transformers/pull/13995
* [Testing] Move speech datasets to `hf-internal` testing ... by patrickvonplaten in https://github.com/huggingface/transformers/pull/14008
* Raise exceptions instead of asserts in src/transformers/models/bart/modeling_flax_[bart, marian, mbart, pegasus].py by killazz67 in https://github.com/huggingface/transformers/pull/13939
* Scatter dummies + skip pipeline tests by LysandreJik in https://github.com/huggingface/transformers/pull/13996
* Fixed horizon_length for PPLM by jacksukk in https://github.com/huggingface/transformers/pull/13886
* Fix: replace assert statements with exceptions in file src/transformers/models/lxmert/modeling_lxmert.py by murilo-goncalves in https://github.com/huggingface/transformers/pull/14029
* [Docs] More general docstrings by patrickvonplaten in https://github.com/huggingface/transformers/pull/14028
* [CLIP] minor fixes by patil-suraj in https://github.com/huggingface/transformers/pull/14026
* Don't duplicate the elements in dir by sgugger in https://github.com/huggingface/transformers/pull/14023
* Replace assertions with ValueError exceptions by ddrm86 in https://github.com/huggingface/transformers/pull/14018
* Fixes typo in `modeling_speech_to_text` by mishig25 in https://github.com/huggingface/transformers/pull/14044
* [Speech] Move all examples to new audio feature by patrickvonplaten in https://github.com/huggingface/transformers/pull/14045
* Update SEW integration test tolerance by anton-l in https://github.com/huggingface/transformers/pull/14048
* [Flax] Clip fix test by patrickvonplaten in https://github.com/huggingface/transformers/pull/14046
* Fix save when laod_best_model_at_end=True by sgugger in https://github.com/huggingface/transformers/pull/14054
* [Speech] Refactor Examples by patrickvonplaten in https://github.com/huggingface/transformers/pull/14040
* fix typo by yyy-Apple in https://github.com/huggingface/transformers/pull/14049
* Fix typo by ihoromi4 in https://github.com/huggingface/transformers/pull/14056
* [FX] Fix passing None as concrete args when tracing by thomasw21 in https://github.com/huggingface/transformers/pull/14022
* TF Model train and eval step metrics for seq2seq models. by pedro-r-marques in https://github.com/huggingface/transformers/pull/14009
* update to_py_obj to support np.number by PrettyMeng in https://github.com/huggingface/transformers/pull/14064
* Trainer._load_rng_state() path fix (14069) by tlby in https://github.com/huggingface/transformers/pull/14071
* replace assert with exception in src/transformers/utils/model_pararallel_utils.py by skpig in https://github.com/huggingface/transformers/pull/14072
* Add missing autocast() in Trainer.prediction_step() by juice500ml in https://github.com/huggingface/transformers/pull/14075
* Fix assert in src/transformers/data/datasets/language_modeling.py by skpig in https://github.com/huggingface/transformers/pull/14077
* Fix label attribution in token classification examples by sgugger in https://github.com/huggingface/transformers/pull/14055
* Context managers by lvwerra in https://github.com/huggingface/transformers/pull/13900
* Fix broken link in the translation section of task summaries by h4iku in https://github.com/huggingface/transformers/pull/14087
* [ASR] Small fix model card creation by patrickvonplaten in https://github.com/huggingface/transformers/pull/14093
* Change asserts in src/transformers/models/xlnet/ to raise ValueError by WestonKing-Leatham in https://github.com/huggingface/transformers/pull/14088
* Replace assertions with ValueError exceptions by ddrm86 in https://github.com/huggingface/transformers/pull/14061
* [Typo] Replace "Masked" with "Causal" in TF CLM script by cakiki in https://github.com/huggingface/transformers/pull/14014
* [Examples] Add audio classification notebooks by anton-l in https://github.com/huggingface/transformers/pull/14099
* Fix ignore_mismatched_sizes by qqaatw in https://github.com/huggingface/transformers/pull/14085
* Fix typo in comment by stalkermustang in https://github.com/huggingface/transformers/pull/14102
* Replace assertion with ValueError exception by ddrm86 in https://github.com/huggingface/transformers/pull/14098
* fix typo in license docstring by 21jun in https://github.com/huggingface/transformers/pull/14094
* Fix a typo in preprocessing docs by h4iku in https://github.com/huggingface/transformers/pull/14108
* Replace assertions with ValueError exceptions by iDeepverma in https://github.com/huggingface/transformers/pull/14091
* [tests] fix hubert test sort by patrickvonplaten in https://github.com/huggingface/transformers/pull/14116
* Replace assert statements with exceptions (13871) by ddrm86 in https://github.com/huggingface/transformers/pull/13901
* Translate README.md to Korean by yeounyi in https://github.com/huggingface/transformers/pull/14015
* Replace assertions with valueError Exeptions by jyshdewangan in https://github.com/huggingface/transformers/pull/14117
* Fix assertion in models by skpig in https://github.com/huggingface/transformers/pull/14090
* [wav2vec2] Add missing --validation_split_percentage data arg by falcaopetri in https://github.com/huggingface/transformers/pull/14119
* Rename variables with unclear naming by qqaatw in https://github.com/huggingface/transformers/pull/14122
* Update TP parallel GEMM image by hyunwoongko in https://github.com/huggingface/transformers/pull/14112
* Fix some typos in the docs by h4iku in https://github.com/huggingface/transformers/pull/14126
* Supporting Seq2Seq model for question answering task by karthikrangasai in https://github.com/huggingface/transformers/pull/13432
* Fix rendering of examples version links by h4iku in https://github.com/huggingface/transformers/pull/14134
* Fix some writing issues in the docs by h4iku in https://github.com/huggingface/transformers/pull/14136
* BartEnocder add set_input_embeddings by Liangtaiwan in https://github.com/huggingface/transformers/pull/13960
* Remove unneeded `to_tensor()` in TF inline example by Rocketknight1 in https://github.com/huggingface/transformers/pull/14140
* Enable DefaultDataCollator class by Rocketknight1 in https://github.com/huggingface/transformers/pull/14141
* Fix lazy init to stop hiding errors in import by sgugger in https://github.com/huggingface/transformers/pull/14124
* Add TF<>PT and Flax<>PT everywhere by patrickvonplaten in https://github.com/huggingface/transformers/pull/14047
* Add Camembert to models exportable with ONNX by ChainYo in https://github.com/huggingface/transformers/pull/14059
* [Speech Recognition CTC] Add auth token to fine-tune private models by patrickvonplaten in https://github.com/huggingface/transformers/pull/14154
* Add vision_encoder_decoder to models/__init__.py by ydshieh in https://github.com/huggingface/transformers/pull/14151
* [Speech Recognition] - Distributed training: Make sure vocab file removal and creation don't interfer by patrickvonplaten in https://github.com/huggingface/transformers/pull/14161
* Include Keras tensor in the allowed types by sergiovalmac in https://github.com/huggingface/transformers/pull/14155
* [megatron_gpt2] dynamic gelu, add tokenizer, save config by stas00 in https://github.com/huggingface/transformers/pull/13928
* Add Unispeech & Unispeech-SAT by patrickvonplaten in https://github.com/huggingface/transformers/pull/13963
* [ONNX] Add symbolic function for XSoftmax op for exporting to ONNX. by fatcat-z in https://github.com/huggingface/transformers/pull/14013
* Typo on ner accelerate example code by monologg in https://github.com/huggingface/transformers/pull/14150
* fix typos in error messages in speech recognition example and modelcard.py by mgoldey in https://github.com/huggingface/transformers/pull/14166
* Replace assertions with ValueError exception by huberemanuel in https://github.com/huggingface/transformers/pull/14142
* switch to inference_mode from no_gard by kamalkraj in https://github.com/huggingface/transformers/pull/13667
* Fix gelu test for torch 1.10 by LysandreJik in https://github.com/huggingface/transformers/pull/14167
* [Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D by patrickvonplaten in https://github.com/huggingface/transformers/pull/14175
* [Pipelines] Fix ASR model types check by anton-l in https://github.com/huggingface/transformers/pull/14178
* Replace assert of data/data_collator.py by ValueError by AkechiShiro in https://github.com/huggingface/transformers/pull/14131
* [TPU tests] Enable first TPU examples pytorch by patrickvonplaten in https://github.com/huggingface/transformers/pull/14121
* [modeling_utils] respect original dtype in _get_resized_lm_head by stas00 in https://github.com/huggingface/transformers/pull/14181

New Contributors
* arfon made their first contribution in https://github.com/huggingface/transformers/pull/13833
* silviu-oprea made their first contribution in https://github.com/huggingface/transformers/pull/13800
* yaserabdelaziz made their first contribution in https://github.com/huggingface/transformers/pull/13851
* Randl made their first contribution in https://github.com/huggingface/transformers/pull/13848
* Traubert made their first contribution in https://github.com/huggingface/transformers/pull/13757
* ZhaofengWu made their first contribution in https://github.com/huggingface/transformers/pull/13820
* m5l14i11 made their first contribution in https://github.com/huggingface/transformers/pull/13894
* hyunwoongko made their first contribution in https://github.com/huggingface/transformers/pull/13892
* ddrm86 made their first contribution in https://github.com/huggingface/transformers/pull/13871
* akulagrawal made their first contribution in https://github.com/huggingface/transformers/pull/13857
* yssjtu made their first contribution in https://github.com/huggingface/transformers/pull/13805
* ymwangg made their first contribution in https://github.com/huggingface/transformers/pull/13896
* hirotasoshu made their first contribution in https://github.com/huggingface/transformers/pull/13907
* fatcat-z made their first contribution in https://github.com/huggingface/transformers/pull/13765
* djroxx2000 made their first contribution in https://github.com/huggingface/transformers/pull/13909
* adamjankaczmarek made their first contribution in https://github.com/huggingface/transformers/pull/13936
* oraby8 made their first contribution in https://github.com/huggingface/transformers/pull/13942
* AkechiShiro made their first contribution in https://github.com/huggingface/transformers/pull/13951
* affjljoo3581 made their first contribution in https://github.com/huggingface/transformers/pull/13949
* LuisFerTR made their first contribution in https://github.com/huggingface/transformers/pull/13957
* midhun1998 made their first contribution in https://github.com/huggingface/transformers/pull/13945
* killazz67 made their first contribution in https://github.com/huggingface/transformers/pull/13938
* hardianlawi made their first contribution in https://github.com/huggingface/transformers/pull/13968
* jacksukk made their first contribution in https://github.com/huggingface/transformers/pull/13886
* murilo-goncalves made their first contribution in https://github.com/huggingface/transformers/pull/14029
* yyy-Apple made their first contribution in https://github.com/huggingface/transformers/pull/14049
* ihoromi4 made their first contribution in https://github.com/huggingface/transformers/pull/14056
* thomasw21 made their first contribution in https://github.com/huggingface/transformers/pull/14022
* pedro-r-marques made their first contribution in https://github.com/huggingface/transformers/pull/14009
* PrettyMeng made their first contribution in https://github.com/huggingface/transformers/pull/14064
* tlby made their first contribution in https://github.com/huggingface/transformers/pull/14071
* skpig made their first contribution in https://github.com/huggingface/transformers/pull/14072
* juice500ml made their first contribution in https://github.com/huggingface/transformers/pull/14075
* h4iku made their first contribution in https://github.com/huggingface/transformers/pull/14087
* WestonKing-Leatham made their first contribution in https://github.com/huggingface/transformers/pull/14088
* cakiki made their first contribution in https://github.com/huggingface/transformers/pull/14014
* stalkermustang made their first contribution in https://github.com/huggingface/transformers/pull/14102
* iDeepverma made their first contribution in https://github.com/huggingface/transformers/pull/14091
* yeounyi made their first contribution in https://github.com/huggingface/transformers/pull/14015
* jyshdewangan made their first contribution in https://github.com/huggingface/transformers/pull/14117
* karthikrangasai made their first contribution in https://github.com/huggingface/transformers/pull/13432
* ChainYo made their first contribution in https://github.com/huggingface/transformers/pull/14059
* sergiovalmac made their first contribution in https://github.com/huggingface/transformers/pull/14155
* huberemanuel made their first contribution in https://github.com/huggingface/transformers/pull/14142

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.11.0...v4.12.0

4.11.3

Not secure
This patch release fixes a few issues encountered since the release of v4.11.2:

- [DPR] Correct init (13796)
- Fix warning situation: UserWarning: max_length is ignored when padding=True" (13829)
- Bart: check if decoder_inputs_embeds is set (13800)
- include megatron_gpt2 in installed modules (13834)
- Fixing 1-length special tokens cut. (13862)
- Fixing empty prompts for text-generation when BOS exists. (13859)
- Fixing question-answering with long contexts (13873)
- Fixing GPU for token-classification in a better way. (13856)
- Fixing Backward compatiblity for zero-shot (13855)
- Fix hp search for non sigopt backends (13897)
- Fix trainer logging_nan_inf_filter in torch_xla mode 13896 (ymwangg)
- [Trainer] Fix nan-loss condition 13911 (anton-l)

4.11.2

Not secure
Fix the `Trainer` API on TPU:
- Fix gather for TPU 13813

4.11.1

Not secure
Patch release with a few bug fixes:
- [Wav2Vec2] Better error message (13777)
- Fix LayoutLM ONNX test error (13710)
- Fix warning for gradient_checkpointing (13767)
- Implement len in IterableDatasetShard (13780)
- Fix length of IterableDatasetShard and add test (13792)

4.11.0

Not secure
GPT-J

Three new models are released as part of the GPT-J implementation: `GPTJModel`, `GPTJForCausalLM`, `GPTJForSequenceClassification`, in PyTorch.

The GPT-J model was released in the [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like causal language model trained on the Pile dataset.

It was contributed by StellaAthena, kurumuz, EricHallahan, and leogao2.

- GPT-J-6B 13022 (StellaAthena)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gptj

SpeechEncoderDecoder & Speech2Text2

One new model is released as part of the Speech2Text2 implementation: `Speech2Text2ForCausalLM`, in PyTorch.

The Speech2Text2 model is used together with Wav2Vec2 for Speech Translation models proposed in [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.

Speech2Text2 is a decoder-only transformer model that can be used with any speech encoder-only, such as Wav2Vec2 or HuBERT for Speech-to-Text tasks. Please refer to the [SpeechEncoderDecoder](https://huggingface.co/transformers/master/model_doc/speechencoderdecoder.html) class on how to combine Speech2Text2 with any speech encoder-only model.

- Add SpeechEncoderDecoder & Speech2Text2 13186 (patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=speech2text2

FNet

Eight new models are released as part of the FNet implementation: `FNetModel`, `FNetForPreTraining`, `FNetForMaskedLM`, `FNetForNextSentencePrediction`, `FNetForSequenceClassification`, `FNetForMultipleChoice`, `FNetForTokenClassification`, `FNetForQuestionAnswering`, in PyTorch.

The FNet model was proposed in [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT model with a fourier transform which returns only the real parts of the transform. The model is significantly faster than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97% accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model.

- Add FNet 13045 (gchhablani)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=fnet

TensorFlow improvements

Several bug fixes and UX improvements for Tensorflow:

- Users should notice much fewer unnecessary warnings and less 'console spam' in general while using Transformers with TensorFlow.
- TensorFlow models should be less picky about the specific integer dtypes (int32/int64) that are passed as input

Changes to compile() and train_step()

- You can now compile our TensorFlow models without passing a loss argument! If you do, the model will compute loss internally during the forward pass and then use this value to fit() on. This makes it much more convenient to get the right loss, particularly since many models have unique losses for certain tasks that are easy to overlook and annoying to reimplement. Remember to pass your labels as the "labels" key of your input dict when doing this, so that they're accessible to the model during the forward pass. There is no change to the behavior if you pass a loss argument, so all old code should remain unaffected by this change.

Associated PRs:

- Modified TF train_step 13678 (Rocketknight1)
- Fix Tensorflow T5 with int64 input 13479 (Rocketknight1)
- MarianMT int dtype fix 13496 (Rocketknight1)
- Removed console spam from misfiring warnings 13625 (Rocketknight1)

Pipelines

Pipeline refactor

The pipelines underwent a large refactor that should make contributing pipelines much simpler, and much less error-prone. As part of this refactor, PyTorch-based pipelines are now optimized for GPU performance based on PyTorch's `Dataset`s and `DataLoader`s.

See below for an example leveraging the `superb` dataset.
py
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
as we're not interested in the `target` part of the dataset.
for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
print(out)
{"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
{"text": ....}
....


- [Large PR] Entire rework of pipelines. 13308 (Narsil)

Audio classification pipeline

Additionally, an additional pipeline is available, for audio classification.

- Add the `AudioClassificationPipeline` 13342 (anton-l)
- Enabling automatic loading of tokenizer with `pipeline` for `audio-classification`. 13376 (Narsil)

Setters for common properties

Version v4.11.0 introduces setters for common configuration properties. Different configurations have different properties as coming from different implementations.

One such example is the `BertConfig` having the `hidden_size` attribute, while the `GPT2Config` has the `n_embed` attribute, which are essentially the same.

The newly introduced setters allow setting such properties through a standardized naming scheme, even on configuration objects that do not have them by default.

See the following code sample for an example:


from transformers import GPT2Config
config = GPT2Config()

config.hidden_size = 4 Failed previously
config = GPT2Config(hidden_size =4) Failed previously

config.n_embed returns 4
config.hidden_size returns 4


- Update model configs - Allow setters for common properties 13026 (nreimers)

Dynamic model code loading

An experimental feature adding support for model files hosted on the hub is added as part of this release. A walkthrough is available in the [PR description](https://github.com/huggingface/transformers/pull/13467).

:warning: This means that code files will be fetched from the hub to be executed locally. An additional argument, `trust_remote_code` is required when instantiating the model from the hub. We heavily encourage you to also specify a `revision` if using code from another user's or organization's repository.

- Dynamically load model code from the Hub 13467 (sgugger)

Trainer

The `Trainer` has received several new features, the main one being that models are uploaded to the Hub each time you save them locally (you can specify another strategy). This push is asynchronous, so training continues normally without interruption.

Also:
- The SigOpt optimization framework is now integrated in the `Trainer` API as an opt-in component.
- The `Trainer` API now supports fine-tuning on distributed CPUs.

Associated PRs:

- Push to hub when saving checkpoints 13503 (sgugger)
- Add SigOpt HPO to transformers trainer api 13572 (kding1)
- Add cpu distributed fine-tuning support for transformers Trainer API 13574 (kding1)

Model size CPU memory usage reduction

The memory required to load a model in memory using PyTorch's `torch.load` requires twice the amount of memory necessary. An experimental feature allowing model loading while requiring only the model size in terms of memory usage is out in version v4.11.0.

It can be used by using the `low_cpu_mem_usage=True` argument with PyTorch pretrained models.

- 1x model size CPU memory usage for `from_pretrained` 13466 (stas00)


GPT-Neo: simplified local attention

The GPT-Neo local attention was greatly simplified with no loss of performance.

- [GPT-Neo] Simplify local attention 13491 (finetuneanon, patil-suraj)


Breaking changes

*We strive for no breaking changes between releases - however, some bugs are not discovered for long periods of time, and users may eventually rely on such bugs. We document here such changes that may affect users when updating to a recent version.*

Order of overflowing tokens

The overflowing tokens returned by the slow tokenizers were returned in the wrong order. This is changed in the PR below.

- Correct order of overflowing_tokens for slow tokenizer 13179 (Apoorvgarg-creator)

Non-prefixed tokens for token classification pipeline

Updates the behavior of `aggregation_strategy` to more closely mimic the deprecated `grouped_entities` pipeline argument.

- Fixing backward compatiblity for non prefixed tokens (B-, I-). 13493 (Narsil)

Inputs normalization for Wav2Vec2 feature extractor

The changes in v4.10 (12804) introduced a bug in inputs normalization for non-padded tensors that affected Wav2Vec2 fine-tuning.
This is fixed in the PR below.

- [Wav2Vec2] Fix normalization for non-padded tensors 13512 (patrickvonplaten)

General bug fixes and improvements

- Fixes for the documentation 13361 (sgugger)
- fix wrong 'cls' masking for bigbird qa model output 13143 (donggyukimc)
- Improve T5 docs 13240 (NielsRogge)
- Fix tokenizer saving during training with `Trainer` 12806 (SaulLu)
- Fix DINO 13369 (NielsRogge)
- Properly register missing submodules in main init 13372 (sgugger)
- Add `Hubert` to the `AutoFeatureExtractor` 13366 (anton-l)
- Add missing feature extractors 13374 (LysandreJik)
- Fix RemBERT tokenizer initialization 13375 (LysandreJik)
- [Flax] Fix BigBird 13380 (patrickvonplaten)
- [GPU Tests] Fix SpeechEncoderDecoder GPU tests 13383 (patrickvonplaten)
- Fix name and get_class method in AutoFeatureExtractor 13385 (sgugger)
- [Flax/run_hybrid_clip] Fix duplicating images when captions_per_image exceeds the number of captions, enable truncation 12752 (edugp)
- Move Flax self-push to test machine 13364 (patrickvonplaten)
- Torchscript test 13350 (LysandreJik)
- Torchscript test for DistilBERT 13351 (LysandreJik)
- Torchscript test for ConvBERT 13352 (LysandreJik)
- Torchscript test for Flaubert 13353 (LysandreJik)
- Fix GPT-J _CHECKPOINT_FOR_DOC typo 13368 (LysandreJik)
- Update clip loss calculation 13217 (sachinruk)
- Add LayoutXLM tokenizer docs 13373 (NielsRogge)
- [doc] fix mBART example 13387 (patil-suraj)
- [docs] Update perplexity.rst to use negative log likelihood 13386 (madaan)
- [Tests] Fix SpeechEncoderDecoder tests 13395 (patrickvonplaten)
- [SpeechEncoderDecoder] Fix final test 13396 (patrickvonplaten)
- ✨ Add PyTorch image classification example 13134 (nateraw)
- Fix tests without any real effect in EncoderDecoderMixin 13406 (ydshieh)
- Fix scheduled tests for `SpeechEncoderDecoderModel` 13422 (anton-l)
- add torchvision in example test requirements 13438 (patil-suraj)
- [EncoderDecoder] Fix torch device in tests 13448 (patrickvonplaten)
- Adding a test for multibytes unicode. 13447 (Narsil)
- skip image classification example test 13451 (patil-suraj)
- Add TAPAS MLM-only models 13408 (NielsRogge)
- Fix scheduled TF Speech tests 13403 (anton-l)
- Update version of `packaging` package 13454 (shivdhar)
- Update setup.py 13421 (anukaal)
- Fix img classification tests 13456 (nateraw)
- Making it raise real errors on ByT5. 13449 (Narsil)
- Optimized bad word ids 13433 (guillaume-be)
- Use powers of 2 in download size calculations 13468 (anton-l)
- [docs] update dead quickstart link on resuing past for GPT2 13455 (shabie)
- fix CLIP conversion script. 13474 (patil-suraj)
- Deprecate Mirror 13470 (JetRunner)
- [CLIP] fix logit_scale init 13436 (patil-suraj)
- Don't modify labels inplace in `LabelSmoother` 13464 (sgugger)
- Enable automated model list copying for localized READMEs 13465 (qqaatw)
- Better error raised when cloned without lfs 13401 (LysandreJik)
- Throw ValueError for mirror downloads 13478 (JetRunner)
- Fix Tensorflow T5 with int64 input 13479 (Rocketknight1)
- Object detection pipeline 12886 (mishig25)
- Typo in "end_of_word_suffix" 13477 (KoichiYasuoka)
- Fixed the MultilabelTrainer document, which would cause a potential bug when executing the code originally documented. 13414 (Mohan-Zhang-u)
- Fix integration tests for `TFWav2Vec2` and `TFHubert` 13480 (anton-l)
- Fix typo in deepspeed documentation 13482 (apohllo)
- flax ner example 13365 (kamalkraj)
- Fix typo in documentation 13494 (apohllo)
- MarianMT int dtype fix 13496 (Rocketknight1)
- [Tentative] Moving slow tokenizer to the Trie world. 13220 (Narsil)
- Refactor internals for Trainer push_to_hub 13486 (sgugger)
- examples: minor fixes in flax example readme 13502 (stefan-it)
- [Wav2Vec2] Fix normalization for non-padded tensors 13512 (patrickvonplaten)
- TF multiple choice loss fix 13513 (Rocketknight1)
- [Wav2Vec2] Fix dtype 64 bug 13517 (patrickvonplaten)
- fix PhophetNet 'use_cache' assignment of no effect 13532 (holazzer)
- Ignore `past_key_values` during GPT-Neo inference 13521 (aphedges)
- Fix attention mask size checking for CLIP 13535 (Renovamen)
- [Speech2Text2] Skip newly added tokenizer test 13536 (patrickvonplaten)
- [Speech2Text] Give feature extraction higher tolerance 13538 (patrickvonplaten)
- [tokenizer] use use_auth_token for config 13523 (stas00)
- Small changes in `perplexity.rst`to make the notebook executable on google collaboratory 13541 (SaulLu)
- [Feature Extractors] Return attention mask always in int32 13543 (patrickvonplaten)
- Nightly torch ci 13550 (LysandreJik)
- Add long overdue link to the Google TRC project 13501 (avital)
- Fixing 13381 13400 (Narsil)
- fixing BC in `fill-mask` (wasn't tested in theses test suites apparently). 13540 (Narsil)
- add flax mbart in auto seq2seq lm 13560 (patil-suraj)
- [Flax] Addition of FlaxPegasus 13420 (bhadreshpsavani)
- Add checks to build cleaner model cards 13542 (sgugger)
- separate model card git push from the rest 13514 (elishowk)
- Fix test_fetcher when setup is updated 13566 (sgugger)
- [Flax] Fixes typo in Bart based Flax Models 13565 (bhadreshpsavani)
- Fix GPTNeo onnx export 13524 (patil-suraj)
- upgrade sentencepiece version 13564 (elishowk)
- [Pretrained Model] Add resize_position_embeddings 13559 (patrickvonplaten)
- [ci] nightly: add deepspeed master 13589 (stas00)
- [Tests] Disable flaky s2t test 13585 (patrickvonplaten)
- Correct device when resizing position embeddings 13593 (patrickvonplaten)
- Fix DataCollatorForSeq2Seq when labels are supplied as Numpy array instead of list 13582 (Rocketknight1)
- Fix a pipeline test with the newly updated weights 13608 (LysandreJik)
- Fix make fix-copies with type annotations 13586 (sgugger)
- DataCollatorForTokenClassification numpy fix 13609 (Rocketknight1)
- Feature Extractor: Wav2Vec2 & Speech2Text - Allow truncation + padding=longest 13600 (patrickvonplaten)
- [deepspeed] replaced deprecated init arg 13587 (stas00)
- Properly use test_fetcher for examples 13604 (sgugger)
- XLMR tokenizer is fully picklable 13577 (ben-davidson-6)
- Optimize Token Classification models for TPU 13096 (ibraheem-moosa)
- [Trainer] Add nan/inf logging filter 13619 (patrickvonplaten)
- Fix special tokens not correctly tokenized 13489 (qqaatw)
- Removed console spam from misfiring warnings 13625 (Rocketknight1)
- Use `config_dict_or_path` for deepspeed.zero.Init 13614 (aphedges)
- Fixes issues with backward pass in LED/Longformer Self-attention 13613 (aleSuglia)
- fix some docstring in encoder-decoder models 13611 (ydshieh)
- Updated tiny distilbert models 13631 (LysandreJik)
- Fix GPT2Config parameters in GPT2ModelTester 13630 (calpt)
- [run_summarization] fix typo 13647 (patil-suraj)
- [Fix]Make sure the args tb_writer passed to the TensorBoardCallback works 13636 (iamlockelightning)
- Fix mT5 documentation 13639 (ayaka14732)
- Update modeling_tf_deberta.py 13654 (kamalkraj)
- [megatron_gpt2] checkpoint v3 13508 (stas00)
- Change https:/ to https:// to dataset GitHub repo #13644 (flozi00)
- fix research_projects/mlm_wwm readme.md examples 13646 (LowinLi)
- Fix typo distilbert doc to code link 13643 (flozi00)
- Add Speech AutoModels 13655 (patrickvonplaten)
- beit-flax 13515 (kamalkraj)
- [FLAX] Question Answering Example 13649 (kamalkraj)
- Typo "UNKWOWN" -> "UNKNOWN" 13675 (kamalkraj)
- [SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy 13650 (anton-l)
- [SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` 13665 (stas00)
- Add push_to_hub to no_trainer examples 13659 (sgugger)
- Layoutlm onnx support (Issue 13300) 13562 (nishprabhu)
- Update modeling_flax_wav2vec2.py 13680 (kamalkraj)
- [FlaxWav2Vec2] Revive Test 13688 (patrickvonplaten)
- [AutoTokenizer] Allow creation of tokenizers by tokenizer type 13668 (patrickvonplaten)
- [Wav2Vec2FeatureExtractor] Fix `extractor.pad()` dtype backwards compatibility 13693 (anton-l)
- Make gradient_checkpointing a training argument 13657 (sgugger)
- Assertions to exceptions 13692 (MocktaiLEngineer)
- Fix non-negligible difference between GPT2 and TFGP2 13679 (ydshieh)
- Allow only textual inputs to VisualBert 13687 (gchhablani)
- Patch training arguments issue 13699 (LysandreJik)
- Patch training arguments issue 13700 (LysandreJik)
- [GPT-J] Use the `float16` checkpoints in integration tests 13676 (anton-l)
- [docs/gpt-j] add a note about tokenizer 13696 (patil-suraj)
- Fix FNet reference to tpu short seq length 13686 (gchhablani)
- Add BlenderBot small tokenizer to the init 13367 (LysandreJik)
- Fix typo in torchscript tests 13701 (LysandreJik)
- Handle `UnicodeDecodeError` when loading config file 13717 (qqaatw)
- Add FSNER example in research_projects 13712 (sayef)
- Replace torch.set_grad_enabled by torch.no_grad 13703 (LysandreJik)
- [ASR] Add official ASR CTC example to `examples/pytorch/speech-recognition` 13620 (patrickvonplaten)
- Make assertions only if actually chunking forward 13598 (joshdevins)
- Use torch.unique_consecutive to check elements are same 13637 (oToToT)
- Fixing zero-shot backward compatiblity 13725 (Narsil)
- [Tests] FNetTokenizer 13729 (patrickvonplaten)
- Warn for unexpected argument combinations 13509 (shirayu)
- Add model card creation snippet to example scripts 13730 (gchhablani)
- [Examples] speech recognition - remove gradient checkpointing 13733 (patrickvonplaten)
- Update test dependence for torch examples 13738 (sgugger)
- [Tests] Add decorator to FlaxBeit 13743 (patrickvonplaten)
- Update requirements for speech example 13745 (sgugger)
- [Trainer] Make sure shown loss in distributed training is correctly averaged over all workers 13681 (patrickvonplaten)
- [megatron gpt checkpoint conversion] causal mask requires pos_embed dimension 13735 (stas00)
- [Tests] Cast Hubert model tests to fp16 13755 (anton-l)
- Fix type annotations for `distributed_concat()` 13746 (Renovamen)
- Fix loss computation in Trainer 13760 (sgugger)
- Silence warning in gradient checkpointing when it's False 13734 (sgugger)

4.10.3

Not secure
Patches an issue with the serialization of the `TrainingArguments`

Page 17 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.