Transformers

Latest version: v4.46.2

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 29 of 30

1.0.0

Name change: welcome PyTorch-Transformers 👾

`pytorch-pretrained-bert` => `pytorch-transformers`

Install with `pip install pytorch-transformers`

New models

- **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.

New pretrained weights

We went from ten (in `pytorch-pretrained-bert` 0.6.2) to twenty-seven (in `pytorch-transformers` 1.0) pretrained model weights.

The newly added model weights are, in summary:
- Two `Whole-Word-Masking` weights for Bert (cased and uncased)
- Three Fine-tuned models for Bert (on SQuAD and MRPC)
- One German model for Bert provided and trained by Deepset.ai (tholor and Timoeller) as detailed in their nice [blogpost](https://deepset.ai/german-bert)
- One OpenAI GPT-2 model (medium size model)
- Two models (base and large) for the newly added XLNet model
- Eight models for the newly added XLM model

The [documentation lists all the models with the shortcut names](https://huggingface.co/pytorch-transformers/pretrained_models.html) and we are currently adding full details of the associated pretraining/fine-tuning parameters.

New documentation

New documentation is currently being created at https://huggingface.co/pytorch-transformers/ and should be finalized over the coming days.

Standard API across models
See the [readme](https://github.com/huggingface/pytorch-transformers#quick-tour) for a quick tour of the API.

Main points:

- All models now return `tuples` with various elements depending on the model and the configuration. The docstrings and [documentation](https://huggingface.co/pytorch-transformers/model_doc/bert.html#bertmodel) list all the expected outputs in order.
- All models can now return the full list of hidden-states (embeddings output + the output hidden-states of each layer)
- All models can now return the full list of attention weights (one tensor of attention weights for each layer)

python
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased',
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]


Standard API to add tokens to the vocabulary and the model

Using `tokenizer.add_tokens()` and `tokenizer.add_special_tokens()`, one can now easily add tokens to each model vocabulary. The model's input embeddings can be resized accordingly to add associated word embeddings (to be trained) using `model.resize_token_embeddings(len(tokenizer))`

python
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))


Serialization

The serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.

python
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')

Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')


Torchscript

All models are now compatible with Torchscript.

python
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))


Examples scripts

The examples scripts have been refactored and gathered in three main examples (`run_glue.py`, `run_squad.py` and `run_generation.py`) which are common to several models and are designed to offer SOTA performances on the respective tasks while being clean starting point to design your own scripts.

Other examples scripts (like `run_bertology.py`) will be added in the coming weeks.

Breaking-changes

The [migration section](https://github.com/huggingface/pytorch-transformers#migrating-from-pytorch-pretrained-bert-to-pytorch-transformers) of the readme lists the breaking changes when switching from `pytorch-pretrained-bert` to `pytorch-transformers`.

The main breaking change is that all models now returns a `tuple` of results.

0.6.2

General updates:
- Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with [best practices for saving/loading](https://github.com/huggingface/pytorch-pretrained-BERT#serialization-best-practices) in readme and examples.
- Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)

Breaking changes:
- `warmup_linear` method in `OpenAIAdam` and `BertAdam` is now replaced by flexible [schedule classes](https://github.com/huggingface/pytorch-pretrained-BERT#learning-rate-schedules) for linear, cosine and multi-cycles schedules.

Bug fixes and improvements to the library modules:
- add a flag in BertTokenizer to skip basic tokenization (john-hewitt)
- Allow tokenization of sequences > 512 (CatalinVoss)
- clean up and extend learning rate schedules in BertAdam and OpenAIAdam (lukovnikov)
- Update GPT/GPT-2 Loss computation (CatalinVoss, thomwolf)
- Make the TensorFlow conversion tool more robust (marpaia)
- fixed BertForMultipleChoice model init and forward pass (dhpollack)
- Fix gradient overflow in GPT-2 FP16 training (SudoSharma)
- catch exception if pathlib not installed (potatochip)
- Use Dropout Layer in OpenAIGPTMultipleChoiceHead (pglock)

New scripts and improvements to the examples scripts:
- Add BERT language model fine-tuning scripts (Rocketknight1)
- Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (ananyahjha93, jplehmann)
- GPT-2 generation fixes (CatalinVoss, spolu, dhanajitb, 8enmann, SudoSharma, cynthia)

0.6.1

Add `regex` to the requirements for OpenAI GPT-2 tokenizer.

0.6.0

Add OpenAI small GPT-2 pretrained model

0.5.1

Mostly a bug fix update for loading the `TransfoXLModel` from s3:

* Fixes a bug in the loading of the pretrained `TransfoXLModel` from the s3 dump (which is a converted `TransfoXLLMHeadModel`) in which the weights were not loaded.
* Added a fallback of `OpenAIGPTTokenizer` on BERT's `BasicTokenizer` when SpaCy and ftfy are not installed. Using BERT's `BasicTokenizer` instead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use the `never_split` option to avoid splitting special tokens like `[CLS], [SEP]...` which is easier than adding the tokens after tokenization.
* Updated the README on the tokenizers options and methods which was lagging behind a bit.

0.5.0

New pretrained models:
- **Open AI GPT** pretrained on the *Toronto Book Corpus* ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.).
- This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices.
- Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme).


- **Transformer-XL** pretrained on *WikiText 103* ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by:
- untying relative positioning embeddings across layers,
- changing memory cells initialization to keep sinusoïdal positions identical
- adding full logits outputs in the adaptive softmax to use it in a generative setting.
- Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme).

New scripts:
- Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by abeljim and Liangtaiwan
- `run_lm_finetuning.py` let you pretrain a `BERT` language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by deepset-ai, tholor and nhatchan (compatibility Python 3.5)

Backward compatibility:
- The library is now compatible with Python 2 also

Improvements and bug fixes:
- add a `never_split` option and arguments to the tokenizers (WrRan)
- better handle errors when BERT is feed with inputs that are too long (patrick-s-h-lewis)
- better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(donglixp)
- fix learning rate schedule issue in example scripts (matej-svejda)
- readme fixes (danyaljj, nhatchan, davidefiocco, girishponkiya )
- importing unofficial TF models in BERT (nhatchan)
- only keep the active part of the loss for token classification (Iwontbecreative)
- fix argparse type error in example scripts (ksurya)
- docstring fixes (rodgzilla, wlhgtc )
- improving `run_classifier.py` loading of saved models (SinghJasdeep)
- In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (jaderabbit, likejazz and JoeDumoulin )
- in `run_squad.py`: fix error when `bert_model` param is path or url (likejazz)
- add license to source distribution and use entry-points instead of scripts (sodre)

Page 29 of 30

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.