Name change: welcome PyTorch-Transformers 👾
`pytorch-pretrained-bert` => `pytorch-transformers`
Install with `pip install pytorch-transformers`
New models
- **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
New pretrained weights
We went from ten (in `pytorch-pretrained-bert` 0.6.2) to twenty-seven (in `pytorch-transformers` 1.0) pretrained model weights.
The newly added model weights are, in summary:
- Two `Whole-Word-Masking` weights for Bert (cased and uncased)
- Three Fine-tuned models for Bert (on SQuAD and MRPC)
- One German model for Bert provided and trained by Deepset.ai (tholor and Timoeller) as detailed in their nice [blogpost](https://deepset.ai/german-bert)
- One OpenAI GPT-2 model (medium size model)
- Two models (base and large) for the newly added XLNet model
- Eight models for the newly added XLM model
The [documentation lists all the models with the shortcut names](https://huggingface.co/pytorch-transformers/pretrained_models.html) and we are currently adding full details of the associated pretraining/fine-tuning parameters.
New documentation
New documentation is currently being created at https://huggingface.co/pytorch-transformers/ and should be finalized over the coming days.
Standard API across models
See the [readme](https://github.com/huggingface/pytorch-transformers#quick-tour) for a quick tour of the API.
Main points:
- All models now return `tuples` with various elements depending on the model and the configuration. The docstrings and [documentation](https://huggingface.co/pytorch-transformers/model_doc/bert.html#bertmodel) list all the expected outputs in order.
- All models can now return the full list of hidden-states (embeddings output + the output hidden-states of each layer)
- All models can now return the full list of attention weights (one tensor of attention weights for each layer)
python
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased',
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]
Standard API to add tokens to the vocabulary and the model
Using `tokenizer.add_tokens()` and `tokenizer.add_special_tokens()`, one can now easily add tokens to each model vocabulary. The model's input embeddings can be resized accordingly to add associated word embeddings (to be trained) using `model.resize_token_embeddings(len(tokenizer))`
python
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
Serialization
The serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
python
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
Torchscript
All models are now compatible with Torchscript.
python
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))
Examples scripts
The examples scripts have been refactored and gathered in three main examples (`run_glue.py`, `run_squad.py` and `run_generation.py`) which are common to several models and are designed to offer SOTA performances on the respective tasks while being clean starting point to design your own scripts.
Other examples scripts (like `run_bertology.py`) will be added in the coming weeks.
Breaking-changes
The [migration section](https://github.com/huggingface/pytorch-transformers#migrating-from-pytorch-pretrained-bert-to-pytorch-transformers) of the readme lists the breaking changes when switching from `pytorch-pretrained-bert` to `pytorch-transformers`.
The main breaking change is that all models now returns a `tuple` of results.