New pretrained models:
- **Open AI GPT** pretrained on the *Toronto Book Corpus* ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.).
- This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices.
- Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme).
- **Transformer-XL** pretrained on *WikiText 103* ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by:
- untying relative positioning embeddings across layers,
- changing memory cells initialization to keep sinusoïdal positions identical
- adding full logits outputs in the adaptive softmax to use it in a generative setting.
- Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme).
New scripts:
- Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by abeljim and Liangtaiwan
- `run_lm_finetuning.py` let you pretrain a `BERT` language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by deepset-ai, tholor and nhatchan (compatibility Python 3.5)
Backward compatibility:
- The library is now compatible with Python 2 also
Improvements and bug fixes:
- add a `never_split` option and arguments to the tokenizers (WrRan)
- better handle errors when BERT is feed with inputs that are too long (patrick-s-h-lewis)
- better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(donglixp)
- fix learning rate schedule issue in example scripts (matej-svejda)
- readme fixes (danyaljj, nhatchan, davidefiocco, girishponkiya )
- importing unofficial TF models in BERT (nhatchan)
- only keep the active part of the loss for token classification (Iwontbecreative)
- fix argparse type error in example scripts (ksurya)
- docstring fixes (rodgzilla, wlhgtc )
- improving `run_classifier.py` loading of saved models (SinghJasdeep)
- In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (jaderabbit, likejazz and JoeDumoulin )
- in `run_squad.py`: fix error when `bert_model` param is path or url (likejazz)
- add license to source distribution and use entry-points instead of scripts (sodre)