General updates:
- Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with [best practices for saving/loading](https://github.com/huggingface/pytorch-pretrained-BERTserialization-best-practices) in readme and examples.
- Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)
Breaking changes:
- `warmup_linear` method in `OpenAIAdam` and `BertAdam` is now replaced by flexible [schedule classes](https://github.com/huggingface/pytorch-pretrained-BERTlearning-rate-schedules) for linear, cosine and multi-cycles schedules.
Bug fixes and improvements to the library modules:
- add a flag in BertTokenizer to skip basic tokenization (john-hewitt)
- Allow tokenization of sequences > 512 (CatalinVoss)
- clean up and extend learning rate schedules in BertAdam and OpenAIAdam (lukovnikov)
- Update GPT/GPT-2 Loss computation (CatalinVoss, thomwolf)
- Make the TensorFlow conversion tool more robust (marpaia)
- fixed BertForMultipleChoice model init and forward pass (dhpollack)
- Fix gradient overflow in GPT-2 FP16 training (SudoSharma)
- catch exception if pathlib not installed (potatochip)
- Use Dropout Layer in OpenAIGPTMultipleChoiceHead (pglock)
New scripts and improvements to the examples scripts:
- Add BERT language model fine-tuning scripts (Rocketknight1)
- Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (ananyahjha93, jplehmann)
- GPT-2 generation fixes (CatalinVoss, spolu, dhanajitb, 8enmann, SudoSharma, cynthia)