| STS-B | **87.5** | 85.8 | 86.5 |
| MNLI-m/mm | 85.3/84.9 | 84.6/83.4 | **86.7/85.9** |
- The SciBERT model introduced by *Iz Beltagy and Arman Cohan and Kyle Lo* in "[SciBERT: Pretrained Contextualized Embeddings for Scientific Text](https://arxiv.org/abs/1903.10676)". The model checkpoints are converted from the [original repository](https://github.com/allenai/scibert) from AllenAI with the following datasets (735):
- `scibert_scivocab_uncased`
- `scibert_scivocab_cased`
- `scibert_basevocab_uncased`
- `scibert_basevocab_cased`
- The BioBERT model introduced by *Lee, Jinhyuk, et al.* in "[BioBERT: a pre-trained biomedical language representation model for biomedical text mining](https://arxiv.org/abs/1901.08746)". The model checkpoints are converted from the [original repository](https://github.com/naver/biobert-pretrained) with the following datasets (735):
- `biobert_v1.0_pmc_cased`
- `biobert_v1.0_pubmed_cased`
- `biobert_v1.0_pubmed_pmc_cased`
- `biobert_v1.1_pubmed_cased`
- The ClinicalBERT model introduced by *Kexin Huang and Jaan Altosaar and Rajesh Ranganath* in "[ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission](https://arxiv.org/abs/1904.05342)". The model checkpoints are converted from the [original repository](https://github.com/kexinhuang12345/clinicalBERT) with the `clinicalbert_uncased` dataset (735)
- The ERNIE model introduced by *Sun, Yu, et al.* in "[ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223)". You can get the model checkpoints converted from the [original repository](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) with `model.get_model("ernie_12_768_12", "baidu_ernie_uncased")` (759) thanks paperplanet
- BERT fine-tuning script for named entity recognition on CoNLL2003 with test F1 92.2 (612).
- BERT fine-tuning script for Chinese XNLI dataset with 78.3% validation accuracy. (759) thanks paperplanet
- BERT fine-tuning script for intent classification and slot labelling on ATIS (95.9 F1) and SNIPS (95.9 F1). (817)
GPT-2
- The GPT-2 language model introduced by *Radford, Alec, et al.* in "[Language Models are Unsupervised Multitask Learners](https://www.techbooky.com/wp-content/uploads/2019/02/Better-Language-Models-and-Their-Implications.pdf)". The model checkpoints are converted from the [original repository](https://github.com/openai/gpt-2), with a script to generate text from GPT-2 model (`gpt2_117m`, `gpt2_345m`) trained on the `openai_webtext` dataset (761).
ESIM
- The ESIM model for text matching introduced by *Chen, Qian, et al.* in "[Enhanced LSTM for Natural Language Inference](https://arxiv.org/abs/1609.06038)". (689)
Data
====
- Natural language understanding with datasets from the GLUE benchmark: CoLA, SST-2, MRPC, STS-B, MNLI, QQP, QNLI, WNLI, RTE (682)
- Sentiment analysis datasets: CR, MPQA (663)
- Intent classification and slot labeling datasets: ATIS and SNIPS (816)
New Features
============
- [Feature] support save model / trainer states to S3 (700)
- [Feature] support load model/trainer states from s3 (702)
- [Feature] Add SentencePieceTokenizer for BERT (669)
- [FEATURE] Flexible vocabulary (732)
- [API] Moving MaskedSoftmaxCELoss and LabelSmoothing to model API (754) thanks ThomasDelteil
- [Feature] add the List batchify function (812) thanks ThomasDelteil
- [FEATURE] Add LAMB optimizer (733)
Bug Fixes
=========
- [BUGFIX] Fixes for BERT embedding, pretraining scripts (640) thanks Deseaus
- [BUGFIX] Update hash of wiki_cn_cased and wiki_multilingual_cased vocab (655)
- fix bert forward call parameter mismatch (695) thanks paperplanet
- [BUGFIX] Fix mlm_loss reporting for eval dataset (696)
- Fix _get_rnn_cell (648) thanks MarisaKirisame
- [BUGFIX] fix mrpc dataset idx (708)
- [bugfix] fix hybrid beam search sampler(710)
- [BUGFIX] [DOC] Update nlp.model.get_model documentation and get_model API (734)
- [BUGFIX] Fix handling of duplicate special tokens in Vocabulary (749)
- [BUGFIX] Fix TokenEmbedding serialization with `emb[emb.unknown_token] != 0` (763)
- [BUGFIX] Fix glue test result serialization (773)
- [BUGFIX] Fix init bug for multilevel BiLMEncoder (783) thanks Ishitori
API Changes
===========
- [API] Dropping support for wiki_multilingual and wiki_cn (764)
- [API] Remove get_bert_model from the public API list (767)
Enhancements
============
- [FEATURE] offer load_w2v_binary method to load w2v binary file (620)
- [Script] Add inference function for BERT classification (639) thanks TaoLv
- [SCRIPT] - Add static BERT base export script (for use with MXNet Module API) (672)
- [Enhancement] One script to export bert for classification/regression/QA (705)
- [enhancement] refactor bert finetuning script (692)
- [Enhancement] only use the best model for inference for bert classification (716)
- [Dataset] redistribute conll2004 (719)
- [Enhancement] add periodic evaluation for BERT pre-training (720)
- [FEATURE]add XNLI task (717)
- [refactor] Refactor BERT script folder (744)
- [Enhancement] BERT pre-training data generation from sentencepiece vocab (743)
- [REFACTOR] Refactor TokenEmbedding to reduce number of places that initialize internals (750)
- [Refactor] Refactor BERT SQuAD inference code (758)
- [Enhancement] Fix dtype conversion, add sentencepiece support for SQuAD (766)
- [Dataset] Move MRPC dataset to API (780)
- [BiDAF-QANet] Common data processing logic for BiDAF and QANet (739) thanks Ishitori
- [DATASET] add LCQMC, ChnSentiCorp dataset (774) thanks paperplanet
- [Improvement] Implement parser evaluation in Python (772)
- [Enhancement] Add whole word masking for BERT (770) thanks basicv8vc
- [Enhancement] Mix precision support for BERT finetuning (793)
- Generate BERT training samples in compressed format (651)
Minor Fixes
===========
- Various documentation fixes: 635, 637, 647, 656, 664, 667, 670, 676, 678, 681, 698, 704, 731, 745, 762, 771, 746, 778, 800, 810, 807 814 thanks rongruosong crcrpar mrchypark xwind-h
- Fix BERT multiprocessing data creation bug which causes unnecessary dispatching to single worker (649)
- [BUGFIX] Update BERT test and pre-train script (661)
- update url for ws353 (701)
- bump up version (742)
- [DOC] Update textCNN results (737)
- padding value warning (747)
- [TUTORIAL][DOC] Tutorial Updates (802) thanks faramarzmunshi
Continuous Integration
======================
- skip failing tests in mxnet master (685)
- [CI] update nodes for CI (686)
- [CI] CI refactoring to speed up tests (566)
- [CI] fix codecov (693)
- use fixture for squad dataset tests (699)
- [CI] create zipped notebooks for link check (712)
- Fix test infrastructure for pytest > 4 and bump CI pytest version (728)
- [CI] set root in BERT tests (738)
- Fix conftest.py function_scope_seed (748)
- [CI] Fix links in contribute.rst (752)
- [CI] Update CI dependencies (756)
- Revert "[CI] Update CI dependencies (756)" (769)
- [CI] AWS Batch serverless CI Pipeline for parallel notebook execution during website build step (791)
- [CI] Don't exit pipeline before displaying AWS Batch logfiles (801)
- [CI] Fix for "Don't exit pipeline before displaying AWS Batch logfile (803)
- add license checker (804)
- enable timeout (813)
- Fix website build on master branch (819)