News
====
- Tutorial proposal for GluonNLP is accepted at [EMNLP 2019](https://www.emnlp-ijcnlp2019.org/), Hong Kong, and [KDD 2019](https://www.kdd.org/kdd2019/), Anchorage.
Models and Scripts
==================
- BERT pre-training on BooksCorpus and English Wikipedia with mixed precision and gradient accumulation on GPUs. We achieved the following fine-tuning results based on the produced checkpoint on validation sets(482, 505, 489). Thank you haven-jeon
- | Dataset | MRPC | SQuAD 1.1 | SST-2 | MNLI-mm |
|:----------:|:--------------:|:--------------:|:-----------:|:-------:|
| Score | 87.99% | 80.99/88.60 | 93% | 83.6% |
- BERT fine-tuning on various sentence classification datasets with checkpoints converted from the official repository(600, 571, 481). Thank you kenjewu haven-jeon
- | Dataset | MRPC | RTE | SST-2 | MNLI-m/mm |
|:---------:|:--------------:|:--------------:|:--------------:|:--------------:|
| Score | 88.7% | 70.8% | 93% | 84.55%, 84.66% |
- BERT fine-tuning on question answering datasets with checkpoints converted from the official repository(493). Thank you fiercex
- | Dataset | SQuAD 1.1 | SQuAD 1.1 | SQuAD 2.0 |
|:---------:|:---------------:|:---------------:|:-------------:|
| Model | bert_12_768_12| bert_24_1024_16 |bert_24_1024_16|
| F1/EM | 88.53/80.98 | 90.97/84.05 | 77.96/81.02 |
- BERT model convertion scripts for checkpoints from the original tensorflow repository, and more converted models(456, 461, 449). Thank you fiercex:
- Multilingual Wikipedia (cased, BERT Base)
- Chinese Wikipedia (cased, BERT Base)
- Books Corpus & English Wikipedia (uncased, BERT Large)
- Scripts and command line interface for BERT embedding of raw sentences(587, 618). Thank you imgarylai
- Scripts for exporting BERT model for deployment (624)
New Features
============
- [API] Add BERTVocab (509) thanks kenjewu
- [API] Add Transforms for BERT (526) thanks kenjewu
- [API] add data parallel for transformer (387)
- [FEATURE] Add squad2.0 Dataset (551) thanks fiercex
- [FEATURE] Add NumpyDataset (498)
- [FEATURE] Add TruncNorm initializer for BERT (548) thanks Ishitori
- [FEATURE] Add split sampler for distributed training (494)
- [FEATURE] Custom metric for masked accuracy (503)
- [FEATURE] Support custom sampler in SimpleDatasetStream (507)
- [FEATURE] clip gradient norm by parameter (470)
Bug Fixes
=========
- [BUGFIX] Fix Data Preprocessing for Translation Data (568)
- [FIX] fix parameter clip (527)
- [FIX] Fix divergence of the training of transformer (543)
- [FIX] Fix documentation and a bug in NCE Block (558)
- [FIX] Fix hashing single ngrams in NGramHashes (450)
- [FIX] Fix weight dying in BERTModel.decoder for BERT pre-training (500)
- [BUGFIX] Modifying the FastText Classification training for accurate mean pooling (529) thanks sravanbabuiitm
API Changes
===========
- [API] BERT return intermediate encodings per layer (606) thanks Ishitori
- [API] Better handle case when backoff is not possible in TokenEmbedding (459)
- [FIX] Rename wiki_cn/wiki_multilingual to wiki_cn_cased/wiki_multilingual_uncased (594) thanks kenjewu
- [FIX] Update default value of BERTAdam epsilon to 1e-6 (601)
- [FIX] Fix BERT decoder API for masked language model prediction (501)
- [FIX] Remove bias correction term in BERTAdam (499)
Enhancements
============
- [BUGFIX] use glove.840B.300d for NLI experiments (567)
- [API] Add debug option for parallel (584)
- [FEATURE] Skip dropout layer in Transformer when rate=0 (597) thanks TaoLv
- [FEATURE] update sharded loader (468)
- [FIX] Update BERTLayerNorm Implementation (485)
- [TUTORIAL] Use FixedBucketSampler in BERT tutorial for better performance (506) thanks Ishitori
- [API] Add Bert tokenizer to transforms.py (464) thanks fiercex
- [FEATURE] Add data parallel to big rnn lm script (564)
Minor Fixes
===========
- Various documentation fixes: 484, 613, 614, 438, 448, 550, 563, 611, 605, 440, 554, 445, 556, 603, 483, 576, 610, 547, 458, 574, 510, 447, 465, 436, 622, 583 thanks anuragsarkar97 brettkoonce
- [FIX] fix repeated unzipping in squad dataset (553)
- [FIX] web fixes (453)
- [FIX] Remove unused argument in fasttext_word_ngram.py (486) thanks kurtjanssensai
- [FIX] Remove unused code (528)
- [FIX] Remove unused code in text_classification script (442)
- [MISC] Bump up version (454)
- [BUGFIX] fix pylint error (549)
- [FIX] Simplify the data preprocessing code for the sentiment analysis script (462)
- [FEATURE] BERT doc fixes and script usability enhancements (444)
- [FIX] Fix Py2 compatibility of machine_translation/dataprocessor.py (541) thanks ymjiang
- [BUGFIX] Fix GluonNLP MXNet dependency (555)
- [BUGFIX] Fix Weight Drop and Test (546)
- [CI] Add version upper bound to doc.yml (467)
- [CI] speed up tests (582)
- [CI] upgrade mxnet to 1.4.0 (617)
- [FIX] Revert an unintended change (525)
- [BUGFIX] update paths and imports in bert scripts (634)