Gluonnlp

Latest version: v0.10.0

Safety actively analyzes 701735 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 3

0.7.0

News
====
- GluonNLP will be featured in KDD 2019 Alaska! Check out our tutorial: [From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond](https://www.kdd.org/kdd2019/hands-on-tutorials).
- GluonNLP was featured in JSALT 2019 in Montreal, 2019-6-14! Checkout https://jsalt19.mxnet.io.

Models and Scripts
==================
BERT
- BERT model pre-trained on [OpenWebText Corpus](https://skylion007.github.io/OpenWebTextCorpus/), BooksCorpus, and English Wikipedia. The test score on GLUE Benchmark is reported below. Also improved usability of the BERT pre-training script: on-the-fly training data generation, sentencepiece, horovod, etc. (799, 687, 806, 669, 665). Thank you davisliang

| Source | GluonNLP | google-research/bert | google-research/bert |
|-----------|-----------------------------------------|-----------------------------|-----------------------------|
| Model | bert_12_768_12 | bert_12_768_12 | bert_24_1024_16 |
| Dataset | `openwebtext_book_corpus_wiki_en_uncased` | `book_corpus_wiki_en_uncased` | `book_corpus_wiki_en_uncased` |
| SST-2 | **95.3** | 93.5 | 94.9 |
| RTE | **73.6** | 66.4 | 70.1 |
| QQP | **72.3** | 71.2 | 72.1 |

0.6.0

News
====
- Tutorial proposal for GluonNLP is accepted at [EMNLP 2019](https://www.emnlp-ijcnlp2019.org/), Hong Kong, and [KDD 2019](https://www.kdd.org/kdd2019/), Anchorage.

Models and Scripts
==================
- BERT pre-training on BooksCorpus and English Wikipedia with mixed precision and gradient accumulation on GPUs. We achieved the following fine-tuning results based on the produced checkpoint on validation sets(482, 505, 489). Thank you haven-jeon
- | Dataset | MRPC | SQuAD 1.1 | SST-2 | MNLI-mm |
|:----------:|:--------------:|:--------------:|:-----------:|:-------:|
| Score | 87.99% | 80.99/88.60 | 93% | 83.6% |

- BERT fine-tuning on various sentence classification datasets with checkpoints converted from the official repository(600, 571, 481). Thank you kenjewu haven-jeon
- | Dataset | MRPC | RTE | SST-2 | MNLI-m/mm |
|:---------:|:--------------:|:--------------:|:--------------:|:--------------:|
| Score | 88.7% | 70.8% | 93% | 84.55%, 84.66% |
- BERT fine-tuning on question answering datasets with checkpoints converted from the official repository(493). Thank you fiercex
- | Dataset | SQuAD 1.1 | SQuAD 1.1 | SQuAD 2.0 |
|:---------:|:---------------:|:---------------:|:-------------:|
| Model | bert_12_768_12| bert_24_1024_16 |bert_24_1024_16|
| F1/EM | 88.53/80.98 | 90.97/84.05 | 77.96/81.02 |
- BERT model convertion scripts for checkpoints from the original tensorflow repository, and more converted models(456, 461, 449). Thank you fiercex:
- Multilingual Wikipedia (cased, BERT Base)
- Chinese Wikipedia (cased, BERT Base)
- Books Corpus & English Wikipedia (uncased, BERT Large)
- Scripts and command line interface for BERT embedding of raw sentences(587, 618). Thank you imgarylai
- Scripts for exporting BERT model for deployment (624)

New Features
============
- [API] Add BERTVocab (509) thanks kenjewu
- [API] Add Transforms for BERT (526) thanks kenjewu
- [API] add data parallel for transformer (387)
- [FEATURE] Add squad2.0 Dataset (551) thanks fiercex
- [FEATURE] Add NumpyDataset (498)
- [FEATURE] Add TruncNorm initializer for BERT (548) thanks Ishitori
- [FEATURE] Add split sampler for distributed training (494)
- [FEATURE] Custom metric for masked accuracy (503)
- [FEATURE] Support custom sampler in SimpleDatasetStream (507)
- [FEATURE] clip gradient norm by parameter (470)

Bug Fixes
=========
- [BUGFIX] Fix Data Preprocessing for Translation Data (568)
- [FIX] fix parameter clip (527)
- [FIX] Fix divergence of the training of transformer (543)
- [FIX] Fix documentation and a bug in NCE Block (558)
- [FIX] Fix hashing single ngrams in NGramHashes (450)
- [FIX] Fix weight dying in BERTModel.decoder for BERT pre-training (500)
- [BUGFIX] Modifying the FastText Classification training for accurate mean pooling (529) thanks sravanbabuiitm

API Changes
===========
- [API] BERT return intermediate encodings per layer (606) thanks Ishitori
- [API] Better handle case when backoff is not possible in TokenEmbedding (459)
- [FIX] Rename wiki_cn/wiki_multilingual to wiki_cn_cased/wiki_multilingual_uncased (594) thanks kenjewu
- [FIX] Update default value of BERTAdam epsilon to 1e-6 (601)
- [FIX] Fix BERT decoder API for masked language model prediction (501)
- [FIX] Remove bias correction term in BERTAdam (499)

Enhancements
============
- [BUGFIX] use glove.840B.300d for NLI experiments (567)
- [API] Add debug option for parallel (584)
- [FEATURE] Skip dropout layer in Transformer when rate=0 (597) thanks TaoLv
- [FEATURE] update sharded loader (468)
- [FIX] Update BERTLayerNorm Implementation (485)
- [TUTORIAL] Use FixedBucketSampler in BERT tutorial for better performance (506) thanks Ishitori
- [API] Add Bert tokenizer to transforms.py (464) thanks fiercex
- [FEATURE] Add data parallel to big rnn lm script (564)

Minor Fixes
===========
- Various documentation fixes: 484, 613, 614, 438, 448, 550, 563, 611, 605, 440, 554, 445, 556, 603, 483, 576, 610, 547, 458, 574, 510, 447, 465, 436, 622, 583 thanks anuragsarkar97 brettkoonce
- [FIX] fix repeated unzipping in squad dataset (553)
- [FIX] web fixes (453)
- [FIX] Remove unused argument in fasttext_word_ngram.py (486) thanks kurtjanssensai
- [FIX] Remove unused code (528)
- [FIX] Remove unused code in text_classification script (442)
- [MISC] Bump up version (454)
- [BUGFIX] fix pylint error (549)
- [FIX] Simplify the data preprocessing code for the sentiment analysis script (462)
- [FEATURE] BERT doc fixes and script usability enhancements (444)
- [FIX] Fix Py2 compatibility of machine_translation/dataprocessor.py (541) thanks ymjiang
- [BUGFIX] Fix GluonNLP MXNet dependency (555)
- [BUGFIX] Fix Weight Drop and Test (546)
- [CI] Add version upper bound to doc.yml (467)
- [CI] speed up tests (582)
- [CI] upgrade mxnet to 1.4.0 (617)
- [FIX] Revert an unintended change (525)
- [BUGFIX] update paths and imports in bert scripts (634)

0.5.0

Highlights
=================

- Featured in [AWS re:invent](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=88736) 2018

Models
======

- **BERT**
- The [Bidirectional Encoder Representations from Transformers](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html) model as introduced by *Devlin, Jacob, et al. "[Bert: Pre-training of deep bidirectional transformers for language understanding.](https://arxiv.org/abs/1802.05365)" arXiv preprint arXiv:1810.04805 (2018)* (409).
Model parameters are converted from the [original model checkpoints from Google research](https://github.com/google-research/bert/), including:
- BERT BASE model trained on
- Book Corpus & English Wikipedia (cased)
- Book Corpus & English Wikipedia (uncased)
- multilingual Wikipedia (uncased)
- BERT LARGE model trained on Book Corpus & English Wikipedia (uncased)
- **ELMo**
- The [Embeddings from Language Models](http://gluon-nlp.mxnet.io/api/modules/model.htmlelmo) as introduced by *Peters, Matthew E., et al. "[Deep contextualized word representations](https://arxiv.org/abs/1802.05365)." arXiv preprint arXiv:1802.05365 (2018)* (227, 428).
Model parameters are converted from the [original model checkpoints in AllenNLP](https://allennlp.org/elmo), including the small, medium, original models trained on 1 billion words dataset, and the original model trained on 5.5B tokens consisting of Wikipedia & monolingual news crawl data from WMT 2008-2012.
- **Word Embedding**
- The [GloVe](http://gluon-nlp.mxnet.io/model_zoo/word_embeddings/index.html) model as introduced by *Pennington, Jeffrey, Richard Socher, and Christopher Manning. "[Glove: Global vectors for word representation](http://www.aclweb.org/anthology/D14-1162)." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014* (359).
- **Natural Language Inference**
- The [Decomposable Attention Model](http://gluon-nlp.mxnet.io/model_zoo/natural_language_inference/index.html) as introduced by *Parikh, Ankur P., et al. "[A decomposable attention model for natural language inference](https://arxiv.org/abs/1606.01933)." arXiv preprint arXiv:1606.01933 (2016).* (404). On the SNLI test set, it achieves 84.6% accuracy (without intra-sentence attention) and 84.4% accuracy (with intra-sentence attention). Thank you linmx0130 hhexiy!
- **Dependency Parsing**
- The [Deep Biaffine Attention Dependency Parser](http://gluon-nlp.mxnet.io/model_zoo/parsing/index.html) as introduced by *Dozat, Timothy, and Christopher D. Manning. "[Deep biaffine attention for neural dependency parsing](https://arxiv.org/pdf/1611.01734.pdf)." arXiv preprint arXiv:1611.01734 (2016).* (408). It achieved 96% UAS on the Penn Treebank dataset. Thank you hankcs!

- **Text Classification**
- The [Text CNN](http://gluon-nlp.mxnet.io/model_zoo/sentiment_analysis/index.html) model as introduced by *Kim, Yoon. "[Convolutional neural networks for sentence classification](https://arxiv.org/abs/1408.5882)." arXiv preprint arXiv:1408.5882 (2014)*. (391) Thank you xiaotinghe!

New Tutorials
============

- **ELMo**
- A [tutorial](http://gluon-nlp.mxnet.io/examples/sentence_embedding/elmo_sentence_representation.html) on generating contextualized representation with the pre-trained ELMo model, as introduced by *Peters, Matthew E., et al. "[Deep contextualized word representations](https://arxiv.org/abs/1802.05365)." arXiv preprint arXiv:1802.05365 (2018)* (227, 428).
- **BERT**
- A [tutorial](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html) on fine-tuning the BERT model for sentence pair classification, as introduced by *Devlin, Jacob, et al. "[Bert: Pre-training of deep bidirectional transformers for language understanding.](https://arxiv.org/abs/1802.05365)" arXiv preprint arXiv:1810.04805 (2018)* (437)

New Datasets
============
- **Sentiment Analysis**
- [MR](https://www.cs.cornell.edu/people/pabo/movie-review-data), a movie-review data set of 10,662 sentences labeled with respect to their overall sentiment polarity (positive or negative). (391)
- [SST_1](http://nlp.stanford.edu/sentiment), an extension of the MR data set with fine-grained labels (391)
- [SST_2](http://nlp.stanford.edu/sentiment), an extension of the MR data set with binary sentiment polarity labels (391)
- [SUBJ](https://www.cs.cornell.edu/people/pabo/movie-review-data), a subjectivity data set for sentiment analysis (391)
- [TREC](http://cogcomp.org/page/resource_view/49), a movie-review data set of 10,000 sentences labeled with respect to their subjectivity status (subjective or objective). (391)

API Updates
===========
- Changed Vocab constructor from staticmethod to classmethod to handle inheritance (386)
- Added Transformer Encoder APIs (409)
- Added pre-trained ELMo model to model.get_model API (227)
- Added pre-trained BERT model to model.get_model API (409)
- Added unknown_lookup setter to TokenEmbedding (429)
- Added dtype support to EmbeddingCenterContextBatchify (416)
- Propagated exceptions from PrefetchingStream (406)
- Added sentencepiece tokenizer detokenizer (380)
- Added CSR format for variable length data in embedding training (384)

Fixes & Small Changes
=================
- Included output of nlp.embedding.list_sources() in API docs (421)
- Supported symlinks in examples and scripts (403)
- Fixed weight tying in GNMT and Transformer (413)
- Simplified transformer notebook (400)
- Fixed LazyTransformDataStream prefetching (397)
- Adopted src/gluonnlp folder layout (390)
- Fixed text8 archive file name for downloads from S3 (388) Thanks bkktimber!
- Fixed ppl reporting for training on multi gpu in the language model notebook (365). Thanks ThomasDelteil!
- Fixed a spelling mistake in QA script. (379) Thanks qyhfbqz!

0.4.1

Highlights
=================

- Hands-on Tutorial at KDD 2018 ([website](https://kdd18.mxnet.io/)) ([code](https://github.com/szha/KDD18-Gluon))

Models
======

- **Language Model**
- The [Large Scale Word Language Model](http://gluon-nlp.mxnet.io/scripts/index.htmllarge-scale-word-language-model) as introduced by *Jozefowicz, Rafal, et al. “Exploring the limits of language modeling”. arXiv preprint arXiv:1602.02410 (2016)* achieved test PPL *43.62* on GBW dataset (179 270 277 278 286 294)
- The [NT-ASGD based Language Model](http://gluon-nlp.mxnet.io/master/model_zoo/language_model/index.html) as introduced by *Merity, S., et al. “Regularizing and optimizing LSTM language models”. ICLR 2018* achieved test PPL *65.62* on WikiText-2 dataset (170)
- **Document Classification**
- The [Classification Model](http://gluon-nlp.mxnet.io/scripts/index.htmldocument-classification) as introduced by *Joulin, Armand, et al. “Bag of tricks for efficient text classification”* achieved validation accuracy validation accuracy *98* on Yelp review dataset (258 297)
- **Question Answering**
- The [QANet](https://github.com/dmlc/gluon-nlp/blob/qanet/scripts/question_answering/train.py) as introduced by *Jozefowicz, Rafal, et al. “
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension”. ICLR 2018* achieved F1 score *79.5* on SQuAD 1.1 dataset (339) (coming soon to master branch)

New Tutorials
============

- **Machine Translation**
- The [Google NMT](http://gluon-nlp.mxnet.io/examples/machine_translation/gnmt.html) as introduced by *Wu, Yonghui, et al. “Google's neural machine translation system:
Bridging the gap between human and machine translation”. arXiv preprint arXiv:1609.08144 (2016)* is introduced as part of the gluonnlp tutorial (261)
- The [Transformer based Machine Translation](http://gluon-nlp.mxnet.io/examples/machine_translation/transformer.html) by *Vaswani, Ashish, et al. “Attention is all you need.” Advances in Neural Information Processing Systems. 2017* is introduced as part of the gluonnlp tutorial (279)
- **Sentence Embedding**
- [A Structured Self-attentive Sentence Embedding](http://gluon-nlp.mxnet.io/examples/sentence_embedding/self_attentive_sentence_embedding.html) (366) by *Z. Lin, M. Feng, C. Santos, M. Yu, B. Xiang, B. Zhou, Y. Bengio, "A Structured Self-attentive Sentence Embedding" ICLR 2017* is introduced in gluonnlp tutorial (366)

New Datasets
============

- **Word Embedding**
- [Wikipedia](https://github.com/dmlc/gluon-nlp/blob/master/scripts/word_embeddings/data.py) (218)
- [Fil9 dataset](https://github.com/dmlc/gluon-nlp/blob/master/gluonnlp/data/corpora/large_text_compression_benchmark.py)(363)
- FastText [crawl-300d-2M-subword](https://github.com/dmlc/gluon-nlp/blob/master/gluonnlp/_constants.py)(336), [wiki-news-300d-1M-subword](https://github.com/dmlc/gluon-nlp/blob/master/gluonnlp/_constants.py)(368), [cc.en.300](https://github.com/dmlc/gluon-nlp/blob/master/gluonnlp/_constants.py)(373)

API updates
===========

- Added dataloader that allows multi-shard sampling (237 280 285)
- Simplified DataStream, added DatasetStream, refactored and extended PrefetchingStream (235)
- Unified BPTT batchify for dataset and stream (246)
- Added symbolic beam search (233)
- Added SequenceSampler (272)
- Refactored Transform APIs (282)
- Reorganized index of the repo and model zoo page (357)

Fixes & Small Changes
=================

- Fixed module name in batchify.py example (239)
- Improved imports structure (248)
- Added test for nmt scripts (234)
- Speeded up batchify.Pad (249)
- Fixed LanguageModelDataset.bptt_batchify (243)
- Fixed weight drop and add tests (268)
- Fixed relative links that pypi doesn't handle (293)
- Updated notebook build logic (309)
- Added community link (313)
- Enabled run tests in parallel (317)
- Enabled word embedding scripts tests (321)

[See all commits](https://github.com/dmlc/gluon-nlp/compare/v0.3.3...v0.4.1)

0.3.3

GluonNLP v0.3 contains many exciting new features.

0.2.0

Features
========

GluonNLP provides its users with easy access to

- State of the art models
- Pre-trained word embeddings
- Many public datasets for different tasks
- Examples friendly to users that are new to the task
- Reproducible training scripts

Models
------

Gluon NLP Toolkit supplies model definitions for common NLP tasks. These can be
adapted for the users requirements or taken as blueprint for new developments.
All of these are implemented using [Gluon Blocks](https://mxnet.incubator.apache.org/gluon/index.html)
allowing easy reuse as plug-and-play neural network building blocks.

- **Language Models**
- [Standard RNN language model](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.StandardRNN)
- [AWD language model by salesforce](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.AWDRNN)
- **Attention Cells**
- [Multi-head attention cell](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.MultiHeadAttentionCell)
- [MLP attention cell](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.MLPAttentionCell)
- [Dot-product attention cell](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.DotProductAttentionCell)
- **Beam Search**
- [Beam Search Sampler](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.BeamSearchSampler)
- [Beam Search Scorer](https://gluon-nlp.mxnet.io/api/model.htmlgluonnlp.model.BeamSearchScorer)

Data
----

Gluon NLP Toolkit provides tools for building efficient data pipelines for NLP
tasks by defining a Dataset class interface and utilities for transforming them.
Several datasets are included by default and will be automatically downloaded
when used.

- **Language modeling** with WikiText
- WikiText is a popular language modeling dataset from Salesforce. It is a
collection of over 100 million tokens extracted from the set of verified
Good and Featured articles on Wikipedia.
- **Sentiment Analysis** with IMDB
- IMDB: IMDB is a popular dataset for binary sentiment classification. It
provides a set of 25,000 highly polar movie reviews for training, 25,000 for
testing, and additional unlabeled data.
- **CoNLL** datasets
- These datasets include data for the shared tasks, such as part-of-speech
(POS) tagging, chunking, named entity recognition (NER), semantic role
labeling (SRL), etc.
- We provide built in support for CoNLL 2000 – 2002, 2004, as well as the
Universal Dependencies dataset which is used in the 2017 and 2018
competitions.
- **Word embedding evaluation** datasets
- There are a number of commonly used datasets for intrinsic evaluation for
word embeddings. We provide commonly used datasets for the similarity and
analogy evaluation tasks.

Gluon NLP further ships with common datasets data transformation functions,
dataset samplers to determine how to iterate through datasets as well as
functions to generate data batches.

[A complete and up-to-date list of supplied datasets and utilities is available
in the API documentation](https://gluon-nlp.mxnet.io/api/data.html).

Other features
--------------
- [Vocabulary](https://gluon-nlp.mxnet.io/api/vocab.html)
- [Pre-trained embeddings](https://gluon-nlp.mxnet.io/api/embedding.html)

Examples and scripts
--------------------

[The Gluon NLP toolkit also provides scripts that use the functionality of the
toolkit for various tasks](https://gluon-nlp.mxnet.io/scripts/index.html)

- Word Embedding Evaluation
- Beam Search Generator
- Word language modeling
- Sentiment Analysis through Fine-tuning, w/ Bucketing
- Machine Translation

Page 3 of 3

Releases

Has known vulnerabilities

Gluonnlp

Page 3 of 3

0.7.0

0.6.0

0.5.0

0.4.1

0.3.3

0.2.0

Page 3 of 3

Links

Releases