
Latest version: v0.10.0

Safety actively analyzes 701735 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3


| STS-B | **87.5** | 85.8 | 86.5 |
| MNLI-m/mm | 85.3/84.9 | 84.6/83.4 | **86.7/85.9** |

- The SciBERT model introduced by *Iz Beltagy and Arman Cohan and Kyle Lo* in "[SciBERT: Pretrained Contextualized Embeddings for Scientific Text](". The model checkpoints are converted from the [original repository]( from AllenAI with the following datasets (735):
- `scibert_scivocab_uncased`
- `scibert_scivocab_cased`
- `scibert_basevocab_uncased`
- `scibert_basevocab_cased`

- The BioBERT model introduced by *Lee, Jinhyuk, et al.* in "[BioBERT: a pre-trained biomedical language representation model for biomedical text mining](". The model checkpoints are converted from the [original repository]( with the following datasets (735):
- `biobert_v1.0_pmc_cased`
- `biobert_v1.0_pubmed_cased`
- `biobert_v1.0_pubmed_pmc_cased`
- `biobert_v1.1_pubmed_cased`

- The ClinicalBERT model introduced by *Kexin Huang and Jaan Altosaar and Rajesh Ranganath* in "[ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission](". The model checkpoints are converted from the [original repository]( with the `clinicalbert_uncased` dataset (735)

- The ERNIE model introduced by *Sun, Yu, et al.* in "[ERNIE: Enhanced Representation through Knowledge Integration](". You can get the model checkpoints converted from the [original repository]( with `model.get_model("ernie_12_768_12", "baidu_ernie_uncased")` (759) thanks paperplanet

- BERT fine-tuning script for named entity recognition on CoNLL2003 with test F1 92.2 (612).

- BERT fine-tuning script for Chinese XNLI dataset with 78.3% validation accuracy. (759) thanks paperplanet

- BERT fine-tuning script for intent classification and slot labelling on ATIS (95.9 F1) and SNIPS (95.9 F1). (817)

- The GPT-2 language model introduced by *Radford, Alec, et al.* in "[Language Models are Unsupervised Multitask Learners](". The model checkpoints are converted from the [original repository](, with a script to generate text from GPT-2 model (`gpt2_117m`, `gpt2_345m`) trained on the `openai_webtext` dataset (761).

- The ESIM model for text matching introduced by *Chen, Qian, et al.* in "[Enhanced LSTM for Natural Language Inference](". (689)

- Natural language understanding with datasets from the GLUE benchmark: CoLA, SST-2, MRPC, STS-B, MNLI, QQP, QNLI, WNLI, RTE (682)
- Sentiment analysis datasets: CR, MPQA (663)
- Intent classification and slot labeling datasets: ATIS and SNIPS (816)

New Features
- [Feature] support save model / trainer states to S3 (700)
- [Feature] support load model/trainer states from s3 (702)
- [Feature] Add SentencePieceTokenizer for BERT (669)
- [FEATURE] Flexible vocabulary (732)
- [API] Moving MaskedSoftmaxCELoss and LabelSmoothing to model API (754) thanks ThomasDelteil
- [Feature] add the List batchify function (812) thanks ThomasDelteil
- [FEATURE] Add LAMB optimizer (733)

Bug Fixes
- [BUGFIX] Fixes for BERT embedding, pretraining scripts (640) thanks Deseaus
- [BUGFIX] Update hash of wiki_cn_cased and wiki_multilingual_cased vocab (655)
- fix bert forward call parameter mismatch (695) thanks paperplanet
- [BUGFIX] Fix mlm_loss reporting for eval dataset (696)
- Fix _get_rnn_cell (648) thanks MarisaKirisame
- [BUGFIX] fix mrpc dataset idx (708)
- [bugfix] fix hybrid beam search sampler(710)
- [BUGFIX] [DOC] Update nlp.model.get_model documentation and get_model API (734)
- [BUGFIX] Fix handling of duplicate special tokens in Vocabulary (749)
- [BUGFIX] Fix TokenEmbedding serialization with `emb[emb.unknown_token] != 0` (763)
- [BUGFIX] Fix glue test result serialization (773)
- [BUGFIX] Fix init bug for multilevel BiLMEncoder (783) thanks Ishitori

API Changes
- [API] Dropping support for wiki_multilingual and wiki_cn (764)
- [API] Remove get_bert_model from the public API list (767)

- [FEATURE] offer load_w2v_binary method to load w2v binary file (620)
- [Script] Add inference function for BERT classification (639) thanks TaoLv
- [SCRIPT] - Add static BERT base export script (for use with MXNet Module API) (672)
- [Enhancement] One script to export bert for classification/regression/QA (705)
- [enhancement] refactor bert finetuning script (692)
- [Enhancement] only use the best model for inference for bert classification (716)
- [Dataset] redistribute conll2004 (719)
- [Enhancement] add periodic evaluation for BERT pre-training (720)
- [FEATURE]add XNLI task (717)
- [refactor] Refactor BERT script folder (744)
- [Enhancement] BERT pre-training data generation from sentencepiece vocab (743)
- [REFACTOR] Refactor TokenEmbedding to reduce number of places that initialize internals (750)
- [Refactor] Refactor BERT SQuAD inference code (758)
- [Enhancement] Fix dtype conversion, add sentencepiece support for SQuAD (766)
- [Dataset] Move MRPC dataset to API (780)
- [BiDAF-QANet] Common data processing logic for BiDAF and QANet (739) thanks Ishitori
- [DATASET] add LCQMC, ChnSentiCorp dataset (774) thanks paperplanet
- [Improvement] Implement parser evaluation in Python (772)
- [Enhancement] Add whole word masking for BERT (770) thanks basicv8vc
- [Enhancement] Mix precision support for BERT finetuning (793)
- Generate BERT training samples in compressed format (651)

Minor Fixes
- Various documentation fixes: 635, 637, 647, 656, 664, 667, 670, 676, 678, 681, 698, 704, 731, 745, 762, 771, 746, 778, 800, 810, 807 814 thanks rongruosong crcrpar mrchypark xwind-h
- Fix BERT multiprocessing data creation bug which causes unnecessary dispatching to single worker (649)
- [BUGFIX] Update BERT test and pre-train script (661)
- update url for ws353 (701)
- bump up version (742)
- [DOC] Update textCNN results (737)
- padding value warning (747)
- [TUTORIAL][DOC] Tutorial Updates (802) thanks faramarzmunshi

Continuous Integration
- skip failing tests in mxnet master (685)
- [CI] update nodes for CI (686)
- [CI] CI refactoring to speed up tests (566)
- [CI] fix codecov (693)
- use fixture for squad dataset tests (699)
- [CI] create zipped notebooks for link check (712)
- Fix test infrastructure for pytest > 4 and bump CI pytest version (728)
- [CI] set root in BERT tests (738)
- Fix function_scope_seed (748)
- [CI] Fix links in contribute.rst (752)
- [CI] Update CI dependencies (756)
- Revert "[CI] Update CI dependencies (756)" (769)
- [CI] AWS Batch serverless CI Pipeline for parallel notebook execution during website build step (791)
- [CI] Don't exit pipeline before displaying AWS Batch logfiles (801)
- [CI] Fix for "Don't exit pipeline before displaying AWS Batch logfile (803)
- add license checker (804)
- enable timeout (813)
- Fix website build on master branch (819)



- **Language Models**
- The [Cache Language Model]( as introduced by *Grave, E., et al. “Improving neural language models with a continuous cache”. ICLR 2017* is introduced as part of gluonnlp.model.train (110)
- The [Activation Regularizer and Temporal Activation Regularizer]( as introduced by *Merity, S., et al. "Regularizing and optimizing LSTM language models". ICLR 2018* is introduced as part of gluonnlp.loss (110)
- **Machine Translation**
- The [Transformer Model]( as introduced by Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017* is introduced as part of the gluonnlp nmt scripts (133)
- **Word embeddings**
- Trainable word embedding models are introduced as part of gluonnlp.model.train (136)
- Word2Vec by *Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).*
- FastText models by *Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146.*

New Datasets
- **Machine Translation**
- [WMT2014BPE]( (135) (177) (180)
- **Question Answering**
- [Stanford Question Answering Dataset (SQuAD)]( *Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392).* (113)
- **Word Embeddings**
- [Text8]( (136)

API changes

- The download directory for datasets and other artifacts can now be specified
via the MXNET_HOME environment variable. (106)
- TokenEmbedding class now exposes the Inverse Vocab as well (123)
- SortedSampler now supports use_average_length option (135)
- Add more strategies for bucket creation (145)
- Add tokenizer to bleu (154)
- Add Convolutional Encoder and Highway Layer (129) (186)
- Add plain text of translation data. (158)
- Use Sherlock Holmes dataset instead of PTB for language model notebook (174)
- Add classes JiebaToknizer and NLTKStanfordSegmenter for Chinese Word Segmentation (164)
- Allow toggling output and prompt in documentation website (184)
- Add shape assertion statements for better user experience to some attention cells (201)
- Add support for computation of word embeddings for unknown words in `TokenEmbedding` class (185)
- Distribute subword vectors for pretrained fastText embeddings enabling embeddings for unknown words (185)

Fixes & Small Changes
- fixed bptt_batchify sometimes returned an invalid last batch (120)
- Fixed wrong PPL calculation in word language model script for multi-GPU (150)
- Fix split compound words and wmt16 results (151)
- Adapt pretrained word embeddings example notebook for nd.topk change in mxnet 1.3 (153)
- Fix beam search script (175)
- Fix small bugs in parser (183)
- TokenEmbedding: Skip lines with invalid bytes instead of crashing (188)
- Fix overly large memory use in TokenEmbedding serialization/deserialization if some tokens are overly large (eg. 50k characters) (187)
- Remove duplicates in WordSim353 when combining segments (192)

[See all commits](


This release includes the following fixes:
- [BUGFIX] remove wd from squad (1223)
- Fix deprecation warnings due to invalid escape sequences. (1219)
- Fix layer_norm_eps in BERTEncoder (1214)
- [BUGFIX] Fix vocab determinism in py35 (1166) (1167)

As we prepare for the NumPy-based GluonNLP development, we are making the following adjustments to the branch usage:
- master (old) -> v0.x: this branch will be used for maintenance of GluonNLP 0.x versions.
- numpy -> master: the new master branch will be used for GluonNLP 1.0 onward with NumPy-compatible interface, based on the upcoming MXNet 2.0.


This patch release includes the following bug fix:
- [BUGFIX] remove wd from squad (1223)
- Fix deprecation warnings due to invalid escape sequences. (1219)
- Fix layer_norm_eps in BERTEncoder (1214)


This release includes the bug fix for (1167). It affects the determinism of the instantiated vocabulary object on the order of special tokens on Python 3.5. Users of Python 3.5 are strongly encouraged to upgrade to this version.




INT8 Quantization for BERT Sentence Classification and Question Answering
(1080)! Also Check out the [blog post](

Enhancements to the pretraining script (1121, 1099) and faster tokenizer for
BERT (921, 1024) as well as multi-GPU support for SQuAD fine-tuning (1079).

Make BERT a HybridBlock (877).


The XLNet model introduced by *Yang, Zhilin, et. al* in
"[XLNet: Generalized Autoregressive Pretraining for Language Understanding](".
The model was converted from the [original repository]( (866).

GluonNLP further provides scripts for finetuning XLNet on the Glue (995) and
SQuAD datasets (1130) that reproduce the authors results. [Check out the usage](


The DistilBERT model introduced by *Sanh, Victor, et. al* in
"[DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](" (922).


Add a separate Transformer inference script to make inference easy and make it
convenient to analysis the performance of transformer inference (852).

Korean BERT

[Pre-trained Korean BERT]( is available as part of GluonNLP (1057)

GluonNLP now provides scripts for finetuning RoBERTa (931).


GPT2 is now a HybridBlock the model can be exported for running from other MXNet
language bindings (1010).

New Features
- Add NamedTuple + Dict batchify (959)
- Add even_size option to split sampler (1028)
- Add length normalized metrics for machine translation tasks (1095)
- Add raw attention scores to the AttentionCell 951 (964)
- Add round_to feature to BERT & XLNet finetuning scripts (1133)
- Add stratified train_valid_split similar to sklearn.model_selection.train_test_split (933)
- Add SuperGlue dataset API (858)
- Add Multi Model Server deployment code example for developers (1140)
- Allow custom dropout, number of layers/units for BERT (950)
- Avoid race condition when downloading vocab (1078)
- Deprecate specifying Vocab padding, bos and eos_token as positional arguments (945)
- Fast multitensor adam optimizer (1111)
- Faster grad_global_norm for clipping (1115)
- Hybridizable AWDRNN/StandardRNN (911)
- Padding seq length to multiple of 8 in BERT model (909)
- Scripts for producing the figures that explain the bucketing strategy (908)
- Split up Seq2SeqDecoder in Seq2SeqDecoder and Seq2SeqOneStepDecoder (976)
- Switch CI to Python 3.5 and declare Python 3.5 support (1009)
- Try to use the new None feature in MXNet + Drop support for MXNet 1.5 (967)
- Use fused gelu operator (1082)
- Use softmax with length, and interleaved matmul for BERT (1136)
- Documentation of Model Conversion Scripts at (922)

Bug Fixes and code cleanup
- Add version checker to all scripts (930)
- Add version checker to all tutorials (934)
- Add 'packaging' to requirements (1143)
- Adjust code owner (923)
- Avoid using dict for attention cell parameter creation (1050)
- Bump version in preparation for 0.9 release (987)
- Change SimVerb3500 URL to aclweb hosted version (979)
- Correct propagation of error codes in GluonNLP-py3-master-gpu-doc (971)
- Corrected np.random.randint upper limit in (935)
- Declare Python version requirement in (927)
- Declare more optional dependencies (958)
- Declare pytest seed marker in pytest.ini (940)
- Disable HybridBeamSearch (1021)
- Drop LAMB optimizer from GluonNLP in favor of MXNet version (1116)
- Drop unused compatibility helpers and fix doc (928)
- Fix 905 (906)
- Fix a SQuAD 2.0 evaluation bug (907)
- Fix argument `analogy-max-vocab-size` (904)
- Fix broken multi-head attention cell (878)
- Fix bugs in BERT export script (944)
- Fix chnsenticorp dataset download link (873)
- Fix file sampler for BERT (977)
- Fix index.rst and gpu flag in machine translation (952)
- Fix log in (1001)
- Fix parameter sharing of WeightDropParameter (1083)
- Fix scripts/question_answering/ requiring optional package (1013)
- Fix the weight tie and weight sharing for AWDRNN (1087)
- Fix training command in Language Modeling index.rst (1100)
- Fix version check in and (1003)
- Fix standard rnn weight sharing error (1122)
- Glue data preprocessing pipeline and bert & xlnet scripts (1031)
- Improve Vocab.__repr__ if reserved_tokens or unknown_token is None (989)
- Improve readability (975)
- Improve test robustness (960)
- Improve the readability of the training script. This fix replaces magic numbers with the name (1006)
- Make EmbeddingCenterContextBatchify returned dtype robust to empty sentences (954)
- Modify the log average loss (1103)
- Move ICSL script out of BERT folder (1131)
- Move NER script out of bert folder (1090)
- Move ParallelBigRNN into nlp.model namespace (1118)
- Move get_rnn_cell out of seq2seq_encoder_decoder (1073)
- Mxnet version check (1063)
- Refactor BERT with new data preprocessing (1124)
- Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer (942)
- Remove extra dropout in BERT/RoBERTa (1022)
- Remove outdated comment (943)
- Remove padding warning (916)
- Replace unicode comma with ascii comma (1056)
- Split up inheritance structure of TransformerEncoder and BERTEncoder (988)
- Support int32 for sampled blocks (1106)
- Switch batch jobs to use G4dn.2x instance (1041)
- TransformerXL LayerNorm eps and XLNet pretrained model config (1005)
- Unify BERT horovod and kvstore pre-training script (889)
- Update README.rst (884)
- Update data_api.rst (893)
- Update embedding script (1046)
- Update (1037)
- Update index.rst (876)
- Update index.rst (891)
- Update navbar install (983)
- Update numba dependency in (941)
- Update outdated contributor list (963)
- Update (998)

- Add comment to BERT notebook (1026)
- Add missing docs for nlp.utils (936)
- Add more documentation to XLNet scripts (985)
- Add section for "Clone the master branch for development" (1075)
- Add to toc tree depth to enable multiple level menu (1108)
- Cite source of pretrained parameters for bert_12_768_12 (915)
- Doc fix for vocab.subwords (885)
- Enhance vocab not found err msg (917)
- Fix command line examples for text classification (874)
- Fix math formula in docs (920)
- More detailed doc for CorpusBPTTBatchify (888)
- Release checklist (890)
- Remove non-existent arguments for BERT and Transformer (946)
- Remove py3 usage from the doc (1077)
- Update installation guide with selectors (966)
- Update mxnet version in installation doc (1072)
- Update pre-trained model link (1117)
- Update Installation instructions for source (1146)

Continuous Integration
- Disable SimVerb test for 14 days (953)
- Disable horovod test temporarily (1030)
- Disable known bad mxnet nightly version (997)
- Enable integration tests on CPU (957)
- Enable testing warnings with pytest and update deprecated API invocations (980)
- Enable timestamp in CI (925)
- Enable type checks and inference with pytype (1018)
- Fix CI (875)
- Preserve stderr and stdout streams in doc CI stage for Cloudwatch (882)
- Remove skip_master feature (1017)
- Switch source of MXNet nightly build (1058)
- Test MXNet 1.6 pre-release as part of CI pipeline (1023)
- Update MXNet master version tested on CI (1113)
- Update numba (1096)
- Use Cuda 10.0 MXNet build (991)

Page 1 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.