Genienlp

Latest version: v0.6.0

Safety actively analyzes 638741 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.6.0a3

=======

* Loss dropping is now an optional dependency instead of a required one [96].
* Fixed running Bootleg models in Kubeflow.
* Fixed combining Bootleg and calibration [99].
* Misc bug fixes [97, 102].
* Misc build system and test fixes [101].

0.6.0a2

=======

* Added support for [Bootleg](https://github.com/HazyResearch/bootleg), a state-of-the art
named entity recognition system. The output of the NER can be fed as auxiliary information
to the model in embedding or text form [83, 93].
* Added support for calibration. Calibration is an additional step applied to the output of
the model to compute a confidence score that can be interpreted as the probability of producing
a correct parse. Multiple calibrators can be trained, to separately identify likely incorrect
parses and out-of-domain inputs [72, 74, 92, 94].
* Added support for inference in Kubeflow, using the new command `genienlp kfserver`, which
exposes a compatible HTTP interface [76, 80, 88, 90].
* Preprocessing of inputs can now use the new fast tokenizers from the huggingface library [66].
* A number of new hyperparameter options were added, includng diverse beam search, loss dropping,
and a new learning rate schedule [66].
* Paraphrasing is now a regular task trained with `genienlp train`, and no longer needs a
different set of commands [79].
* Misc bug fixes [67, 68, 69, 70, 71, 85, 95].

0.6.0a1

=======

* Preprocessing of code inputs have changed, and code tokens are no longer treated specially.
Instead, they are treated as normal words and preprocessed using BPE. This allows using any
Huggingface tokenizer without changes. Tasks can still define certain tokens that should be
treated as special tokens. These are either added as new tokens, or further preprocessed
into non-ambiguous sequences of words.
* Old models (MQAN and baselines) were removed. The GloVe vectors and other non-contextual
word embeddings were also removed. Old training options that were ineffective or unused
were removed.
* The internals of the library have been refactored to simplify development allow using any
Huggingface Seq2Seq or MLM model. As a result, the name of the models have changed: `Seq2Seq`
is now `TransformerLSTM` and `Bart` is now `TransformerSeq2Seq`. Command-line flags changed as well.

NOTE: due to the change in model names and commnd-line flags, this release is not backward
compatible with models trained with genienlp <= 0.5.0

0.5.0

=====

* Paraphrasing and training was made much faster, with improved GPU usage and by removing
redundant tokenization in the hot-paraphrase path [37, 38, 47].
* The transformers library was updated to 4.0; PyTorch dependency increased to 1.6 [44, 59, 62].
* New models: BART, mBART, mT5. As part of this work, the model code was refactored to be more consistent
with Huggingface models and generation code [46, 62].
* Paraphrasing scripts are now proper subcommands of genienlp [53].
* It is now possible to fine-tune MBart and Marian models for neural machine translation
and sentence denoising [54].
* genienlp server can now operate in batch mode, improving GPU utilization [58].
* Misc bug and documentation fixes [39, 40, 41, 43, 48, 55, 58].

0.4.0

=====

* Added the ability to run paraphrasing in FP-16 mixed precision mode.
* The dependency on matplotlib and seaborn (used to produce plots for analysis) is now
optional [36].

Please see the development releases below for the full list of features in this release.

0.4.0b1

=======

* Fixed handling of CJK characters and combining characters in BERT and XLM tokenizers [34].

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.