Flair

Latest version: v0.15.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 6

0.3.2

Not secure

This is an update over release 0.3.1 with some critical bug fixes, a few new features and a lot more pre-packaged embeddings.

New Features

Embeddings

More word embeddings (194 )

We added FastText embeddings for 10 languages ('en', 'de', 'fr', 'pl', 'it', 'es', 'pt', 'nl', 'ar', 'sv'), load using the two-letter language code, like this:

python
french_embedding = WordEmbeddings('fr')

More character LM embeddings (204 187 )

Thanks to contribution by [stefan-it](https://github.com/stefan-it/flair-lms), we added CharLMEmbeddings for Bulgarian and Slovenian. Load like this:

python
flm_embeddings = CharLMEmbeddings('slovenian-forward')
blm_embeddings = CharLMEmbeddings('slovenian-backward')

Custom embeddings (170 )

Add explanation on how to use your own custom word embeddings. Simply convert to gensim.KeyedVectors and point embedding class there:

python
custom_embedding = WordEmbeddings('path/to/your/custom/embeddings.gensim')

New embeddings type: `DocumentPoolEmbeddings` (191 )

Add a new embedding class for document-level embeddings. You can now choose between different pooling options, e.g. min, max and average. Create the new embeddings like this:

python
word_embeddings = WordEmbeddings('glove')
pool_embeddings = DocumentPoolEmbeddings([word_embeddings], mode='min')

Language model

New method: `generate_text()` (167 )

The `LanguageModel` class now has an in-built `generate_text()` method to sample the LM. Run code like this:

python
load your language model
model = LanguageModel.load_language_model('path/to/your/lm')

generate 2000 characters
text = model.generate_text(20000)
print(text)

Metrics

Class-based metrics in `Metric` class (164 )

Refactored Metric class to provide class-based metrics, as well as micro and macro averaged F1 scores.

Bug Fixes

Fix serialization error for MacOS and Windows (174 )

On these setups, we got errors when serializing or loading large models. We've put in place a workaround that limits model size so it works on those systems. Added bonus is that models are smaller now.

"Frozen" dropout (184 )

Potentially big issue in which dropout was frozen in the first epoch in embeddings produced from the character LM, meaning that throughout training the same dimensions stayed dropped. Fixed this.

Testing step in language model trainer (178 )

Previously, the language model was never applied to test data during training. A final testing step has been added in (again).

Testing

Distinguish between unit and integration tests (183)

Instructions on how to run tests with pipenv (161 )

Optimizations

Disable autograd during testing and prediction (175)

Since autograd is unused here this gives us minor speedups.

0.3.1

Not secure

This is a stability-update over release 0.3.0 with small optimizations, refactorings and bug fixes. For list of new features, refer to 0.3.0.

Optimizations

Retain Token embeddings in memory by default (146 )

Allow for faster training of text classifier on large datasets by keeping token embeddings im memory.

Always clear embeddings after prediction (149 )

After prediction, remove embeddings from memory to avoid filling up memory.

Refactorings

Alignd TextClassificationTrainer and SquenceTaggerTrainer (148 )

Align signatures and features of the two training classes to make it easier to understand training options.

Updated DocumentLSTMEmbeddings (150 )

Remove unused flag and code from DocumentLSTMEmbeddings

Removed unneeded AWS and Jinja2 dependencies (158 )

Some dependencies are no longer required.

Bug Fixes

Fixed error when predicting over empty sentences. (157)

Serialization: reset cache settings when saving a model. (153 )

0.3.0

Not secure

Breaking Changes

New `Label` class with confidence score (https://github.com/zalandoresearch/flair/issues/38)

A tag prediction is not a simple string anymore but a `Label`, which holds a value and a confidence score.
To obtain the tag name you need to call `tag.value`. To get the score call `tag.score`. This can help you build
applications in which you only want to use predictions that lie above a specific confidence threshold.

`LockedDropout` moved to the new `flair.nn` module (https://github.com/zalandoresearch/flair/issues/48)

New Features

Multi-token spans (https://github.com/zalandoresearch/flair/issues/54, https://github.com/zalandoresearch/flair/issues/97)
Entities are can now be wrapped into multi-token spans (type: `Span`). This is helpful for entities that span multiple words, such as "George Washington". A `Span` contains the position of the entity in the original text, the tag, a confidence score, and its text. You can get spans from a sentence by using the `get_spans()` method, like so:
python
from flair.data import Sentence
from flair.models import SequenceTagger

make a sentence
sentence = Sentence('George Washington went to Washington .')

load and run NER
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)

get span entities, together with tag and confidence score
for entity in sentence.get_spans('ner'):
print('{} {} {}'.format(entity.text, entity.tag, entity.score))

Predictions with confidence score (https://github.com/zalandoresearch/flair/issues/38)
Predicted tags are no longer simple strings, but objects of type `Label` that contain a value and a confidence score. These scores are extracted during prediction from the sequence tagger or text classifier and indicate how confident the model is of a prediction. Print confidence scores of tags like this:

python
from flair.data import Sentence
from flair.models import SequenceTagger

make a sentence
sentence = Sentence('George Washington went to Washington .')

load the POS tagger
tagger = SequenceTagger.load('pos')

run POS over sentence
tagger.predict(sentence)

print token, predicted POS tag and confidence score
for token in sentence:
print('{} {} {}'.format(token.text, token.get_tag('pos').value, token.get_tag('pos').score))

Visualization routines (https://github.com/zalandoresearch/flair/issues/61)
`flair` now includes visualizations for plotting training curves and weights when training a sequence tagger or text classifier. We also added visualization routines for plotting embeddings and highlighting tags in a sentence. For instance, to visualize contextual string embeddings, do this:

python
from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
from flair.embeddings import CharLMEmbeddings
from flair.visual import Visualizer

get a list of Sentence objects
corpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03).downsample(0.1)
sentences = corpus.train + corpus.test + corpus.dev

init embeddings (can also be a StackedEmbedding)
embeddings = CharLMEmbeddings('news-forward-fast')

embed corpus batch-wise
batches = [sentences[x:x + 8] for x in range(0, len(sentences), 8)]
for batch in batches:
embeddings.embed(batch)

visualize
visualizer = Visualizer()
visualizer.visualize_word_emeddings(embeddings, sentences, 'data/visual/embeddings.html')

Implementation of different dropouts (https://github.com/zalandoresearch/flair/issues/48)
Different dropout possibilities (Locked Dropout and Word Dropout) were added and can be used during training.

Memory management for training on large data sets (https://github.com/zalandoresearch/flair/issues/137)
`flair` now stores contextual string embeddings on disk to speed up training and allow for training on larger datsets.

Pre-trained language models for Polish
Added pre-trained language models for Polish, donated by [(Borchmann et al., 2018)](https://github.com/applicaai/poleval-2018). Load the Polish embeddings like this:

python
flm_embeddings = CharLMEmbeddings('polish-forward')
blm_embeddings = CharLMEmbeddings('polish-backward')

Bug Fixes

Fix evaluation of sequence tagger (https://github.com/zalandoresearch/flair/issues/79, https://github.com/zalandoresearch/flair/issues/75)
The script `eval.pl` for sequence tagger contained bugs. `flair` now uses its own evaluation methods.

Fix bugs in text classifier (https://github.com/zalandoresearch/flair/issues/108)
Fixed bugs in single label training and out-of-memory errors during evaluation.

Others

Standardize logging output (https://github.com/zalandoresearch/flair/issues/16)
Logging output for sequence tagger and text classifier is imporved and standardized.

Update torch version (https://github.com/zalandoresearch/flair/issues/34, https://github.com/zalandoresearch/flair/issues/106)

0.2

| | `[SEP]` | 93.76 +- 0.13 |

0.2.0

Breaking Changes

Reorganized package structure 12

There are now two packages: `flair.models` and `flair.trainers` for the models and model trainers respectively.

Models package
`flair.models` contains 3 model classes: `SequenceTagger`, `TextClassifier` and `LanguageModel`.

Trainers package
`flair.trainers` contains 3 model trainer classes: `SequenceTaggerTrainer`, `TextClassifierTrainer` and `LanguageModelTrainer`.

Direct import from package
You call these classes directly from the packages, for instance the SequenceTagger is now instantiated as:

python
from flair.models import SequenceTagger
tagger = SequenceTagger.load('ner')

Reorganized embeddings 12

Clear distinction between token-level and document-level embeddings by adding two classes, namely `TokenEmbeddings` and `DocumentEmbeddings` from which respective embeddings need to inherit.

New Features

LanguageModelTrainer 24 17

Added `LanguageModelTrainer` class to train your own LM embeddings.

Document Classification 10

Added experimental `TextClassifier` model for document-level text classification. Also added corresponding model trainer class, i.e. `TextClassifierTrainer`.

Batch prediction 7

Added batching into prediction method for faster sequence tagging

CPU-friendly pre-trained models 29

Added pre-trained models with smaller LM embeddings for faster CPU-inference speed

You can load them by adding '-fast' to the model name. Only for English at present.
python
from flair.models import SequenceTagger
tagger = SequenceTagger.load('ner-fast')

Learning Rate Scheduling 19

Added learning rate schedulers to all trainer classes for improved learning rate annealing functionality and control.

Auto-spawn on GPUs 19

All model classes now automatically spawn on GPUs if available. The separate `.cuda()` call is no longer necessary.

Bug Fixes

Retagging error 23

Fixed error that occurred when using multiple pre-trained taggers on the same sentence.

Empty sentence error 33

Fixed error that caused data fetchers to sometimes create empty sentences.

Other

Unit Tests 15

Added a large set of automated unit tests for better stability.

Documentation 15

Expanded documentation and tutorials. Also expanded descriptions of APIs.

Code Simplifications in sequence tagger 19

A number of code simplifications all around, hopefully making the code easier to understand.

0.1.0

First release of Flair Framework

Static word embeddings:
- includes prepared word embeddings from GloVe, FastText, Numberbatch and Extvec
- includes prepared word embeddings for English, German and Swedish

Contextual string embeddings:
- includes pre-trained models for English and German

Text embeddings:
- Two experimental methods for full-text embeddings (LSTM and Mean)

Sequence labeling:
- pre-trained models for English (PoS-tagging, chunking and NER)
- pre-trained models for German (PoS-tagging and NER)
- experimental semantic frame detector for English

Page 6 of 6

Releases

Has known vulnerabilities

Flair

Page 6 of 6

0.3.2

0.3.1

0.3.0

0.2

0.2.0

0.1.0

Page 6 of 6

Links

Releases