Breaking Changes

New `Label` class with confidence score (https://github.com/zalandoresearch/flair/issues/38)

A tag prediction is not a simple string anymore but a `Label`, which holds a value and a confidence score.
To obtain the tag name you need to call `tag.value`. To get the score call `tag.score`. This can help you build
applications in which you only want to use predictions that lie above a specific confidence threshold.

`LockedDropout` moved to the new `flair.nn` module (https://github.com/zalandoresearch/flair/issues/48)

New Features

Multi-token spans (https://github.com/zalandoresearch/flair/issues/54, https://github.com/zalandoresearch/flair/issues/97)
Entities are can now be wrapped into multi-token spans (type: `Span`). This is helpful for entities that span multiple words, such as "George Washington". A `Span` contains the position of the entity in the original text, the tag, a confidence score, and its text. You can get spans from a sentence by using the `get_spans()` method, like so:
from flair.data import Sentence
from flair.models import SequenceTagger

make a sentence
sentence = Sentence('George Washington went to Washington .')

load and run NER
tagger = SequenceTagger.load('ner')

get span entities, together with tag and confidence score
for entity in sentence.get_spans('ner'):
print('{} {} {}'.format(entity.text, entity.tag, entity.score))

Predictions with confidence score (https://github.com/zalandoresearch/flair/issues/38)
Predicted tags are no longer simple strings, but objects of type `Label` that contain a value and a confidence score. These scores are extracted during prediction from the sequence tagger or text classifier and indicate how confident the model is of a prediction. Print confidence scores of tags like this:

from flair.data import Sentence
from flair.models import SequenceTagger

make a sentence
sentence = Sentence('George Washington went to Washington .')

load the POS tagger
tagger = SequenceTagger.load('pos')

run POS over sentence

print token, predicted POS tag and confidence score
for token in sentence:
print('{} {} {}'.format(token.text, token.get_tag('pos').value, token.get_tag('pos').score))

Visualization routines (https://github.com/zalandoresearch/flair/issues/61)
`flair` now includes visualizations for plotting training curves and weights when training a sequence tagger or text classifier. We also added visualization routines for plotting embeddings and highlighting tags in a sentence. For instance, to visualize contextual string embeddings, do this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
from flair.embeddings import CharLMEmbeddings
from flair.visual import Visualizer

get a list of Sentence objects
corpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03).downsample(0.1)
sentences = corpus.train + corpus.test + corpus.dev

init embeddings (can also be a StackedEmbedding)
embeddings = CharLMEmbeddings('news-forward-fast')

embed corpus batch-wise
batches = [sentences[x:x + 8] for x in range(0, len(sentences), 8)]
for batch in batches:

visualizer = Visualizer()
visualizer.visualize_word_emeddings(embeddings, sentences, 'data/visual/embeddings.html')

Implementation of different dropouts (https://github.com/zalandoresearch/flair/issues/48)
Different dropout possibilities (Locked Dropout and Word Dropout) were added and can be used during training.

Memory management for training on large data sets (https://github.com/zalandoresearch/flair/issues/137)
`flair` now stores contextual string embeddings on disk to speed up training and allow for training on larger datsets.

Pre-trained language models for Polish
Added pre-trained language models for Polish, donated by [(Borchmann et al., 2018)](https://github.com/applicaai/poleval-2018). Load the Polish embeddings like this:

flm_embeddings = CharLMEmbeddings('polish-forward')
blm_embeddings = CharLMEmbeddings('polish-backward')

Bug Fixes

Fix evaluation of sequence tagger (https://github.com/zalandoresearch/flair/issues/79, https://github.com/zalandoresearch/flair/issues/75)
The script `eval.pl` for sequence tagger contained bugs. `flair` now uses its own evaluation methods.

Fix bugs in text classifier (https://github.com/zalandoresearch/flair/issues/108)
Fixed bugs in single label training and out-of-memory errors during evaluation.


Standardize logging output (https://github.com/zalandoresearch/flair/issues/16)
Logging output for sequence tagger and text classifier is imporved and standardized.

Update torch version (https://github.com/zalandoresearch/flair/issues/34, https://github.com/zalandoresearch/flair/issues/106)


Breaking Changes

Reorganized package structure 12

There are now two packages: `flair.models` and `flair.trainers` for the models and model trainers respectively.

Models package
`flair.models` contains 3 model classes: `SequenceTagger`, `TextClassifier` and `LanguageModel`.

Trainers package
`flair.trainers` contains 3 model trainer classes: `SequenceTaggerTrainer`, `TextClassifierTrainer` and `LanguageModelTrainer`.

Direct import from package
You call these classes directly from the packages, for instance the SequenceTagger is now instantiated as:

from flair.models import SequenceTagger
tagger = SequenceTagger.load('ner')

Reorganized embeddings 12

Clear distinction between token-level and document-level embeddings by adding two classes, namely `TokenEmbeddings` and `DocumentEmbeddings` from which respective embeddings need to inherit.

New Features

LanguageModelTrainer 24 17

Added `LanguageModelTrainer` class to train your own LM embeddings.

Document Classification 10

Added experimental `TextClassifier` model for document-level text classification. Also added corresponding model trainer class, i.e. `TextClassifierTrainer`.

Batch prediction 7

Added batching into prediction method for faster sequence tagging

CPU-friendly pre-trained models 29

Added pre-trained models with smaller LM embeddings for faster CPU-inference speed

You can load them by adding '-fast' to the model name. Only for English at present.
from flair.models import SequenceTagger
tagger = SequenceTagger.load('ner-fast')

Learning Rate Scheduling 19

Added learning rate schedulers to all trainer classes for improved learning rate annealing functionality and control.

Auto-spawn on GPUs 19

All model classes now automatically spawn on GPUs if available. The separate `.cuda()` call is no longer necessary.

Bug Fixes

Retagging error 23

Fixed error that occurred when using multiple pre-trained taggers on the same sentence.

Empty sentence error 33

Fixed error that caused data fetchers to sometimes create empty sentences.


Unit Tests 15

Added a large set of automated unit tests for better stability.

Documentation 15

Expanded documentation and tutorials. Also expanded descriptions of APIs.

Code Simplifications in sequence tagger 19

A number of code simplifications all around, hopefully making the code easier to understand.


First release of Flair Framework

Static word embeddings:
- includes prepared word embeddings from GloVe, FastText, Numberbatch and Extvec
- includes prepared word embeddings for English, German and Swedish

Contextual string embeddings:
- includes pre-trained models for English and German

Text embeddings:
- Two experimental methods for full-text embeddings (LSTM and Mean)

Sequence labeling:
- pre-trained models for English (PoS-tagging, chunking and NER)
- pre-trained models for German (PoS-tagging and NER)
- experimental semantic frame detector for English

