Flair

Latest version: v0.15.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 6

0.4.5

Not secure
This is an enhancement release that slims down Flair for quicker/easier installation and smaller library size. It also makes Flair compatible with torch 1.4.0 and adds enhancements that reduce model size and improve runtime speed for some embeddings. New features include the ability to steer the precision/recall tradeoff during training of models and support for CamemBERT embeddings.


Memory, Runtime and Dependency Improvements

Slim down dependency tree (1296 1299 1335 1336)

We want to keep list of dependencies of Flair generally small to avoid errors like 1245 and keep the library small and quick to setup. So we removed dependencies that were each only used for one particular feature, namely:
- `ipython` and `ipython-genutils`, only used for visualization settings in iPython notebooks
- `tiny_tokenizer`, used for Japanese tokenization (replaced with instructions for how to install for all users who want to use Japanese tokenizers)
- `pymongo`, used for MongoDB datasets (replaced with instructions for how to install for all users who want to use MongoDB datasets)
- `torchvision`, now only loaded when needed

We also relaxed version requirements for easier installation on Google CoLab (1335 1336)

Dramatic speed-up of BERT embeddings (1308)

shoarora optimized the BERTEmbeddings implementation by removing redundant calls. This was shown to lead to dramatic speed improvements.

Reduce size of models that use WordEmbeddings (1315)

timnon added a method to replace word embeddings in trained model with sqlite database to dramatically reduce memory usage. Creates class `WordEmbeedingsStore` which can be used to replace a `WordEmbeddings`-instance in a flair model via duck-typing. By using this, timnon was able to reduce our ner-servers memory consumption from 6gig to 600mb (10x decrease) by adding a few lines of code. It can be tested using the following lines (also in the docstring). First create a headless version of a model without word embeddings:

python
from flair.inference_utils import WordEmbeddingsStore
from flair.models import SequenceTagger
import pickle
tagger = SequenceTagger.load("multi-ner-fast")
WordEmbeddingsStore.create_stores(tagger)
pickle.dump(tagger, open("multi-ner-fast-headless.pickle", "wb"))

and then to run the stored headless model without word embeddings, use:
python
from flair.data import Sentence
tagger = pickle.load(open("multi-ner-fast-headless.pickle", "rb"))
WordEmbeddingsStore.load_stores(tagger)
text = "Schade um den Ameisenbären. Lukas Bärfuss veröffentlicht Erzählungen aus zwanzig Jahren."
sentence = Sentence(text)
tagger.predict(sentence)



New Features

Prioritize precision/recall or specific classes during training (1345)

klasocki added ways to steer the precision/recall tradeoff during training of models, as well as prioritize certain classes. This option was added to the `SequenceTagger` and the `TextClassifier`.

You can steer precision/recall tradeoff by adding the `beta` parameter, which indicates how many more times recall is important than precision. So if you set `beta=0.5`, precision becomes twice as important than recall. If you set `beta=2`, recall becomes twice as important as precision. Do it like this:

python
tagger = SequenceTagger(
hidden_size=256,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type,
beta=0.5)


If you want to prioritize classes, you can pass a `weight_loss` dictionary to the model classes. For instance, to prioritize learning the NEGATIVE class in a sentiment tagger, do:

python
tagger = TextClassifier(
document_embeddings=embeddings,
label_dictionary=tag_dictionary,
loss_weights={'NEGATIVE': 10.})


which will increase the importance of class NEGATIVE by a factor of 10.

CamemBERT Embeddings (1297)
stefan-it added support for the recently proposed French language model: CamemBERT.

Thanks to the awesome 🤗/Transformers library, CamemBERT can be used in Flair like in this example:

python
from flair.data import Sentence
from flair.embeddings import CamembertEmbeddings

embedding = CamembertEmbeddings()

sentence = Sentence("J'aime le camembert !")
embedding.embed(sentence)

for token in sentence.tokens:
print(token.embedding)


Bug fixes and enhancements

- Fix new RNN format for torch 1.4.0 (1360, 1382 )
- Fix memory issue in PooledFlairEmbeddings (1337 1339)
- Correct subtoken mapping function for GPT-2 and RoBERTa (1242)
- Update the transformers library to the latest 2.3 version (1333)
- Add staticmethod decorator to some functions (1257)
- Add a warning if validation data is too small (1115)
- Remove leftover printline from MUSE embeddings (1224)
- Correct generate_text() UTF-8 conversion (1238)
- Clarify documentation (1295 1332)
- Replace sklearn by scikit-learn (1321)
- Fix off-by-one error in progress logging (1334)
- Fix typo and annotation (1341)
- Various improvements (1347)
- Make load_big_file work with read-only file (1353)
- Rename tiny_tokenizer to konoha (1363)
- Make test loss plotting optional (1372)
- Add pretty print function for Dictionary (1375)

0.4.4

Not secure
Release 0.4.4 introduces dramatic improvements in inference speed for taggers (thanks to many contributions by pommedeterresautee), Flair embeddings in 300 languages (thanks stefan-it), modular tokenization and many new features and refactorings.


Speed optimizations

Many refactorings by pommedeterresautee to improve inference speed of sequence tagger (1038 1053 1068 1093 1130), Flair embeddings (1074 1095 1107 1132 1145), word embeddings (1084),
embeddings memory management (1082 1117), general optimizations (1112) and classification (1187).

The combined improvements **increase inference speed by a factor of 2-3**!

New features

Modular tokenization (1022)

You can now pass custom tokenizers to `Sentence` objects and `Dataset` loaders to use different tokenizers than the included `segtok` library by implementing a tokenizer method. Currently, in-built support exists for whitespace tokenization, segtok tokenization and Japanese tokenization with mecab (requires mecab to be installed). In the future, we expect support for additional external tokenizers to be added.

For instance, if you wish to use Japanese tokanization performed by mecab, you can instantiate the `Sentence` object like this:

python
from flair.data import build_japanese_tokenizer
from flair.data import Sentence

instantiate Japanese tokenizer
japanese_tokenizer = build_japanese_tokenizer()

init sentence and pass this tokenizer
sentence = Sentence("私はベルリンが好きです。", use_tokenizer=japanese_tokenizer)
print(sentence)



Flair Embeddings for 300 languages (1146)

Thanks to stefan-it, there is now a massivey multilingual Flair embeddings model that covers 300 languages. See 1099 for more info on these embeddings and [this repo](https://github.com/stefan-it/flair-lms#multilingual-flair-embeddings) for more details.

This replaces the old multilingual Flair embeddings that were trained for 6 languages. Load them with:

python
embeddings_fw = FlairEmbeddings('multi-forward')
embeddings_bw = FlairEmbeddings('multi-backward')


Multilingual Character Dictionaries (1157)

Adds two multilingual character dictionaries computed by stefan-it.

Load with

python
dictionary = Dictionary.load('chars-large')
print(len(dictionary.idx2item))

dictionary = Dictionary.load('chars-xl')
print(len(dictionary.idx2item))


Batch-growth annealing (1138)

The paper [Don't Decay the Learning Rate, Increase the Batch Size](https://arxiv.org/abs/1711.00489) makes the case for increasing the batch size over time instead of annealing the learning rate.

This version adds the possibility to have arbitrarily large mini-batch sizes with an accumulating gradient strategy. It introduces the parameter `mini_batch_chunk_size` that you can set to break down large mini-batches into smaller chunks for processing purposes.

So let's say you want to have a mini-batch size of 128, but your memory cannot handle more than 32 samples at a time. Then you can train like this:

python
trainer = ModelTrainer(tagger, corpus)
trainer.train(
"path/to/experiment/folder",
set large mini-batch size
mini_batch_size=128,
set chunk size to lower memory requirements
mini_batch_chunk_size=32,
)


Because we now can arbitrarly raise mini-batch size, we can now execute the annealing strategy in the above paper. Do it like this:

python
trainer = ModelTrainer(tagger, corpus)
trainer.train(
"path/to/experiment/folder",
set initial mini-batch size
mini_batch_size=32,
choose batch growth annealing
batch_growth_annealing=True,
)


Document-level sequence labeling (1194)

Introduces the option for reading entire documents into one Sentence object for sequence labeling. This option is now supported for `CONLL_03`, `CONLL_03_GERMAN` and `CONLL_03_DUTCH` datasets which indicate document boundaries.

Here's how to train a model on CoNLL-03 on the document level:

python
read CoNLL-03 with document_as_sequence=True
corpus = CONLL_03(in_memory=True, document_as_sequence=True)

what tag do we want to predict?
tag_type = 'ner'

3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

init simple tagger with GloVe embeddings
tagger: SequenceTagger = SequenceTagger(
hidden_size=256,
embeddings=WordEmbeddings('glove'),
tag_dictionary=tag_dictionary,
tag_type=tag_type,
)

initialize trainer
from flair.trainers import ModelTrainer

trainer: ModelTrainer = ModelTrainer(tagger, corpus)

start training
trainer.train(
'path/to/your/experiment',
set a much smaller mini-batch size because documents are huge
mini_batch_size=2,
)


Option to evaluate on training split (1202)

Previously, the `ModelTrainer` only allowed monitoring of dev and test splits during training. Now, you can also monitor the train split to better check if your method is overfitting.

Support for Danish tagging (1183)

Adds support for Danish POS and NER thanks to AmaliePauli!

Use like this:

python
from flair.data import Sentence
from flair.models import SequenceTagger

example sentence
sentence = Sentence("København er en fantastisk by .")

load Danish NER model and predict
ner_tagger = SequenceTagger.load('da-ner')
ner_tagger.predict(sentence)

print annotations (NER)
print(sentence.to_tagged_string())

load Danish POS model and predict
pos_tagger = SequenceTagger.load('da-pos')
pos_tagger.predict(sentence)

print annotations (NER + POS)
print(sentence.to_tagged_string())


Support for DistilBERT embeddings (1044)

You can use them like this:

python
from flair.data import Sentence
from flair.embeddings import BertEmbeddings

embeddings = BertEmbeddings("distilbert-base-uncased")

s = Sentence("Berlin and Munich are nice cities .")
embeddings.embed(s)

for token in s.tokens:
print(token.embedding)
print(token.embedding.shape)


MongoDataset for reading text classification data from a Mongo database (1192)

Adds the option of reading data from MongoDB. See [this documentation](https://github.com/zalandoresearch/flair/pull/1192#issuecomment-540015019) on how to use this features.

Feidegger corpus (1199)

Adds a dataset downloader for the Feidegger corpus consisting of text-image pairs. Instantiate the corpus like this:

python
from flair.datasets import FeideggerCorpus

instantiate Feidegger corpus
corpus = FeideggerCorpus()

print a text-image pair
print(corpus.train[0])


Refactorings

Refactor checkpointing mechanism (1101)

Refactored the checkpointing mechanism and slimmed down interfaces / code required to load checkpoints.

In detail:

- The methods `save_checkpoint` and `load_checkpoint` are no longer part of the `flair.nn.Model` interface. Instead, saving and restoring checkpoints is now (fully) performed by the `ModelTrainer`.
- The optimizer state and scheduler state are removed from the `ModelTrainer` constructor since they are no longer required here.
- Loading a checkpoint is now one line of code (previously two lines).

python
1. initialize trainer as always with a model and a corpus
from flair.trainers import ModelTrainer
trainer: ModelTrainer = ModelTrainer(model, corpus)

2. train your model for 2 epochs
trainer.train(
'experiment/folder',
max_epochs=2,
example checkpointing
checkpoint=True,
)

3. load last checkpoint with one line of code
trainer = ModelTrainer.load_checkpoint('experiment/folder/checkpoint.pt', corpus)

4. continue training for 2 extra epochs
trainer.train('experiment/folder_2', max_epochs=4)


Refactor data sampling during training (1154)

Adds a `FlairSampler` interface to better enable passing custom samplers to the `ModelTrainer`.

For instance, if you want to always shuffle your dataset in chunks of 5 to 10 sentences, you provide a sampler like this:

python
your trainer
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

execute training run
trainer.train('path/to/experiment/folder',
max_epochs=150,
sample data in chunks of 5 to 10
sampler=ChunkSampler(block_size=5, plus_window=5)
)


Other refactorings

- Switch everything to batch first mode (1077)

- Refactor classification to be more consistent with SequenceTagger (1151)

- PyTorch-Transformers -> Transformers 1163

- In-place transpose of tensors (1047)


Enhancements

Documentation fixes (1045 1098 1121 1157 1160 1168 )

Add option to set ` rnn_type` used in ` SequenceTagger` (1113)
Accept string as input in NER predict (1142)

Example usage:

python
init tagger
tagger= SequenceTagger.load('ner')

predict over list of strings
sentences = tagger.predict(
[
'George Washington went to Berlin .',
'George Berlin lived in Washington .'
]
)

output predictions
for sentence in sentences:
print(sentence.to_tagged_string())


Enable One-hot Embeddings of other Tags (1191)



Bug fixes

- Fix the learning rate finder (1119)
- Fix OneHotEmbeddings on Cuda (1147)
- Fix encoding error in ` CSVClassificationDataset` (1055)
- Fix encoding errors related to old windows chars (1135)
- Fix length error in ` CharacterEmbeddings` (1088 )
- Fix tokenizer insert empty token to sentence object (1226)
- Ensure `StackedEmbeddings` always has the same embedding order (1114)
- Use $HOME instead of ~ for ` cache_root` (1134)

0.4.3

Not secure
Release 0.4.3 includes a host of new features including transformer-based embeddings (roBERTa, XLNet, XLM, etc.), fine-tuneable `FlairEmbeddings`, crosslingual MUSE embeddings, new data loading/sampling methods, speed/memory optimizations, bug fixes and enhancements. It also begins a refactoring of interfaces that prepares more general applicability of Flair to other types of downstream tasks.

Embeddings

Transformer embeddings (941 972 993)

Updates the old `pytorch-pretrained-BERT` library to the latest version of `pytorch-transformers` to support various new Transformer-based architectures for embeddings.

A total of 7 (new/updated) transformer-based embeddings can be used in Flair now:

python
from flair.embeddings import (
BertEmbeddings,
OpenAIGPTEmbeddings,
OpenAIGPT2Embeddings,
TransformerXLEmbeddings,
XLNetEmbeddings,
XLMEmbeddings,
RoBERTaEmbeddings,
)

bert_embeddings = BertEmbeddings()
gpt1_embeddings = OpenAIGPTEmbeddings()
gpt2_embeddings = OpenAIGPT2Embeddings()
txl_embeddings = TransformerXLEmbeddings()
xlnet_embeddings = XLNetEmbeddings()
xlm_embeddings = XLMEmbeddings()
roberta_embeddings = RoBERTaEmbeddings()


Detailed benchmarks on the downsampled CoNLL-2003 NER dataset for English can be found in 873 .

Crosslingual MUSE Embeddings (853)

Use the new `MuseCrosslingualEmbeddings` class to embed any sentence in one of 30 languages into the same embedding space. Behind the scenes the class first does language detection of the sentence to be embedded, and then embeds it with the appropriate language embeddings. If you train a classifier or sequence labeler with (only) this class, it will automatically work across all 30 languages, though quality may widely vary.

Here's how to embed:
python
initialize embeddings
embeddings = MuseCrosslingualEmbeddings()

two sentences in different languages
sentence_1 = Sentence("This red shoe is new .")
sentence_2 = Sentence("Dieser rote Schuh ist rot .")

language code is auto-detected
print(sentence_1.get_language_code())
print(sentence_2.get_language_code())

embed sentences
embeddings.embed([sentence_1, sentence_2])

print similarities
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-6)
for token_1, token_2 in zip (sentence_1, sentence_2):
print(f"'{token_1.text}' and '{token_2.text}' similarity: {cos(token_1.embedding, token_2.embedding)}")



FastTextEmbeddings (879 )

Adds `FastTextEmbeddings` capable of handling for oov words. Be warned though that these embeddings are huge. `BytePairEmbeddings` are much smaller and reportedly of similar quality so it is probably advisable to use those instead.

Fine-tuneable FlairEmbeddings (922)

You can now fine-tune FlairEmbeddings on downstream tasks. You can **fine-tune an existing LM** by simply passing the `fine_tune` parameter in the `FlairEmbeddings` constructor, like this:

python
embeddings = FlairEmbeddings('news-foward', fine_tune=True)


You can also use this option to **task-train a wholly new language model** by passing an empty `LanguageModel` to the `FlairEmbeddings` constructor and the `fine_tune` parameter, like this:

python
make an empty language model
language_model = LanguageModel(
Dictionary.load('chars'),
is_forward_lm=True,
hidden_size=256,
nlayers=1)

init FlairEmbeddings to task-train this model
embeddings = FlairEmbeddings(language_model, fine_tune=True)



Optimizations

Automatic mixed precision support (934)

Mixed precision training can significantly speed up training. It can now be enabled by setting `use_amp=True` in the trainer classes. For instance for training language models you can do:

python
train your language model
trainer = LanguageModelTrainer(language_model, corpus)

trainer.train('resources/taggers/language_model',
sequence_length=256,
mini_batch_size=256,
max_epochs=10,
use_amp=True)


In our experiments, we saw 3x speedup of training large language models though results vary depending on your model size and experimental setup.

Control memory / speed tradeoff during training (891 809).

This release introduces the `embeddings_storage_mode` parameter to the `ModelTrainer` class and `predict()` methods. This parameter can be one of 'none', 'cpu' and 'gpu' and allows you to control the tradeoff between memory usage and speed during training:

- If set to '**none**' all embeddings are deleted after usage - this has lowest memory requirements but means that embeddings need to be recomputed at each epoch of training potentially causing a slowdown.
- If set to '**cpu**' all embeddings are moved to CPU memory after usage. During training, this means that they only need to be moved back to GPU for the forward pass, and not recomputed so in many cases this is faster, but requires memory.
- If set to '**gpu**' all embeddings stay on GPU memory after computation. This eliminates memory shuffling during training, causing a speedup. However this option requires enough GPU memory to be available for all embeddings of the dataset.

To use this option during training, simply set the parameter:

python
initialize trainer
trainer: ModelTrainer = ModelTrainer(tagger, corpus)
trainer.train(
"path/to/your/model",
embeddings_storage_mode='gpu',
)


This release also removes the `FlairEmbeddings`-specific disk-caching mechanism. In the future, a more general caching mechanism applicable to all embedding types may potentially be added as a fourth memory management option.

Speed-ups on in-memory datasets (792)

A new `DataLoader` abstract base class used in Flair will speed up data loading for in-memory datasets.


Refactoring of interfaces (891 843)

This release also slims down interfaces of `flair.nn.Model` and adds a new `DataPoint` interface that is currently implemented by the `Token` and `Sentence` classes. The idea is to widen the applicability of Flair to other data types and other tasks. In the future, the `DataPoint` interface will for example also be implemented by an `Image` object and new downstream tasks added to Flair.

The release also slims down the `evaluate()` method in the `flair.nn.Model` interface to take a `DataLoader` instead of a group of parameters. And refactors the logging header logic. Both refactorings prepare adding new new downstream tasks to Flair in the near future.

Other features

Training Classifiers with CSV files (826 952 967)

Adds the `CSVClassificationCorpus` so you can train classifiers directly from CSVs instead of first having to convert to FastText format. To load a CSV, you need to pass a `column_name_map` (like in `ColumnCorpus`), which indicates which column(s) in the CSV holds the text and which field(s) the label(s):

python
corpus = CSVClassificationCorpus(
path to the data folder containing train / test / dev files
data_folder='path/to/data',
indicates which columns are text and labels
column_name_map={4: "text", 1: "label_topic", 2: "label_subtopic"},
if CSV has a header, you can skip it
skip_header=True)


Data sampling (908)

We added the first (of many) data samplers that can be passed to the `ModelTrainer` to influence training. The `ImbalancedClassificationDatasetSampler` for instance will upsample rare classes and downsample common classes in a classification dataset. It may potentially help with imbalanced datasets. Call like this:
python
initialize trainer
trainer: ModelTrainer = ModelTrainer(tagger, corpus)
trainer.train(
'path/to/folder',
learning_rate=0.1,
mini_batch_size=32,
sampler=ImbalancedClassificationDatasetSampler,
)

There are two experimental chunk samplers (`ChunkSampler` and `ExpandingChunkSampler`) split a dataset into chunks and shuffle them. This preserves some ordering of the original data while also randomizing the data.


Visualization

- Adds HTML vizualization of sequence labeling (933). Call like this:
python
from flair.visual.ner_html import render_ner_html

tagger = SequenceTagger.load('ner')

sentence = Sentence(
"Thibaut Pinot's challenge ended on Friday due to injury, and then Julian Alaphilippe saw "
"his lead fall away. The BBC's Hugh Schofield in Paris reflects on 34 years of hurt."
)

tagger.predict(sentence)
html = render_ner_html(sentence)

with open("sentence.html", "w") as writer:
writer.write(html)


- Plotter now returns images for use in iPython notebooks (943)
- Initial TensorBoard support (924)
- Add pointer to Flair Visualizer (1014)

Additional parameterization options

- `CharacterEmbeddings` now let you specify number of hidden states and embedding size (834)
python
embedding = CharacterEmbedding(char_embedding_dim=64, hidden_size_char=64)

- Adds configuration option for minimal learning rate stopping criterion (871)
- `num_workers` is a parameter of `LanguageModelTrainer` (962 )

Bug fixes / enhancements
- Updates old pretrained models to remove old bugs / performance issues (1017)
- Fix error in RNN initialization in `DocumentRNNEmbeddings` (793)
- `ELMoEmbeddings` now use `flair.device` param (825)
- Fix download of TREC_6 dataset (896)
- Fix download of UD_GERMAN-HDT (980)
- Fix download of WikiNER_German (1006)
- Fix error in `ColumnCorpus` in which words that begin with hashtags were skipped as comments (956)
- Fix `max_tokens_per_do`c param in `ClassificationCorpus` (991)
- Simplify split rule in `ColumnCorpus` (990)
- Fix import error message for `ELMoEmbeddings` (1019)
- References to Persian language unified across embeddings (773)
- Updates most pre-trained models fixing quality issues / bugs (800)
- Clarifications in documentation (803 860 868)
- Fixes infinite loop for tokens without startpos (1030)

Enhancements
- Adds a learnable initial hidden state to `SequenceTagger` (899)
- Now keeps order of sentences in mini-batch when embedding (866)
- `SequenceTagger` now optionally returns a distribution of tag probabilities over all classes (782 949 1016)
- The model trainer now outputs a 'test.tsv' file that contains prediction of final model when done training (771 )
- Releases logging handler when finishing training a model (799)
- Fixes `bad_epochs` in training logs and no longer evaluates on test data at each epoch by default (818 )
- Convenience method to remove all empty sentences from a corpus (795)

0.4.2

Not secure
New way of loading data (768)

The data loading part has been completely refactored to enable streaming data loading from disk using PyTorch's DataLoaders. I.e. training no longer requires the full dataset to be kept in memory, allowing us to train models over much larger datasets. This version also changes the syntax of how to load datasets.

Old way (now deprecated):
python
from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)


New way:
python
import flair.datasets
corpus = flair.datasets.UD_ENGLISH()


To use streaming loading, i.e. to not load into memory, you can pass the `in_memory` parameter:
python
import flair.datasets
corpus = flair.datasets.UD_ENGLISH(in_memory=False)


Embeddings

Flair embeddings (614)

This release brings Flair embeddings to 11 new languages (thanks stefan-it!): Arabic (ar), Danish (da), Persian (fa), Finnish (fi), Hebrew (he), Hindi (hi), Croatian (hr), Indonesian (id), Italian (it), Norwegian (no) and Swedish (sv). It also improves support for Bulgarian (bg), Czech, Basque (eu), Dutch (nl) and Slovenian (sl), and adds special language models for historical German. Load with language code, i.e.

python
load Flair embeddings for Italian
embeddings = FlairEmbeddings('it-forward')


One-hot encoded embeddings (747)

Some classification baselines work astonishingly well with simple learnable word embeddings. To support testing these baselines, we've added learnable word embeddings that start from a one-hot encoding of words. To initialize, you need to pass a corpus to initialize the vocabulary.

python
load corpus
import flair.datasets
corpus = flair.datasets.UD_ENGLISH()

init learnable word embeddings with corpus
embeddings = OneHotEmbeddings(corpus)


More options in `DocumentPoolEmbeddings` (747)

We now allow users to specify a fine-tuning option before the pooling operation is executed in document pool embeddings. Options are 'none' (no fine-tuning), 'linear' (linear remapping of word embeddings), 'nonlinear' (nonlinear remapping of word embeddings). Nonlinear should be used together with `WordEmbeddings`, while None should be used with `OneHotEmbeddings` (not necessary since they are already learnt on data). So, to replicate FastText classification you can either do:

python
instantiate one-hot encoded word embeddings
embeddings = OneHotEmbeddings(corpus)

document pool embeddings
document_embeddings = DocumentPoolEmbeddings([embeddings], fine_tune_mode='none')


or

python
instantiate pre-trained word embeddings
embeddings = WordEmbeddings('glove')

document pool embeddings
document_embeddings = DocumentPoolEmbeddings([embeddings], fine_tune_mode='nonlinear')


OpenAI GPT Embeddings (624)

We now support embeddings from the OpenAI GPT model. We use the excellent pytorch-pretrained-BERT library to download the GPT model, tokenize the input and extract embeddings from the subtokens.

Initialize with:

python
embeddings = OpenAIGPTEmbeddings()

Portuguese embeddings from NILC (576)

Extensibility to new downstream tasks (681)

Previously, we had the `SequenceTagger` and `TextClassifier` as the two downstream tasks supported by Flair. The `ModelTrainer` had specific methods to train these two models, making it difficult for users to add new types of tasks (such as text regression) to Flair.

This release refactors the `flair.nn.Model` and `ModelTrainer` functionality to make it uniform across tagging models and enable users to add new tasks to Flair. Now, by implementing the 5 methods in the `flair.nn.Model` interface, a custom model immediately becomes trainable with the `ModelTrainer`. Now, three types of downstream tasks implement this interface:

- the `SequenceTagger`,
- the `TextClassifier`
- and the beta `TextRegressor`.

The code refactor removes a lot of code redundancies and slims down the interfaces of the downstream tasks classes. As the sole breaking change, it removes the `load_from_file()` methods, which are now part of the `load()` method, i.e. if previously you loaded a self-trained model like this:

python
tagger = SequenceTagger.load_from_file('/path/to/model.pt')


You now do it like this:

python
tagger = SequenceTagger.load('/path/to/model.pt')


New features

- New beta support for text regression (564)
- Return confidence scores for single-label classification (664)
- Add method to find probability for each class in case of multi-class classification (693)
- Capability to change threshold during multi label classification 707
- Support for customized ELMo embeddings (661)
- Detect multi-label problems automatically: Previously, users always had to specify whether their text classification problem was multi_label or not. Now, this is detected automatically if users do not specify. So now you can init like this:

python
corpus
corpus = TREC_6()

make label_dictionary
label_dictionary = corpus.make_label_dictionary()

init text classifier
classifier = TextClassifier(document_embeddings, label_dictionary)


- We added better module descriptions to embeddings and dropout so that more parameters get printed by default for models for better logging. (747)
- Make 'cache_root' a global variable so that different directories can be chosen for caching (667)
- Both string and Token objects can now be passed to the add_token method (628)

New datasets
- Added IMDB classification corpus to `flair.datasets` (749)
- Added TREC_6 classification corpus to `flair.datasets` (749)
- Added 20 newsgroups classification corpus to `flair.datasets` (NEWSGROUPS object)
- WASSA-17 emotion intensity text regression tasks (695)

Bug fixes

- We normalized the training loss across modules so that train / test loss are consistent. (670)
- Permission error on Windows preventing model download (557)
- Handling of empty sentences (566 758)
- Fix text generation on CUDA (666)
- others ...

0.4.1

Not secure
Updated documentation (https://github.com/zalandoresearch/flair/issues/138, https://github.com/zalandoresearch/flair/issues/89)
Expanded documentation and tutorials.

0.4.0

Not secure
Release 0.4 with new models, lots of new languages, experimental multilingual models, hyperparameter selection methods, BERT and ELMo embeddings, etc.

New Features

Support for new languages

Flair embeddings
We now include new language models for:
* [Swedish](https://github.com/zalandoresearch/flair/issues/3)
* [Polish](https://github.com/zalandoresearch/flair/issues/187)
* [Bulgarian](https://github.com/zalandoresearch/flair/issues/188)
* [Slovenian](https://github.com/zalandoresearch/flair/issues/202)
* [Dutch](https://github.com/zalandoresearch/flair/issues/224)

In addition to English and German. You can load FlairEmbeddings for Dutch for instance with:

python
flair_embeddings = FlairEmbeddings('dutch-forward')


Word Embeddings

We now include pre-trained [FastText Embeddings for 30 languages](https://github.com/zalandoresearch/flair/issues/234): English, German, Dutch, Italian, French, Spanish, Swedish, Danish, Norwegian, Czech, Polish, Finnish, Bulgarian, Portuguese, Slovenian, Slovakian, Romanian, Serbian, Croatian, Catalan, Russian, Hindi, Arabic, Chinese, Japanese, Korean, Hebrew, Turkish, Persian, Indonesian.

Each language has embeddings trained over Wikipedia, or Web crawls. So instantiate with:

python
German embeddings computed over Wikipedia
german_wikipedia_embeddings = WordEmbeddings('de-wiki')

German embeddings computed over web crawls
german_crawl_embeddings = WordEmbeddings('de-crawl')


Named Entity Recognition

Thanks to the Flair community, we now include NER models for:
* [French](https://github.com/zalandoresearch/flair/issues/238)
* [Dutch](https://github.com/zalandoresearch/flair/issues/224)

Next to the previous models for English and German.

Part-of-Speech Taggigng

Thanks to the Flair community, we now include PoS models for:
* [German tweets](https://github.com/zalandoresearch/flair/issues/51)


Multilingual models

As a major new feature, we now include models that can tag text in various languages.

12-language Part-of-Speech Tagging

We include a PoS model trained over 12 different languages (English, German, Dutch, Italian, French, Spanish, Portuguese, Swedish, Norwegian, Danish, Finnish, Polish, Czech).

python
load model
tagger = SequenceTagger.load('pos-multi')

text with English and German sentences
sentence = Sentence('George Washington went to Washington . Dort kaufte er einen Hut .')

predict PoS tags
tagger.predict(sentence)

print sentence with predicted tags
print(sentence.to_tagged_string())


4-language Named Entity Recognition

We include a NER model trained over 4 different languages (English, German, Dutch, Spanish).

python
load model
tagger = SequenceTagger.load('ner-multi')

text with English and German sentences
sentence = Sentence('George Washington went to Washington . Dort traf er Thomas Jefferson .')

predict NER tags
tagger.predict(sentence)

print sentence with predicted tags
print(sentence.to_tagged_string())


This model also kind of works on other languages, such as French.

Pre-trained classification models ([issue 70](https://github.com/zalandoresearch/flair/issues/70))

Flair now also includes two pre-trained classification models:
* de-offensive-lanuage: detecting offensive language in German text ([GermEval 2018 Task 1](https://projects.fzai.h-da.de/iggsa/projekt/))
* en-sentiment: detecting postive and negative sentiment in English text ([IMDB](http://ai.stanford.edu/~amaas/data/sentiment/))

Simply load the `TextClassifier` using the preferred model, such as
python
TextClassifier.load('en-sentiment')


BERT and ELMo embeddings

We added both BERT and ELMo embeddings so you can try them out, and mix and match them with Flair embeddings or any other embedding types. We hope this will enable the research community to better compare and combine approaches.

BERT Embeddings ([issue 251](https://github.com/zalandoresearch/flair/issues/251))

We added [BERT embeddings](https://arxiv.org/pdf/1810.04805.pdf) to Flair. We are using the implementation of [huggingface](https://github.com/huggingface/pytorch-pretrained-BERT). The embeddings can be used as any other embedding type in Flair:

python
from flair.embeddings import BertEmbeddings
init embedding
embedding = BertEmbeddings()
create a sentence
sentence = Sentence('The grass is green .')
embed words in sentence
embedding.embed(sentence)


ELMo Embeddings ([issue 260](https://github.com/zalandoresearch/flair/issues/260))

Flair now also includes [ELMo embeddings](http://www.aclweb.org/anthology/N18-1202). We use the implementation of [AllenNLP](https://allennlp.org/elmo). As this implementation comes with a lot of sub-dependencies, you need to first install the library via `pip install allennlp` before you can use it in Flair. Using the embeddings is as simple as using any other embedding type:
python
from flair.embeddings import ELMoEmbeddings
init embedding
embedding = ELMoEmbeddings()
create a sentence
sentence = Sentence('The grass is green .')
embed words in sentence
embedding.embed(sentence)



Multi-Dataset Training ([issue 232](https://github.com/zalandoresearch/flair/issues/232))

You can now train a model on on multiple datasets with the `MultiCorpus` object. We use this to train our multilingual models.

Just create multiple corpora and put them into `MultiCorpus`:

python
english_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)
german_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_GERMAN)
dutch_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_DUTCH)

multi_corpus = MultiCorpus([english_corpus, german_corpus, dutch_corpus])

The `multi_corpus` can now be used for training, just as any other corpus before. Check [the tutorial](TUTORIAL_6_TRAINING_A_MODEL.md) for more details.

Parameter Selection using Hyperopt ([issue 242](https://github.com/zalandoresearch/flair/issues/242))

We built a wrapper around [hyperopt](http://hyperopt.github.io/hyperopt/) to allow you to search for the best hyperparameters for your downstream task.

Define your search space and start training using several different parameter settings. The results are written to a specific file called `param_selection.txt` in the result directory. Check [the tutorial](TUTORIAL_7_HYPER_PARAMETER.md) for more details.

NLP Dataset Downloader ([issue 243](https://github.com/zalandoresearch/flair/issues/243))

To make it as easy as possible to start training models, we have a new feature for automatically downloading publicly available NLP datasets. For instance, by running this code:

python
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)


you download the Universal Dependencies corpus for English and can immediately start training models. The list of available datasets can be found in [the tutorial](TUTORIAL_5_CORPUS.md).


Model training features

We added various other features to model training.

Saving training log ([issue 212](https://github.com/zalandoresearch/flair/issues/212))

The training log output will from now on be automatically saved in the result directory you provide for training.
The log will be saved in `training.log`.

Resuming training ([issue 217](https://github.com/zalandoresearch/flair/issues/217))

It is now possible to stop training at any point in time and to resume it later by training with `checkpoint` set to `True`. Check [the tutorial](TUTORIAL_6_TRAINING_A_MODEL.md) for more details.

Custom Optimizers ([issue 220](https://github.com/zalandoresearch/flair/issues/220))

You can now choose other optimizers besides SGD, i.e. any PyTorch optimizer, plus our own modified implementations of SDG and Adam, namely SGDW and AdamW.

Learning Rate Finder ([issue 228](https://github.com/zalandoresearch/flair/issues/228))

A new helper method to assist you in finding a [good learning rate for model training](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_8_MODEL_OPTIMIZATION.md#finding-the-best-learning-rate).


Breaking Changes

This release introduces breaking changes. The most important are:

Unified Model Trainer ([issue 189](https://github.com/zalandoresearch/flair/issues/189))

Instead of maintaining two separate trainer classes for sequence labeling and text classification, we now have one model training class, namely `ModelTrainer`. This replaces the earlier classes `SequenceTaggerTrainer` and `TextClassifierTrainer`.

Downstream task models now implement the new `flair.nn.Model` interface. So, both the `SequenceTagger` and `TextClassifier` now inherit from `flair.nn.Model`. This allows both models to be trained with the `ModelTrainer`, like this:

python
Training text classifier
tagger = SequenceTagger(512, embeddings, tag_dictionary, 'ner')
trainer = ModelTrainer(tagger, corpus)
trainer.train('results')

Training text classifier
classifier = TextClassifier(document_embedding, label_dictionary=label_dict)
trainer = ModelTrainer(classifier, corpus)
trainer.train('results')


The advantage is that all training parameters ans training procedures are now the same for sequence labeling and text classification, which reduces redundancy and hopefully make it easier to understand.

Metric class

The metric class is now refactored to compute micro and macro averages for F1 and accuracy. There is also a new enum `EvaluationMetric` which you can pass to the ModelTrainer to tell it what to use for evaluation.

Updates and Bug Fixes

Torch 1.0 ([issue 176](https://github.com/zalandoresearch/flair/issues/299))

Flair now bulids on torch 1.0.

Use Pathlib ([issue 176](https://github.com/zalandoresearch/flair/issues/176))

Flair now uses `Path` wherever possible to allow easier operations on files/directories. However, our interfaces still allows you to pass a string, which will then be transformed into a Path by Flair.

Bug Fixes

* Fix: Non-whitespaced tokenized text results into an infinite loop ([issue 226](https://github.com/zalandoresearch/flair/issues/226))
* Fix: Getting IndexError: list index out of range error ([issue 233](https://github.com/zalandoresearch/flair/issues/233))
* Do not reset cache directory always to None ([issue 249](https://github.com/zalandoresearch/flair/issues/249))
* Filter sentences with zero tokens ([issue 266](https://github.com/zalandoresearch/flair/issues/266))

Page 5 of 6

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.