Flair

Latest version: v0.15.1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 6

0.15.1

Not secure

This release fixes compatibility bugs with the newest PyTorch and SciPy versions, and adds a number of small improvements and new features.

Improvements and new features
* `SegtokTokenizer`: Add option to customize SegtokTokenizer, by alanakbik in https://github.com/flairNLP/flair/pull/3592
* `RegexpTagger`: Add option to define matching groups to RegexpTagger, by alanakbik in https://github.com/flairNLP/flair/pull/3598
* `RelationClassifier`: Optimize RelationClassifier by adding the option to filter long sentences and truncate context, by alanakbik in https://github.com/flairNLP/flair/pull/3593
* `RelationClassifier`: Modify printouts in RelationClassifier evaluation to remove clutter by alanakbik in https://github.com/flairNLP/flair/pull/3591
* Add sentence labeler, by MattGPT-ai in https://github.com/flairNLP/flair/pull/3570
* Adding a Deep Nearest Class Means Classifier model to Flair, by sheldon-roberts in https://github.com/flairNLP/flair/pull/3532
* Add per-task metrics by ntravis22 in https://github.com/flairNLP/flair/pull/3605
* Add options to load full documents as Sentence objects, by alanakbik in https://github.com/flairNLP/flair/pull/3595

New Model: Deep Nearest Class Means Classifier (3532)

Adds a new Nearest Class Mean classification approach to Flair that classifies data points to the class with the closest class data mean. This approach can be used as an alternative to fitting a Softmax Classifier. It is now available for any class in Flair that implements DefaultClassifier. For instance, to train a TextClassifier with DeepNCMs you can use the following code:

python
from flair.data import Corpus
from flair.datasets import TREC_50
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.nn import DeepNCMDecoder
from flair.trainers import ModelTrainer
from flair.trainers.plugins import DeepNCMPlugin

load the TREC dataset
corpus: Corpus = TREC_50()

label_type = "class"

make a transformer document embedding
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

create the label_dictionary
label_dictionary = corpus.make_label_dictionary(label_type=label_type)

create a text classifier with a special DeepNCM decoder
classifier = TextClassifier(
document_embeddings,
label_type=label_type,
label_dictionary=label_dictionary,
decoder=DeepNCMDecoder(
mean_update_method="condensation",
embeddings_size=document_embeddings.embedding_length,
label_dictionary=label_dictionary,
),
)

initialize the trainer
trainer = ModelTrainer(classifier, corpus)

train the model using the DeepNCM plugin
trainer.fine_tune(
"resources/taggers/deepncm_baseline",
plugins=[DeepNCMPlugin()],
)

Contributed by sheldon-roberts in https://github.com/flairNLP/flair/pull/3532

Datasets
* Add BarNER Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3604

Bug Fixes
* Fix model loading for compatibility with PyTorch 2.6, by helpmefindaname in https://github.com/flairNLP/flair/pull/3608
* Fix SciPy compatibility by updating scipy .A to toarray(), by sg-wbi in https://github.com/flairNLP/flair/pull/3606
* Fix: use proper eval default main eval metrics for text regression model by MattGPT-ai in https://github.com/flairNLP/flair/pull/3602
* Fix: cast indices tensor to int to fix bug by MattGPT-ai in https://github.com/flairNLP/flair/pull/3601

New Contributors
* sg-wbi made their first contribution in https://github.com/flairNLP/flair/pull/3606
* ntravis22 made their first contribution in https://github.com/flairNLP/flair/pull/3605

**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.15.0...v0.15.1

0.15.0

Not secure

This release adds multi-GPU support, an improved documentation page with API docs and finally deprecates Python 3.8!

Improved Documentation and API Docs

Thanks to konstantin-lukas we have a completely new design of our documentation page, which now includes API docs.

You can check it out [here](https://flairnlp.github.io/flair)!
- Check out our [tutorials](https://flairnlp.github.io/flair/master/tutorial/index.html)
- Check out the new [Python API docs](https://flairnlp.github.io/flair/master/api/index.html)

Future releases will improve docstring coverage and further improve upon the documentation!

PRs:
* Fix doc build by helpmefindaname in https://github.com/flairNLP/flair/pull/3528
* Rework Doc page by konstantin-lukas in https://github.com/flairNLP/flair/pull/3563
* Test new docstrings and apidocs deployment by alanakbik in https://github.com/flairNLP/flair/pull/3573

Multi-GPU Support

Flair now offers support for training models on multiple GPUs! Big thanks to jeffpicard!

PRs:
* Add multi-GPU support by jeffpicard in https://github.com/flairNLP/flair/pull/3548
* Fix gradient accumulation and learning rate aggregation by jeffpicard in 3583

Deprecations

Since Python3.8 is [no longer supported](https://devguide.python.org/versions/), we are also dropping support for it, in favor of features added in python 3.9.
To acknowledge [CVE-2024-10073](https://nvd.nist.gov/vuln/detail/CVE-2024-10073), we decided to drop support for the `flair.models.clustering` module, since we aren't aware of any usage of it, we decided to do a hard drop instead of a deprecation.

* Drop python 3.8 by helpmefindaname in https://github.com/flairNLP/flair/pull/3560
* Remove clustering support by helpmefindaname in https://github.com/flairNLP/flair/pull/3567

Other Improvements

New Datasets
* Add CleanCoNLL object by susannaruecker in https://github.com/flairNLP/flair/pull/3557
* Add NoiseBench object by elenamer in 3512

Performance Improvements
* perf: optimize dictionary items check by MattGPT-ai in https://github.com/flairNLP/flair/pull/3569
* Refactor `fill_mean_token_embeddings` for performance optimization on GPU by sheldon-roberts in https://github.com/flairNLP/flair/pull/3525

New Features and Improvements

* Add proxies information to requests.head by diego-morientez in https://github.com/flairNLP/flair/pull/3535
* Allow specifying proxy information in TransformerEmbeddings by diego-morientez in https://github.com/flairNLP/flair/pull/3539
* Add `use_tokenizer` to `JsonlDataset` by david-waterworth in https://github.com/flairNLP/flair/pull/3486
* Use built-in version parsing from packaging by adrianeboyd in https://github.com/flairNLP/flair/pull/3502

Bugfixes
* `TransformerDocumentEmbeddings`: Fix error when `cls_pooling="mean"` or `cls_pooling="max"` by fkdosilovic in https://github.com/flairNLP/flair/pull/3558
* `SequenceTagger` : Fix the incorrect token prediction distribution from `_all_scores_for_token()` by mdmotaharmahtab in https://github.com/flairNLP/flair/pull/3449
* `TransformerEmbeddings`: Fix T5 tokenizer loading by helpmefindaname in https://github.com/flairNLP/flair/pull/3544
* `TextPairRegressor`: Fix: use proper eval default main eval metrics by MattGPT-ai in https://github.com/flairNLP/flair/pull/3538
* `TextPairRegressor`: Fix state dict key mismatch for embeddings by MattGPT-ai in https://github.com/flairNLP/flair/pull/3537
* Make onnx export work again by helpmefindaname in https://github.com/flairNLP/flair/pull/3530
* Fix support metric by MattGPT-ai in https://github.com/flairNLP/flair/pull/3510

Operations/Development
* Invalidate tars classifier and tars ner tests to save disk space by helpmefindaname in https://github.com/flairNLP/flair/pull/3527
* Ignore FutureWarning by alanakbik in https://github.com/flairNLP/flair/pull/3526
* Update SECURITY.md with current contact by alanakbik in https://github.com/flairNLP/flair/pull/3568

New Contributors
* adrianeboyd made their first contribution in https://github.com/flairNLP/flair/pull/3502
* david-waterworth made their first contribution in https://github.com/flairNLP/flair/pull/3486
* diego-morientez made their first contribution in https://github.com/flairNLP/flair/pull/3535
* jeffpicard made their first contribution in https://github.com/flairNLP/flair/pull/3548
* mdmotaharmahtab made their first contribution in https://github.com/flairNLP/flair/pull/3449
* fkdosilovic made their first contribution in https://github.com/flairNLP/flair/pull/3558

**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.14.0...v0.15.0

0.14

In the table, *none* is the approach used in previous Flair versions. `[SEP]` means using the standard separator symbol as context delimiter. `[FLERT]` means using a new dedicated special token.

As `[FLERT]` performs best in our experiments, the `[FLERT]` context marker is now activated by default.

**More details**: Assume the current sentence is `Peter Blackburn` and the previous sentence ends with `to boycott British lamb .`, while the next sentence starts with `BRUSSELS 1996-08-22 The European Commission`.

In this case,
1. if `use_context_separator=False`, the embedding is produced from this string: `to boycott British lamb . Peter Blackburn BRUSSELS 1996-08-22 The European Commission`
2. if `use_context_separator=True`, the embedding is produced from this string `to boycott British lamb . [FLERT] Peter Blackburn [FLERT] BRUSSELS 1996-08-22 The European Commission`

Integrate transformer-smaller-training-vocab 3066

We integrate the `transformer-smaller-training-vocab` library into the `ModelTrainer`. With it, you can reduce the size of transformer models when training and evaluating models on specific datasets. This leads to faster training times and a smaller memory footprint. Documentation on this new feature will be added soon!

Masked Relation Classifier 2748 2993 with various Encoding Strategies 3023 (BETA)

We now include BETA support a new type of relation extraction model that leads to much higher accuracies than our vanilla relation extraction, but increases computational costs. Documentation for this will be added as we iterate on the model.

ONNX compatible models 2640 2643 3041 3075

This release continues the journey on making our models more ONNX compatible.

Other features
- Add push to Hub functionalities 2897
- Add layoutlm layoutxlm support and the the [SROIE](https://rrc.cvc.uab.es/?ch=13) dataset #2980
- Convenience method for learning rate factor 2888 2893

New Datasets
- Add fewnerd corpus 3103
- Add support for NERMuD 2023 Dataset 3087
- Adds ZELDA Entity Linking dataset 3088
- Added Ukrainian NER and UD datasets 3069
- Add support MasakhaNER v2 dataset 3013
- Add support for MultiCoNerV2 3006
- Add support for new ICDAR Europeana NER Dataset 2911
- datasets: add support for HIPE-2022 2735 2827 2805

Major refactorings
- Unify loss reduction by making sure that all losses are summed over all points, instead of averaged 2933 2910
- Python 3.7 2769
- Flatten DefaultClassifier interface 2978
- Restructure Tokenizer and Splitter modules 3002
- Refactor Token and Sentence Positional Properties 3001
- Seralization of embeddings 3011

Various Improvements

Enhancements
- add functionality for using proxies 3082
- add option not to shuffle the first epoch 3076
- improved Tars Context 3063
- release optimizer memory and fix legacy tokenization 3043
- add time elapsed to training printout 2983
- separate between token-lengths and sub-token lengths 2990
- small speed optimizations 2975
- change output of .text to original string 2974
- remove BAD_EPOCHS printout for most schedulers 2970
- warn if resuming with too low max_epochs & ' additional_epochs' parameter 2895
- embeddings: add support for T5 encoder models 2896
- add py.typed file for PEP-561 compatibility 2858
- tars classifier always predict something on single label 2838
- make add_unk optional and don't use it for ner 2839
- add deprecation warning for SentenceDataset rename 2819
- more precise type hint for eval_on_train_fraction 2811
- better handling for consecutive whitespaces in Sentence 2721(already in flair 0.11.3)
- remove unnecessary more-itertools pin 2730 (already in flair 0.11.3)
- add `exclude_labels` parameter to trainer.train 2724 (already in flair 0.11.3)
- add option to force token-level predictions in SequenceTagger 2750 (already in flair 0.11.3)

Build
- unified test classes, to ensure that all models & embeddings have tested the basic functionality 2981
- add missing dependency pre-commit to requirements-dev.txt 3093
- fix pre-commit bug by upgrading to isort 5.11.5 3106 3107
- update pytest and flake8 versions 2741
- pytest flake precommit update 2820
- pin flake8 to v4 2892
- specify test paths 2932
- pin versions for unit tests 2994
- unit tests: Set a seed so test_train_load_use_classifier doesn't randomly fail 2834
- replace issue templates with issue forms 3051
- github actions cache 2753 (already in flair 0.11.3)

Documentation
- Add Missing Import to Tutorial 5 2902
- Documentation pointers 2927
- readme: fix BibTeX for FLERT paper 2806 2821
- docs: mention HIPE-2022 in corpus tutorial 2807

Code improvements
- add return types to Model and Classifier 3121
- removed undefined names 3054 3056
- add docstrings missing for ModelTrainer.train() parameters 2961
- remove "tag_to_bioes" (Sequence) Corpus parameter, as it is not used 2812
- update hf-hub version 2837
- use transformers sentencepiece requirement 2835
- replace deprecated logging.warn with logging.warning 2829
- various mypy issues 2822 2845 2905
- removed some model classes that were very beta: the DependencyParser, the DistancePredictor and the SimilarityLearner. 2910
- remove legacy TransformerXLEmbeddings class 2768 (already in flair 0.11.3)

Bug fixes
- fix train error missing dev split 3115
- fix Avg Pooling in the Entity Linker 3123
- call `super().__setstate__()` in Embeddings 3057
- remove konoha from requirements.txt 3060
- fix label alignment if the sentence contains invalid tokens 3052
- change indexing in TARSTagger predict 3058
- fix training sample count in UD English 3044
- fix comment parsing for conllu datasets 3020
- HunFlair: Fix loading of datasets 3030 3029
- persist needs_manual_ocr 3012
- save initial hidden states in sequence tagger 3010
- do not save Path objects to model cards 2998
- make JsonlCorpus create span labels 2863
- JsonlDataset: Fix code that claims to set "O" labels to actually set them 2817
- relationClassifier fix 2986
- fix problem in loading TARSClassifier 2987
- add missing tab for tensorboard 2922
- fast tokenizer reload fix pt.2: Bloom model 2904
- fix transformer embeddings for sentence with trailing whitespace 2891
- added label_name parameter to render_ner_html 2850
- allow BIO evaluation on sequence tagger 2787
- refactorings for initialization from state dict 2846
- save and load "tag_format" for sequence tagger model 2840
- do not remove other labels of sentence for set_label on Token and Span 2831
- fix left-over cases of token.get_tag(), which was renamed 2815
- remove wrong boolean check for loading datasets RE_ENGLISH_CONLL04 2779
- added missing property decorator in PooledFlairEmbeddings 2744 (already in flair 0.11.3)
- fix wrong initialisations of label (where data_type was missing) 2731 (already in flair 0.11.3)
- update gdown requirement, fix download for dataset NER_MULTI_WIKIANN 2757 (already in flair 0.11.3)
- make Span detection more robust 2752 (already in flair 0.11.3)

0.14.0

Not secure

For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:

python
from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType

Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)

Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
"bert-base-uncased",
fine_tune=True,
set LoRA config
peft_config=LoraConfig(
task_type=TaskType.FEATURE_EXTRACTION,
inference_mode=False,
),
)

define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
learning_rate=5.0e-4,
mini_batch_size=4,
max_epochs=1,
)

Big thanks to janpf for this new feature!
* Add PEFT training and explicit kwarg passthrough by janpf in https://github.com/flairNLP/flair/pull/3480

Smaller Library

We've removed dependencies such as `gensim` from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.

Install as always with:
console
pip install flair

For certain features, you still need `gensim`, such as training a model that uses classic word embeddings. For this use case, install with:

console
pip install flair[word-embeddings]

Or just install `gensim` separately.

Big thanks to helpmefindaname for this new feature!
* Make gensim optional by helpmefindaname in https://github.com/flairNLP/flair/pull/3493
* Update models for v0.14.0 by alanakbik in https://github.com/flairNLP/flair/pull/3505
* Relax version constraint for konoha by himkt in https://github.com/flairNLP/flair/pull/3394
* Dependencies maintainance updates by helpmefindaname in https://github.com/flairNLP/flair/pull/3402
* Make janome optional by himkt in https://github.com/flairNLP/flair/pull/3405
* Bump min. version of bpemb by stefan-it in https://github.com/flairNLP/flair/pull/3468

Other Improvements

New Features and Improvements
* Speed up euclidean distance calculation by sheldon-roberts in https://github.com/flairNLP/flair/pull/3485
* Add DataTriples which act just like DataPairs by janpf in https://github.com/flairNLP/flair/pull/3481
* Add random seed parameter to dataset splitting and downsampling for better reproducibility by MattGPT-ai in https://github.com/flairNLP/flair/pull/3475
* Allow cpu device even if gpu available by drbh in https://github.com/flairNLP/flair/pull/3417
* Add prediction label type for span classifier by helpmefindaname in https://github.com/flairNLP/flair/pull/3432
* Character embeddings store their embedding name too by helpmefindaname in https://github.com/flairNLP/flair/pull/3477

Bug Fixes
* `TextPairRegressor`: Fix data point iteration by ya0guang in https://github.com/flairNLP/flair/pull/3413
* `TextPairRegressor`: Fix GPU memory leak by MattGPT-ai in https://github.com/flairNLP/flair/pull/3490
* `TextRegressor`: Fix label_name bug by sheldon-roberts in https://github.com/flairNLP/flair/pull/3491
* `SequenceTagger`: Fix _all_scores_for_token in ViterbiDecoder by mauryaland in https://github.com/flairNLP/flair/pull/3455
* `SentenceSplitter`: Fix linking of sentences by mariosaenger in https://github.com/flairNLP/flair/pull/3397
* `SentenceSplitter`: Fix case where split was performed on special characters by helpmefindaname in https://github.com/flairNLP/flair/pull/3404
* `Classifier`: Fix loading by moving error message to main load function by alanakbik in https://github.com/flairNLP/flair/pull/3504
* `Trainer`: Fix edge case by loading best model at end, even when there is no final evaluation by helpmefindaname in https://github.com/flairNLP/flair/pull/3470
* `TransformerEmbeddings`: Fix special tokens by not replacing replace_additional_special_tokens by helpmefindaname in https://github.com/flairNLP/flair/pull/3451
* Unit tests: Fix double `data_folder` in unit test by ya0guang in https://github.com/flairNLP/flair/pull/3412

New Datasets
* Add revision support for all Universal Dependencies datasets by stefan-it in https://github.com/flairNLP/flair/pull/3420
* `NER_ESTONIAN_NOISY`: Support for Estonian NER dataset with noise by teresaloeffelhardt in https://github.com/flairNLP/flair/pull/3463
* `MASAKHA_POS`: Support for two new languages by stefan-it in https://github.com/flairNLP/flair/pull/3421
* `UD_BAVARIAN_MAIBAAM`: Add support for new Bavarian MaiBaam UD by stefan-it in https://github.com/flairNLP/flair/pull/3426

Documentation
* Minor readme fixes by stefan-it in https://github.com/flairNLP/flair/pull/3424
* Fix typo transformer-embeddings.md by abhisheklomsh in https://github.com/flairNLP/flair/pull/3500
* Fix typo in how-model-training-works.md by abhisheklomsh in https://github.com/flairNLP/flair/pull/3499

Build Management
* Fix black and ruff by stefan-it in https://github.com/flairNLP/flair/pull/3423
* Remove zappr yaml by helpmefindaname in https://github.com/flairNLP/flair/pull/3435
* Fix `tests` package being incorrectly included in builds by asumagic in https://github.com/flairNLP/flair/pull/3440

New Contributors
* ya0guang made their first contribution in https://github.com/flairNLP/flair/pull/3413
* drbh made their first contribution in https://github.com/flairNLP/flair/pull/3417
* asumagic made their first contribution in https://github.com/flairNLP/flair/pull/3440
* MattGPT-ai made their first contribution in https://github.com/flairNLP/flair/pull/3475
* janpf made their first contribution in https://github.com/flairNLP/flair/pull/3481
* sheldon-roberts made their first contribution in https://github.com/flairNLP/flair/pull/3485
* abhisheklomsh made their first contribution in https://github.com/flairNLP/flair/pull/3500
* teresaloeffelhardt made their first contribution in https://github.com/flairNLP/flair/pull/3463

**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.13.1...v0.14.0

0.13.1

Not secure

This releases adds some bugfixes on top of the [0.13.0 Release](https://github.com/flairNLP/flair/releases/tag/v0.13.0), and adds a new dataset.

Bug fixes
* fix doc redirect by helpmefindaname in https://github.com/flairNLP/flair/pull/3366
* fix awaiting response check by helpmefindaname in https://github.com/flairNLP/flair/pull/3371
* fix has unknown label is not always initialized by helpmefindaname in https://github.com/flairNLP/flair/pull/3372
* Fix classification report if dataset has no labels by alanakbik in https://github.com/flairNLP/flair/pull/3375
* fix flert hidden context breaks reduced vocab by helpmefindaname in https://github.com/flairNLP/flair/pull/3370
* update HF cache env variable by helpmefindaname in https://github.com/flairNLP/flair/pull/3386

Enhancements
* use batch count instead of total training samples for logging metrics by helpmefindaname in https://github.com/flairNLP/flair/pull/3374

New Datasets
* Add AGNews corpus by elenamer in https://github.com/flairNLP/flair/pull/3385

New Contributors
* elenamer made their first contribution in https://github.com/flairNLP/flair/pull/3385

**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.13.0...v0.13.1

0.13.0

Not secure

This release adds several major new features such as (1) faster and more memory-efficient transformer training, (2) a new plugin system for custom logging and training, (3) new API docs for better documentation - still in beta, and (4) various new models, datasets, bug fixes and enhancements. This release also increases the minimum requirement to Python 3.8!

New Feature: Faster and more memory-efficient transformer training

This release integrates helpmefindaname's [transformer-smaller-training-vocab](https://github.com/helpmefindaname/transformer-smaller-training-vocab) into the ModelTrainer. This temporarily reduces a transformer's vocabulary to only the tokens in the training dataset, and after training restores the full vocabulary. Depending on the dataset, this may effect huge savings in GPU memory and tuning speeds.

To use this feature, simply add the flag `reduce_transformer_vocab=True` to the `fine_tune` method. For example, to fine-tune a distilbert model on TREC_6, run this code (step 7 has the flag to reduce the vocabulary):

python
1. get the corpus
corpus: Corpus = TREC_6()

2. what label do we want to predict?
label_type = "question_class"

3. create the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

4. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

6. initialize trainer
trainer = ModelTrainer(classifier, corpus)

7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
reduce_transformer_vocab=True, set this to False for slow version
)

Involved PR: add reduce transformer vocab plugin by helpmefindaname in https://github.com/flairNLP/flair/pull/3217

New Feature: Trainer Plugins

A new "Plugin" system was added to the `ModelTrainer`, allowing far greater options to customize the training cycle (and slimming down the code of the ModelTrainer somewhat). For instance, it is now possible to customize logging to a far greater degree and integrate third-party logging tools.

For instance, if you want to integrate ClearML logging into the above script, simply instantiate the plugin and attach it to the trainer:

python
[...]

6. initialize trainer
trainer = ModelTrainer(classifier, corpus)

NEW: instantiate a special logger and attach it to the trainer before the training run
ClearmlLoggerPlugin(clearml.Task.init(project_name="test", task_name="test")).attach_to(trainer)

7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
reduce_transformer_vocab=True, set this to False for slow version
)

Involved PRs:
* Proposal: Pluggable `ModelTrainer` train function by plonerma in https://github.com/flairNLP/flair/pull/3084
* Major refactoring of ModelTrainer by alanakbik in https://github.com/flairNLP/flair/pull/3182
* Allow users to use no scheduler and use a custom scheduling plugin by plonerma in https://github.com/flairNLP/flair/pull/3200
* Don't pickle classes & plugins in modelcard by helpmefindaname in https://github.com/flairNLP/flair/pull/3325
* Clearml logger by helpmefindaname in https://github.com/flairNLP/flair/pull/3259
* Add a convenience conversion for flair.device by alanakbik in https://github.com/flairNLP/flair/pull/3350

API Docs and other documentation

We are working towards improving our documentation. A first step was the release of our tutorial page. Now, we are adding (in beta) online API docs to make navigating the code and options offered by Flair easier. To enable it, we changed all docstrings to Google docstrings. However, this process is still ongoing, so expect the API docs to improve in coming versions of Flair.

You can find the API docs here: https://flairnlp.github.io/flair/master/api/index.html

Involved PRs:
* Creating a doc page with autodocs by helpmefindaname in https://github.com/flairNLP/flair/pull/3273
* Google doc strings by helpmefindaname in https://github.com/flairNLP/flair/pull/3164
* Add redirects to old tutorials by alanakbik in https://github.com/flairNLP/flair/pull/3211
* Add some more documentation and (rather empty) glossary page by helpmefindaname in https://github.com/flairNLP/flair/pull/3339
* Update README.md by eltociear in https://github.com/flairNLP/flair/pull/3241
* Fix embedding finetuning tutorial by helpmefindaname in https://github.com/flairNLP/flair/pull/3301
* Fix build doc page action trigger by helpmefindaname in https://github.com/flairNLP/flair/pull/3319
* Reduce gh-actions diskspace by helpmefindaname in https://github.com/flairNLP/flair/pull/3327
* Orange secondary color by helpmefindaname in https://github.com/flairNLP/flair/pull/3321
* Bump Flair and Python versions by alanakbik in https://github.com/flairNLP/flair/pull/3355

Model Refactorings

In an effort to unify class names, we now offer models that inherit from `DefaultClassifier` for each label type we predict, i.e.:
- `TokenClassifier` for predicting `Token` labels
- `TextPairClassifier` for predicting `TextPair` labels
- `RelationClassifier` for predicting `Relation` labels
- `SpanClassifier` for predicting `Span` labels
- `TextClassifier` for predicting `Sentence` labels

An advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in `DefaultClassifier` and then is immediately usable for all model classes.

To enable this, we renamed and extended `WordTagger` as `TokenClassifier`, and renamed `Entity Linker` to `SpanClassifier`. This is not a breaking change yet, as the old names are still available. But in the future, `WordTagger` and `Entity Linker` will be removed.

Involved PRs:
* `TokenClassifier` model by alanakbik in https://github.com/flairNLP/flair/pull/3203
* Rename EntityLinker and remove some legacy embeddings by alanakbik in https://github.com/flairNLP/flair/pull/3295

New Models

We also add two new model classes: (1) a `TextPairRegressor` for regression tasks on pairs of sentences (such as STS-B), and (2) an experimental Label Encoder method for few-shot classification.

Involved PRs:
* Add `TextPair` regression model by plonerma in https://github.com/flairNLP/flair/pull/3202
* Add dual encoder by whoisjones in https://github.com/flairNLP/flair/pull/3208
* Adapt `LabelVerbalizer` so that it also works for non-BIOES span labes by alanakbik in https://github.com/flairNLP/flair/pull/3231

New Datasets
* Integrate BigBio NER data sets into HunFlair by mariosaenger in https://github.com/flairNLP/flair/pull/3146
* Add datasets STS-B and SST-2 to flair by plonerma in https://github.com/flairNLP/flair/pull/3201
* Extend German LER Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3288
* Add support for MasakhaPOS Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3247
* Gh3275: sample_missing_splits in SST-2 by plonerma in https://github.com/flairNLP/flair/pull/3276
* Add German MobIE NER Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3351

Build Process
* Use ruff instead of flake8 and isort by Lingepumpe in https://github.com/flairNLP/flair/pull/3213
* Update mypy by Lingepumpe in https://github.com/flairNLP/flair/pull/3210
* Use poetry instead of pipenv for developer/testing by Lingepumpe in https://github.com/flairNLP/flair/pull/3214
* Remove poetry by helpmefindaname in https://github.com/flairNLP/flair/pull/3258

Bug Fixes
* Fix seralization of config in transformers by helpmefindaname in https://github.com/flairNLP/flair/pull/3178
* Add stacklevel to log_line in order to display correct file and line number (backwards compatible) by plonerma in https://github.com/flairNLP/flair/pull/3175
* Fix tars loading by helpmefindaname in https://github.com/flairNLP/flair/pull/3212
* Fix best epoch score update by lephong in https://github.com/flairNLP/flair/pull/3220
* Fix loading of (not so) old models by helpmefindaname in https://github.com/flairNLP/flair/pull/3229
* Fix false warning for "An empty Sentence was created!" by AbdiHaryadi in https://github.com/flairNLP/flair/pull/3268
* Fix bug with sentences that do not contain a single valid transformer token by helpmefindaname in https://github.com/flairNLP/flair/pull/3230
* Fix loading of old models by helpmefindaname in https://github.com/flairNLP/flair/pull/3228
* Fix multiple arguments destination by helpmefindaname in https://github.com/flairNLP/flair/pull/3272
* Support transformers 4310 by helpmefindaname in https://github.com/flairNLP/flair/pull/3289
* Fix import error by helpmefindaname in https://github.com/flairNLP/flair/pull/3336

Enhancements
* Bump min version to 3.8 by helpmefindaname in https://github.com/flairNLP/flair/pull/3297
* Use torch native amp by helpmefindaname in https://github.com/flairNLP/flair/pull/3128
* Unpin gdown dependency by helpmefindaname in https://github.com/flairNLP/flair/pull/3176
* get_spans_from_bio: Start new span for previous S- if class also changed by Lingepumpe in https://github.com/flairNLP/flair/pull/3195
* Include `flair/py.typed` and `requirements.txt` in source distribution by dobbersc in https://github.com/flairNLP/flair/pull/3206
* Better tars inference by helpmefindaname in https://github.com/flairNLP/flair/pull/3222
* prevent fasttext embeddings to be stored separately by helpmefindaname in https://github.com/flairNLP/flair/pull/3293
* recreate `to_dict` and add relations by helpmefindaname in https://github.com/flairNLP/flair/pull/3271
* github: bug report description should be textarea by stefan-it in https://github.com/flairNLP/flair/pull/3181
* Making gradient clipping optional & max gradient norm variable by plonerma in https://github.com/flairNLP/flair/pull/3240
* Save final model only if `save_final_model` is True (even if the training is interrupted) by plonerma in https://github.com/flairNLP/flair/pull/3251
* Fix inconsistency between best path and scores in ViterbiDecoder by mauryaland in https://github.com/flairNLP/flair/pull/3189
* Add action to remove Awaiting Response label when an response was made by helpmefindaname in https://github.com/flairNLP/flair/pull/3300
* Add onnx session config by helpmefindaname in https://github.com/flairNLP/flair/pull/3302
* Feature jsonldataset metadata by helpmefindaname in https://github.com/flairNLP/flair/pull/3349

Breaking Changes

* Removing the following legacy embeddings, as their support was droppend long ago:
* `XLNetEmbeddings`
* `XLMEmbeddings`
* `OpenAIGPTEmbeddings`
* `OpenAIGPT2Embeddings`
* `RoBERTaEmbeddings`
* `CamembertEmbeddings`
* `XLMRobertaEmbeddings`
* `BertEmbeddings`
you can use `TransformerWordEmbeddings` or `TransformerDocumentEmbeddings` instead.
* Removing `ELMoTransformerEmbeddings` as [allennlp](https://github.com/allenai/allennlp) is no longer maintained.
* Removal of the `flair.hyperparameter` module: We recommend using the hyperparameter optimzier of your choice as external module, for example see here how to fine tune flair models with the hugginface [AutoTrain SpaceRunner](https://huggingface.co/blog/stefan-it/autotrain-flair-mobie)
* Drop of the `trainer.resume(...)` functionality. Similary to the `flair.hyperparameter` module, this functionality was dropped due to the trainer rework.
* Changes to the `trainer.train(...)` and `trainer.fine_tune(...)` parameters:
* `monitor_train: bool` was replaced by `monitor_train_sample: float`: this allows you to specify the percentage of training data points used for monitoring (setting `monitor_train_sample=1.0` is equivalent to the previous behaivour of `monitor_train=True`.
* `eval_on_train_fraction` is removed in favour of `monitor_train_sample` see `monitor_train`.
* `eval_on_train_shuffle` is removed.
* `anneal_with_prestarts` and `batch_growth_annealing` have been removed.
* `num_workers` has been removed, now there is always used a single worker for data loading, as it is the fastest for the inmemory datasets.
* `checkpoint` has been removed as parameter. You can use the `CheckpointPlugin` for the same behaviour.
* `cycle_momentum` has been removed, as schedulers have been moved to Plugins.
* `param_selection_mode` has been removed, similar to the hyper parameter optimization.
* `optimizer_state_dict` and `scheduler_state_dict` were removed as part of the resume functionality.
* `anneal_against_dev_loss` has been dropped, as the annealing goeas always against the metric specified by `main_evaluation_metric`
* `use_swa` has been removed
* `use_tensorboard`, `tensorboard_comment` `tensorboard_log_dir` & `metrics_for_tensorboard` are removed in favour of the `TensorboardLogger` plugin.
* `amp_opt_level` is removed, as we moved to the [torch integration](https://pytorch.org/docs/stable/amp.html).
* `WordTagger` has been deprecated as it was renamed to `TokenClassifier`
* `EntityLinker` has been deprecated as it was renamed to `SpanClassifier`

New Contributors
* lephong made their first contribution in https://github.com/flairNLP/flair/pull/3220
* AbdiHaryadi made their first contribution in https://github.com/flairNLP/flair/pull/3268
* eltociear made their first contribution in https://github.com/flairNLP/flair/pull/3241

**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.12.2...v0.13.0

Page 2 of 6

Releases

Has known vulnerabilities

Previous Next

Flair

Page 2 of 6

0.15.1

0.15.0

0.14

0.14.0

0.13.1

0.13.0

Page 2 of 6

Links

Releases