This release adds several major new features such as (1) faster and more memory-efficient transformer training, (2) a new plugin system for custom logging and training, (3) new API docs for better documentation - still in beta, and (4) various new models, datasets, bug fixes and enhancements. This release also increases the minimum requirement to Python 3.8!
New Feature: Faster and more memory-efficient transformer training
This release integrates helpmefindaname's [transformer-smaller-training-vocab](https://github.com/helpmefindaname/transformer-smaller-training-vocab) into the ModelTrainer. This temporarily reduces a transformer's vocabulary to only the tokens in the training dataset, and after training restores the full vocabulary. Depending on the dataset, this may effect huge savings in GPU memory and tuning speeds.
To use this feature, simply add the flag `reduce_transformer_vocab=True` to the `fine_tune` method. For example, to fine-tune a distilbert model on TREC_6, run this code (step 7 has the flag to reduce the vocabulary):
python
1. get the corpus
corpus: Corpus = TREC_6()
2. what label do we want to predict?
label_type = "question_class"
3. create the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)
4. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)
5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)
6. initialize trainer
trainer = ModelTrainer(classifier, corpus)
7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
reduce_transformer_vocab=True, set this to False for slow version
)
Involved PR: add reduce transformer vocab plugin by helpmefindaname in https://github.com/flairNLP/flair/pull/3217
New Feature: Trainer Plugins
A new "Plugin" system was added to the `ModelTrainer`, allowing far greater options to customize the training cycle (and slimming down the code of the ModelTrainer somewhat). For instance, it is now possible to customize logging to a far greater degree and integrate third-party logging tools.
For instance, if you want to integrate ClearML logging into the above script, simply instantiate the plugin and attach it to the trainer:
python
[...]
6. initialize trainer
trainer = ModelTrainer(classifier, corpus)
NEW: instantiate a special logger and attach it to the trainer before the training run
ClearmlLoggerPlugin(clearml.Task.init(project_name="test", task_name="test")).attach_to(trainer)
7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
reduce_transformer_vocab=True, set this to False for slow version
)
Involved PRs:
* Proposal: Pluggable `ModelTrainer` train function by plonerma in https://github.com/flairNLP/flair/pull/3084
* Major refactoring of ModelTrainer by alanakbik in https://github.com/flairNLP/flair/pull/3182
* Allow users to use no scheduler and use a custom scheduling plugin by plonerma in https://github.com/flairNLP/flair/pull/3200
* Don't pickle classes & plugins in modelcard by helpmefindaname in https://github.com/flairNLP/flair/pull/3325
* Clearml logger by helpmefindaname in https://github.com/flairNLP/flair/pull/3259
* Add a convenience conversion for flair.device by alanakbik in https://github.com/flairNLP/flair/pull/3350
API Docs and other documentation
We are working towards improving our documentation. A first step was the release of our tutorial page. Now, we are adding (in beta) online API docs to make navigating the code and options offered by Flair easier. To enable it, we changed all docstrings to Google docstrings. However, this process is still ongoing, so expect the API docs to improve in coming versions of Flair.
You can find the API docs here: https://flairnlp.github.io/flair/master/api/index.html
Involved PRs:
* Creating a doc page with autodocs by helpmefindaname in https://github.com/flairNLP/flair/pull/3273
* Google doc strings by helpmefindaname in https://github.com/flairNLP/flair/pull/3164
* Add redirects to old tutorials by alanakbik in https://github.com/flairNLP/flair/pull/3211
* Add some more documentation and (rather empty) glossary page by helpmefindaname in https://github.com/flairNLP/flair/pull/3339
* Update README.md by eltociear in https://github.com/flairNLP/flair/pull/3241
* Fix embedding finetuning tutorial by helpmefindaname in https://github.com/flairNLP/flair/pull/3301
* Fix build doc page action trigger by helpmefindaname in https://github.com/flairNLP/flair/pull/3319
* Reduce gh-actions diskspace by helpmefindaname in https://github.com/flairNLP/flair/pull/3327
* Orange secondary color by helpmefindaname in https://github.com/flairNLP/flair/pull/3321
* Bump Flair and Python versions by alanakbik in https://github.com/flairNLP/flair/pull/3355
Model Refactorings
In an effort to unify class names, we now offer models that inherit from `DefaultClassifier` for each label type we predict, i.e.:
- `TokenClassifier` for predicting `Token` labels
- `TextPairClassifier` for predicting `TextPair` labels
- `RelationClassifier` for predicting `Relation` labels
- `SpanClassifier` for predicting `Span` labels
- `TextClassifier` for predicting `Sentence` labels
An advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in `DefaultClassifier` and then is immediately usable for all model classes.
To enable this, we renamed and extended `WordTagger` as `TokenClassifier`, and renamed `Entity Linker` to `SpanClassifier`. This is not a breaking change yet, as the old names are still available. But in the future, `WordTagger` and `Entity Linker` will be removed.
Involved PRs:
* `TokenClassifier` model by alanakbik in https://github.com/flairNLP/flair/pull/3203
* Rename EntityLinker and remove some legacy embeddings by alanakbik in https://github.com/flairNLP/flair/pull/3295
New Models
We also add two new model classes: (1) a `TextPairRegressor` for regression tasks on pairs of sentences (such as STS-B), and (2) an experimental Label Encoder method for few-shot classification.
Involved PRs:
* Add `TextPair` regression model by plonerma in https://github.com/flairNLP/flair/pull/3202
* Add dual encoder by whoisjones in https://github.com/flairNLP/flair/pull/3208
* Adapt `LabelVerbalizer` so that it also works for non-BIOES span labes by alanakbik in https://github.com/flairNLP/flair/pull/3231
New Datasets
* Integrate BigBio NER data sets into HunFlair by mariosaenger in https://github.com/flairNLP/flair/pull/3146
* Add datasets STS-B and SST-2 to flair by plonerma in https://github.com/flairNLP/flair/pull/3201
* Extend German LER Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3288
* Add support for MasakhaPOS Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3247
* Gh3275: sample_missing_splits in SST-2 by plonerma in https://github.com/flairNLP/flair/pull/3276
* Add German MobIE NER Dataset by stefan-it in https://github.com/flairNLP/flair/pull/3351
Build Process
* Use ruff instead of flake8 and isort by Lingepumpe in https://github.com/flairNLP/flair/pull/3213
* Update mypy by Lingepumpe in https://github.com/flairNLP/flair/pull/3210
* Use poetry instead of pipenv for developer/testing by Lingepumpe in https://github.com/flairNLP/flair/pull/3214
* Remove poetry by helpmefindaname in https://github.com/flairNLP/flair/pull/3258
Bug Fixes
* Fix seralization of config in transformers by helpmefindaname in https://github.com/flairNLP/flair/pull/3178
* Add stacklevel to log_line in order to display correct file and line number (backwards compatible) by plonerma in https://github.com/flairNLP/flair/pull/3175
* Fix tars loading by helpmefindaname in https://github.com/flairNLP/flair/pull/3212
* Fix best epoch score update by lephong in https://github.com/flairNLP/flair/pull/3220
* Fix loading of (not so) old models by helpmefindaname in https://github.com/flairNLP/flair/pull/3229
* Fix false warning for "An empty Sentence was created!" by AbdiHaryadi in https://github.com/flairNLP/flair/pull/3268
* Fix bug with sentences that do not contain a single valid transformer token by helpmefindaname in https://github.com/flairNLP/flair/pull/3230
* Fix loading of old models by helpmefindaname in https://github.com/flairNLP/flair/pull/3228
* Fix multiple arguments destination by helpmefindaname in https://github.com/flairNLP/flair/pull/3272
* Support transformers 4310 by helpmefindaname in https://github.com/flairNLP/flair/pull/3289
* Fix import error by helpmefindaname in https://github.com/flairNLP/flair/pull/3336
Enhancements
* Bump min version to 3.8 by helpmefindaname in https://github.com/flairNLP/flair/pull/3297
* Use torch native amp by helpmefindaname in https://github.com/flairNLP/flair/pull/3128
* Unpin gdown dependency by helpmefindaname in https://github.com/flairNLP/flair/pull/3176
* get_spans_from_bio: Start new span for previous S- if class also changed by Lingepumpe in https://github.com/flairNLP/flair/pull/3195
* Include `flair/py.typed` and `requirements.txt` in source distribution by dobbersc in https://github.com/flairNLP/flair/pull/3206
* Better tars inference by helpmefindaname in https://github.com/flairNLP/flair/pull/3222
* prevent fasttext embeddings to be stored separately by helpmefindaname in https://github.com/flairNLP/flair/pull/3293
* recreate `to_dict` and add relations by helpmefindaname in https://github.com/flairNLP/flair/pull/3271
* github: bug report description should be textarea by stefan-it in https://github.com/flairNLP/flair/pull/3181
* Making gradient clipping optional & max gradient norm variable by plonerma in https://github.com/flairNLP/flair/pull/3240
* Save final model only if `save_final_model` is True (even if the training is interrupted) by plonerma in https://github.com/flairNLP/flair/pull/3251
* Fix inconsistency between best path and scores in ViterbiDecoder by mauryaland in https://github.com/flairNLP/flair/pull/3189
* Add action to remove Awaiting Response label when an response was made by helpmefindaname in https://github.com/flairNLP/flair/pull/3300
* Add onnx session config by helpmefindaname in https://github.com/flairNLP/flair/pull/3302
* Feature jsonldataset metadata by helpmefindaname in https://github.com/flairNLP/flair/pull/3349
Breaking Changes
* Removing the following legacy embeddings, as their support was droppend long ago:
* `XLNetEmbeddings`
* `XLMEmbeddings`
* `OpenAIGPTEmbeddings`
* `OpenAIGPT2Embeddings`
* `RoBERTaEmbeddings`
* `CamembertEmbeddings`
* `XLMRobertaEmbeddings`
* `BertEmbeddings`
you can use `TransformerWordEmbeddings` or `TransformerDocumentEmbeddings` instead.
* Removing `ELMoTransformerEmbeddings` as [allennlp](https://github.com/allenai/allennlp) is no longer maintained.
* Removal of the `flair.hyperparameter` module: We recommend using the hyperparameter optimzier of your choice as external module, for example see here how to fine tune flair models with the hugginface [AutoTrain SpaceRunner](https://huggingface.co/blog/stefan-it/autotrain-flair-mobie)
* Drop of the `trainer.resume(...)` functionality. Similary to the `flair.hyperparameter` module, this functionality was dropped due to the trainer rework.
* Changes to the `trainer.train(...)` and `trainer.fine_tune(...)` parameters:
* `monitor_train: bool` was replaced by `monitor_train_sample: float`: this allows you to specify the percentage of training data points used for monitoring (setting `monitor_train_sample=1.0` is equivalent to the previous behaivour of `monitor_train=True`.
* `eval_on_train_fraction` is removed in favour of `monitor_train_sample` see `monitor_train`.
* `eval_on_train_shuffle` is removed.
* `anneal_with_prestarts` and `batch_growth_annealing` have been removed.
* `num_workers` has been removed, now there is always used a single worker for data loading, as it is the fastest for the inmemory datasets.
* `checkpoint` has been removed as parameter. You can use the `CheckpointPlugin` for the same behaviour.
* `cycle_momentum` has been removed, as schedulers have been moved to Plugins.
* `param_selection_mode` has been removed, similar to the hyper parameter optimization.
* `optimizer_state_dict` and `scheduler_state_dict` were removed as part of the resume functionality.
* `anneal_against_dev_loss` has been dropped, as the annealing goeas always against the metric specified by `main_evaluation_metric`
* `use_swa` has been removed
* `use_tensorboard`, `tensorboard_comment` `tensorboard_log_dir` & `metrics_for_tensorboard` are removed in favour of the `TensorboardLogger` plugin.
* `amp_opt_level` is removed, as we moved to the [torch integration](https://pytorch.org/docs/stable/amp.html).
* `WordTagger` has been deprecated as it was renamed to `TokenClassifier`
* `EntityLinker` has been deprecated as it was renamed to `SpanClassifier`
New Contributors
* lephong made their first contribution in https://github.com/flairNLP/flair/pull/3220
* AbdiHaryadi made their first contribution in https://github.com/flairNLP/flair/pull/3268
* eltociear made their first contribution in https://github.com/flairNLP/flair/pull/3241
**Full Changelog**: https://github.com/flairNLP/flair/compare/v0.12.2...v0.13.0