Flair

Latest version: v0.14.0

Safety actively analyzes 682361 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

1.0

Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome


The printout shows that:

- "IFNAR2" is a **gene**. Further, it is recognized as gene [3455](https://www.ncbi.nlm.nih.gov/gene/3455) ("_interferon alpha and beta receptor subunit 2_") in the NCBI database.

- "POLG" is a **gene**. Further, it is recognized as gene [5428](https://www.ncbi.nlm.nih.gov/gene/5428) ("_DNA polymerase gamma, catalytic subunit_") in the NCBI database.

- "long-COVID syndrome" is a **disease**. Further, it is uniquely linked to "[Post-Acute COVID-19 Syndrome](https://meshb-prev.nlm.nih.gov/record/ui?ui=D000094024)" in the MESH database.

Big thanks to sg-wbi WangXII mariosaenger helpmefindaname for all their work:
* Entity Mention Linker by helpmefindaname in https://github.com/flairNLP/flair/pull/3388
* Support for biomedical datasets with multiple entity types by WangXII in https://github.com/flairNLP/flair/pull/3387
* Update documentation for Hunflair2 release by mariosaenger in https://github.com/flairNLP/flair/pull/3410
* Improve nel tutorial by helpmefindaname in https://github.com/flairNLP/flair/pull/3369
* Incorporate hunflair2 docs to docpage by helpmefindaname in https://github.com/flairNLP/flair/pull/3442


Parameter-Efficient Fine-Tuning

0.9971

0.9969

0.9803

More flexibility on main metric (2161)

When training models, you can now chose any standard evaluation metric for model selection (previously it was fixed to micro F1). When calling the trainer, simply pass the desired metric as `main_evaluation_metric` like so:

python
trainer.train('resources/taggers/your_model',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=10,
main_evaluation_metric=("macro avg", 'f1-score'),
)


In this example, we now use macro F1 instead of the default micro F1.

Add handling for mapping labels to 'O' 2254

In `ColumnDataset`, labels can be remapped to other labels. But sometimes you may not wish to use all label types in a given dataset.
You can now remap them to 'O' and so exclude them.

For instance, to load CoNLL-03 without MISC, do:

python
corpus = CONLL_03(
label_name_map={'MISC': 'O'}
)
print(corpus.make_label_dictionary('ner'))
print(corpus.train[0].to_tagged_string('ner'))


Other
- add per-label thresholds for prediction (2366)
- add support for Spanish clinical Flair embeddings (2323)
- added 'mean', 'max' pooling strategy for `TransformerDocumentEmbeddings` class (2180)
- new `DocumentCNNEmbeddings` class to embed text with a trainable CNN (2141)
- allow negative examples in `ClassificationCorpus` (2233)
- added new parameter to save model each k epochs during training (2146)
- log epoch of best model instead of printing it during training (2286)
- add option to exclude specific sentences from dataset (2262)
- improved tensorboard logging (2164)

- return predictions during evaluation (2162)


Internal Refactorings

Refactor for simplicity and extensibility (2333 2351 2356 2377 2379 2382 2184)

In order to accommodate all these new NLP task types (plus many more in the pipeline), we restructure the `flair.nn.Model` class such that most models now inherit from `DefaultClassifier`. This removes many redundancies as most models do classification and are really only different in what they classify and how they apply embeddings. Models that inherit from `DefaultClassifier` need only implement the method `forward_pass`, making each model class only a few lines of code.

Check for instance our implementation of the `RelationExtractor` class to see how easy it now is to add a new tasks!

Refactor for speed

- Flair models trained with transformers (such as the FLERT models) were previously not making use of mini-batching, greatly slowing down training and application of such models. We refactored the `TransformerWordEmbeddings` class, yielding significant speed-ups depending on the mini-batch size used. We observed **speed-ups from x2 to x6**. (2385 2389 2384)

- Improve training speed of Flair embeddings (2203)


Bug fixes & improvements

- fixed references to multi-X-fast Flair embedding models (2150)
- fixed serialization of DocumentRNNEmbeddings (2155)
- fixed separator in cross-attention mode (2156)
- fixed ID for Slovene word embeddings in the doc (2166)
- close log_handler after training is complete. (2170)
- fixed bug in IMDB dataset (2172)
- fixed IMDB data splitting logic (2175)
- fixed XLNet and Transformer-XL Execution (2191)
- remove unk token from Ner labeling (2225)
- fxed typo in property name (2267)
- fixed typos (2303 2373)
- fixed parallel corpus (2306)
- fixed SegtokSentenceSplitter Incorrect Sentence Position Attributes (2312)
- fixed loading of old serialized models (2322)
- updated url for BioSemantics corpus (2327)
- updated requirements (2346)
- serialize multi_label_threshold for classification models (2368)
- small refactorings in ModelTrainer (2184)
- moving Path construction of flair.cache_root (2241)
- documentation improvement (2304)
- add model fit tests 2378

0.975

Indicating correctly that the span "Kirk" points to "James_T._Kirk". As the prediction for the string "Enterprise" shows, the model is still beta and will be further improved with future releases.

Bug fixes
- make transformer training vocab optional 3132
- change token.get_tag() to token.get_label() 3135
- update required version of transformers library 3138
- update HunFlair tutorial to Flair 0.12 3137

0.17

Page 1 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.