Annif

Latest version: v1.2.0

Safety actively analyzes 682361 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

0.54.1

This is a patch release that fixes bugs surfaced and found after 0.54.0 release. In particular, installation using pip was not working correctly due to a missing dependency on the dateutil package.

Bugs fixed:
523 Make Drone builds start on all git tag events
524 Add MLLM classifier sanity check
525 Much faster updating of existing large vocabulary
528 Declare dateutil dependency

0.54.0

This release adds a new `--jobs` parameter for the `annif train` command, which allows easy control of the number of threads/CPUs when training MLLM, fasttext and Omikuji backends. Many other improvements are included that speed up the MLLM backend, especially in the case of a large vocabulary. Also a few minor bugs have been fixed.

**Edit:** Also introduces support for adding new text-input transformation operations to Annif. Previously the input-limiting feature was implemented as a backend mechanism (446, 452), which was set up in a project configuration e.g. with a setting `input_limit=5000`; now the input-limiting feature is implemented as a more general input-text transform and it can be set up in the project configuration with `transform=limit(5000)`.

New features:
512 Support jobs parameter in train command
**Edit:** 496 Support for adding input-transformation operations

Improvements:
500 Implement custom MeanLayer in nn_ensemble
511/483 Process training docs in parallel in MLLM backend
513/519 Keep serialized dump of SKOS graph to save parsing time
518 Use least frequent token as key in TokenSetIndex used by MLLM
520 Optimize limit_mask creation

Bug fixes:
510/502 Use set as container of uris instead of list in DocumentFile
515/453 Allow NN ensemble to be used for parallel eval
517 Skip unimportant subjects in _vector_to_list_suggestion
522/521 Allow private projects to be accessed from CLI

0.53.2

This patch release includes the following changes:
- 506 Fix NN ensemble training and learning on one-document corpus
- 509 Warn instead of error in case of multiple subjects per doc in SVC training
- 503 Fix read-the-docs documentation build error due to package conflict

0.53.1

This patch release [fixes](https://github.com/NatLibFi/Annif/pull/501) a bug which prevented training the SVC backend on fulltext corpus.

0.53

0.53.0

This release adds two new backends, YAKE and SVC. The YAKE backend is a wrapper around the [YAKE library](https://github.com/LIAAD/yake), which performs lexical unsupervised keyword extraction. There is no need for training data. See the [YAKE](https://github.com/NatLibFi/Annif/wiki/Backend%3A-YAKE) wiki page for more information. In future Annif releases, it would be possible to extend YAKE support so that it can be used to suggest new terms for a vocabulary (the keywords that are not found in the vocabulary).

The SVC backend implements Linear Support Vector Classification. It is well suited for multiclass (but not multilabel) classification, for example classifying documents with the Dewey Decimal Classification or the 20 Newsgroups classification. It requires relatively little training data, and is suitable for classifications of up to around 10,000 classes. See the [SVC](https://github.com/NatLibFi/Annif/wiki/Backend%3A-SVC) wiki page for more information.

This release also upgrades many dependencies, which enables all Annif backends to run on Python 3.9 (previously nn_ensemble backend was available only for 3.6-3.8). The Docker image uses now Python 3.8 instead of 3.7.

Note that nn_ensemble models are not compatible across Python versions: e.g. a model trained on Python 3.7 can be used only on Python 3.7. Training the nn_ensemble models shows a `CustomMaskWarning`, but it is harmless (caused by a [TensorFlow bug](https://github.com/tensorflow/tensorflow/issues/49754)) and can be ignored.

Due to the update of scikit-learn, using TFIDF, MLLM or Omikuji models trained on older Annif versions will show warnings about the `TfidfVectorizer`. To the best of our knowledge, these are harmless and can be ignored. You have to retrain the models to get rid of the warnings.

This release includes also many minor improvements and bug fixes.

New features:
486 New SVC (support vector classification) backend using scikit-learn
439/461 YAKE backend
490/494 Make --version option show Annif version

Improvements:
488 Add support for ngram setting in omikuji backend

Maintenance:

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.