Annif

Latest version: v1.2.0

Safety actively analyzes 683480 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 8

0.52.0

This release includes a new MLLM backend which is a Python implementation of the Maui-like Lexical Matching algorithm. It was inspired by the [Maui algorithm](https://hdl.handle.net/10289/3513) (by Alyona Medelyan), but not a direct reimplementation. It is meant for long full-text documents and like Maui, it needs to be trained with a relatively small number (hundreds or thousands) of manually indexed documents so that the algorithm can choose the right mix of heuristics that achieves best results on a particular document collection. See [the MLLM Wiki page](https://github.com/NatLibFi/Annif/wiki/Backend%3A-MLLM) for more information.

New features include the possibility to configure two project parameters:
- `min_token_length` [can be set in the analyzer parameters](https://github.com/NatLibFi/Annif/wiki/Analyzers); e.g. setting the value to 2 allows the word "UK" to pass to a backend, while with the default value (3) the word is filtered out by the analyzer
- `lr` can be set in the [neural-network ensemble](https://github.com/NatLibFi/Annif/wiki/Backend%3A-nn_ensemble) project configuration to define the learning rate.

The STWFSA backend has been updated to use a newer version of the [stwfsapy library](https://github.com/zbw/stwfsapy). Old STWFSA models are not compatible with the new version so any STWFSA projects must be retrained. The release includes also several minor improvements and bug fixes.


New features:
462 New lexical backend MLLM
456/468 Allow configuration of token min length (credit: [mo-fu](https://github.com/mo-fu))
475 Allow configuration of nn ensemble learning rate (credit: [mo-fu](https://github.com/mo-fu))

Improvements:
478/479 Update stwfsa to 0.2.* (credit: [mo-fu](https://github.com/mo-fu))
472 Cleanup suggestion tests
480 Optimize check for deprecated subject IDs using a set

Maintenance:
474 Use GitHub Actions as CI service

Bug fixes:
470/471 Make sure suggestion scores are in the range 0.0-1.0
477 Optimize the optimize command
481 Backwards compatibility fix for the token_min_length setting
482 MLLM fix: don't include use_hidden_labels in hyperopt, it won't have any effect

0.51.0

This release includes a new [STWFSA backend](https://github.com/NatLibFi/Annif/wiki/Backend%3A-STWFSA) which is a wrapper around [STWFSAPY](https://github.com/zbw/stwfsapy), a lexical algorithm based on finite state automata. It achieves best results with short texts, i.e., titles and author keywords, and is best suited for English language data.

The NN ensemble backend has been improved with better handling of source weights. Retraining NN ensemble models after updating Annif to this version is recommended, since the quality of results can decrease if old models are used. A new option for several CLI commands has been added: `--docs-limit/-d` option can be used to limit the number of documents to process, for example to create learning-curve data. Also several bugs have been fixed.

New features:
438 Lexical STWFSAPY Backend (credit mo-fu)
465 Limit document number CLI option

Improvements:
457/458 Improved handling of source weights in NN ensemble

Bug fixes:
454/455 Address SonarCloud complaints
459/460 Pass limit parameter to Maui Server during train
463 Fix TruncatingCorpus iterator

0.50.0

This release introduces a setting to use only a part of the input text for subject indexing: the new `input_limit` project parameter truncates the input text to the given character number. This can improve the quality of the suggestions as the beginning of a long document typically includes an abstract and introduction. The default value for `input_limit` is zero, which means that truncation is not performed.

Improvements include better handling of cached data in nn_ensemble training and optimization of memory usage in evaluation by using sparse matrices for suggested subjects. Many dependencies have been updated and a few minor issues fixed.

New features:
446 Add a backend paratemer to limit input characters in suggest
452 Apply the input_limit backend parameter to texts in train & learn

Improvements:
441 Sparse subjects (credit mo-fu)
443/444 Allow use of cached data after cancelled training of nn_ensemble backend

Maintenance:
448 Upgrade dependencies
445 Upgrade LMDB dependency from 0.98 to 1.0.0
449 Resolve DeprecationWarning: change warn to warning

Bug fixes:
447 Fix missing default params in pav and nn ensemble

0.49.0

This release introduces the hyperopt CLI command for hyperparameter optimization. Initially it can only be used for finding optimal ensemble weights. The Web UI now follows the same visual style as the annif.org website. There are also some improvements to CLI commands, memory optimizations and bug fixes.

New features:
* 240/321/414 Hyperparameter optimization of ensemble weights

Improvements:
* 424/426 New style for Web UI
* 430 Define short form for CLI options and fix some of their docstrings
* 428 Memory optimization: Avoid double allocation of NumPy arrays in eval operation

Maintenance:
* 437 Upgrade TensorFlow to version 2.3.0 (from 2.2.0)

Bug fixes:
* 431 Problem parsing timestamps from Maui Server
* 432 Make modification timestamps timezone-aware

0.48.0

This release brings a major upgrade of the fastText library, switching from the old fasttextmirror package to the new official fasttext Python bindings. The generation of fastText training files has been rewritten. The release also introduces an experimental feature to speed up model evaluation using multiprocessing; a `--jobs N` option can be used with [the `eval` command](https://github.com/NatLibFi/Annif/wiki/Commands#evaluate-on-a-collection-of-manually-indexed-files) to perform evaluation in N parallel jobs. Another new feature is the addition of project state details to project information listings (is a project trained or not, and timestamp of training). Also minor improvements and bug fixes are included.

New features:
- 65/417/418/425 Evaluate documents in parallel
- 329/415 Show project train state and modification time

Improvements:
- 290/292/409/412 Upgrade fastText to official version 0.9.2 (credit: mvsjober)
- 413 Upgrade to omikuji 0.3.x

Maintenance:
- 411 Run Travis CI fastText tests on Python 3.7 instead of 3.6
- 421 Pin SciPy to 1.4.1 as required by TensorFlow 2.2.0

Bug fixes:
- 422 Assign first retrieved project to selected variable (credit: mo-fu)
- 419 WEB-UI: Remove empty entry from list of projects (credit: mo-fu)
- 357/410 fastText training file incorrectly generated

0.47.1

This patch release installs [Tensorflow 2.2 without GPU support](https://pypi.org/project/tensorflow-cpu/) (introduced by default in [TF 2.1](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0)) as currently Annif does not benefit from the GPU support but it takes quite much disk space. This patch reduces the size of Annif's Docker image from 2.4 GB to 1.4 GB.

Page 5 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.