Annif

Latest version: v1.2.0

Safety actively analyzes 682361 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 8

1.0.0

We are excited to introduce Annif version 1.0!

Advancing the version number to the 1.x series means that Annif is considered ready for more general, production use. The upcoming releases in the series (patches 1.0.x and minor feature releases 1.x.x) will be backward compatible, following the semantic versioning principle. See a [Wiki page describing the aspects of the compatibility](https://github.com/NatLibFi/Annif/wiki/Backward-compatibility-between-Annif-releases).

The changes in this release include enhancements to the command-line interface as well as many bug fixes and maintenance updates. The CLI commands, options and most parameters can now be tab-completed when the support is enabled: see instructions in [README.md](https://github.com/NatLibFi/Annif/tree/915d5db805163d9b8dffe01bee48bd835bb87f79#shell-compeletions). Also the CLI startup time has been optimized, and the output of many commands has been refined.

Python 3.11 is now mostly supported; the Omikuji backend cannot yet be used on Python 3.11 because the Omikuji library does not support it at the moment.

From now on the Docker image of the latest release in the [quay.io repository](https://quay.io/repository/natlibfi/annif?tab=tags) is going to be rebuilt from time to time in order to apply security updates to the image. The rebuilds will not change Annif itself. Version tags (`<major>.<minor>[.<patch>]`) can be used to reference the latest build of the version. To allow more strict pinning to a particular build, the images will also be tagged with the build date as a suffix: `<major>.<minor>.<patch>-<YYYYMMDD>`.

Supported Python versions:
* 3.8, 3.9 and 3.10 are fully supported
* 3.11 is supported except Omikuji backend

Backward compatibility:
* MLLM, STWFSA and NN ensemble projects trained with Annif v0.61 or older need to be retrained; for other projects the warnings by SciKit-learn are harmless
* Using STWFSA backend now requires installing an optional dependency

New features:
684/693 Support for CLI command completions
703/727 Python 3.11 support

Improvements:
696 Optimize CLI startup time
686/694 Improve outputs of project inspection CLI commands
704 Show scores in outputs of suggest, eval and index with only 4 decimals

Maintenance:
690/708 Use Python type hints
699/700 Make stwfsapy an optional dependency (credit: cbartz)
315/712/714 Add CI/CD job for testing Docker image
707/711 Ensure system packages are up-to-date in Docker image
715 Add CI/CD workflow for rebuilding Docker image
706/725 Test CLI startup time with CI/CD job
723 Update ReadTheDocs documentation
726/697/532 Update and pin dependencies v1.0
730 Switch to Keras v3 save format for nn_ensemble
731 Upgrade Docker baseimage to Debian Bookworm

Bug fixes:
705 Fix crashing index command when targeted directory contains subject files
717 Fix Python version in GitHub Actions CI/CD pipeline
718 Fix missing limit parameter in STWFSA backend
722 Fix train state and modification time for unfinished project training
720/721 Suppress TensorFlow info messages to debug level
695 Fix displaying of modification time for null value in Web UI project information
701 Remove duplicated fasttext entry in optional dependencies list in Dockerfile
728 Avoid PytestUnknownMarkWarning due to "slow" marker
729 Avoid scikit-learn UserWarning for vectorizer parameter token_pattern

Other:
616 Discussion on semantic versioning for Annif releases beyond 1.0

0.61

691 Upgrade Docker image to Python 3.10

Bug fixes:
674/677 Memory leak in NN ensemble backend

0.61.0

The main improvements in this release are internal changes to allow batch processing of documents for better suggestion performance and the streamlining of suggestion result representation by using sparse arrays. Currently batched processing of documents is implemented in the Omikuji, SVC, and all ensemble backends. Also a new REST API method for suggesting subjects for multiple documents has been added.

The new REST API method `/v1/projects/{project_id}/suggest-batch` accepts at most 32 documents in one POST request; the documents in the batch are processed in parallel when the used backend provides support for this. The request body is given in JSON format and, like in the case of the regular single-document suggest method, the limit, threshold and language parameters are optional and can be given as URL query parameters. For details see the [interactive OpenAPI documention](https://api.annif.org/v1/ui/#/Automatic%20subject%20indexing/annif.rest.suggest_batch) of the REST API of annif.org.

The [`annif suggest`](https://annif.readthedocs.io/en/v0.61.0/source/commands.html#annif-suggest) CLI command is augmented to accept path(s) to file(s) to be processed, in addition to stdin, to enable it to operate on multiple documents. The [`annif optimize`](https://annif.readthedocs.io/en/v0.61.0/source/commands.html#annif-optimize) command is now much faster than before and supports using a `--jobs` parameter for parallel processing.

The Annif Docker image has been updated to use Python 3.10.

Also various maintenance tasks have been performed, for example, the default branch of the git repository has been renamed from `master` to `main`, the [Schemathesis](https://github.com/schemathesis/schemathesis) tool has been introduced for testing the REST API and many dependendencies have been updated. A bug causing a memory leak in the neural network ensemble backend bas been fixed.

The next release of Annif will be version 1.0. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.

Backward compatibility:
* Models trained with Annif v0.60 should remain working; the warnings by SciKit-learn are harmless
* LRAP metric has been removed from evaluation results

New features:
664 Add REST API method `/v1/projects/{project_id}/suggest-batch`
663 Support for batch suggest operations for CLI commands
423/681 Parallelize optimize command

Improvements:
678/681 Represent suggestion results as sparse arrays
665/669 Batch suggest in Omikuji backend
667/670 Batch suggest in SVC backend
677 Batch suggest in ensemble backends
671 Add log message indicating finishing projects initialization
673 Suppress duplicate log messages from subject module

Maintenance:
668 Migrate codestyle to Black v23
679/680 Switch default git branch to main
672 Fix slow CI/CD runs for Python 3.10
675 Refactor and cleanup CLI module
682/685 Schemathesis tests for REST API and OpenAPI schema fixes

0.60

647/661 Order of projects when using project configuration directory

Maintenance:
609/640 Use black code style
641 Use isort to order import statements
656 Install linting tools with Poetry in CI/CD pipeline
624 Increase timeout of test and publish GH Actions jobs
653 Add CodeQL workflow for GitHub code scanning
599/650 Avoid using pytest-flake8 plugin
 657/662 Upgrade GitHub Actions
636 Better set up for docker-compose

0.60.0

This release includes improvements and maintenance updates in particular to the Web UI and REST API as well as some new functionality, especially related to multilingual support. The Web UI no longer relies on jQuery, as the last parts that were used were replaced with Axios. The REST API and Web UI updates are by UnniKohonen, who has joined NatLibFi as a trainee in the Annif & Finto development teams.

It is now possible to override the language for subject suggestion labels instead of always using the project language: when using the `annif suggest` command by giving the new `--language/-L` option, and when using the REST API suggest method by the new optional `language` parameter.

A new resource is added to the root of the REST API (i.e. `http://<annif_host>/v1/`) that gives basic information on the API (a title for the API and the version of Annif being used). Also, the REST API spec has been updated to OpenAPI 3.0. In the Web UI it is now possible to see detailed information about a project (language, backend type, modification timestamp etc.). 

Multiprocessing support for Mac OS and Windows environments has been improved by supporting the 'spawn' multiprocessing mode.

The language detection is now performed with [Simplemma](https://github.com/adbar/simplemma) instead of pycld3. This functionality is now installed by default instead of being an optional extra.

New code style tools Black and isort are now used to help maintaining good code quality; see [CONTRIBUTING.md](https://github.com/NatLibFi/Annif/blob/master/CONTRIBUTING.md) how they can be used and instructions to how best participate in Annif development.

Many dependendencies have been updated to their most recent versions.

Note also that we are preparing for Annif 1.0 release. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.


Backward compatibility:

* Models trained with Annif v0.59 should remain working; the warnings by SciKit-learn are harmless
* The `annif loadvoc` command has been removed, as in the previous release it was deprecated and replaced by the `annif load-vocab` command.

New features:
628/630 Allow overriding subject label language in CLI and REST suggest operations
637/638 Add support for spawn multiprocessing mode
654 Add project info to web UI
655/658 Add REST API root resource

Improvements:
593/626 Use Simplemma instead of pycld3 for language detection
643 Add CONTRIBUTING.md file
645 Use tailored user-agent in requests by HTTP-backend
644/649 Upgrade REST API spec to OpenAPI 3
627 Upgrade joblib to 1.2.x

0.59.0

This release makes many changes to how Annif handles vocabularies.

First, the vocabularies are now multilingual: projects with different languages can share the same vocabulary by using a common vocabulary id in the project configurations. The vocabulary id should no longer include a language specifier, which has been the practice until now. The language of the labels of subject suggestions is now defined by the project's language setting, or it can be overridden in a project by giving the language code in parentheses after the vocabulary id (e.g. `vocab=lcsh(en)` in a Finnish language project). These changes break the backward compatibility of existing projects and vocabularies.

The CLI command for loading a vocabulary has changed: the command is now `annif load-vocab` to align with the other annif commands and its first argument is a vocabulary id instead of a project id. When loading a vocabulary from a TSV file the `--language` option needs to be given to set the language. A command `annif list-vocabs` is introduced for listing vocabularies. The old `annif loadvoc` command still works in this release, but it has been deprecated and will be removed in the next Annif release.

The CLI commands are now documented in a [page on the ReadTheDocs](https://annif.readthedocs.io/en/stable/source/commands.html) instead of the Annif wiki. The development installations of Annif now use [Poetry](https://python-poetry.org/) for managing Python virtual environments and dependencies. There are also a few other minor changes, including an upgrade to Simplemma v0.8 series that introduced support for new languages.

Note also that we are starting to prepare for Annif 1.0 release. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.

Backward compatibility
The changes in the vocabulary functionality require **reloading of previously loaded vocabularies** and **retraining of existing models**.

New features
559/600 Make vocabularies multilingual
602/614 Implement `load-vocab` and `list-vocabs` commands
603/610 Store vocabs in AnnifRegistry so they are shared between projects
597 Include labels without language tag and concepts without labels in vocabulary

Improvements
617/618 Upgrade to simplemma 0.8 and disable unnecessary cache
595/611 Autogenerated CLI commands documentation on ReadTheDocs
612 Add Annif logo to ReadTheDocs sidebar
608 Multilingual SubjectIndex backed by CSV file
604 Refactor SubjectSuggestion to store subject_id - not uri, label, notation

Maintenance
607 Remove language suffixes from vocabulary ids in example config
606 Refactor SubjectSet and Document to store subject IDs instead of URIs and labels
601/605 Switch to Poetry for dependency management
621 Remove curl from Docker image
622 Remove Poetry cache from Docker image

Fixes
613 Restore ability to use vocab language different from project language
619 Allow use of hyphens in vocabulary IDs
620 Make NN ensemble suggest operations silent

Page 2 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.