Kazu

Latest version: v2.3.0

Safety actively analyzes 702510 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

2.3.0

Features

- Release new multilabel biomedBERT model trained on LLM (Gemini) synthetically generated NER data. The model was trained on over 7000 LLM annoted documents with a total of 295822 samples.
The model was trained for 21 epochs and achieved an F1 score of 95.6% on a held out test set. (multilabel_bert)
- added multilabel NER training example and config.
- added scaling kazu with Ray docs and example.

Bugfixes

- Fix issue with TransformersModelForTokenClassificationNerStep when processing large amounts of documents. The fix offloads tensors onto cpu before performin the torch.cat operation which lead to a zero tensor before. (pytorch_memory_issue)

2.2.1

Features

- Update ontologies to later versions (ontology_updates)

Bugfixes

- Fix synonym generator to only check if strings exist in original synonyms. Update tests (combinatorial_synonym_generator)
- Remove save/reset button not belonging on page 1 (krt)

2.2.0

Features

- New LLMNERStep, for performing NER with LLMs

Bugfixes

- Fix bug with Chromosome X being converted to Chromosome 10 raised in 42 (chromosomeX)
- Fix pip install command in docs raised in 56 (docs_pip_command)
- Added new multiword AutoCurationAction, and adjusted some curations as per 58.

2.1.1

No significant changes.

2.1.0

Features

- Added new Kazu Resource Tool UI to ease the process of updating resources and resource configuration.
- New OntologyDownloader abstraction to assist with resource updating.
- Updated resources for June 2024.

2.0.0

Features

- (De)serialization has been greatly improved, simplified, made correct, and given a slightly more compact serialized representation.
This does mean there are some small changes in (de)serialization behaviour since the previous release.
- Curation process has been significantly improved and simplified for the end user, including introducing the `AutoCurator` concept to aid in this. This will enable us to build out better documentation and an interactive tool in future releases, which are currently in draft. Overally, this will greatly simplify upgrading ontology versions, adding curations for a new ontology etc.
- Datamodel has been substantially revised in a **backwards incompatible** manner to clear up confusing concepts, fix longstanding issues etc.
- New Zero shot NER model with GLiNER

Deprecations and Removals

- Remove deprecated `GildaUtils.replace_dashes`. This was superceded by `GildaUtils.split_on_dashes_or_space` and was already deprecated pending removal.
- Remove deprecated `SpacyToKazuObjectMapper`, as this was renamed to `KazuToSpacyObjectMapper`, and the old name already deprecated pending removal.
- Remove deprecated `create_phrasematchers_using_curations` method of `OntologyMatcher`. This was renamed to `create_phrasematchers` and was already deprecated pending removal.
- Rename `Document.json` to `to_json`, and remove optional arguments.
The previous name was inconsistent with naming on other classes, as the function signature were parallel to `to_json` methods.
The argument `drop_unmapped_ents` had functionality that was duplicated with `DropUnmappedEntityFilter` within the `CleanupStep`,
and it made sense to add the `drop_terms` behaviour to a new `LinkingCandidateRemovalCleanupAction` to collect this behaviour together
and significantly simplify the Document serialization code.
- Rename `ParserActions.from_json` and `GlobalParserActions.from_json` to `from_dict`.
The previous names were misleading, as the function signature were parallel to the `from_dict` methods on other classes, not to their `from_json` methods.
- Renamed `SynonymDatabase.add` to `SynonymDatabase.add_parser`, for consistency with `MetadataDatabase.add_parser`.

Page 1 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.