Kazu

Latest version: v2.0.0

Safety actively analyzes 641171 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

2.0.0

Features

- (De)serialization has been greatly improved, simplified, made correct, and given a slightly more compact serialized representation.
This does mean there are some small changes in (de)serialization behaviour since the previous release.
- Curation process has been significantly improved and simplified for the end user, including introducing the `AutoCurator` concept to aid in this. This will enable us to build out better documentation and an interactive tool in future releases, which are currently in draft. Overally, this will greatly simplify upgrading ontology versions, adding curations for a new ontology etc.
- Datamodel has been substantially revised in a **backwards incompatible** manner to clear up confusing concepts, fix longstanding issues etc.
- New Zero shot NER model with GLiNER

Deprecations and Removals

- Remove deprecated `GildaUtils.replace_dashes`. This was superceded by `GildaUtils.split_on_dashes_or_space` and was already deprecated pending removal.
- Remove deprecated `SpacyToKazuObjectMapper`, as this was renamed to `KazuToSpacyObjectMapper`, and the old name already deprecated pending removal.
- Remove deprecated `create_phrasematchers_using_curations` method of `OntologyMatcher`. This was renamed to `create_phrasematchers` and was already deprecated pending removal.
- Rename `Document.json` to `to_json`, and remove optional arguments.
The previous name was inconsistent with naming on other classes, as the function signature were parallel to `to_json` methods.
The argument `drop_unmapped_ents` had functionality that was duplicated with `DropUnmappedEntityFilter` within the `CleanupStep`,
and it made sense to add the `drop_terms` behaviour to a new `LinkingCandidateRemovalCleanupAction` to collect this behaviour together
and significantly simplify the Document serialization code.
- Rename `ParserActions.from_json` and `GlobalParserActions.from_json` to `from_dict`.
The previous names were misleading, as the function signature were parallel to the `from_dict` methods on other classes, not to their `from_json` methods.
- Renamed `SynonymDatabase.add` to `SynonymDatabase.add_parser`, for consistency with `MetadataDatabase.add_parser`.

1.5.1

Bugfixes

- Pinned scipy to <1.12.0 due to breaking API change.

1.5.0

Features

- Added new cleanup action: DropMappingsByParserNameRankAction
- Added new disambiguation strategy: PreferNearestEmbeddingToDefaultLabelDisambiguationStrategy.
- DefinedElsewhereInDocumentDisambiguationStrategy has slightly changed, so that it will only return mappings that were found elsewhere in the document, rather than the whole EquivalentIdSet where those ids were contained
- New disambiguation methodology GildaTfIdfDisambiguationStrategy.
- OpenTargetsTargetOntologyParser now has a biotype filter parameter.

Deprecations and Removals

- Deprecated `GildaUtils.replace_dashes` in favour of `GildaUtils.split_on_dashes_or_space`, as the latter improves efficiency in Kazu.
`GildaUtils.replace_dashes` will continue to work until kazu 1.6, but using it will produce a `DeprecationWarning`.
Please [open a GitHub issue](https://github.com/AstraZeneca/KAZU/issues/new) if you wish this to remain.

1.4.0

Features

- Added new curation_report.py to assist in upgrading ontologies between versions
- New disambiguation strategy to prefer mappings that have a default label that matches an entity.
- The OpenTargetsDiseaseOntologyParser has been heavily reworked, so that it uses the therapeutic_area concept to decide what records should be included. This has in turn yielded the subsets: measurement, medical_procedure, biological_process and phenotype. The measurement configuration is currently disabled as it requires heavy curation of the underlying strings. In addition, the OpenTargetsDiseaseOntologyParser now supports a custom ID grouping method, to make use of cross references.

Bugfixes

- MemoryEfficientStringMatchingStep now only produces a single entity per class where multiple curations exist with different cases.
- Previously, the `tested_dependencies.txt` file in the model packs included an editable install of kazu, which wasn't intended.
We now exclude kazu from that output.
- Speed up model pack builds for model packs using `ExplosionStringMatchingStep`, by fixing a bug that caused the parsers to be populated twice in this case.

Deprecations and Removals

- Removed pytorch-lightning as a dependency. The signatures of SapbertStringSimilarityScorer and TransformersModelForTokenClassificationNerStep have changed
- Renamed `create_phrasematchers_using_curations` method of `OntologyMatcher` to `create_phrasematchers`. The old name will continue to work until kazu 1.6, but using it will produce a `DeprecationWarning`.
- `MetadataDatabase.add_parser` now requires an `entity_class`.
This enables correct string normalisation in the `MappingStep` for the new disambiguation strategy.

1.3.2

Bugfixes

- Hits with scores of 0.0 are no longer returned by DictionaryIndex
- Pin lightning-utilities dependency, a new version of which completely broke the model inference, despite lightning itself being pinned (they didn't pin lightning-utilities appropriately in the version we're using).

1.3.1

Features

- Added methods to dataclasses that allow them to be deserialied from json.

Deprecations and Removals

- Renamed `SpacyToKazuObjectMapper` to `KazuToSpacyObjectMapper`.
The old name will continue to work until kazu 1.6, but using it will produce a `DeprecationWarning`
- `RulesBasedEntityClassDisambiguationFilterStep` no longer requires `parsers` or `other_entity_classes`.
It previously used these to construct the `entity_classes` argument of `KazuToSpacyObjectMapper.__init__`, but now we can just calculate which of these we really need from the class and mention rules passed to `RulesBasedEntityClassDisambiguationFilterStep.__init__`

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.