Chemdataextractor2

Latest version: v2.2.2

Safety actively analyzes 623616 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

2.2.2

Bugfixes to model location and elsevier readers

2.2.1

2.2.0

CDE 2.2.0 as described in [Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications](https://www.nature.com/articles/s41597-023-02511-6)

*New features:*
- `InferredProperty` allows the creation of properties that are automatically created based on other properties.
- Enhance merging based on distance and any confidence measures
- Allow parsers to skip individual elements and sections during parsing based on the type of content and the section title
- Add clause (subsentence) extraction as described in the paper
- Allow model parameters to be never merged or ignored when merging.
- Add AutoDependencyParser
- Update document readers
- Add the ability to deserialize serialized records

2.1.2

The new class `PlainTextCacher` allows the user to cache results of expensive computation such as tagging or tokenising. Use this by calling `cache_document` on an existing document and `hydrate_document` to reuse these computations.

In an extraction scenario for a real document, this can lead to >10x performance, as only parsing needs to be done from the second run onwards.

2.1.1

Fix bug in Snowball parser that was caused by it not being migrated to the new API for sentence parsers, and reflected this in the documentation.

2.1.0

**Implemented Enhancements:**
- An improved NER system that allows for much better performance on inorganic materials.
- New tokenization to go with the new NER system.
- The addition of InferredProperty allows for users to define explicit links between different properties included in their data models, reducing a large amount of boilerplate parser code.
- The Every parse element means that users can specify that a certain token satisfy multiple condition.
- A more flexible tagging system that allows for the creation of taggers beyond just part of speech and NER taggers.
- Batch tagging.
- A new, modern theme for the documentation, along with much more detail in the documentation on certain parts of ChemDataExtractor, such as tagging and tokenization.

**Breaking Changes:**
- Any taggers previously written by the user will be broken. Please refer to the migration guide for version 2.1.
- The new tokenization can break some parse rules written by the user. This can either be fixed by adopting a few changes to the parse rules, or by reverting to the previous NER system and tokenizer. Please refer to the migration guide for more details.

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.