Edsnlp

Latest version: v0.16.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 8

0.8.0

Added

- New trainable component for multi-label, multi-class span qualification (any attribute/extension)
- Add range measurements (like `la tumeur fait entre 1 et 2 cm`) to `eds.measurements` matcher
- Add `eds.CKD` component
- Add `eds.COPD` component
- Add `eds.alcohol` component
- Add `eds.cerebrovascular_accident` component
- Add `eds.congestive_heart_failure` component
- Add `eds.connective_tissue_disease` component
- Add `eds.dementia` component
- Add `eds.diabetes` component
- Add `eds.hemiplegia` component
- Add `eds.leukemia` component
- Add `eds.liver_disease` component
- Add `eds.lymphoma` component
- Add `eds.myocardial_infarction` component
- Add `eds.peptic_ulcer_disease` component
- Add `eds.peripheral_vascular_disease` component
- Add `eds.solid_tumor` component
- Add `eds.tobacco` component
- Add `eds.spaces` (or `eds.normalizer` with `spaces=True`) to detect space tokens, and add `ignore_space_tokens` to `EDSPhraseMatcher` and `SimstringMatcher` to skip them
- Add `ignore_space_tokens` option in most components
- `eds.tables`: new pipeline to identify formatted tables
- New `merge_mode` parameter in `eds.measurements` to normalize existing entities or detect
measures only inside existing entities
- Tokenization exceptions (`Mr.`, `Dr.`, `Mrs.`) and non end-of-sentence periods are now tokenized with the next letter in the `eds` tokenizer

Changed

- Disable `EDSMatcher` preprocessing auto progress tracking by default
- Moved dependencies to a single pyproject.toml: support for `pip install -e '.[dev,docs,setup]'`
- ADICAP matcher now allow dot separators (e.g. `B.H.HP.A7A0`)

Fixed

- Abbreviation and number tokenization issues in the `eds` tokenizer
- `eds.adicap` : reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)
- Fix build for python 3.9 on Mac M1/M2 machines.

0.7.4

Added

- `eds.history` : Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).
- `eds.negation` : It takes into account following past participates and preceding infinitives.
- `eds.hypothesis`: It takes into account following past participates hypothesis verbs.
- `eds.negation` & `eds.hypothesis` : Introduce new patterns and remove unnecessary patterns.
- `eds.dates` : Add a pattern for preceding relative dates (ex: l'embolie qui est survenue **à 10 jours**).
- Improve patterns in the `eds.pollution` component to account for multiline footers
- Add `QuickExample` object to quickly try a pipeline.
- Add UMLS terminology matcher `eds.umls`
- New `RegexMatcher` method to create spans from groupdicts
- New `eds.dates` option to disable time detection

Changed

- Improve date detection by removing false positives

Fixed

- `eds.hypothesis` : Remove too generic patterns.
- `EDSTokenizer` : It now tokenizes `"rechereche d'"` as `["recherche", "d'"]`, instead of `["recherche", "d", "'"]`.
- Fix small typos in the documentation and in the docstring.
- Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
- Fix BratConnector file loading issues with complex file hierarchies

0.7.2

Added

- Improve the `eds.history` component by taking into account the date extracted from `eds.dates` component.
- New pop up when you click on the copy icon in the termynal widget (docs).
- Add NER `eds.elston-ellis` pipeline to identify Elston Ellis scores
- Add flags=re.MULTILINE to `eds.pollution` and change pattern of footer

Fixed

- Remove the warning in the ``eds.sections`` when ``eds.normalizer`` is in the pipe.
- Fix filter_spans for strictly nested entities
- Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing
- Allow back spaCy components whose name contains a dot (forbidden since spaCy v3.4.2) for backward compatibility.

0.7.1

Added

- Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)

Changed

- Improved TNM detection algorithm
- Account for more modifiers in ADICAP codes detection

Fixed

- Add nephew, niece and daughter to family qualifier patterns
- EDSTokenizer (`spacy.blank('eds')`) now recognizes non-breaking whitespaces as spaces and does not split float numbers
- `eds.dates` pipeline now allows new lines as space separators in dates

0.7.0

Added

- New nested NER trainable `nested_ner` pipeline component
- Support for nested entities and attributes in BratDataConnector
- Pytorch wrappers and experimental training utils
- Add attribute `section` to entities
- Add new cases for separator pattern when components of the TNM score are separated by a forward slash
- Add NER `eds.adicap` pipeline to identify ADICAP codes
- Add patterns to `pollution` pipeline and simplifies activating or deactivating specific patterns

Changed

- Simplified the configuration scheme of the `pollution` pipeline
- Update of the `ContextualMatcher` (and all pipelines depending on it), rendering it more flexible to use
- Rename R component of score TNM as "resection_completeness"

Fixed

- Prevent section titles from capturing surrounding tokens, causing overlaps (113)
- Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
- Fix explain mode, which was always triggered, in `eds.history` factory.
- Fix test in `eds.sections`. Previously, no check was done
- Remove SOFA scores spurious span suffixes

0.6.2

Added

- New `SimstringMatcher` matcher to perform fuzzy term matching, and `algorithm` parameter in terminology components and `eds.matcher` component
- Makefile to install,test the application and see the documentation

Changed

- Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
- Update the pipeline score `eds.TNM`. Now it is possible to return a dictionary where the results are either `str` or `int` values

Fixed

- Add new patterns to the negation qualifier
- Numpy header issues with binary distributed packages
- Simstring dependency on Windows

Page 5 of 8

Releases

Has known vulnerabilities

Previous Next

Edsnlp

Page 5 of 8

0.8.0

0.7.4

0.7.2

0.7.1

0.7.0

0.6.2

Page 5 of 8

Links

Releases