Edsnlp

Latest version: v0.16.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 8

0.12.1

Added

- Added binary distribution for linux aarch64 (Streamlit's environment)
- Added new separator option in eds.table and new input check

Fixed

- Make catalogue & entrypoints compatible with py37-py312
- Check that a data has a doc before trying to use the document's `note_datetime`

0.12.0

Added

- The `eds.transformer` component now accepts `prompts` (passed to its `preprocess` method, see breaking change below) to add before each window of text to embed.
- `LazyCollection.map` / `map_batches` now support generator functions as arguments.
- Window stride can now be disabled (i.e., stride = window) during training in the `eds.transformer` component by `training_stride = False`
- Added a new `eds.ner_overlap_scorer` to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold
- `edsnlp.load` now accepts EDS-NLP models from the huggingface hub 🤗 !
- New `python -m edsnlp.package` command to package a model for the huggingface hub or pypi-like registries
- Improve table detection in `eds.tables` and support new options in `table._.to_pd_table(...)`:
- `header=True` to use first row as header
- `index=True` to use first column as index
- `as_spans=True` to fill cells as document spans instead of strings

Changed

- :boom: Major breaking change in trainable components, moving towards a more "task-centric" design:
- the `eds.transformer` component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via the `preprocess` method, which now accepts more arguments than just the docs to process.
- similarly the `eds.span_pooler` is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in the `preprocess` method.

Consequently, the `eds.transformer` and `eds.span_pooler` no longer accept their `span_getter` argument, and the `eds.ner_crf`, `eds.span_classifier`, `eds.span_linker` and `eds.span_qualifier` components now accept a `context_getter` argument instead, as well as a `span_getter` argument for the latter two. This refactoring can be summarized as follows:

diff
- eds.transformer.span_getter
+ eds.ner_crf.context_getter
+ eds.span_classifier.context_getter
+ eds.span_linker.context_getter

- eds.span_pooler.span_getter
+ eds.span_qualifier.span_getter
+ eds.span_linker.span_getter

and as an example for the `eds.span_linker` component:

diff
nlp.add_pipe(
eds.span_linker(
metric="cosine",
probability_mode="sigmoid",
+ span_getter="ents",
+ context_getter="ents", -> by default, same as span_getter
embedding=eds.span_pooler(
hidden_size=128,
- span_getter="ents",
embedding=eds.transformer(
- span_getter="ents",
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
),
),
name="linker",
)

- Trainable embedding components now all use `foldedtensor` to return embeddings, instead of returning a tensor of floats and a mask tensor.
- :boom: TorchComponent `__call__` no longer applies the end to end method, and instead calls the `forward` method directly, like all torch modules.
- The trainable `eds.span_qualifier` component has been renamed to `eds.span_classifier` to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not).
- `omop` converter now takes the `note_datetime` field into account by default when building a document
- `span._.date.to_datetime()` and `span._.date.to_duration()` now automatically take the `note_datetime` into account
- `nlp.vocab` is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway

Fixed

- `edsnlp.data.read_json` now correctly read the files from the directory passed as an argument, and not from the parent directory.
- Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
- Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors

0.11.2

Fixed
- Fix `edsnlp.utils.file_system.normalize_fs_path` file system detection not working correctly
- Improved performance of `edsnlp.data` methods over a filesystem (`fs` parameter)

0.11.1

Added

- Automatic estimation of cpu count when using multiprocessing
- `optim.initialize()` method to create optim state before the first backward pass

Changed

- `nlp.post_init` will not tee lazy collections anymore (use `edsnlp.utils.collections.multi_tee` yourself if needed)

Fixed

- Corrected inconsistencies in `eds.span_linker`

0.11.0

Added

- Support for a `filesystem` parameter in every `edsnlp.data.read_*` and `edsnlp.data.write_*` functions
- Pipes of a pipeline are now easily accessible with `nlp.pipes.xxx` instead of `nlp.get_pipe("xxx")`
- Support builtin Span attributes in converters `span_attributes` parameter, e.g.
python
import edsnlp

nlp = ...
nlp.add_pipe("eds.sentences")

data = edsnlp.data.from_xxx(...)
data = data.map_pipeline(nlp)
data.to_pandas(converters={"ents": {"span_attributes": ["sent.text", "start", "end"]}})

- Support assigning Brat AnnotatorNotes as span attributes: `edsnlp.data.read_standoff(..., notes_as_span_attribute="cui")`
- Support for mapping full batches in `edsnlp.processing` pipelines with `map_batches` lazy collection method:
python
import edsnlp

data = edsnlp.data.from_xxx(...)
data = data.map_batches(lambda batch: do_something(batch))
data.to_pandas()

- New `data.map_gpu` method to map a deep learning operation on some data and take advantage of edsnlp multi-gpu inference capabilities
- Added average precision computation in edsnlp span_classification scorer
- You can now add pipes to your pipeline by instantiating them directly, which comes with many advantages, such as auto-completion, introspection and type checking !

python
import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
instead of nlp.add_pipe("eds.sentences")

*The previous way of adding pipes is still supported.*
- New `eds.span_linker` deep-learning component to match entities with their concepts in a knowledge base, in synonym-similarity or concept-similarity mode.

Changed

- `nlp.preprocess_many` now uses lazy collections to enable parallel processing
- :warning: Breaking change. Improved and simplified `eds.span_qualifier`: we didn't support combination groups before, so this feature was scrapped for now. We now also support splitting values of a single qualifier between different span labels.
- Optimized edsnlp.data batching, especially for large batch sizes (removed a quadratic loop)
- :warning: Breaking change. By default, the name of components added to a pipeline is now the default name defined in their class `__init__` signature. For most components of EDS-NLP, this will change the name from "eds.xxx" to "xxx".

Fixed

- Flatten list outputs (such as "ents" converter) when iterating: `nlp.map(data).to_iterable("ents")` is now a list of entities, and not a list of lists of entities
- Allow span pooler to choose between multiple base embedding spans (as likely produced by `eds.transformer`) by sorting them by Dice overlap score.
- EDS-NLP does not raise an error anymore when saving a model to an already existing, but empty directory

0.10.7

Added

- Support empty writer converter by default in `edsnlp.data` readers / writers (do not convert by default)
- Add support for polars data import / export
- Allow kwargs in `eds.transformer` to pass to the transformer model

Changed

- Saving pipelines now longer saves the `disabled` status of the pipes (i.e., all pipes are considered "enabled" when saved). This feature was not used and causing issues when saving a model wrapped in a `nlp.select_pipes` context.

Fixed

- Allow missing `meta.json`, `tokenizer` and `vocab` paths when loading saved models
- Save torch buffers when dumping machine learning models to disk (previous versions only saved the model parameters)
- Fix automatic `batch_size` estimation in `eds.transformer` when `max_tokens_per_device` is set to `auto` and multiple GPUs are used
- Fix JSONL file parsing

Page 2 of 8

Releases

Has known vulnerabilities

Previous Next

Edsnlp

Page 2 of 8

0.12.1

0.12.0

0.11.2

0.11.1

0.11.0

0.10.7

Page 2 of 8

Links

Releases