Edsnlp

Latest version: v0.16.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 8

0.10.6

Added

- Added `batch_by`, `split_into_batches_after`, `sort_chunks`, `chunk_size`, `disable_implicit_parallelism` parameters to processing (`simple` and `multiprocessing`) backends to improve performance
and memory usage. Sorting chunks can improve yield up to **twice the speed** in some cases.
- The deep learning cache mechanism now supports multitask models with weight sharing in multiprocessing mode.
- Added `max_tokens_per_device="auto"` parameter to `eds.transformer` to estimate memory usage and automatically split the input into chunks that fit into the GPU.

Changed

- Improved speed and memory usage of the `eds.text_cnn` pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.
- Deprecate the converters' (especially for BRAT/Standoff data) `bool_attributes`
parameter in favor of general `default_attributes`. This new mapping describes how to
set attributes on spans for which no attribute value was found in the input format.
This is especially useful for negation, or frequent attributes values (e.g. "negated"
is often False, "temporal" is often "present"), that annotators may not want to
annotate every time.
- Default `eds.ner_crf` window is now set to 40 and stride set to 20, as it doesn't
affect throughput (compared to before, window set to 20) and improves accuracy.
- New default `overlap_policy='merge'` option and parameter renaming in
`eds.span_context_getter` (which replaces `eds.span_sentence_getter`)

Fixed

- Improved error handling in `multiprocessing` backend (e.g., no more deadlock)
- Various improvements to the data processing related documentation pages
- Begin of sentence / end of sentence transitions of the `eds.ner_crf` component are now
disabled when windows are used (e.g., neither `window=1` equivalent to softmax and
`window=0`equivalent to default full sequence Viterbi decoding)
- `eds` tokenizer nows inherits from `spacy.Tokenizer` to avoid typing errors
- Only match 'ne' negation pattern when not part of another word to avoid false positives cases like `u[ne] cure de 10 jours`
- Disabled pipes are now correctly ignored in the `Pipeline.preprocess` method
- Add "eventuel*" patterns to `eds.hyphothesis`

0.10.5

Fixed

- Allow non-url paths when parquet filesystem is given

0.10.4

Changed

- Assigning `doc._.note_datetime` will now automatically cast the value to a `pendulum.DateTime` object

Added

- Support loading model from package name (e.g., `edsnlp.load("eds_pseudo_aphp")`)
- Support filesystem parameter in `edsnlp.data.read_parquet` and `edsnlp.data.write_parquet`

Fixed

- Support doc -> list converters with parquet files writer
- Fixed some OOM errors when writing many outputs to parquet files
- Both edsnlp & spacy factories are now listed when a factory lookup fails
- Fixed some GPU OOM errors with the `eds.transformer` pipe when processing really long documents

0.10.3

Added

- By default, `edsnlp.data.write_json` will infer if the data should be written as a single JSONL
file or as a directory of JSON files, based on the `path` argument being a file or not.

Fixed

- Measurements now correctly match "0.X", "0.XX", ... numbers
- Typo in "celsius" measurement unit
- Spaces and digits are now supported in BRAT entity labels
- Fixed missing 'permet pas + verb' false positive negation patterns

0.10.2

Changed

- `eds.span_qualifier` qualifiers argument now automatically adds the underscore prefix if not present

Fixed

- Fix imports of components declared in `spacy_factories` entry points
- Support `pendulum` v3
- `AsList` errors are now correctly reported
- `eds.span_qualifier` saved configuration during `to_disk` is now longer null

0.10.1

Changed

- Small regex matching performance improvement, up to 1.25x faster (e.g. `eds.measurements`)

Fixed

- Microgram scale is now correctly 1/1000g and inverse meter now 1/100 inverse cm.
- We now isolate some of edsnlp components (trainable pipes that require ml dependencies)
in a new `edsnlp_factories` entry points to prevent spacy from auto-importing them.
- TNM scores followed by a space are now correctly detected
- Removed various short TNM false positives (e.g., "PT" or "a T") and false negatives
- The Span value extension is not more forcibly overwritten, and user assigned values are returned by `Span._.value` in priority, before the aggregated `span._.get(span.label_)` getter result (220)
- Enable mmap during multiprocessing model transfers
- `RegexMatcher` now supports all alignment modes (`strict`, `expand`, `contract`) and better handles partial doc matching (201).
- `on_ent_only=False/True` is now supported again in qualifier pipes (e.g., "eds.negation", "eds.hypothesis", ...)

Page 3 of 8

Releases

Has known vulnerabilities

Previous Next

Edsnlp

Page 3 of 8

0.10.6

0.10.5

0.10.4

0.10.3

0.10.2

0.10.1

Page 3 of 8

Links

Releases