Edspdf

Latest version: v0.10.0

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

0.10.0

Added

- Support packaging models made in setuptools based projects

Fixed

- Support packaging with poetry 2.0

Changed

- Handle cases like distant superscript "³ something" where the super script and the rest of the text are parsed are two lines one above the other, when they should be on the same line.

0.9.3

- Support pydantic v2

0.9.2

Changed

- Default to fp16 when inferring with gpu
- Support `inputs` parameter in `TrainablePipe.postprocess(...)` method (as in edsnlp)
- We now check that the user isn't trying to write a single file in a split fashion (when `write_in_worker is True ` or `num_rows_per_file is not None`) and raise an error if they do

Fixed

- Batches full of empty content boxes no longer crash the `huggingface-embedding` component
- Ensure models are always loaded in non training mode
- Improved performance of `edspdf.data` methods over a filesystem (`fs` parameter)

0.9.1

Fixed

- It is now possible to recursively retrieve pdf files in a directory using `edspdf.data.read_files`

0.9.0

Added

- New unified `edspdf.data` api (pdf files, pandas, parquet) and LazyCollection object
to efficiently read / write data from / to different formats & sources. This API is
has been heavily inspired by the `edsnlp.data` API.
- New unified processing API to select the execution backend via `data.set_processing(...)`
to replace the old `accelerators` API (which is now deprecated, but still available).
- `huggingface-embedding` now supports quantization and other `AutoModel.from_pretrained` kwargs
- It is now possible to add convert a label to multiple labels in the `simple-aggregator` component :

ini
To build the "text" field, we will aggregate "title", "body" and "table" lines,
and output "title" lines in a separate field as well.
label_map = {
"text" : [ "title", "body", "table" ],
"title": "title",
}


Fixed

- `huggingface-embedding` now resize bbox features for large PDFs, instead of making the model crash
- `huggingface-embedding` and `sub-box-cnn-pooler` now handle empty PDFs correctly

0.8.1

Fixed

- Fix typing to allow passing an accelerator dict to `Pipeline.pipe(...)`
- Removed multiprocessing accelerator debug output
- Fixed absolute links in github-pages docs (e.g. image assets)

Changed

- Added auto-links to components in the docs (by comparing span contents with entry points)

Page 1 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.