Edspdf

Latest version: v0.10.0

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

0.8.0

Added

- Add multi-modal transformers (`huggingface-embedding`) with windowing options
- Add `render_page` option to `pdfminer` extractor, for multi-modal PDF features
- Add inference utilities (`accelerators`), with simple mono process support and multi gpu / cpu support
- Packaging utils (`pipeline.package(...)`) to make a pip installable package from a pipeline

Changed

- Updated API to follow EDS-NLP's refactoring
- Updated `confit` to 0.4.2 (better errors) and `foldedtensor` to 0.3.0 (better multiprocess support)
- Removed `pipeline.score`. You should use `pipeline.pipe`, a custom scorer and `pipeline.select_pipes` instead.
- Better test coverage
- Use `hatch` instead of `setuptools` to build the package / docs and run the tests

Fixed

- Fixed `attrs` dependency only being installed in dev mode

0.7.0

Major refactoring of the library:

Core features
- new pipeline system whose API is inspired by spaCy
- first-class support for pytorch
- hybrid model inference and training (rules + deep learning)
- moved from pandas DataFrame to attrs dataclasses (`PDFDoc`, `Page`, `Box`, ...) for representing PDF documents
- new configuration system based on [config][https://github.com/aphp/config], with support for instantiation of complex deep learning models, off-the-shelf CLI, ...

Functional features
- new extractors: pymupdf and poppler (separate packages for licensing reasons)
- many deep learning layers (box-transformer, 2d attention with relative position information, ...)
- trainable deep learning classifier
- training recipes for deep learning models

0.6.3

Fixed

- Allow corrupted PDF to not raise an error by default (they are treated as empty PDFs)
- Fix classification and aggregation for empty PDFs

0.6.2

Cast bytes-like extractor inputs as bytes

0.6.1

Performance and cuda related fixes.

0.6.0

Many, many changes:
- added torch as the main deep learning framework instead of spaCy and thinc :tada:
- added poppler and mupdf as alternatives to pdfminer
- new pipeline / config / registry system to facilitate consistency between training and inference
- standardization of the exchange format between components with dataclass models (attrs more specifically) instead of pandas dataframes

Page 2 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.