Pdftext

Latest version: v0.3.18

Safety actively analyzes 682404 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 4

0.3.0

- Fix bug where hyphens didn't show up at the end of lines
- Improve wrapping for hyphens - join words across hyphens before newline (disable by passing `keep_hyphens`)
- Restructure output to avoid redundant info in json blob - keep track of text spans with similar font info instead of individual characters
- Update model to predict blocks more accurately

0.2.1

- Switch the character box to a `loose` box, to get the full character range

0.2.0

- Rotate bboxes if pdf is rotated

0.1.2

- Optimize some internal routines
- Improve the model further

0.1.1

- Added a few extra line-related features
- Improved accuracy of the model

0.1.0

Initial version of pdftext. Fast text extraction based on pypdfium2.

- Extract plain text, sorted into reading order or in pdf order
- Extract structured blocks and lines with font and other information per-character

Page 4 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.