Pdfplumber

Latest version: v0.11.4

Safety actively analyzes 682416 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 10

0.5.3

Fixed
- Allow `import pdfplumber` even if ImageMagick not installed.

0.5.2

Added
- Access to `curve` points. (E.g., `page.curves[0]["points"]`.)
- Ability for `.draw_line` to draw `curve` points.

Changed
- Disaggregated "min_words_vertical" (default: 3) and "min_words_horizontal" (default: 1), removing "text_word_threshold".
- Internally, made `utils.decimalize` a bit more robust; now throws errors on non-decimalizable items.
- Now explicitly ignoring some (obscure) `pdfminer` object attributes.
- Raw input for `.draw_line` from a bounding box to `((x, y), (x, y))`, for consistency with `curve["points"]` and with `Pillow`'s underlying method.

Fixed
- Fixed typo bug when `.rect_edges` is called before `.edges`

0.5.1

Added
- Quick-draw `PageImage` methods: `.draw_vline`, `.draw_vlines`, `.draw_hline`, and `.draw_hlines`.
- Boolean parameter `keep_blank_chars` for `.extract_words(...)` and `TableFinder` settings.

Changed
- Increased default `text_tolerance` and `intersection_tolerance` TableFinder values from 1 to 3.

Fixed
- Properly handle conversion of PDFs with transparency to `pillow` images.
- Properly handle `pandas` DataFrames as inputs to multi-draw commands (e.g., `PageImage.draw_rects(...)`).

0.5.0

Added
- Visual debugging features, via `Page.to_image(...)` and `PageImage`. (Introduces `wand` and `pillow` as package requirements.)
- More powerful options for extracting data from tables. See changes below.

Changed
- Entirely overhaul the table-extraction methods. Now based on [Anssi Nurminen's master's thesis](http://dspace.cc.tut.fi/dpub/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3).
- Disentangle `.crop` from `.intersects_bbox` and `.within_bbox`.
- Change default `x_tolerance` and `y_tolerance` for word extraction from `5` to `3`

Fixed
- Fix bug stemming from non-decimalized page heights. [h/t jsfenfen]

0.4.6

Added
- Provide access to `Page.page_number`

Changed
- Use `.page_number` instead of `.page_id` as primary identifier. [h/t jsfenfen]
- Change default `x_tolerance` and `y_tolerance` for word extraction from `0` to `5`

Fixed
- Provide proper support for rotated pages

0.4.5

Fixed
- Fix bug stemming from when metadata includes a PostScript literal. [h/t boblannon]

Page 9 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.