Unstructured-inference

Latest version: v0.8.10

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 19

0.6.5

* Add functionality to keep extracted image elements while merging inferred layout with extracted layout
* Fix `source` property for elements generated by pdfminer.
* Add 'OCR-tesseract' and 'OCR-paddle' as sources for elements generated by OCR.

0.6.4

* add a function to automatically scale table crop images based on text height so the text height is optimum for `tesseract` OCR task
* add the new image auto scaling parameters to `config.py`

0.6.3

* fix a bug where padded table structure bounding boxes are not shifted back into the original image coordinates correctly

0.6.2

* move the confidence threshold for table transformer to config

0.6.1

* YoloX_quantized is now the default model. This models detects most diverse types and detect tables better than previous model.
* Since detection models tend to nest elements inside others(specifically in Tables), an algorithm has been added for reducing this
behavior. Now all the elements produced by detection models are disjoint and they don't produce overlapping regions, which helps
reduce duplicated content.
* Add `source` property to our elements, so you can know where the information was generated (OCR or detection model)

0.6.0

* add a config class to handle parameter configurations for inference tasks; parameters in the config class can be set via environement variables
* update behavior of `pad_image_with_background_color` so that input `pad` is applied to all sides

Page 10 of 19

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.