Unstructured-inference

Latest version: v0.8.1

Safety actively analyzes 682441 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 18

0.6.2

* move the confidence threshold for table transformer to config

0.6.1

* YoloX_quantized is now the default model. This models detects most diverse types and detect tables better than previous model.
* Since detection models tend to nest elements inside others(specifically in Tables), an algorithm has been added for reducing this
behavior. Now all the elements produced by detection models are disjoint and they don't produce overlapping regions, which helps
reduce duplicated content.
* Add `source` property to our elements, so you can know where the information was generated (OCR or detection model)

0.6.0

* add a config class to handle parameter configurations for inference tasks; parameters in the config class can be set via environement variables
* update behavior of `pad_image_with_background_color` so that input `pad` is applied to all sides

0.5.31

* Add functionality to extract and save images from the page
* Add functionality to get only "true" embedded images when extracting elements from PDF pages
* Update the layout visualization script to be able to show only image elements if need
* add an evaluation metric for table comparison based on token similarity
* fix paddle unit tests where `make test` fails since paddle doesn't work on M1/M2 chip locally

0.5.28

* add env variable `ENTIRE_PAGE_OCR` to specify using paddle or tesseract on entire page OCR

0.5.27

* table structure detection now pads the input image by 25 pixels in all 4 directions to improve its recall

Page 9 of 18

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.