Unstructured

Latest version: v0.17.2

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 17 of 39

0.10.11

Not secure
Enhancements

* Bump unstructured-inference
* Combine entire-page OCR output with layout-detected elements, to ensure full coverage of the page (0.5.19)

Features

* Add in ingest cli s3 writer

Fixes

* Fix a bug where `xy-cut` sorting attemps to sort elements without valid coordinates; now xy cut sorting only works when **all** elements have valid coordinates

0.10.10

Not secure
Enhancements

* Adds `text` as an input parameter to `partition_xml`.
* `partition_xml` no longer runs through `partition_text`, avoiding incorrect splitting
on carriage returns in the XML. Since `partition_xml` no longer calls `partition_text`,
`min_partition` and `max_partition` are no longer supported in `partition_xml`.
* Bump `unstructured-inference==0.5.18`, change non-default detectron2 classification threshold
* Upgrade base image from rockylinux 8 to rockylinux 9
* Serialize IngestDocs to JSON when passing to subprocesses

Features

Fixes

- Fix a bug where mismatched `elements` and `bboxes` are passed into `add_pytesseract_bbox_to_elements`

0.10.9

Not secure
Enhancements

* Fix `test_json` to handle only non-extra dependencies file types (plain-text)

Features

* Adds `chunk_by_title` to break a document into sections based on the presence of `Title`
elements.
* add new extraction function `extract_image_urls_from_html` to extract all img related URL from html text.

Fixes

* Make cv2 dependency optional
* Edit `add_pytesseract_bbox_to_elements`'s (`ocr_only` strategy) `metadata.coordinates.points` return type to `Tuple` for consistency.
* Re-enable test-ingest-confluence-diff for ingest tests
* Fix syntax for ingest test check number of files
* Fix csv and tsv partitioners loosing the first line of the files when creating elements

0.10.8

Not secure
Enhancements

* Release docker image that installs Python 3.10 rather than 3.8

Features

Fixes

0.10.7

Not secure
Enhancements

Features

Fixes

* Remove overly aggressive ListItem chunking for images and PDF's which typically resulted in inchorent elements.

0.10.6

Not secure
Enhancements

* Enable `partition_email` and `partition_msg` to detect if an email is PGP encryped. If
and email is PGP encryped, the functions will return an empy list of elements and
emit a warning about the encrypted content.
* Add threaded Slack conversations into Slack connector output
* Add functionality to sort elements using `xy-cut` sorting approach in `partition_pdf` for `hi_res` and `fast` strategies
* Bump unstructured-inference
* Set OMP_THREAD_LIMIT to 1 if not set for better tesseract perf (0.5.17)

Features

* Extract coordinates from PDFs and images when using OCR only strategy and add to metadata

Fixes

* Update `partition_html` to respect the order of `<pre>` tags.
* Fix bug in `partition_pdf_or_image` where two partitions were called if `strategy == "ocr_only"`.
* Bump unstructured-inference
* Fix issue where temporary files were being left behind (0.5.16)
* Adds deprecation warning for the `file_filename` kwarg to `partition`, `partition_via_api`,
and `partition_multiple_via_api`.
* Fix documentation build workflow by pinning dependencies

Page 17 of 39

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.