Unstructured

Latest version: v0.17.2

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Page 26 of 39

0.5.10

Not secure

Enhancements

* Updated inference package
* Add sender, recipient, date, and subject to element metadata for emails

Features

* Added `--download-only` parameter to `unstructured-ingest`

Fixes

* FileNotFound error when filename is provided but file is not on disk

0.5.9

Not secure

Enhancements

Features

Fixes

* Convert file to str in helper `split_by_paragraph` for `partition_text`

0.5.8

Not secure

Enhancements

* Update `elements_to_json` to return string when filename is not specified
* `elements_from_json` may take a string instead of a filename with the `text` kwarg
* `detect_filetype` now does a final fallback to file extension.
* Empty tags are now skipped during the depth check for HTML processing.

Features

* Add local file system to `unstructured-ingest`
* Add `--max-docs` parameter to `unstructured-ingest`
* Added `partition_msg` for processing MSFT Outlook .msg files.

Fixes

* `convert_file_to_text` now passes through the `source_format` and `target_format` kwargs.
Previously they were hard coded.
* Partitioning functions that accept a `text` kwarg no longer raise an error if an empty
string is passed (and empty list of elements is returned instead).
* `partition_json` no longer fails if the input is an empty list.
* Fixed bug in `chunk_by_attention_window` that caused the last word in segments to be cut-off
in some cases.

BREAKING CHANGES

* `stage_for_transformers` now returns a list of elements, making it consistent with other
staging bricks

0.5.7

Not secure

Enhancements

* Refactored codebase using `exactly_one`
* Adds ability to pass headers when passing a url in partition_html()
* Added optional `content_type` and `file_filename` parameters to `partition()` to bypass file detection

Features

* Add `--flatten-metadata` parameter to `unstructured-ingest`
* Add `--fields-include` parameter to `unstructured-ingest`

Fixes

0.5.6

Not secure

Enhancements

* `contains_english_word()`, used heavily in text processing, is 10x faster.

Features

* Add `--metadata-include` and `--metadata-exclude` parameters to `unstructured-ingest`
* Add `clean_non_ascii_chars` to remove non-ascii characters from unicode string

Fixes

* Fix problem with PDF partition (duplicated test)

0.5.5

* **Improve orig_elements handling in astra and neo4j connectors**

Page 26 of 39

Releases

Has known vulnerabilities

Previous Next

Unstructured

Page 26 of 39

0.5.10

0.5.9

0.5.8

0.5.7

0.5.6

0.5.5

Page 26 of 39

Links

Releases