Unstructured

Latest version: v0.16.11

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 24 of 34

0.5.2

Enhancements

* Fully move from printing to logging.
* `unstructured-ingest` now uses a default `--download_dir` of `$HOME/.cache/unstructured/ingest`
rather than a "tmp-ingest-" dir in the working directory.

Features

Fixes

* `setup_ubuntu.sh` no longer fails in some contexts by interpreting
`DEBIAN_FRONTEND=noninteractive` as a command
* `unstructured-ingest` no longer re-downloads files when --preserve-downloads
is used without --download-dir.
* Fixed an issue that was causing text to be skipped in some HTML documents.

0.5.1

Enhancements

Features

Fixes

* Fixes an error causing JavaScript to appear in the output of `partition_html` sometimes.
* Fix several issues with the `requires_dependencies` decorator, including the error message
and how it was used, which had caused an error for `unstructured-ingest --github-url ...`.

0.5.0

Enhancements

* Add `requires_dependencies` Python decorator to check dependencies are installed before
instantiating a class or running a function

Features

* Added Wikipedia connector for ingest cli.

Fixes

* Fix `process_document` file cleaning on failure
* Fixes an error introduced in the metadata tracking commit that caused `NarrativeText`
and `FigureCaption` elements to be represented as `Text` in HTML documents.

0.4.16

Enhancements

* Fallback to using file extensions for filetype detection if `libmagic` is not present

Features

* Added setup script for Ubuntu
* Added GitHub connector for ingest cli.
* Added `partition_md` partitioner.
* Added Reddit connector for ingest cli.

Fixes

* Initializes connector properly in ingest.main::MainProcess
* Restricts version of unstructured-inference to avoid multithreading issue

0.4.15

Enhancements

* Added `elements_to_json` and `elements_from_json` for easier serialization/deserialization
* `convert_to_dict`, `dict_to_elements` and `convert_to_csv` are now aliases for functions
that use the ISD terminology.

Fixes

* Update to ensure all elements are preserved during serialization/deserialization

0.4.14

* Automatically install `nltk` models in the `tokenize` module.

Page 24 of 34

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.