* Fully move from printing to logging. * `unstructured-ingest` now uses a default `--download_dir` of `$HOME/.cache/unstructured/ingest` rather than a "tmp-ingest-" dir in the working directory.
Features
Fixes
* `setup_ubuntu.sh` no longer fails in some contexts by interpreting `DEBIAN_FRONTEND=noninteractive` as a command * `unstructured-ingest` no longer re-downloads files when --preserve-downloads is used without --download-dir. * Fixed an issue that was causing text to be skipped in some HTML documents.
0.5.1
Enhancements
Features
Fixes
* Fixes an error causing JavaScript to appear in the output of `partition_html` sometimes. * Fix several issues with the `requires_dependencies` decorator, including the error message and how it was used, which had caused an error for `unstructured-ingest --github-url ...`.
0.5.0
Enhancements
* Add `requires_dependencies` Python decorator to check dependencies are installed before instantiating a class or running a function
Features
* Added Wikipedia connector for ingest cli.
Fixes
* Fix `process_document` file cleaning on failure * Fixes an error introduced in the metadata tracking commit that caused `NarrativeText` and `FigureCaption` elements to be represented as `Text` in HTML documents.
0.4.16
Enhancements
* Fallback to using file extensions for filetype detection if `libmagic` is not present
Features
* Added setup script for Ubuntu * Added GitHub connector for ingest cli. * Added `partition_md` partitioner. * Added Reddit connector for ingest cli.
Fixes
* Initializes connector properly in ingest.main::MainProcess * Restricts version of unstructured-inference to avoid multithreading issue
0.4.15
Enhancements
* Added `elements_to_json` and `elements_from_json` for easier serialization/deserialization * `convert_to_dict`, `dict_to_elements` and `convert_to_csv` are now aliases for functions that use the ISD terminology.
Fixes
* Update to ensure all elements are preserved during serialization/deserialization
0.4.14
* Automatically install `nltk` models in the `tokenize` module.