Amazon-textract-textractor

Latest version: v1.8.5

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

1.8.5

What's Changed

- Fix bug in convert that caused an exception on empty pages.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.8.4...v1.8.5

1.8.4

What's Changed

* Add check for None bounding boxes for AnalyzeExpense by Belval
* Allow Custom Separator in `Document.export_kv_to_csv()` by Chuukwudi
* Update analyze_document type hint by ryangamble
* Fix invalid escape in BoundingBox docstring by simonschmidt in https://github.com/aws-samples/amazon-textract-textractor/pull/395

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.8.3...v1.8.4

1.8.3

What's Changed
* Id in html output by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/386
* Escape html output by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/387
* Fix table indexing returning too many cells

:warning: Breaking changes

* To support ids in HTML, layout Table created for TABLE predictions will no longer share the same ID as the table.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.8.2...v1.8.3

1.8.2

What's Changed
* Fix pypdfium2 failing to parse PDFs in bytearray format by Belval

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.8.1...v1.8.2

1.8.1

What's Changed
* Fix .to_markdown() raising an exception on missing local config by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/381


**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.8.0...v1.8.1

1.8.0

What's Changed
- Improve HTML linearization
- Add HTML table linearization format that uses merged cells information for `colspan` and `rowspan`
- Add prefix and suffix for `LAYOUT_FOOTER` and `LAYOUT_ENTITY`
- Add `<html><body>...</body></html>` to the output when calling `Document.to_html()`
- Use `pypdfium2` for PDF rasterization when available instead of `pdf2image`. This allows for better portability as the former does not have a dependency on OS libraries and should work out of the box with Lambda and SageMaker.
- Fix expenses with no summary fields
- Replace region mismatch with invalid S3 object exception

Backward-incompatible changes
* This update removes `s3_output_path` from the synchronous functions as `s3_output_path` is not a supported parameter for the Textract Synchronous API
* This update changes the exception raised by the `textractor.py` functions which will no longer raise `RegionMismatchError` (which is however kept in `textractor.exceptions` for backward compatibility.
* This update removes `confidence_score` from `KeyValue` entities in favour of `_confidence` which is used for all other entities.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.12...v1.8.0

Page 1 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.