Amazon-textract-textractor

Latest version: v1.9.0

Safety actively analyzes 715081 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 8

1.8.0

What's Changed
- Improve HTML linearization
- Add HTML table linearization format that uses merged cells information for `colspan` and `rowspan`
- Add prefix and suffix for `LAYOUT_FOOTER` and `LAYOUT_ENTITY`
- Add `<html><body>...</body></html>` to the output when calling `Document.to_html()`
- Use `pypdfium2` for PDF rasterization when available instead of `pdf2image`. This allows for better portability as the former does not have a dependency on OS libraries and should work out of the box with Lambda and SageMaker.
- Fix expenses with no summary fields
- Replace region mismatch with invalid S3 object exception

Backward-incompatible changes
* This update removes `s3_output_path` from the synchronous functions as `s3_output_path` is not a supported parameter for the Textract Synchronous API
* This update changes the exception raised by the `textractor.py` functions which will no longer raise `RegionMismatchError` (which is however kept in `textractor.exceptions` for backward compatibility.
* This update removes `confidence_score` from `KeyValue` entities in favour of `_confidence` which is used for all other entities.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.12...v1.8.0

1.7.12

What's Changed

* Fix issue where tables linearized to plaintext that contained merge cells would duplicate the text over the entire table.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.11...v1.7.12

1.7.11

What's Changed

* Add figure layout prefix and suffix by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/362
* Add confidence scores at the DocumentEntity level by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/363

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.10...v1.7.11

1.7.10

What's Changed

* Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
* Fix missing figure layouts

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.9...v1.7.10

1.7.9

What's Changed

* Set JPEG compression parameters by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/342


**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.8...v1.7.9

1.7.8

What's Changed

* Handle None Relationships when parsing LAYOUT_FIGURE

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.7...v1.7.8

Page 2 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.