Amazon-textract-textractor

Latest version: v1.8.5

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

1.7.6

What's Changed
* Add CITATION.cff by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/332
* Add missing entities in docs by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/334
* Handle null EntityTypes by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/339


**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.5...v1.7.6

1.7.5

What's Changed
* Make KeyValue.key an EntityList by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/320
* Remove numpy from explicit dependencies by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/324
* Hide key value layouts by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/325
* Return query and query answer with get_text() by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/329
* Convert image to RGB in EntityList for Jupyter compatibility by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/330
* Support for Python 3.12 by tb102122 in https://github.com/aws-samples/amazon-textract-textractor/pull/311


**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.4...v1.7.5

1.7.4

What's Changed

* Fix table title .get_text() by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/314
* Fix .to_pandas() raising an exception by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/315


**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.3...v1.7.4

1.7.3

What's Changed

* Table linearization improvements by Belval in https://github.com/aws-samples/amazon-textract-textractor/pull/313
- Add `.get_text()`, `.to_html()` and `.to_markdown()` functions to `Linearizable` which is now implemented by `Document`, `Page`, `DocumentEntity` and `EntityList`
- Add `HTMLLinearizationConfig` and `MarkdownLinearizationConfig` as pre-configured `TextLinearizationConfig`
- Add the follow parameters to `TextLinearizationConfig`
- `duplicate_text_in_merged_cells` duplicates the text in merge cells to preserve row-level alignment
- `table_flatten_headers` combines multi-row headers into a single row, duplicating the merged cells horizontally as needed
- `table_tabulate_remove_extra_hyphens` removes extra hyphens '-' in markdown tables to reduce context length
- `max_number_of_consecutive_spaces` defines the maximum number of contiguous whitespace characters, similar to `max_number_of_consecutive_new_lines`

* Fixes:
- Fix trailing whitespace in cell text
- Fix `table_column_separator` being hardcoded as '\t'
- Fix `table_row_separator` being hardcoded as '\n'
- Resets BytesIO buffer to 0 position by abest0 in https://github.com/aws-samples/amazon-textract-textractor/pull/310

New Contributors
* abest0 made their first contribution in https://github.com/aws-samples/amazon-textract-textractor/pull/310

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.2...v1.7.3

1.7.2

What's Changed
* Fix for page objects not always having an image attached, causing an exception on `.visualize()`

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.1...v1.7.2

1.7.1

What's Changed

* Fix issue where a table within a container layout could be duplicated in the `.get_text()` output.

**Full Changelog**: https://github.com/aws-samples/amazon-textract-textractor/compare/v1.7.0...v1.7.1

Page 3 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.