Textract-py3

Latest version: v2.1.1

Safety actively analyzes 723911 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

1.1.0

-----

* support for ``.wav``, ``.mp3``, and ``.ogg`` files (`56`_ and
`62`_ by `arvindch`_)

* support for ``.csv`` files (`64`_)

* support for scanned ``.pdf`` files with tesseract (`66`_ by
`pudo`_)

* support for ``.htm`` files (`69`_)

* several bug fixes, including:

* ``.odt`` parser now correctly extracts text in order (`61`_ by
`levivm`_)

* fixed Docker development environment compatability with the
Vagrant VM environment (`73`_ by `ShawnMilo`_)

* several internal improvements, including:

* improvements in the python documentation (`70`_)

* improved html output with reduced whitespace around inline
elements in output text (`58`_ by `eiotec`_)

1.0.0

-----

* **standardized encoding of output with** ``-e/--encoding`` **option**
(`39`_)

* support for ``.xls`` and ``.xlsx`` files (`42`_ and `55`_ by `levivm`_)

* support for ``.epub`` files (`40`_ by `kokxx`_)

* several bug fixes, including:

* removing tesseract version info from output of image parsers
(`48`_)

* problems with spaces in filenames (`53`_)

* concurrancy problems with tesseract (`44`_ by `ShawnMilo`_,
`41`_ by `christomitov`_)

* several internal improvements, including:

* switching to using class-based parsers to abstract away the common
functionality between different parser classes (`39`_)

* switching to using a python-based test suite and added
standardized text tests to make sure output is consistent across
file types (`49`_)

* including support for Docker-based testing (`46`_ by `ShawnMilo`_)

0.5.1

-----

* several bug fixes, including:

* documentation fixes

* shell commands hanging on large files (`33`_)

0.5.0

-----

* support for ``.json`` files (`13`_ by `anthonygarvan`_)

* support for ``.odt`` files (`29`_ by `christomitov`_)

* support for ``.ps`` files (`25`_)

* support for ``.gif``, ``.jpg``, ``.jpeg``, and ``.png`` files
(`30`_ by `christomitov`_)

* several bug fixes, including:

* improved fallback handling in ``.pdf`` parser if the ``pdftotext``
command line utility isn't installed (`26`_)

* improved documentation for installation instructions on non-Ubuntu
operating systems (`21`_, `26`_)

* several internal improvements, including:

* cleaned up implementation of extension parsers to avoid magic

0.4.0

-----

* support for ``.html`` files (`7`_)

* support for ``.eml`` files (`4`_)

* automated the documentation for the python package using
sphinx-apidoc in docs/Makefile (`9`_)

0.3.0

-----

* support for ``.txt`` files, haha (`8`_)

* fixed installation bug with not properly including requirements
files in the manifest

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.