Textract

Latest version: v1.6.5

Safety actively analyzes 688867 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

1.3.0

Not secure
-----

* support for ``.rtf`` files (`84`_)

* support for ``.msg`` files (`87`_ and `17`_ by `anthonygarvan`_)

1.2.0

Not secure
-----

* support for ``.tiff`` files (`81`_)

* added support for other languages for tesseract (`76`_ by `anderser`_)

* added ``--option/-O`` flag to pass arbitrary arguments for things like
languages into textract

* several bug fixes, including:

* fix bug with doing OCR on multi-page pdfs and removing temporary directory
(`82`_ by `pudo`_)

* correctly accounting for whitespace in ``.odt`` documents (`79`_
by `evfredericksen`_)

* standardizing testing environment to be compatible with different versions
of third-party command line tools (`78`_)

1.1.0

Not secure
-----

* support for ``.wav``, ``.mp3``, and ``.ogg`` files (`56`_ and
`62`_ by `arvindch`_)

* support for ``.csv`` files (`64`_)

* support for scanned ``.pdf`` files with tesseract (`66`_ by
`pudo`_)

* support for ``.htm`` files (`69`_)

* several bug fixes, including:

* ``.odt`` parser now correctly extracts text in order (`61`_ by
`levivm`_)

* fixed Docker development environment compatability with the
Vagrant VM environment (`73`_ by `ShawnMilo`_)

* several internal improvements, including:

* improvements in the python documentation (`70`_)

* improved html output with reduced whitespace around inline
elements in output text (`58`_ by `eiotec`_)

1.0.0

Not secure
-----

* **standardized encoding of output with** ``-e/--encoding`` **option**
(`39`_)

* support for ``.xls`` and ``.xlsx`` files (`42`_ and `55`_ by `levivm`_)

* support for ``.epub`` files (`40`_ by `kokxx`_)

* several bug fixes, including:

* removing tesseract version info from output of image parsers
(`48`_)

* problems with spaces in filenames (`53`_)

* concurrancy problems with tesseract (`44`_ by `ShawnMilo`_,
`41`_ by `christomitov`_)

* several internal improvements, including:

* switching to using class-based parsers to abstract away the common
functionality between different parser classes (`39`_)

* switching to using a python-based test suite and added
standardized text tests to make sure output is consistent across
file types (`49`_)

* including support for Docker-based testing (`46`_ by `ShawnMilo`_)

0.5.1

Not secure
-----

* several bug fixes, including:

* documentation fixes

* shell commands hanging on large files (`33`_)

0.5.0

Not secure
-----

* support for ``.json`` files (`13`_ by `anthonygarvan`_)

* support for ``.odt`` files (`29`_ by `christomitov`_)

* support for ``.ps`` files (`25`_)

* support for ``.gif``, ``.jpg``, ``.jpeg``, and ``.png`` files
(`30`_ by `christomitov`_)

* several bug fixes, including:

* improved fallback handling in ``.pdf`` parser if the ``pdftotext``
command line utility isn't installed (`26`_)

* improved documentation for installation instructions on non-Ubuntu
operating systems (`21`_, `26`_)

* several internal improvements, including:

* cleaned up implementation of extension parsers to avoid magic

Page 2 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.