Product Research Enterprise Plans Docs

Textract

Latest version: v1.6.5

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.6.5

-------------------

* switched epub parsing to MIT license compatible package (`411`_ by
`jhale1805`_)

1.6.4

-------------------

* several bug fixes, including:

* fixing dependency declarations (`162`_ by `lillypad`_)

1.6.1

-------------------

* several bug fixes, including:

* fixing the readthedocs build (`150`_)

1.6.0

-------------------

* Let the user provide file extension as an argument when the file name has no
extension (`148`_ by `motazsaad`_)

* Added ability to parse audio with ``pocketsphinx`` (`122`_ by `barrust`_)

* Added ability to parse ``.psv`` and ``.tsv`` files (`141`_)

* several bug fixes, including:

* checking for the importability of a parser rather than the presense of the
file (`136`_ by `AusIV`_)

* manage versions with `bumpversion <https://pypi.python.org/pypi/bumpversion>`_
(`146`_)

* properly reporting on missing external dependencies (`139`_ by `AusIV`_)

* pin `chardet` to version 2.1.1 to avoid decode errors (`107`_)

* avoid unicode decode error with html parser (`147`_ by `suned`_)

* enabling autocomplete and improving error handling (`149`_)

1.5.0

-----

* Added python 3 support, including pdfminer (`104`_ by `sirex`_ via `126`_)

* Python 3 support for ``pdfminer`` using ``pdfminer.six`` (`116`_ by
`jaraco`_ via `126`_)

* fixed security vulnerability by properly using subprocess.call (`114`_ by
`pierre-ernst`_)

* updating to ``tesseract`` 3.03 (`127`_)

* adding a ``.tif`` synonym for ``.tiff`` files (`113`_ by `onionradish`_)

* improved ``.docx`` support using ``docx2txt`` (`100`_ by `ankushshah89`_)

* several bug fixes, including:

* including all requirements for ``Pillow`` (`119`_ by `akoumjian`_)

1.4.0

Not secure

-----

* added layout preservation option for pdftotext pdf extractor (`93`_ by
`ankushshah89`_)

* added simple support for extensionless filenames, treating them as plain
``.txt`` files (`85`_)

* several bug fixes, including:

* now extracting the text in tables from docx files at the end of the text
extraction (`92`_ by `jsmith-mploir`_)

* faster testing framework by only rebuilding test data when needed (`90`_)

* fixed ``.html`` and ``.epub`` parsers to deal with beautifulsoup4
upgrades

* using official ``msg-extractor`` now that it has a native ``setup.py``

* updated tests for ``.html``, ``.ogg``, ``.wav``, and ``.mp3`` file types to
be consistent with more recent versions of the underlying packages.

Page 1 of 3

Releases

Has known vulnerabilities

Textract

Page 1 of 3

1.6.5

1.6.4

1.6.1

1.6.0

1.5.0

1.4.0

Page 1 of 3

Links

Releases