Html-text

Latest version: v0.7.0

Safety actively analyzes 723166 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

0.5.0

------------------

Parsel dependency is removed in this release,
though parsel is still supported.

* ``parsel`` package is no longer required to install and use html-text;
* ``html_text.etree_to_text`` function allows to extract text from
lxml Elements;
* ``html_text.cleaner`` is an ``lxml.html.clean.Cleaner`` instance with
options tuned for text extraction speed and quality;
* test and documentation improvements;
* Python 3.7 support.

0.4.1

------------------

Fixed a regression in 0.4.0 release: text was empty when
``html_text.extract_text`` is called with a node with text, but
without children.

0.4.0

------------------

This is a backwards-incompatible release: by default html_text functions
now add newlines after elements, if appropriate, to make the extracted text
to look more like how it is rendered in a browser.

To turn it off, pass ``guess_layout=False`` option to html_text functions.

* ``guess_layout`` option to to make extracted text look more like how
it is rendered in browser.
* Add tests of layout extraction for real webpages.

0.3.0

------------------

* Expose functions that operate on selectors,
use ``.//text()`` to extract text from selector.

0.2.1

------------------

* Packaging fix (include CHANGES.rst)

0.2.0

------------------

* Fix unwanted joins of words with inline tags: spaces are added for inline
tags too, but a heuristic is used to preserve punctuation without extra spaces.
* Accept parsed html trees.

Page 2 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.