Extruct

Latest version: v0.18.0

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

0.7.2

-------------------

* Cover all possible exception cases dealt by ``extruct()`` ``errors``
attribute for values ``strict``, ``log`` and ``ignore``
* avoid including ``itemprop`` from child ``itemscope`` when using
``itemref`` for microdata
* proper processing order for ``itemref`` for microdata

0.7.1

-------------------

* json-ld parsing issue is fixed;
* deprecation warning for ``url`` argument points to caller code;
* better Python 3.7 support (fixed warnings, setup running 3.7 tests on CI).

0.7.0

-------------------

In this release OpenGraph parsing is improved:

* known OpenGraph namespaces (og, music, video,
article, book, profile) work without an explicitly defined prefix;
* prefix is extracted both from ``<head>`` and ``<html>`` element attributes,
not only from ``<head>``;
* prefix parsing is more permissive.

Other changes:

* pypi version badge is added to the README;
* html parsing code is cleaned up.

0.6.0

-------------------

* JSON-LD parsing is less strict now: control characters are allowed.

0.5.0

-------------------

* Add OpenGraph and Microformat extractors.
* Add argument ``syntaxes`` to ``extract`` and command line function, it allows to
select which syntaxes to extract.
* Add argument ``uniform`` to ``extract`` and command line function, if True it maps
the output of Microdata, OpenGraph, Microformat and Json-ld to the same template.
* Add argument ``errors`` to ``extract`` and command line function, it allows to
define if errors should be raised, logged or ignored.
* Fix RDFa memory leak, now RDfaExtractor resets ``_lookups`` after each
extraction.
* Fixed regex pattern in ``JsonLdExtractor`` to avoid removing comments from
within valid JSON.
* In ``w3microdata`` strip whitespaces, newlines, etc from urls extracted from
html nodes.
* ``base_url`` substitutes ``url`` in ``MicroformatExtractor``, ``JsonLdExtractor``,
``OpenGraphExtractor``, ``RDFaExtractor`` and ``MicrodataExtractor``
* individual extractors accept ``base_url`` instead of ``url``, unused keyword
arguments are removed.
* In ``w3microdata.extract_items`` ``items_seen`` and ``url`` are no longer
class variables but are passed as arguments.
* In ``w3microdata`` the following functions are now private:
``extract_item``, ``extract_property_value``, ``extract_textContent``,
``_extract_property``, ``_extract_properties``, ``_extract_property_refs``
and ``_extract_textContent``.
* In ``w3microdata`` ``_extract_properties``, ``_extract_property_refs``,
``_extract_property``, ``_extract_property_value`` and ``_extract_item``
now need ``items_seen`` and ``url`` to be passed as arguments.
* Add argument ``return_html_node`` to ``extract``, it allows to return HTML
node with the result of metadata extraction. It is supported only by
microdata syntax.

Warning: backward-incompatible change:

* ``base_url`` is used instead of ``url`` in ``extruct.extract``, ``url`` is
still supported by deprecated.
* In ``extruct.extract`` default ``base_url`` is now ``None`` to avoid wrong
results with ``urljoin``.

0.4.0

-------------------

* New ``extruct`` command line tool to fetch a page and extract its metadata.
Works either via ``extruct`` directly or ``python -m extruct``.
* Accept leading HTML comment in JSON-LD payload.
* rdflib log messages were silenced to avoid the noise when importing extruct.

Page 3 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.