Metadata-parser

Latest version: v0.12.2

Safety actively analyzes 710445 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 14

0.9.22

removed internal calls to the deprecated `get_metadata`, replacing them with `get_metadatas`.
this will avoid emitting a deprecation warning, allowing users to migrate more easily

0.9.21

* requests_toolbelt is now required
** this is to solve PR16 / Issue21
** the toolbelt and built-in versions of get_encodings_from_content required different workarounds
* the output of urlparse is now cached onto the parser instance.
** perhaps this will be global cache in the future
* MetadataParser now accepts `cached_urlparser`
** default: True
options: True: use a instance of UrlParserCacheable(maxitems=30)
: INT: use a instance of UrlParserCacheable(maxitems=cached_urlparser)
: None/False/0 - use native urlparse
: other truthy values - use as a custom urlparse

* addressing issue 17 (https://github.com/jvanasco/metadata_parser/issues/17) where `get_link_` logic does not handle schemeless urls.
** `MetadataParser.get_metadata_link` will now try to upgrade schemeless links (e.g. urls that start with "//")
** `MetadataParser.get_metadata_link` will now check values against `FIELDS_REQUIRE_HTTPS` in certain situations to see if the value is valid for http
** `MetadataParser.schemeless_fields_upgradeable` is a tuple of the fields which can be upgradeable. this defaults to a package definition, but can be changed on a per-parser bases.
The defaults are:
'image',
'og:image', 'og:image:url', 'og:audio', 'og:video',
'og:image:secure_url', 'og:audio:secure_url', 'og:video:secure_url',
** `MetadataParser.schemeless_fields_disallow` is a tuple of the fields which can not be upgradeable. this defaults to a package definition, but can be changed on a per-parser bases.
The defaults are:
'canonical',
'og:url',
** `MetadataParser.get_url_scheme()` is a new method to expose the scheme of the active url
** `MetadataParser.upgrade_schemeless_url()` is a new method to upgrade schemeless links
it accepts two arguments: url and field(optional)
if present, the field is checked against the package tuple FIELDS_REQUIRE_HTTPS to see if the value is valid for http
'og:image:secure_url',
'og:audio:secure_url',
'og:video:secure_url',

0.9.20

* support for deprecated `twitter:label` and `twitter:data` metatags, which use "value" instead of "content".
* new param to `__init__` and `parse`: `support_malformed` (default `None`).
if true, will support malformed parsing (such as consulting "value" instead of "content".
functionality extended from PR 13 (https://github.com/jvanasco/metadata_parser/pull/13) from https://github.com/amensouissi

0.9.19

* addressing https://github.com/jvanasco/metadata_parser/issues/12
on pages with duplicate metadata keys, additional elements are ignored
when parsing the document, duplicate data was not kept.
* `MetadataParser.get_metadata` will always return a single string (or none)
* `MetadataParser.get_metadatas` has been introduced. this will always return an array.
* the internal parsed_metadata store will now store data in a mix of arrays and strings, keeping it backwards compatible
* This new version benches slightly slower because of the mixed format but preserves a smaller footprint.
* the parsed result now contains a version record for tracking the format `_v`.
* standardized single/double quoting
* cleaned up some line
* the library will try to coerce strategy= arguments into the right format
* when getting dublin core data, the result could either be a string of a dict. there's no good way to handle this.
* added tests for encoders
* greatly expanded tests

0.9.18

* removed a stray debug line

0.9.17

* added `retry_dropped_without_headers` option

Page 3 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.