Warc2zim

Latest version: v2.1.3

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

2.1.3

Changed

- Upgrade to wombat 3.8.3 (414)

2.1.2

Added

- Enrich test website with img srcset situations (in preparation for 403)

Changed

- Upgrade dependencies, including wombat 3.8.2 (407)

Fixed

- HTML document can be retrieved as `fetch` resource type (405)

2.1.1

Changed

- Upgrade dependencies, including wombat 3.8.0 (386)

2.1.0

Added

- New fuzzy-rule for cheatography.com (342), der-postillon.com (330), iranwire.com (363)
- Properly rewrite redirect target url when present in <meta> HTML tag (237)
- New `--encoding-aliases` argument to pass encoding/charset aliases (331)
- Add support for SVG favicon (148)
- Automatically index PDF content and use PDF title (289 and 290)

Changed

- Upgrade to python-scraperlib 4.0.0
- Generate fuzzy rules tests in Python and Javascript (284)
- Refactor HTML rewriter class to make it more open to change and expressive (305)
- Detect charset in document header only for HTML documents (331)
- Use `software` property from `warcinfo` record to set ZIM `Scraper` metadata (357)
- Store `ContentDate` as metadata, based on `WARC-Date` (358)
- Remove domain specific rules (328)
- Revisit retrieve_illustration logic to prefer best favicons (352 and 369)
- Upgrade dependencies (zimscraperlib 4.0.0, wombat.js 3.7.12 and others) (376)

 Fixed

- Handle case where the redirect target is bad / unsupported (332 and 356)
- Fixed WARC files handling order to follow creation order (366)
- Remove subsequent slashes in URLs, both in Python and JS (365)
- Ignore non HTTP(S) WARC records (351)
- Fix `vimeo_cdn_fix` fuzzy rule for proper operation in Javascript (348)
- Performance issue linked to new "extensible" HTML rewriting rules (370)

2.0.3

Changed

- Moved rules definition from JSON to YAML and documented update process (216)
- Upgrade to wombat.js 3.7.11

 Added

- Exit with cleaner message when no entries are expected in the ZIM (336) and when main entry is not processable (337)
- Add debug log for items whose content is empty (344)

Fixed

- Some resources rewrite mode are still not correctly identified (326)

2.0.2

Added

- Add `--ignore-content-header-charsets` option to disable automatic retrieval of content charsets from content first bytes (318)
- Add `--content-header-bytes-length` option to specify how many first bytes to consider when searching for content charsets in header (320)
- Add `--ignore-http-header-charsets` option to disable automatic retrieval of content charsets from content HTTP `Content-Type` headers (318)

Changed

- Simplify logic deciding content charset, stop guessing with chardet (312)

Fixed

- Rewrite only content with mimetype `text-html` when `WARC-Resource-Type` is `html` (313)

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.