Added
- New fuzzy-rule for cheatography.com (342), der-postillon.com (330), iranwire.com (363)
- Properly rewrite redirect target url when present in <meta> HTML tag (237)
- New `--encoding-aliases` argument to pass encoding/charset aliases (331)
- Add support for SVG favicon (148)
- Automatically index PDF content and use PDF title (289 and 290)
Changed
- Upgrade to python-scraperlib 4.0.0
- Generate fuzzy rules tests in Python and Javascript (284)
- Refactor HTML rewriter class to make it more open to change and expressive (305)
- Detect charset in document header only for HTML documents (331)
- Use `software` property from `warcinfo` record to set ZIM `Scraper` metadata (357)
- Store `ContentDate` as metadata, based on `WARC-Date` (358)
- Remove domain specific rules (328)
- Revisit retrieve_illustration logic to prefer best favicons (352 and 369)
- Upgrade dependencies (zimscraperlib 4.0.0, wombat.js 3.7.12 and others) (376)
Fixed
- Handle case where the redirect target is bad / unsupported (332 and 356)
- Fixed WARC files handling order to follow creation order (366)
- Remove subsequent slashes in URLs, both in Python and JS (365)
- Ignore non HTTP(S) WARC records (351)
- Fix `vimeo_cdn_fix` fuzzy rule for proper operation in Javascript (348)
- Performance issue linked to new "extensible" HTML rewriting rules (370)