Breaking change:
- metadata now skipped by default (613), to trigger inclusion in all output formats:
- `with_metadata=True` (Python)
- `--with-metadata` (CLI)
Extraction:
- add HTML as output format (614)
- better and faster baseline extraction (619)
- better handling of HTML/XML elements (628)
- XPath rules added with felipehertzer (540)
- fix: avoid faulty readability_lxml content (635)
Evaluation:
- new scripts and data with LydiaKoerber (606, 615)
- additional data with swetepete (197)
Maintenance:
- docs extended and updated, added page on deduplication (618)
- review code, add tests and types in part of the submodules (620, 623, 624, 625)