Changes
HTML parsing:
- new: improved model for handling text blocks and lines
- chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
- chg: improved whitespace handling
- add: cover more borderline cases with unit tests
Inscriptis core:
- new: annotation support
- new: processing of annotation rules and annotation output
- new: type hints
- add: extended and improved documentation
Inscript command line client:
- new: added `--annotation-rules` option for annotation support.
- new: added `--post-processor` option to export and visualize annotations (HTML, XML and surface form export)
- chg: apply `--encoding` to Web URLs as well
Misc:
- chg: migrated to the semantic versioning schema described on https://semver.org/ for versioning.
Note
In terms of functionality, this release corresponds to Inscriptis 2.0rc2.