================
Breaking changes:
- the scale for weights for terms is changed from 0..10 to -10..10. See docs for the meaning.
- Elasticsearch indexes should be recreated to allow returning one result per domain
Fixes:
- small fixes to various command line tools
- the detection of visible HTML elements is much better now. Previously, some extraction results included non-visible blocks that were mixed with visible blocks.
- better detection of creation datetimes of pages
New features:
- a planet can share specific domains - planet_config.entity_source can specify a domain_list
- better tokenization for emails
- debug page for extraction results shows virtual blocks (e.g. with kind='title')
- term_dens algorithm excludes common words and punctuations
- weights can be assigned to individual URLs and use the `*` and `**` characters for pattern matching
- include a score for an URL length (the number of path components)
- the `kconfig.py` file can be put in `~/.kconfig.py`
- the `Index.search` method accepts the parameter `one_per_domain` which makes returning only one result for each domain