Added
- speed optimizations, ~250%
- pseudo-annotating eponymous diseases (e.g. Creutzfeldt-Jakob)
- `PatientNameAnnotator`, which replaces `deduce.pattern`
- a structured way for loading and building lookup structures (lists and tries), including caching
- `pre_match_words` for some regexp annotators, speeding up the annotating
- option to present a user config as dict (using `config` keyword)
Changed
- speedup for `TokenPatternAnnotator`
- some internals of `ContextPatternAnnotator`
- initials now detected by lookup list, rather than pattern
- redactor open and close chars from `<` `>` to `[` `]`, as previous chars caused issues in html (so deidentified text now shows `[PATIENT]`, `[LOCATIE]`, etc.)
- names of lookup structures to singular (`prefix`, rather than `prefixes`)
- `INSTELLING` tag to `ZIEKENHUIS` and `ZORGINSTELLING`
- refactored and simplified annotator loading, specifically the `annotator_type` config keyword now accepts references to classes (e.g `deduce.annotator.TokenPatternAnnotator`)
- renamed `interfix_with_capital` annotator to `interfix_with_name`
Deprecated
- the `config_file` keyword, now replaced by `config` which accepts both filenames and dicts
- old lookup list names, e.g. `prefixes` now replaced by `prefix`
- annotator types `custom`, `regexp`, `token_pattern`, `dd_token_pattern` and `annotation_context`, all replaced by setting class directly as `annotator_type`
- everything in `deduce.pattern`, patient patterns now replaced by `PatientNameAnnotator`
Removed
- automated coverage reporting on coveralls.io
- options `lowercase_lookup`, `lowercase_neg_lookup` for token patterns
- `utils.any_in_text`
Fixed
- some small additions/removals for specific lookup lists
- smaller bugs related to overlapping matches