Changed
- tokenizer logic:
- a token is now a sequence of alphanumeric characters, a single newline, or a single special character.
- whitespaces are no longer considered tokens
- moved token pattern logic to config, using a new `TokenPatternAnnotator`
- moved context pattern logic to config, using a new `ContextAnnotator`
- many updates to name detection logic
- lookup list optimizations
- added, removed and simplified patterns