- fixed issue where the parser produced non-CONLLU-compliant extension labels with underscores (e.g. `cc_preconj`) instead of colon-separated labels (e.g. `cc:preconj`)
- during lemmatization, if a token consists of a character that is not present in the seq2seq vocabulary, lemma will now be identical to the token
- added PUNCT control
- fixed MISC collumn bug for NER
- `punct` in Bulgarian UPOS was renamed to `Z`