This assures proper tokenization of abbreviations, e.g. tokenizing "To je djelo dr. Ljubešića" as a single sentence instead of splitting it in two sentences on the full stop.
2.1
- Added new models for all languages - Added new "web" processing type - Fixed sentence splitting in the tokenizers
2.0
- Added new models for standard Slovenian - Added new inflectional lexicon for Slovenian - Adapted tests to new model outputs - Modified lexicon to store underscores instead of empty strings - Other changes
1.2.0
- Added SRL parsing to Slovenian language - Fixed training for lemmatizer and pos tagger - Added toy tests for all trainings - Other smaller fixes
1.1.1
- Updated external package version requirements. Mainly due to updates in Slovenian obeliks tokenizer
1.1.0
- Added tokenizer pretag option for both obeliks and reldi-tokeniser (via `pos_lemma_pretag`) - Updated Slovene inflectional lexicon and moved from lemmatizer model to morphosyntactic annotation model - Added upos and ufeats control to Slovene inflectional lexicon - Other smaller fixes