Tokenization only (without sentence segmentation) models trained on all trainable corpora from UD 2.8 collection (67 languages), 8 models for each corpus. Deeplima commit: https://github.com/aymara/lima/commit/40ab9ad6e53fde56daf8b357ea0bc078d0545968 LSTM hidden state: 8