New features
- adds the random_classification method, useful for random text classification for testing model accuracy
- adds the replace_with_blob method, useful for creating blobs from a cospus for testing method accuracy
- adds the strings module, with a list of punctuation and diacritic strings
- adds a tokenizer to aruana
- adds a pos-tagger for portuguese
Improvements
- Expand words recognizes now more words for portuguese
- Preprocess text converts emojis to text instead of completely removing them
- Removes NLTK tokenizer and replaces it for an internal tokenizer