Malaya

Latest version: v5.1.1

Safety actively analyzes 663899 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

1.3

1. Release pretrained Bahasa Malaysia using wikipedia dataset, simply, `malaya.word2vec.load_wiki()`, https://malaya.readthedocs.io/en/latest/Word2vec.html

2. Retrained summarization model based on news dataset, simply `malaya.summarize.deep_model_news()`

3. Release pretrained summarization model based on wikipedia dataset, simply `malaya.summarize.deep_model_wiki()`

4. Provide interface to train word2vec on custom dataset, simply `malaya.word2vec.train()`, https://malaya.readthedocs.io/en/latest/Word2vec.html#train-on-custom-corpus

5. Provide interface to train skip-thought on custom dataset for summarization agent, simply `malaya.summarize.train_skip_thought()`, https://malaya.readthedocs.io/en/latest/Summarization.html#train-skip-thought-summarization-deep-learning-model

1.2

1. Released emotion analysis, https://malaya.readthedocs.io/en/latest/Emotion.html
2. Added sparse `fast-text-char` deep learning model for sentiment, emotion, and subjectivity analysis.

Sparse deep learning models

What happen if a word not included in the dictionary of the models? like setan, what if setan appeared in text we want to classify? We found this problem when classifying social media texts / posts. Words used not really a vocabulary-based contextual.

Malaya will treat unknown words as `<UNK>`, so, to solve this problem, we need to use N-grams character based. Malaya chose tri-grams until fifth-grams.

setan = ['set', 'eta', 'tan']
Sklearn provided easy interface to use n-grams, problem is, it is very sparse, a lot of zeros and not memory efficient. Sklearn returned sparse matrix for the result, lucky Tensorflow already provided some sparse function.

simply call, `malaya.sentiment.sparse_deep_model()`, `malaya.subjective.sparse_deep_model()`, `malaya.emotion.sparse_deep_model()`

1.1

1. Added deep learning model for language detection, simply call `malaya.language_detection.deep_model()`.
2. Retrained language detection models.

1.0

1. Major housekeeping, old APIs totally replaced by new APIs.
2. Added subjectivity analysis, https://malaya.readthedocs.io/en/latest/Subjective.html.
3. Added stacking module, https://malaya.readthedocs.io/en/latest/Stack.html.
4. Added clustering module, https://malaya.readthedocs.io/en/latest/Cluster.html
5. Added visualization for word2vec, https://malaya.readthedocs.io/en/latest/Word2vec.html
6. Build systematic caching system, https://malaya.readthedocs.io/en/latest/Cache.html

0.043

0.9

1. Added LDA2Vec model for topic modelling.
2. Now can visualize topic-modelling models using [pyLDAvis](https://github.com/bmabey/pyLDAvis), by simply `model.visualize_topics()`
3. No longer depends on NLTK.
4. Added stochastic gradient descent model for language detection, simply `malaya.sgd_detect_languages()`
5. Retrain language detection models.

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.