- added pymorphy2 lemmatization (81)
- added token frequency support (85)
- added threshold selection for binary classification (86)
- added arbitrary save folder name (83)
---
pymorphy2 lemmatization ([config.yaml](https://github.com/dayyass/text-classification-baseline#config))
yaml
preprocessing
(included in resulting model pipeline, so preserved for inference)
preprocessing:
lemmatization: pymorphy2
token frequency support
- `text_clf.token_frequency.get_token_frequency(path_to_config)` - <br> get token frequency of **train dataset** according to the config file parameters
threshold selection for binary classification
- `text_clf.pr_roc_curve.get_precision_recall_curve(path_to_model_folder)` - <br> get *precision* and *recall* metrics for precision-recall curve
- `text_clf.pr_roc_curve.get_roc_curve(path_to_model_folder)` - <br> get *false positive rate (fpr)* and *true positive rate (tpr)* metrics for roc curve
- `text_clf.pr_roc_curve.plot_precision_recall_curve(precision, recall)` - <br> plot *precision-recall curve*
- `text_clf.pr_roc_curve.plot_roc_curve(fpr, tpr)` - <br> plot *roc curve*
- `text_clf.pr_roc_curve.plot_precision_recall_f1_curves_for_thresholds(precision, recall, thresholds)` - <br> plot *precision*, *recall*, *f1-score* curves for probability thresholds
arbitrary save folder name ([config.yaml](https://github.com/dayyass/text-classification-baseline#config))
yaml
experiment_name: model