This release contains the packages that automate the cleaning, preprocessing, and feature engineering portions of data science workflows.
This release is for market research and feedback.
Release Notes:
Add Cleaning, Preprocessing and Feature Engineering techniques.
This library can be used in 2 ways, through the individual function files or for more functionality and modularity through the wrapper classes `Clean`, `Preprocess`, `Feature`.
General:
- Train/Test data splitting at the beginning of each process to avoid data leakage.
- Can start at any stage of the machine learning workflow by initiating the wrapper classes with data.
Cleaning:
- General
- Removing rows that have greater than a certain percentage of missing values.
- Removing columns that have greater that a certain percentage of missing values.
- Numeric
- Replace missing values with the Mean, Median or Mode of the column.
- Replace missing values with a constant.
- Categorical
- Remove row if a certain column is null.
- Replace missing value with a category.
Preprocessing:
- Numeric
- Normalize numeric values between 0 and 1.
Feature Engineering:
- Categorical
- One Hot Encoding
- Text
- TF-IDF
- Bag of Words