aethos Changelog

0.2.1

Release Notes
---------------
- When assigning a new column to the method objects, it will now look at the length of the new column data and determine which dataset to apply it to for train and test data.
- Added drop functionality to make it easier to drop columns from your datasets
- Extended apply functions from pandas to the Feature object, so all that is required is the function you want to apply and the output column
- Can now replace missing data in columns with a random value that follow the Probability Density Function of that column
- Can now make new columns by equating the object with the new column name with a tuple of data. The tuple should contain the data for the training and test set in no particular order.

0.2.0

Release Notes
---

- Made it easier to import package, it is now: `from pyautoml import Clean`
- Can now view either the full dataset or the training dataset just by calling the Clean, Preprocess or Feature object: `clean`
- Can now view columns either the full dataset or the training dataset just by calling the Clean, Preprocess or Feature object and indexing it with the column like pandas: `clean[col_name]`
- Can now create new columns by calling the Clean, Preprocess or Feature object and equating it to an iterable, like pandas: `clean[new_col] = ...` for the entire dataset or the training dataset.
- Currently working on how to apply it to the test set automatically when you apply it to the training set
- Added a missing values property
- Added descriptive statistics for the entire dataframe and columns
- Added indepth descriptive statistics for each column
- Generated documentation web

0.1.2

Release Notes:

- Refactored code so developers have an easier time
- Added `PoS Tagging` as part of text feature engineering/extraction
- Added methods to remove duplicate rows and duplicate columns whose values are identical
- Calling `clean.missing_values` now displays a table outlining out may and what % of values are missing in each column
- Calling the `clean`, `feature` or `preprocessing` object now prints either the full data or the training data - whichever was provided

0.1.1

Release Notes:

Reporting:
- Added automatic reporting
- To enable, pass in a report name to the first constructor you initialize through the variable `report_name`, when you pass the `data_properties` object from constructor to constructor, the techniques will be written to the same file
- Files are saved in the `pyautoml_reports` directory in your working directory.

General:
- Made it easier to import packages using only pyautoml folder and module folder (pyautoml.cleaning import Clean)
- Numerous bug fixes

0.1

This release contains the packages that automate the cleaning, preprocessing, and feature engineering portions of data science workflows.

This release is for market research and feedback.

Release Notes:

Add Cleaning, Preprocessing and Feature Engineering techniques.

This library can be used in 2 ways, through the individual function files or for more functionality and modularity through the wrapper classes `Clean`, `Preprocess`, `Feature`.

General:
- Train/Test data splitting at the beginning of each process to avoid data leakage.
- Can start at any stage of the machine learning workflow by initiating the wrapper classes with data.

Cleaning:
- General
- Removing rows that have greater than a certain percentage of missing values.
- Removing columns that have greater that a certain percentage of missing values.
- Numeric
- Replace missing values with the Mean, Median or Mode of the column.
- Replace missing values with a constant.
- Categorical
- Remove row if a certain column is null.
- Replace missing value with a category.

Preprocessing:
- Numeric
- Normalize numeric values between 0 and 1.

Feature Engineering:
- Categorical
- One Hot Encoding
- Text
- TF-IDF
- Bag of Words

0.1.0

This release contains the packages that automate the cleaning, preprocessing, and feature engineering portions of data science workflows.

This release is for market research and feedback.

Release Notes:

Add Cleaning, Preprocessing and Feature Engineering techniques.

This library can be used in 2 ways, through the individual function files or for more functionality and modularity through the wrapper classes `Clean`, `Preprocess`, `Feature`.

General:
- Train/Test data splitting at the beginning of each process to avoid data leakage.
- Can start at any stage of the machine learning workflow by initiating the wrapper classes with data.

Cleaning:
- General
- Removing rows that have greater than a certain percentage of missing values.
- Removing columns that have greater that a certain percentage of missing values.
- Numeric
- Replace missing values with the Mean, Median or Mode of the column.
- Replace missing values with a constant.
- Categorical
- Remove row if a certain column is null.
- Replace missing value with a category.

Preprocessing:
- Numeric
- Normalize numeric values between 0 and 1.

Feature Engineering:
- Categorical
- One Hot Encoding
- Text
- TF-IDF
- Bag of Words

Aethos

Page 4 of 4