Retrofit

Latest version: v0.1.7

Safety actively analyzes 641872 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.1.5

Feature Engineering is now a class-based setup. User can choose between datatable, polars, and pandas for feature engineering operations.

The ML examples on the readme currently reflects usage for the datatable version with expanded feature engineering to highlight their usage.

0.1.4

RetroFit class:

Added XGBoost and LightGBM. Scoring also allows users to pass in new data for scoring. Examples on README

0.1.1

Added the first version of many for the RetroFit class for machine learning



Goals


Class Initialization
Model Initialization
Training
Grid Tuning
Scoring
Model Evaluation
Model Interpretation


Functions


ML1_Single_Train()
ML1_Single_Score()


Attributes


self.ModelArgs = ModelArgs
self.ModelArgsNames = [*self.ModelArgs]
self.Runs = len(self.ModelArgs)
self.DataSets = DataSets
self.DataSetsNames = [*self.DataSets]
self.ModelList = dict()
self.ModelListNames = []
self.FitList = dict()
self.FitListNames = []
self.EvaluationList = dict()
self.EvaluationListNames = []
self.InterpretationList = dict()
self.InterpretationListNames = []
self.CompareModelsList = dict()
self.CompareModelsListNames = []


Example Usage


Setup Environment
import timeit
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml

Load some data
BechmarkData.csv is located is the tests folder
Path = "./BenchmarkData.csv"
data = dt.fread(Path)

Create partitioned data sets
Data = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName=None,
PartitionType='random',
Ratios=[0.7,0.2,0.1],
ByVariables=None,
Sort=False,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')

Prepare modeling data sets
DataSets = ml.ML0_GetModelData(
Processing='Ftrl',
TrainData=Data['TrainData'],
ValidationData=Data['ValidationData'],
TestData=Data['TestData'],
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
InputFrame='datatable')

Get args list for algorithm and target type
ModelArgs = ml.ML0_Parameters(
Algorithms='Ftrl',
TargetType="Regression",
TrainMethod="Train")

Initialize RetroFit
x = RetroFit(ModelArgs, DataSets)

Train Model
x.ML1_Single_Train(Algorithm='Ftrl')

Score data
x.ML1_Single_Score(DataName=x.DataSetsNames[2], ModelName=x.ModelListNames[0], Algorithm='Ftrl')

Scoring data names
x.DataSets.keys()

Check ModelArgs Dict
x.ModelArgs

Check the names of data sets collected
x.DataSetsNames

List of model names
x.ModelListNames

List of model fitted names
x.FitListNames

List of comparisons
x.CompareModelsListNames

0.1.0

Enhanced FE2_AutoDataPartition() for Processing = 'datatable' and 'polars'

Added methods for xgboost and lightgbm for ML0_GetModelData()

Modified sorting and subsetting tasks for Processing = 'polars'

0.0.9

Added polars processing to FE2_AutoDataPartition(), added examples to README, and fixed some bugs in the other functions

0.0.7

Created framework for organizing modules and functions within modules.

New functions include:

FE2_AutoDataParition()

Example
import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import utils as u

random
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
ArgsList = DataSets['ArgsList']


FE1_DummyVariables()

import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
Output = fe.FE1_DummyVariables(
data=data,
ArgsList=None,
CategoricalColumnNames=['MarketingSegments','MarketingSegments2'],
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
data = Output['data']
ArgsList = Output['ArgsList']


ML0_GetModelData()

ML0_GetModelData Example:
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml

Load some data
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")

Create partitioned data sets
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')

Collect partitioned data
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
del DataSets

Create catboost data sets
DataSets = ml.ML0_GetModelData(
TrainData=TrainData,
ValidationData=ValidationData,
TestData=TestData,
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
Processing='catboost',
InputFrame='datatable')

Collect catboost training data
catboost_train = DataSets['train_data']
catboost_validation = DataSets['validation_data']
catboost_test = DataSets['test_data']

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.