- a review of the testID_column parameter usage in automunge and postmunge found a few opportunities for improvement
- added automunge test_ID_column parameter support to pass as True for cases where test set contains consistent ID columns as train set
- which may also be passed as the default of False for this case
- but just trying to make this intuitive, since postmunge allowed passing test_ID_column as True, now can do the same in automunge
- as part of that update realized there was a small redundancy where ID column extraction had gotten integrtated in with index column extraction, so broke the ID column extraction out of index column extraction and consolidated the redundancy
- then in postmunge(.) found that could offer comparable testID_column support to automunge as well, where if a trainID_column was passed to automunge(.), now there is no need to specify a corresponding testID_column in postmunge, it is detected automatically
- (previously this required passing testID_column as either True or as the string or list of columns)
- so to be clear, parameter support is now completely equivalent for testID_column between usage in automunge and postmunge, where if trainID columns are present in the test set can either pass as True or leave as default of False and will automatically detect
- also added overdue support for postmunge to automatically detect if a labels column is present and treat accordingly
- (there is a tradeoff in this approach in that if sets passed as numpy arrays if ID columns between train and test are different the label column may have different header, this is a bit of an edge case and not an issue when prefered input of pandas dataframes)
- also started experimenting with ML infill support for an alternative to random forest
- specifically AutoGluon (an autoML library)
- requires installing and then importing AutoGLuon external to the function call as:
import autogluon.core as ag
from autogluon.tabular import TabularPrediction as task
- and then can be activated with ML_cmnd = {'autoML_type':'autogluon'}, and MLinfill=True
- parameter support for autogluon pending
- so to be clear, I don't consider the current autogluon implementation highly audited
- it needs more testing
- please consider this autoML option as experimental for now