Automunge

Latest version: v8.33

Safety actively analyzes 682441 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 53 of 99

5.2

- cleaned up edge case in featrue importance associated with newly reported baseaccuracy from 4.98
- allowed the reversion of labels column update in 5.1 which in hindsight was better served in prior configuration
- aded base accuracy printout to postmunge feature importance
- also bug fix in assembly of columntype_report associated with populating onehot sets and binary sets

5.1

- a small cleanup to feature importance to support edge case for labels column parameters as updated in 4.99

5.00

- paying down a little technical debt
- renamed a parameter in a few of the ML infill support functions to be consistent with usage / avoid confusion
- which in some cases was labeled as a "columnslist" but was actually passed to the function as a "categorylist"
- so yeah these parameters are a little more clearly labeled now
- also a fix for shuffleaccuracy function to support labels prepared in multiple configurations
- also created a label set version of ord5 trasnfomation category rolled out yesterday as lbo5
- which is fully consistent just creating a version intended for labels to follow convention
- such as may be useful if you want to overwrite a family tree for labels but not for training data or visa versa

4.99

- found bug associated with new ML infill autoML option for AutoGluon library rolled out in 4.97
- reconsidered imports approach
- now imports are conducted internal to ML infill support functions instead of by user
- which fixes the bug
- also found and fixed bug associated with ML infill targets of binary encoded sets for this autoML type
- so yeah I still consider this implementation somewhat experimental
- any edge scenarios where a model doesn't train can be addressed by assigning a different infill type for a column in assigninfill
- still a step in the right direction
- also new default transform category applied to all unique entry categoric sets
- ord5 which is comparable to ordl but excludes application of ML infill to that column

4.98

- added a printout to feature importance reporting base accuracy of feature importance model
- added an entry to FS_sorted report available in postprocess_dict with the base accuracy metric as 'baseaccuracy'

4.97

- a review of the testID_column parameter usage in automunge and postmunge found a few opportunities for improvement
- added automunge test_ID_column parameter support to pass as True for cases where test set contains consistent ID columns as train set
- which may also be passed as the default of False for this case
- but just trying to make this intuitive, since postmunge allowed passing test_ID_column as True, now can do the same in automunge
- as part of that update realized there was a small redundancy where ID column extraction had gotten integrtated in with index column extraction, so broke the ID column extraction out of index column extraction and consolidated the redundancy
- then in postmunge(.) found that could offer comparable testID_column support to automunge as well, where if a trainID_column was passed to automunge(.), now there is no need to specify a corresponding testID_column in postmunge, it is detected automatically
- (previously this required passing testID_column as either True or as the string or list of columns)
- so to be clear, parameter support is now completely equivalent for testID_column between usage in automunge and postmunge, where if trainID columns are present in the test set can either pass as True or leave as default of False and will automatically detect
- also added overdue support for postmunge to automatically detect if a labels column is present and treat accordingly
- (there is a tradeoff in this approach in that if sets passed as numpy arrays if ID columns between train and test are different the label column may have different header, this is a bit of an edge case and not an issue when prefered input of pandas dataframes)
- also started experimenting with ML infill support for an alternative to random forest
- specifically AutoGluon (an autoML library)
- requires installing and then importing AutoGLuon external to the function call as:
import autogluon.core as ag
from autogluon.tabular import TabularPrediction as task
- and then can be activated with ML_cmnd = {'autoML_type':'autogluon'}, and MLinfill=True
- parameter support for autogluon pending
- so to be clear, I don't consider the current autogluon implementation highly audited
- it needs more testing
- please consider this autoML option as experimental for now

Page 53 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.