Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 82 of 99

3.19

- Added support for passing single entry column headers to assigncat and assigninfill without list brackets.

3.18

- found and fixed efficiency issue for ML infill application in automunge(.) and postmunge(.) originating from infilliterate methods
- ML infill runs faster now

3.17

- new methods for hyperparameter tuning in ML infill predictive algorithms via column specific grid search or randomized search of user passed sets of hyperparameters
- user can pass predictive algorithm parameters to ML_cmnd dicitonary in an automunge call either as distinct parameters to overwrite the defaults or as lists or distributions of parameters for tuning, as an example:
ML_cmnd = {'MLinfill_type':'default',
'hyperparam_tuner':'gridCV',
'MLinfill_cmnd':{'RandomForestClassifier':{'max_depth':range(4,6)},
'RandomForestRegressor':{'max_depth':[3,6,12]}}}
- hyperparameter tuning defaults to grid search for cases where user passes parameters as lists or ranges
- to perform randomized search user can pass via ML_cmnd the entry {'hyperparam_tuner':'randomCV'}, in which default number of iterations is 10, user can also assign number of iterations via ML_cmnd such as via {'hyperparam_tuner':'randomCV', 'randomCV_n_iter':20}
- in context of a randomized search for hyperparameter tuning user can also pass parameters as distributions via scipy stats module
- (also fixed bug for predictive model initializer associated with passing parameters)

3.16

- added a few more statistic evlauations to source column Data Distribution Drift assessment
- source column assessment for numeric sets now includes Shapiro and Skew stats such as to test for normality and tail characteristics
- source columns for root categories where e.g. 0 value subject to infill now includes an assessment of 0 value ratio in the set (similarly for nonpositive, etc)
- here is an example of source column drift assessment statistics for a positive numeric root category:

postreports_dict['sourcecolumn_drift']['new_driftstats'] = \
{(column) : {'max' : (stat),
'quantile_99' : (stat),
'quantile_90' : (stat),
'quantile_66' : (stat),
'median' : (stat),
'quantile_33' : (stat),
'quantile_10' : (stat),
'quantile_01' : (stat),
'min' : (stat),
'mean' : (stat),
'std' : (stat),
'MAD' : (stat),
'skew' : (stat),
'shapiro_W' : (stat),
'shapiro_p' : (stat),
'nonpositive_ratio' : (stat),
'nan_ratio' : (stat)}}

3.15

- resolved edge case bug associated with postmunge binary encoding ('bnry' and '1010' categories) for cases where corresponding automunge column was all infill. Tested full library for this edge scenario and passed.
- added entry to postreports_dict driftreport entries, 'newnotinorig', capturing returned columns that were not returned in the original set.

3.14

- resolved edge case bug associated with postmunge one-hot encoding ('text' category) for cases where corresponding automunge column was all infill

Page 82 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.