Automunge

Latest version: v8.33

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 99

8.21

- Fixed a printout setting associated with printing error messages in the printstatus=False case (small variable naming snafu originating from 8.17)
- Fixed postmunge feature importance printouts to align with automunge with respect to printing reported sorted results when printstatus='summary'

8.20

- new default for xgboost tuning with tuner optuna_XG1
- (which make use of the XGBoost and Optuna libraries, respectively)
- now defaults to applying cross validation with k=5 folds
- and fasttune meaning that only one fold is inspected in each tuning trial
- such that inspected folds are cycled through on each trial
- note that number of folds can be specified by ML_cmnd['optuna_kfolds'] and fast tuning can be deactivated by ML_cmnd['optuna_fasttune'] if desired
- this is similar to the "Fast Cross-Validation" benchmarked for bayesian tuning in "Multi-Task Bayesian Optimization" by Kevin Swersky, Jasper Snoek, and Ryan Adams (NIPS 2013)

8.19

- found and fixed a small bug for protected_feature option in DPnb transform
- associated with inspecting test data instead of intended train data

8.18

- new invert parameter for bnry, which flips the activations between 0 and 1
- lbbn (labels for binary classification) now defaults to invert=True
- (used for lbbn to eliminate remote edge case for xgboost for cases of binary classification with only one label class present)

8.17

- new automunge(.) and postmunge(.) parameter logger
- records validation results and different tiers of printouts associated with debug/info/warning
- this is similar to the python logging module, but better suited for our architecture and development environment
- the validation results are associated with different halt scenarios
- logger may be initialized as a dictionary externally to an automunge(.) or postmunge(.) function call, and if the API halts for any reason, the recorded log and validations will still be retained in the external dictionary for inspection or troubleshooting support
- we expect this update will greatly benefit troubleshooting since now the validation results are available for inspection even in cases of halt


logger = {}

train, train_ID, labels, \
val, val_ID, val_labels, \
test, test_ID, test_labels, \
postprocess_dict = \
am.automunge(df_train,
logger=logger,
printstatus='silent')

and then, e.g.
print(logger['debug_report'])
print(logger['info_report'])
print(logger['warning_report'])

or validation results available in logger['validations']
postmunge logger applied in same manner


- full logger contents are not returned in postprocess_dict for privacy preservation
- also removed the application timestamp from returned postprocess_dict for privacy preservation purposes
- retained the application number which is 12 digit random number associated with each application
- also removed a redundant __init__ function definition internal to the class definition, which I think was a relic from some very early implementations
- the implementation includes our first global variable (for both automunge and postmunge) as logger_dict which greatly simplified the implementation
- in the process also did an overhaul of printouts to align with logging
- also muted the pandas defragmented dataframe peformance warning. This is not a bug, it is a performance indication which we have operations in place to mitigate, so we are muting until pandas can roll out a fix

8.16

- new automunge(.) and postmunge(.) parameter logger
- records validation results and different tiers of printouts associated with debug/info/warning
- this is similar to the python logging module, but better suited for our architecture and development environment
- the validation results are associated with different halt scenarios
- logger may be initialized as a dictionary externally to an automunge(.) or postmunge(.) function call, and if the API halts for any reason, the recorded log and validations will still be retained in the external dictionary for inspection or troubleshooting support
- we expect this update will greatly benefit troubleshooting since now the validation results are available for inspection even in cases of halt


logger = {}

train, train_ID, labels, \
val, val_ID, val_labels, \
test, test_ID, test_labels, \
postprocess_dict = \
am.automunge(df_train,
logger=logger,
printstatus='silent')

and then, e.g.
print(logger['debug_report'])
print(logger['info_report'])
print(logger['warning_report'])

or validation results available in logger['validations']
postmunge logger applied in same manner


- full logger contents are not returned in postprocess_dict for privacy preservation
- also removed the application timestamp from returned postprocess_dict for privacy preservation purposes
- retained the application number which is 12 digit random number associated with each application
- also removed a redundant __init__ function definition internal to the class definition, which I think was a relic from some very early implementations
- the implementation includes our first global variable (for both automunge and postmunge) as logger_dict which greatly simplified the implementation
- in the process also did an overhaul of printouts to align with logging
- also muted the pandas defragmented dataframe peformance warning. This is not a bug, it is a performance indication which we have operations in place to mitigate, so we are muting until pandas can roll out a fix

Page 3 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.