Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 60 of 99

4.60

- created companion columntype_report for columns returned from label set
- available in postprocess_dict as postprocess_dict['label_columntype_report']
- new label processing category 'lbos' that applies ordinal encoding followed by conversion to string
- such as to support downstream machine learning libraries that may consider data types of labels for determination of application of regression vs classification
- (ie some libraries may treat integer sets as targets for regression when user intends classification)
- note that inversion is supported to recover original form eg to convert predictions back to form

4.59

- added returned_PCA_columns to postprocess_dict
- added a memory clear operation to PCA transform to reduce overhead
- added returned_Binary_columns to postprocess_dict
- improved printouts for PCA and Binary dimensionality reductions
- removed ordinal columns from Binary dimensionality reduction, just made more sense
- improved read me writeup for ord4, intended as a scaled metric ranking redundant entries by frequency of occurance
- new report classifying returned columns by data type now available in postprocess_dict as postprocess_dict['columntype_report']
- includes aggregated lists of columns per types: continuous, boolean, ordinal, onehot, onehot_sets, binary, binary_sets, passthrough
- where onehot captures all one-hot encoded columns, and onehot_sets is redundant except that it subaggregates by those from same transform as a list of lists
- similarily with binary and binary_sets
- these aggregations should be helpful for training downstream models in libraries that accept specification of column types, such as eg for entity embeddings

4.58

- improved printouts to support postmunge(.) troubleshooting
- for cases where df_train columns passed to automunge(.) inconsistent with df_test passed to postmunge(.)
- removed a validation test for bug scenario that was eliminated as part of 4.55 update
- and also simplified the reporting for ML infill validation test added in 4.56 since only needs to be run once instead of for every column

4.57

- found and fixed small bug in postmunge feature importance evaluation

4.56

- added a validation and printout for ML infill edge case scenario
- a little clean up to infill validation aggregations

4.55

- rewrote the insertinfill function for simplicty / clarity, also to segregate supporting columns from attachment to df_train, one more edge case failure mode eliminated
- this is a pretty central support function and was one of the first that had written, part of reason it was a little sloppy, much cleaner now
- new validation infill_suffixoverlap_results out of abundance of caution
- also a few more tweaks to dataframe column inspections to use more efficient .columns instead of list()
- similar tweaks to a few dictionary key inspections

Page 60 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.