Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 20 of 99

7.10

- performed a walkthrough of ML infill support functions
- validated various translations applied as part of data prep for 1010 MLinfilltype and model training for libraries that that only accept single column classification labels
- found and fixed a ML infill bug for catboost associated with accessing Binary columns in the support function to populate columntype_report
- (catboost calls this function prior to applying Binary was reason for bug, I think this might have been associated with 6.89 updates)
- found and fixed edge case for autogluon ML infill associated with infill targeting single column categoric feature
- (autogluon likes categoric labels as strings to recognize the classification application)
- replaced a validation split with shuffling performed for ML infill with catboost, now using support function _df_split
- fixed a small bug in one of support functions rolled out in last update via negative (there was a scenario where recursion would halt too early)
- another privacy_encode extension, now when activated the column_map is erased in returned postprocess_dict
- found and fixed a process flaw for ML infill targeting a concurrent MLinfilltype

7.0

- extensions to privacy_encode option
- now an alternate columntype_report is returned as postprocess_dict['private_columntype_report']
- populated with the alternate privacy encoded column headers
- and when privacy_encode activated this replaces the original columntype_report
- also now when privacy_encode is activated, the order of columns is shuffled prior to assigning alternate headers
- postmunge printouts other than bug reports are now silenced in privacy_encode scenario
- struck a redundancy in postmunge inversion associated with renaming columns which interfered now that privacy_encode shuffles order of columns
- consolidated a redundant parameter in support functions processfamily, postprocessfamily, circleoflife, postcircleoflife
- a little fleshing out of the returned data structure postprocess_dict['labelsencoding_dict']
- which is intended as an alternative resource for label inversion for cases where user doesn't wish to share the entire postprocess_dict, such as for file size or privacy reasons
- now labelsencoding_dict records any trasnform_dict and process_dict entries that were inspected as part of label processing
- as well as recording normalization_dict entries associated with derived columns that were subject to replacement

6.99

- a walkthrough of the process_dict data structure
- process_dict is initialized from assembleprocessdict and then after some preparations and validations are performed on user passed processdict the two are consolidated
- process_dict is then populated in the postprocess_dict upon initialization
- integration of process_dict into postprocess_dict was a deviation to support inspection of process_dict in custom_train postmunge wrapper function, which doesn't see process_dict but does see postprocess_dict
- but had the side effect of redundant variable being passed through family processing functions as process_dict and postprocess_dict['process_dict']
- realized it was a potential point of confusion to have the same data structure inspected in two different ways
- which was compounded when moved the labelctgy intialization from automunge start to label processing, as labelctgy initialization in some cases edits entries to the process_dict
- very simple solution, standardized on a single version of process_dict inspected in automunge(.) after postprocess_dict initialization as postprocess_dict['process_dict']
- the exception is for feature importance application, which takes place prior to postprocess_dict initialization, so this still sees as process_dict

6.98

- a cleanup to strike some of the intermediate column forms from Binary returned from inverson (inversion was working, this just results in a cleaner returned set)
- similarly, a cleanup to strike in set returned from inversion Binary columns produced from a retain consolidation in cases where inversion was originally passed as 'test' or 'labels (to be consistent with the original form)
- added TrainLabelFreqLevel support for consolidated labels (supporting ordinal and onehot consolidations)
- added validation in automunge for column header string conversion to confirm did not result in redundant headers (could happen if e.g. received headers included 1 and '1')
- added validation to postmunge inversion denselabels case to confirm single label_column entry
- fixed a snafu in privacy_encode (turned out was recording length of wrong list as part of a derivation)

6.97

- moved the conversion of passed dictionaries and lists to internal state a little earlier
- updated the support function parameter_str_convert so that if first entry is enclosed in set brackets it is exclude form string conversion (relvant to new labels_column scenario)
- added parameter_str_convert to Binary specifications to be consistent with convention that received numeric column headers are converted to string
- restated for clarity: automunge(.) converts all column headers to string to align with suffix appention convention
- added a clarifying word to df_train description in read me that column headers should be unique strings
- labels_column automunge parameter now accepts specification of list of label columns, resulting in multiple labels in returned label sets
- each of the list entries may have their own root category assigned in assigncat if desired
- labels_column automunge parameter, when passed as a list of label columns, accepts a first entry set bracket specification to activate a consolidation of categoric labels, resulting in a single classifciation target even in cases of multiple categoric labels, where the form of multiple labels can automatically be recovered in an inversion operation for data returned from an inference
- such that a single classification model can then be trained for use with multiple classification targets
- set bracket specification options are consistent with those options supported for Binary, with exception that binarized with replacement requires specification as {True}
- (instead of automatic when ommitted as is case with Binary)
- we recommend defaulting to the ordinal option, e.g. to consolidate three labels with headers 'categoriclabel_':
- labels_column = [{'ordinal', 'categoriclabel_1', 'categoriclabel_2', 'categoriclabel_3']
- which can then be recovered back to the form of three seperate labels in postmunge with inversion='labels'
- updated form of postprocess_dict['labelsencoding_dict'], now with extra tiers of entries as well as entries associated with categoric consolidations

6.96

- moved the validation for labelctgy processdict entry
- now takes place after identifying the root label category
- and limits validations to that entry
- which reduces overhead and moves an unnecessarily prominent printout for custom processdict
- a tweak to the labelctgy writeup (struck reference to functionpointer) to align with the functionpointer writeup

Page 20 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.