Automunge

Latest version: v8.33

Safety actively analyzes 723976 Python packages for vulnerabilities to keep your Python projects secure.

Page 20 of 99

7.10

- performed a walkthrough of ML infill support functions
- validated various translations applied as part of data prep for 1010 MLinfilltype and model training for libraries that that only accept single column classification labels
- found and fixed a ML infill bug for catboost associated with accessing Binary columns in the support function to populate columntype_report
- (catboost calls this function prior to applying Binary was reason for bug, I think this might have been associated with 6.89 updates)
- found and fixed edge case for autogluon ML infill associated with infill targeting single column categoric feature
- (autogluon likes categoric labels as strings to recognize the classification application)
- replaced a validation split with shuffling performed for ML infill with catboost, now using support function _df_split
- fixed a small bug in one of support functions rolled out in last update via negative (there was a scenario where recursion would halt too early)
- another privacy_encode extension, now when activated the column_map is erased in returned postprocess_dict
- found and fixed a process flaw for ML infill targeting a concurrent MLinfilltype

7.0

- extensions to privacy_encode option
- now an alternate columntype_report is returned as postprocess_dict['private_columntype_report']
- populated with the alternate privacy encoded column headers
- and when privacy_encode activated this replaces the original columntype_report
- also now when privacy_encode is activated, the order of columns is shuffled prior to assigning alternate headers
- postmunge printouts other than bug reports are now silenced in privacy_encode scenario
- struck a redundancy in postmunge inversion associated with renaming columns which interfered now that privacy_encode shuffles order of columns
- consolidated a redundant parameter in support functions processfamily, postprocessfamily, circleoflife, postcircleoflife
- a little fleshing out of the returned data structure postprocess_dict['labelsencoding_dict']
- which is intended as an alternative resource for label inversion for cases where user doesn't wish to share the entire postprocess_dict, such as for file size or privacy reasons
- now labelsencoding_dict records any trasnform_dict and process_dict entries that were inspected as part of label processing
- as well as recording normalization_dict entries associated with derived columns that were subject to replacement

6.99

- a walkthrough of the process_dict data structure
- process_dict is initialized from assembleprocessdict and then after some preparations and validations are performed on user passed processdict the two are consolidated
- process_dict is then populated in the postprocess_dict upon initialization
- integration of process_dict into postprocess_dict was a deviation to support inspection of process_dict in custom_train postmunge wrapper function, which doesn't see process_dict but does see postprocess_dict
- but had the side effect of redundant variable being passed through family processing functions as process_dict and postprocess_dict['process_dict']
- realized it was a potential point of confusion to have the same data structure inspected in two different ways
- which was compounded when moved the labelctgy intialization from automunge start to label processing, as labelctgy initialization in some cases edits entries to the process_dict
- very simple solution, standardized on a single version of process_dict inspected in automunge(.) after postprocess_dict initialization as postprocess_dict['process_dict']
- the exception is for feature importance application, which takes place prior to postprocess_dict initialization, so this still sees as process_dict

6.98

- a cleanup to strike some of the intermediate column forms from Binary returned from inverson (inversion was working, this just results in a cleaner returned set)
- similarly, a cleanup to strike in set returned from inversion Binary columns produced from a retain consolidation in cases where inversion was originally passed as 'test' or 'labels (to be consistent with the original form)
- added TrainLabelFreqLevel support for consolidated labels (supporting ordinal and onehot consolidations)
- added validation in automunge for column header string conversion to confirm did not result in redundant headers (could happen if e.g. received headers included 1 and '1')
- added validation to postmunge inversion denselabels case to confirm single label_column entry
- fixed a snafu in privacy_encode (turned out was recording length of wrong list as part of a derivation)

6.97

- moved the conversion of passed dictionaries and lists to internal state a little earlier
- updated the support function parameter_str_convert so that if first entry is enclosed in set brackets it is exclude form string conversion (relvant to new labels_column scenario)
- added parameter_str_convert to Binary specifications to be consistent with convention that received numeric column headers are converted to string
- restated for clarity: automunge(.) converts all column headers to string to align with suffix appention convention
- added a clarifying word to df_train description in read me that column headers should be unique strings
- labels_column automunge parameter now accepts specification of list of label columns, resulting in multiple labels in returned label sets
- each of the list entries may have their own root category assigned in assigncat if desired
- labels_column automunge parameter, when passed as a list of label columns, accepts a first entry set bracket specification to activate a consolidation of categoric labels, resulting in a single classifciation target even in cases of multiple categoric labels, where the form of multiple labels can automatically be recovered in an inversion operation for data returned from an inference
- such that a single classification model can then be trained for use with multiple classification targets
- set bracket specification options are consistent with those options supported for Binary, with exception that binarized with replacement requires specification as {True}
- (instead of automatic when ommitted as is case with Binary)
- we recommend defaulting to the ordinal option, e.g. to consolidate three labels with headers 'categoriclabel_':
- labels_column = [{'ordinal', 'categoriclabel_1', 'categoriclabel_2', 'categoriclabel_3']
- which can then be recovered back to the form of three seperate labels in postmunge with inversion='labels'
- updated form of postprocess_dict['labelsencoding_dict'], now with extra tiers of entries as well as entries associated with categoric consolidations

6.96

- moved the validation for labelctgy processdict entry
- now takes place after identifying the root label category
- and limits validations to that entry
- which reduces overhead and moves an unnecessarily prominent printout for custom processdict
- a tweak to the labelctgy writeup (struck reference to functionpointer) to align with the functionpointer writeup

Page 20 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 20 of 99

7.10

7.0

6.99

6.98

6.97

6.96

Page 20 of 99

Links

Releases