Automunge

Latest version: v8.33

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 99

7.88

- new root category DPpc for DPod applied to passthrough categoric
- available by new assignparam option for DPod as 'passthrough'
- accepts boolean, defaults to False
- in the process found a DPod bug for the protected_feature rolled out in last update, now fixed
- through the resolution found opportunity to eliminate a support column in DPod that wasn't needed

7.87

- new option for noise injection transforms via 'protected_feature' assignparam parameter
- protected feature defaults to False, accepts string input header assignment of adjacent categoric feature
- may be specified to a target noise transform or to all noise transform with usual assignparam options
- when specified, noise injection to a target feature is scaled differently between segments corresponding to the adjacent protected feature
- e.g. if we have a global noise scale, different segments of the feature may be exposed to different relative noise profile owing to their differences in segment distribution properties
- now the different segments have noise scaled to align with the segment scale
- including for distribution sampled noise in numeric noise and for weighted categoric sampling in categoric noise
- this practice is expected to benefit loss discrepancy between attributes of a protected feature
- was inspired by the identification of potential for loss discrepancy offered by Khani, F. and Liang, P. in "Feature noise induces loss discrepancy across groups.
- as part of the update decided to modularize the column processing loops in automunge and postmunge for purposes of segregating the protected features into a separate for loop, which may be beneficial when parallelizing the master for loops to ensure the protected feature has a known configuration when it is accessed for the noise scaling
- also found and fixed a small bug for public label inversion with encryption (had a pair of contradictory if statements)
- added inplace support for DPod

7.86

- new parameter accepted for mask noise via DPsk as 'additive'
- additive accepts boolean defaulting to False
- when True mask noise is added to the input instead of replacing
- intended for use to inject discrete noise into continuous numeric sets
- also was considering adding functionality to introduce arbitrary noise profiles of multiple perturbation vectors to common feature
- and then realized we already have that functionality available by th8e family tree primitives
- as one example, could inject one profile of small noise sigma with regular flip_prob, and then a second profile of large noise sigma with very small flip_prob as a downstream transform to the first noise profile
- doing that with family tree primitives for DPnb which is nosie with z-score normalziation would look something like this, where were are overwriting fmaily trees for DPnb and DPn3 in transformdict and adding new processdict entry for the DPn4 which will be the downstream second perturbation vector
- this is kind of like probabilistic programming although not turing complete

tramnsformdict = {}

transformdict.update({'DPnb' : {'parents' : ['DPn3'],
'siblings' : [],
'auntsuncles' : [],
'cousins' : ['NArw'],
'children' : [],
'niecesnephews' : [],
'coworkers' : ['DPn4'],
'friends' : []}})

DPn3 primarily intended for use as a tree category
transformdict.update({'DPn3' : {'parents' : ['DPn3'],
'siblings' : [],
'auntsuncles' : [],
'cousins' : [],
'children' : ['DPnb'],
'niecesnephews' : [],
'coworkers' : [],
'friends' : []}})

processdict = {}

processdict.update({'DPnb' : {'functionpointer' : 'DPnb',
'defaultparams' : {'sigma':0.5,
'flip_prob':0.0001}}})

processdict.update({'DPn4' : {'functionpointer' : 'DPnb',
'defaultparams' : {'sigma':0.05,
'flip_prob':0.03}}})

7.85

- quick fix, realized the bug fixed in 7.84 was also present in automunge
- now aligned as resolved in both channels
- also adjusted rollout validations to include scenario going forward

7.84

- found and fixed a bug associated with ID extraction in postmunge
- which also was impacting validation split in automunge
- originating from non-range index in postmunge when not found in automunge not getting extracted to ID set
- a cleanup to support function __assignparam_str_convert
- added a clarification to read me on custom transformation functions that recieved parameter types require support with python deepcopy operation
- ran some additional validations on process_dict functionpointer support from 7.83, looks good

7.83

- new automunge(.) parameter orig_headers, accepts boolean defaults to False
- activating orig_headers results in returned dataframes matching input column headers without suffix appenders
- consistent basis applied in postmunge
- this may result in redundent column headers in the returned dataframe
- privacy_encode when activated takes precedence
- created for use in workflows supporting integration of noise injection into existing data pipelines
- also, added additional powertransform scenarios DT1/DT2/DB1/DB2, each consistent with DP1 or DP2 but using the DT convention that noise injected to only test data or the DB convention that noise is injected to both train and test data
- also, added functionpointer support for internally defined process_dict entries
- found a small memory management issue with DPhs that was interfering with custom generators, now resolved

Page 7 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 7 of 99

7.88

7.87

7.86

7.85

7.84

7.83

Page 7 of 99

Links

Releases