Automunge

Latest version: v8.33

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 62 of 99

4.48

- performed an additional code review of today's rollout and found a small bug missed in testing
- (since noise injection prevents us from comparing similar sets validation methods are impacted)
- so quick fix for small bug from 4.47 in DPrt postprocess function

4.47

- new DP transform DPrt intended to inject noise to retn scaled data
- (retn is numerical set normalization similar to min-max that retains +/- sign of recieved data for interpretability purposes)
- note that retn doesn't have a pre-defined mean or range so noise injection instead incorporated directly into retn transform for a common transformation function
- (as opposed to other DP transforms which instead break noise injection into a seperate transformation passed to family tree primitives)
- DPrt includes parameters available in retn: divisor / offset / multiplier / cap / floor defaulting to 'minmax'/0/1/False/False
- as well as parameters available in other numeric DP transorms: mu / sigma / flip_prob defaulting to 0./0.03/1.

4.46

- new DP transform DPmm intended to inject noise to min-max scaled numeric sets
- injects Gaussian noise per parameters mu, sigma, to ratio of data based on flip_prob
- with parameter defaults of 0., 0.03, 1.0 respectively
- noise capped at -0.5 / +0.5
- DPmm scales noise based on recieved minmax value to ensure output remains in range 0-1
- for example if recieved input is 0.1, scales any negative noise by 0.2 multiplier
- updated default flip_prob for DPnb to 1.0, which makes it equivalent to DPnm
- so just making DPnb the default numeric DP transform to save space etc
- user can still elect sampled injection by decreasing this value

4.45

- recasting of family tree definitions for DP family of transforms to address issue found with inversion
- bug originated from transformation function saving transformation category key which was associated with another transformation function due to the way we structured family trees, now resolved
- also new DP transform DPnb for numerical data
- similar to DPnm to inject gaussian noise intended for z-score normalized data
- but only injects to a subset of the data based on flip_prob parameter defaulting to 0.03
- also accepts parameters from DPnm of mu and sigma defaulting to 0 and 0.06
- added traindata parameter to postmunge parameter validations

4.44

- new DP transformation family for differential privacy purposes
- such as may enable noise injection to training data (but not test data)
- includes root categories DPnm for numerical data, DPbn for boolean/binary data, DPod for ordinal encodings, DP10 for binary encodings, and DPoh for one-hot encodings
- where DPnm injects Gaussian noise and accepts parameters for mu and sigma (defaults to 0, 0.06), as may be suitable for application to z-score normalized data
- DPbn injects Bernoulli distributted activation flips to bnry encodings, accepting parameter flip_prob for probability of flip (defaults to 0.03)
- DPod / DPoh / DP10 changes categoric activation per a Bernoulli distribution for whether to change and when changed per an equal probability within the set of activations, accepting parameter flip_prob for probability of flip (defaults to 0.03)
- note that when passing parameters to the DP functions, be sure to use the transformation category in the family trees associated with the transformation function (which in some cases may be different than you'd expect)
- DP operation in postmunge(.) supported by a new "traindata" parameter to distinguish whether the df_test is to be treated as train or test data (defaults to False for test data)
- note that methods make use of numpy.random library, which doesn't accept a random seed for repeatability, so for now DP family is randomized between applications (prob better this way, just making note)
- idea for a DP noise injection transform partly inspired by the differential privacy podcast series from TWiML documented in the From the Diaries of John Henry essay "Differential Privacy"
- added code comment about potential deep copy operation for postprocess_dict in postmunge so as not to edit exterior object
- some cleanup to normalization_dict entries for various ordinal encoding transforms such as to standardize on a few of the activation encoding dictionary formats

4.43

- new transformation family ntgr / ntg2 / ntg3
- intended for integer sets of unknown interpretation
- such as may be any one of continuous variables, discrete relational variables, or categoric
- (similar to those sets included in the IEEE competition)
- ntgr addresses this interpretation problem by simply encoding in multiple formats appropriate for each
- such as to let the ML determine through training which is the most useful
- ntgr includes ord3_mnmx / retn / 1010 / ordl / NArw
- where ord3_mnmx encodes info about frequency, retn normalizes for continuous, 1010 encodes as categoric, ordl is ordinal encoding which retains integer order information, and NArw for identification of infill points
- ntg2 same as ntgr but adds a pwr2 power of ten binning
- ntg3 reduces number of columns from ntg2 by using ordinal alternates 1010->ordl, pwr2->por2
- thoughts around integer set considerations partly inspired by a passage in chapter 4 of "Deep Learning with Pytorch" by Eli Stevens, Luca Antiga, and Thomas Viehmann
- also corrected the naninfill option to correct a potential bug originating from inconsistent data types between infill and other data

Page 62 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 62 of 99

4.48

4.47

4.46

4.45

4.44

4.43

Page 62 of 99

Links

Releases