Automunge

Latest version: v8.33

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 58 of 99

4.72

- new 'defaultparams' option for the processdict data structures for defining transformation category properties
- can now define a transformation category to accept custom default parameters for passing to transformation functions
- such as may be useful if you want to distinguish between versions of transformation categories that apply the same transformation functions but with different default parameters
- note that manually defined parameters passed to assign_param will still overwrite these defaults
- this new convention allows us to scrub the recently defined 'DL' differential privacy functions with laplace distribution
- now laplace is available as a parameter to the corresponding 'DP' differential privacy functions
- saving about 1,000 lines of code in the process
- also a little cleanup to the processfamily functions
- with consistent transformation function calls independant of parameter assignments
- which just makes more sense

4.71

- performed another code review of 4.70 and found a small snafu populating processdict for new transforms
- so quick fix to processdict entries for DLnb, DLmm, and DLrt

4.70

- new differential privacy series for numerical data
- featuring transforms DLnb, DLmm, and DLrt
- comparable to DPnb, DPmm, and DPrt
- but apply laplace distributed noise (i.e. double exponential) instead of gaussian
- where DLnb applies to z-score normalized data, DLmm to min-max normalized data, and DLrt to retain normalized data
- uses same parameters as the DP versions, where scale is passed as sigma, and loc as mu, and ratio of application as flip_prob
- inspired by a NIST post just saw on Hacker News
- also hat tip to Numpy for their numpy.random which serves as noise source

4.69

- a few cleanups to the ID column extractions in postmunge
- new 'mad' divisor parameter option for retain normalziation via retn and DPrt
- 'mad' applies median absolute deviation divisor instead of max-min
- mad divisor may be appropriate when range of values unconstrained, to avoid outliers interfering with in distribution range of normalizated set
- (e.g. if most of values fall in range 0-100, a train set outlier of 10,000,000,000 would interfere with normalization)
- in some distributions median absolute deviation may be more tractable than standard deviation

4.68

- found and fixed a typo bug in recently rolled out transforms sp19, sp20, sbs3, sbs4
- added support for unnamed non-range index in dataframes passed to df_train and df_test
- new root categories similar to demonstrations from recent paper "A Numbers Game"
- rtbn (retain normalization with ordinal encoded standard deviation bins)
- rtb2 (retain normalization with one-hot encoded standard deviation bins)

4.67

- found and fixed a small bug in assignparam_str_convert
- removed an unused code block in postcircleoflife
- small code comment cleanup

Page 58 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.