Automunge

Latest version: v8.33

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 58 of 99

4.72

- new 'defaultparams' option for the processdict data structures for defining transformation category properties
- can now define a transformation category to accept custom default parameters for passing to transformation functions
- such as may be useful if you want to distinguish between versions of transformation categories that apply the same transformation functions but with different default parameters
- note that manually defined parameters passed to assign_param will still overwrite these defaults
- this new convention allows us to scrub the recently defined 'DL' differential privacy functions with laplace distribution
- now laplace is available as a parameter to the corresponding 'DP' differential privacy functions
- saving about 1,000 lines of code in the process
- also a little cleanup to the processfamily functions
- with consistent transformation function calls independant of parameter assignments
- which just makes more sense

4.71

- performed another code review of 4.70 and found a small snafu populating processdict for new transforms
- so quick fix to processdict entries for DLnb, DLmm, and DLrt

4.70

- new differential privacy series for numerical data
- featuring transforms DLnb, DLmm, and DLrt
- comparable to DPnb, DPmm, and DPrt
- but apply laplace distributed noise (i.e. double exponential) instead of gaussian
- where DLnb applies to z-score normalized data, DLmm to min-max normalized data, and DLrt to retain normalized data
- uses same parameters as the DP versions, where scale is passed as sigma, and loc as mu, and ratio of application as flip_prob
- inspired by a NIST post just saw on Hacker News
- also hat tip to Numpy for their numpy.random which serves as noise source

4.69

- a few cleanups to the ID column extractions in postmunge
- new 'mad' divisor parameter option for retain normalziation via retn and DPrt
- 'mad' applies median absolute deviation divisor instead of max-min
- mad divisor may be appropriate when range of values unconstrained, to avoid outliers interfering with in distribution range of normalizated set
- (e.g. if most of values fall in range 0-100, a train set outlier of 10,000,000,000 would interfere with normalization)
- in some distributions median absolute deviation may be more tractable than standard deviation

4.68

- found and fixed a typo bug in recently rolled out transforms sp19, sp20, sbs3, sbs4
- added support for unnamed non-range index in dataframes passed to df_train and df_test
- new root categories similar to demonstrations from recent paper "A Numbers Game"
- rtbn (retain normalization with ordinal encoded standard deviation bins)
- rtb2 (retain normalization with one-hot encoded standard deviation bins)

4.67

- found and fixed a small bug in assignparam_str_convert
- removed an unused code block in postcircleoflife
- small code comment cleanup

Page 58 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 58 of 99

4.72

4.71

4.70

4.69

4.68

4.67

Page 58 of 99

Links

Releases