Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 10 of 99

7.70

- for cases where additional entropy seeds are sampled internally, updated the range of sampled seed size from 0 : 2 ** 32-1 to 0 : 2 ** 63
- this selection was partly informed by the max capacity for sampled integers from np.random.Generator().integers
- note that this differs from max randomseed accepted by pandas operations which is 2 ** 32 - 1
- which is the max capacity used for automunge/postmunge global randomseed's
- updated the read me to clarify entropy_seed accepted range
- also revisited the methods used to distinguish on whether to pass entropy seeds to custom generators
- previosuly we had a try/except for each case which was kind of not ideal practice, now only performing the try/except inspection once in the validation function and using that as basis everywhere else
- updated sampling budget derivation for passing entropy seeds to transforms associated with edge case for binomial sampling where flip_prob / test_flip_prob parameter set to 1 in which case no seeds are applied (e.g. for cases where noise injected into every entry)
- in other words, from an entropy seeding budget standpoint, it's actually cheaper to inject noise into every entry in a feature as opposed to just a sampled subset of entries based on a binomial sampling, although only by a ratio corresponding to the alternate value of flip_prob

7.69

- added some operations in column processing loop and conclusion to defragment drataframes
- added a regex default specification in hashing functions
- found a scenario mismatch between automunge and postmunge for qttf trasnform, now for both when qttf isn't fit due to all non-numeric the trasnfor returns as all 0

7.68

- updated scikit-learn model initialization defaults so that it only passes user specified parameters other than initailized random seed
- (prior config was a bug channel when scikit changes parameter options which apparently was a case with min_impurity_split)
- added support for random_generator sourced from the QRAND library

7.67

- fixed bug originating from 7.63 that was missed in testing due to error channel

7.66

- added entropy seeding support to DPhs which is for multi-column hashing with DPod noise applied to each column
- applies the same set of seeds to each target shuffled
- now have entropy seeding support for full range of noise injection options

7.65

- reduced number of sampled entries associated with choice sampling applied for categoric noise injections DPod and DPmc (which most categoric noise transforms are built on top of)
- now instead of sampling a choice entry for every row, the choice entries are only sampled based on the number of binomial activations
- results in a significant reduction in entropy_seeds required to support bulk_seeds case, for a dataframe of all categoric features, about a 47% reduction in number of sampled entries
- found and fixed an edge case for DPod
- also updated ML infill's stochastic_impute_numeric option to default to laplace instead of gaussian noise based on some benchmarking experiments

Page 10 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 10 of 99

7.70

7.69

7.68

7.67

7.66

7.65

Page 10 of 99

Links

Releases