Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 99

7.70

- for cases where additional entropy seeds are sampled internally, updated the range of sampled seed size from 0 : 2 ** 32-1 to 0 : 2 ** 63
- this selection was partly informed by the max capacity for sampled integers from np.random.Generator().integers
- note that this differs from max randomseed accepted by pandas operations which is 2 ** 32 - 1
- which is the max capacity used for automunge/postmunge global randomseed's
- updated the read me to clarify entropy_seed accepted range
- also revisited the methods used to distinguish on whether to pass entropy seeds to custom generators
- previosuly we had a try/except for each case which was kind of not ideal practice, now only performing the try/except inspection once in the validation function and using that as basis everywhere else
- updated sampling budget derivation for passing entropy seeds to transforms associated with edge case for binomial sampling where flip_prob / test_flip_prob parameter set to 1 in which case no seeds are applied (e.g. for cases where noise injected into every entry)
- in other words, from an entropy seeding budget standpoint, it's actually cheaper to inject noise into every entry in a feature as opposed to just a sampled subset of entries based on a binomial sampling, although only by a ratio corresponding to the alternate value of flip_prob

7.69

- added some operations in column processing loop and conclusion to defragment drataframes
- added a regex default specification in hashing functions
- found a scenario mismatch between automunge and postmunge for qttf trasnform, now for both when qttf isn't fit due to all non-numeric the trasnfor returns as all 0

7.68

- updated scikit-learn model initialization defaults so that it only passes user specified parameters other than initailized random seed
- (prior config was a bug channel when scikit changes parameter options which apparently was a case with min_impurity_split)
- added support for random_generator sourced from the QRAND library

7.67

- fixed bug originating from 7.63 that was missed in testing due to error channel

7.66

- added entropy seeding support to DPhs which is for multi-column hashing with DPod noise applied to each column
- applies the same set of seeds to each target shuffled
- now have entropy seeding support for full range of noise injection options

7.65

- reduced number of sampled entries associated with choice sampling applied for categoric noise injections DPod and DPmc (which most categoric noise transforms are built on top of)
- now instead of sampling a choice entry for every row, the choice entries are only sampled based on the number of binomial activations
- results in a significant reduction in entropy_seeds required to support bulk_seeds case, for a dataframe of all categoric features, about a 47% reduction in number of sampled entries
- found and fixed an edge case for DPod
- also updated ML infill's stochastic_impute_numeric option to default to laplace instead of gaussian noise based on some benchmarking experiments

Page 10 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.