Automunge

Latest version: v8.33

Safety actively analyzes 706259 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 13 of 99

7.52

- based on findings from essay "Noise Injection Benchmarks" am updating a few default parameter settings for noise distributions targeting test data

Numeric
DPnb

7.51

- DP10 and DPoh reframed to make use of DPmc for noise injection instead of DPod, with comparable functionality
- this results in an updated convention for passing assignparam parameters to these transforms, now parameter assignments for noise injection can be passed directly to DP10 and DPoh
- purpose was for a material latency benefits for these transforms, on order of 20%
- primary source of benefit is fewer tiers of transforms, the prior config applied 3 tiers, now since DPmc accepts multi column input, only needed 2
- comparable parameter support, with addded benefit that now DP10 and DPoh have access to the swap_noise parameter
- small cleanup to 7.50 update to resolve a printout

7.50

- found a small snafu with default initializations for ML_cmnd
- cases of setting defaults as False were not being populated as intended
- which turns out was impacting the xgboost autoML_type in cases of CPU training
- due to a missing entry for ML_cmnd['xgboost_gpu_id']
- which didn't show up in our testing since we had externally initialized the default
- xgboost cpu training now working as intended

7.49

- new "swap noise" noise injection option
- swap noise can be applied to both categoric and continuous numeric features
- the implementation can be applied downstream of a single or multi column encoding
- swap noise, instead of sampling from a distribution, replaces noise targets with a random draw from the rows in a feature
- implemented by new swap_noise assignparam option in DPmc transform
- swap_noise accepts boolean, defaulting to False
- DPmc otherwise performs a categoric injection in a manner similar to other categoric injecitons, with exception that DPmc is neutral to whether it is applied downstream of a single or multi-column transform
- note that when swap_noise is activated, the weighted and test_weighted DPmc options are reset to False
- new root categories DPns and DP1s, which are for numeric z-score normalization with swap noise and categoric 1010 binarization with swap noise
- note that DPmc can also be applied downstream of concurrent MLinfilltypes in context of mlti trasnform to inject swap noise into each column in a multi column set individually
- the swap noise option was inspired by seeing a description in the paper "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning" by Talip Ucar, Ehsan Hajiramezanali, Lindsay Edwards

7.48

- new option for noise injection transforms
- now the noise distribution parameters can be passed as scipy.stats distributions
- such that a unique noise distribution parameter is sampled with each automunge and postmunge call
- this is primarily intended as a resource for data augmentation via the noise_augment option
- as we suspect there may be benefit to variations in noise profiles across duplicates
- this is somewhat of a hypothesis for now
- fixed test_flip_prob normalization_dict entry for DPbn

7.47

- apparent process flaw, 7.46 did not get uploaded to pypi, this update includes 7.46 rollouts
- new parameter accepted for qbt1 family of transforms as 'angle_bits', encodes activations as pi instead of 1
- may have more detail on this provided in blog, pending
- update to string naming for one of index column overlap scenarios for consistency

Page 13 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.