Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 15 of 99

7.40

- feature importance now supported with categoric consolidated labels
- including feature importance applied in both of automunge(.) and postmunge(.)
- categoric consolidation was accomodated by adding some condiitonals to feature importance column_dict inspections
- and also as a hack some editing of data strucutre passed to the feature importance model training
- such as to treat the consolidation transform as consistent to a transformation category

7.39

- found an edge case for multi transform label sets originating from 7.26
- (which was where we struck a redundant storage of normalization_dict in column_dict)
- taking place with populating labelsencoding_dict
- resolved by accessing targeted data structure entry through different path

7.38

- found an edge case for a few of the numeric noise injection transforms
- associated with cases where a validation split was performed in automunge
- arrising from an index mismatch between noise samples and target dataframe
- resolved by following better practice of initializing index with dataframe initialization
- used this as a hint to perform a walkthrough of each dataframe initialization in codebase
- identified a few other cases where added an index initialization as a precaiutionary measure
- although I think the nois3e injeciton applicaitons were primary poitn of issue
- now resolved
- also found a validaiton function that was intefering with feature importance
- originating from calling printstatus from a dicitonary prior to initialization

7.37

- an extension to noise injection transforms
- now user has option to designate distinct distribution parameters to train and test data
- basically all distribution parameters (mu, sigma, flip_prob, noisedistribution, weighted)
- now have a comparable test data mirror (test_mu, test_sigma, test_flip_prob, test_noisedistribution, test_weighted)
- which when unspecified default to matching the train data specifications
- only real complication was for scaled noise bias offset rolled out in 7.36 (associated with DPmm and DPrt)
- now when test distriution parameters differ from train, the scaled noise bias offset is fit to each of train and test data seperately
- (only relevant when testnoise parameter is activated)
- generally we recommend only using these test distribution parameters in conjunction with activating the testnoise parameter
- as otherwise postmunge processing of test data with noise by activating the postmunge traindata parameter will treat the data as train data so these won't be in play
- the rationale for the new test specific noise distribution parameters was from running some benchmarks and finding that the performance penalty from noise is slightly more pronounced for test data used in inference in comparison to noise injected to training data
- so just wanted to allow some flexibility for experimentation by any power users (including myself)

7.36

- new conveniont for noise injection transforms DPmm and DPrt
- DPmm and DPrt are for noise injection to numeric sets with a fixed range of entries, specifically mnmx and retn normalization
- DPmm and DPrt maintain a consistent range of values by scaling the noise distribution as a function of feature entry properties
- the issue was that in cases of imbalanced feature distributions, this had potential to introduce bias from noise scaling resulting in noise with non-zero mean
- the new convention is the noise mu, aka the noise mean, is adjusted from the specified version (defaulting to 0) to better approximate a scaled noise with zero mean
- activated by the DPmm and DPrt parameter noise_scaling_bias_offset, which accepts a boolean defaulting to True
- the original mu and final mu are returned in the normalization_dict as mu_orig and mu
- basically we do this by sampling a noise for all entries, scaling, and measuring mean of the scaled noise, sampling again with the measured mean as an offset to the noise mean, measuring a mean for scaling of that noise, and then using the set of values of mu and their resulting mean of scaled noise to linear interpolate to a final mu closer approximating a scaled noise with mean of 0
- *Note that we recommend deactivating parameter noise_scaling_bias_offset in conjunction with abs or negabs noisedistribution scenarios (i/e/ all positive or all negative noise scenarios), otherwise the sampled mean will be shifted resulting in noise with zero mean.
- also corrected the location of a precautionary adjinfill application in DPmm

7.35

- update to drift reporting, now derived column drift stat reporting includes derived columns that were subject to replacement
- added entry to postprocess_dict['origcolumn'] data structure as allderivedlist, which is similar to columnkeylist but includes derived columns that were subject to replacement
- small tweak to support function that sorts a set to match order of a list, now added support for sorting a target list (prior assumed the sorting target was a set) for support function __list_sorting
- performed audit of Function Blocks to identify cases where the function block string wasn't included in the code, found and added a few missing instances

Page 15 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 15 of 99

7.40

7.39

7.38

7.37

7.36

7.35

Page 15 of 99

Links

Releases