Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 44 of 99

5.56

- found and fixed a few small bugs in catboost wrapper

5.55

- found a sort of unneccesary for loop that was part of central flow
- replaced with extracted data structure negating the need for for loop
- should make everything run a little quicker
- especially with high density category assignments

5.54

- a comprehensive audit of use of .values
- which was used in several places to convert dataframes to arrays
- in some cases it turns out more appropriately than others
- so yeah was able to strike a significant number of instances
- and replaced remainder with .to_numpy()
- now much more inline with pandas recomended practice
- also updated logic tests for conditional data types to be a little more precise

5.53

- a slight cleanup to the featureimportance report returned from automunge
- now the returned featureimportance report includes both the sorted results as well as the raw data serving as basis
- (where prevously had the sorted results only saved in postprocess_dict which was admittedly kind of not user friendly)
- so yeah feature importance results now all aggregated in single location for ease of reference

5.52

- new root transformation category or23
- or23 is inspired by the experiments conducted in String Theory paper
- and is an alternative to or19 that makes use of sp19 instead of chains of spl2
- with an upstream UPCS and sp19 supplemented by nmcm and ord3

5.51

- was doing some research and apparently pandas to_numpy() is newer / more consistent approach to converting pandas to numpy as opposed to .values
- so went ahead and updated the treatment to returned sets to be based on .to_numpy()
- this is relevant to cases where pandasoutput=False, which has been the default
- and yeah something I've been mulling over for a very long time is whether defaulting to returning numpy arrays is best approach
- scikit likes numpy arrays, but as far as I can tell almost all other frameworks prefer pandas dataframes for tabular
- so made the executive decision to change default for pandasoutput parameter from False to True
- which means returned sets are now pandas dataframes by default
- and to otherwise return numpy arrays can designate pandasoutput=False
- also updated hash transform to make returned data types condiitional based on size of encoding space

Page 44 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.