Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 44 of 99

5.56

- found and fixed a few small bugs in catboost wrapper

5.55

- found a sort of unneccesary for loop that was part of central flow
- replaced with extracted data structure negating the need for for loop
- should make everything run a little quicker
- especially with high density category assignments

5.54

- a comprehensive audit of use of .values
- which was used in several places to convert dataframes to arrays
- in some cases it turns out more appropriately than others
- so yeah was able to strike a significant number of instances
- and replaced remainder with .to_numpy()
- now much more inline with pandas recomended practice
- also updated logic tests for conditional data types to be a little more precise

5.53

- a slight cleanup to the featureimportance report returned from automunge
- now the returned featureimportance report includes both the sorted results as well as the raw data serving as basis
- (where prevously had the sorted results only saved in postprocess_dict which was admittedly kind of not user friendly)
- so yeah feature importance results now all aggregated in single location for ease of reference

5.52

- new root transformation category or23
- or23 is inspired by the experiments conducted in String Theory paper
- and is an alternative to or19 that makes use of sp19 instead of chains of spl2
- with an upstream UPCS and sp19 supplemented by nmcm and ord3

5.51

- was doing some research and apparently pandas to_numpy() is newer / more consistent approach to converting pandas to numpy as opposed to .values
- so went ahead and updated the treatment to returned sets to be based on .to_numpy()
- this is relevant to cases where pandasoutput=False, which has been the default
- and yeah something I've been mulling over for a very long time is whether defaulting to returning numpy arrays is best approach
- scikit likes numpy arrays, but as far as I can tell almost all other frameworks prefer pandas dataframes for tabular
- so made the executive decision to change default for pandasoutput parameter from False to True
- which means returned sets are now pandas dataframes by default
- and to otherwise return numpy arrays can designate pandasoutput=False
- also updated hash transform to make returned data types condiitional based on size of encoding space

Page 44 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 44 of 99

5.56

5.55

5.54

5.53

5.52

5.51

Page 44 of 99

Links

Releases