Automunge

Latest version: v8.33

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 99

8.0

- updated entropy_seeding range from (2 ** 63) to (2 ** 31 - 1) to align with int32
- updated randomseed range from (2 ** 32 - 1) to (2 ** 31 - 1) to align with int32
- found and fixed a small equality operator misframing that was interfering with xgboost label prep
- also updated xgboost classification so that it doesn't train a model for the nunique == 1 case

7.99

- similar update to 7.98
- identified another potential overlap in naming space from function being used to support targeting two different mutable containers of similar form
- now the encapsulations have a distinct naming convention
- referring to difference between processdict (user passed parameter), process_dict (internal version after consolidation between user passed and internal library), and now the support functions which can receive either of these two will use the internal name processdict_
- the use of these support functions (associated with functionpointer specification) for both versions was a recent addition is reason for the occurrence
- we made note of this underscore naming convention in essay Specification of Derivations with Automunge, in hindsight it is possibly an example of naming scheme technical debt, but we use a similar distinction between external/internal for a few parameters so at least is consistent (e.g. transformdict/transform_dict, assignparam/assign_param)

7.98

- found a possible memory sharing issue associated with attempted function encapsulation renaming on mutable container renamed overlapping to another mutable retainer
- now reverted the encapsulation to original expected external name scheme
- trying to keep a clean name space

7.97

- in python, dictionaries and lists are mutable object containers
- such that setting a = b means a is b
- we have tried to circumvent that by using deepcopy operation on dictionaries in a few places
- but recently identified an edge case for deepcopy in presence of some types of non-native objects
- wanted to avoid this edge case coming up with user defined transformation functions
- for cases of storing non-native objects in normalization_dict
- so new __autocopy function replaces use of deepcopy in library
- with equivalent functionality and edge case support
- struck deepcopy import
- updated a few PCA support functions for initializating PCA models to accomodate error channel if scikit depreciates one of their parameter settings and we don't notice
- we had run into this on random forest a little while back, lesson learned to let the library manage the defaults
- moved PCA imports into associated support functions instead of making them global
- makes sense since they are only called once, results in slightly lower overhead for postmunge when not applying PCA
- also moved import for QuantileTransformer into qttf transformation functions
- also moved import for USFederalHolidayCalendar into hldy transformation function
- a slight revision to automunge initialization of randomseed for pandas seeding
- now it may be initialized twice, first for some neutral applicaiton, and second, if it was a specified sampling_type other than default and user didn't pass any entropy_seeds, again after initializing entropy seeds to allow for seeding initialization
- added some code comments here and there

7.96

- revised the cat_type parameter implementation rolled out in 7.90
- now avoids a redundant dtype conversion
- was wondering if might have been a channel for inconsistency with exotic data types I might not be aware of
- fixed a potential bug channel in ppd_append associated with duplicate_rows parameter
- identified a (remote) error channel from code review associated with incompatibility of postmunge dupl_rows options and automunge ppd_append option
- basically becuse ppd_append results in preparing sets of features seperately, the dupl_rows option may not consistently consolidate duplicate rows resulting in halt with the concat operaiton
- now when identified a printout is returned and postmugne validation result logged as dupl_rows_ppd_append_postmunge_valresult

7.95

- updated a few default settings
- new default for printstatus parameter is printstatus = 'summary'
- this option is a little less obnoxious than the True case
- new scenario supported for pandasoutput parameter as pandasoutput = 'dataframe'
- also making this the new default
- pandasoutput = 'dataframe' differs from prior default of pandasoutput = True in that single column label sets are now retained as dataframes instead of being converted to series
- I've seen different conventions in different libraries on whether single column pandas labels are preferred as Series or DataFrame, decided to select the default on which is easiest to remember
- the new default of pandasoutput = 'dataframe' aligns with case that all returned sets are dataframes, even single column label sets
- apologies if anyone get's a bug out of this update, trying to err on the side of simplicity, should be easy to fix
- also found and fixed a variable naming snafu interfering with validation of traindata parameter

Page 5 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 5 of 99

8.0

7.99

7.98

7.97

7.96

7.95

Page 5 of 99

Links

Releases