Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 71 of 99

3.85

- remade the search function 'srch'
- now expected more efficient for unbounded sets
- by making use of a pandas.Series.str.contains method
- srch accepts parameters of 'search' as a list of search terms
- and 'case', a boolean signal for case sensitivity of search
- returns a new column for each search term containing activations corresponding to search terms identified as substring portions of categorical entries
- also updated the application of floatprecision data type adjustments for consistency between train set and labels
- (any data type conversions from floatprecision take place after processing functions and then again after infill)
- also a few small code comment cleanups

3.84

- cleanup of string parsing for numeric entry support functions
- fixed edge case from casting partitioned string "inf" as float

3.83

- realized there was potential bug when passing 0/1 integer column identifiers
- from overlap with option to pass values as boolean
- (such as for parameters labels_column, trainID_column, testID_column)
- easy fix, just performed global conversion from {== True : is True, == False : is False, != True : is not True, != False : is not False}

3.82

- added new validation for user-passed transformdict
- checking for redundant entries accross upstream or downstream primitives
- corrected printouts in a validation function for user-passed assigncat

3.81

- corrected the conversion from np.inf to np.nan from 3.80
- so to be clear, by default Automunge does not recognize np.inf values, they are treated as np.nan for purposes of infill
- added new 'retain' option for Binary parameter
- which can now be passed as one of {True, False, 'retain'}
- as prior, False does no conversion, True collectively applies a Binary transform to all boolean encoded sets as a replacement (such as for improved memory bandwidth)
- in the new 'retain' option, the returned collective Binary encoding acts as a supplement instead of a replacement to the columns serving as basis (such as a means of presenting boolean sets collectively in alternate configuration)
- I suspect this may prove a very useful option
- found and fixed edge case for spl9 and sp10 transforms preivously missed in testing
- associated with string conversion of numerical entries to test data
- performed a walkthrough of postmunge(.) labelscolumn parameter
- found a code snippet that had been carried over from automunge(.) that was inconsistent with documentation, now struck
- moved the postmunge(.) initialization of empty label sets a little earlier for clarity
- added a marker to returned dictionary noting cases when df_train is a single column designated as labels, just in case that might come in handy
- new transformation category ucct
- in same neighborhood as ord3 which is an ordinal integer categorical encoding sorted by frequency
- ucct counts in train set the unique class count for each categopry class and returns that count divided by total row count in place of the category
- e.g. for a train set with 10 rows, if we have two cases of category "circle", those instances would be returned with the value 0.2
- and then test set conversion would be same value independant of test set row count
- ucct inspired by review of the ICLR paper "Weakly Supervised Clustering by Exploiting Unique Class Count" by Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung
- additional new transform category Ucct, performs an uppercase character conversion prior to encoding (e.g. the strings usa, Usa, and USA treated consistently)
- followed by a downstream pair of offspring ucct and ord3

3.80

- added new convention that np.inf values are recognized as NaN for infill by default
- by making use of use_inf_as_na setting in Pandas
- impacts the getNArows function used in automunge(.) and postmunge(.)
- as well as any isna calls in transformation funcitons

Page 71 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 71 of 99

3.85

3.84

3.83

3.82

3.81

3.80

Page 71 of 99

Links

Releases