Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 71 of 99

3.85

- remade the search function 'srch'
- now expected more efficient for unbounded sets
- by making use of a pandas.Series.str.contains method
- srch accepts parameters of 'search' as a list of search terms
- and 'case', a boolean signal for case sensitivity of search
- returns a new column for each search term containing activations corresponding to search terms identified as substring portions of categorical entries
- also updated the application of floatprecision data type adjustments for consistency between train set and labels
- (any data type conversions from floatprecision take place after processing functions and then again after infill)
- also a few small code comment cleanups

3.84

- cleanup of string parsing for numeric entry support functions
- fixed edge case from casting partitioned string "inf" as float

3.83

- realized there was potential bug when passing 0/1 integer column identifiers
- from overlap with option to pass values as boolean
- (such as for parameters labels_column, trainID_column, testID_column)
- easy fix, just performed global conversion from {== True : is True, == False : is False, != True : is not True, != False : is not False}

3.82

- added new validation for user-passed transformdict
- checking for redundant entries accross upstream or downstream primitives
- corrected printouts in a validation function for user-passed assigncat

3.81

- corrected the conversion from np.inf to np.nan from 3.80
- so to be clear, by default Automunge does not recognize np.inf values, they are treated as np.nan for purposes of infill
- added new 'retain' option for Binary parameter
- which can now be passed as one of {True, False, 'retain'}
- as prior, False does no conversion, True collectively applies a Binary transform to all boolean encoded sets as a replacement (such as for improved memory bandwidth)
- in the new 'retain' option, the returned collective Binary encoding acts as a supplement instead of a replacement to the columns serving as basis (such as a means of presenting boolean sets collectively in alternate configuration)
- I suspect this may prove a very useful option
- found and fixed edge case for spl9 and sp10 transforms preivously missed in testing
- associated with string conversion of numerical entries to test data
- performed a walkthrough of postmunge(.) labelscolumn parameter
- found a code snippet that had been carried over from automunge(.) that was inconsistent with documentation, now struck
- moved the postmunge(.) initialization of empty label sets a little earlier for clarity
- added a marker to returned dictionary noting cases when df_train is a single column designated as labels, just in case that might come in handy
- new transformation category ucct
- in same neighborhood as ord3 which is an ordinal integer categorical encoding sorted by frequency
- ucct counts in train set the unique class count for each categopry class and returns that count divided by total row count in place of the category
- e.g. for a train set with 10 rows, if we have two cases of category "circle", those instances would be returned with the value 0.2
- and then test set conversion would be same value independant of test set row count
- ucct inspired by review of the ICLR paper "Weakly Supervised Clustering by Exploiting Unique Class Count" by Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung
- additional new transform category Ucct, performs an uppercase character conversion prior to encoding (e.g. the strings usa, Usa, and USA treated consistently)
- followed by a downstream pair of offspring ucct and ord3

3.80

- added new convention that np.inf values are recognized as NaN for infill by default
- by making use of use_inf_as_na setting in Pandas
- impacts the getNArows function used in automunge(.) and postmunge(.)
- as well as any isna calls in transformation funcitons

Page 71 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.