Automunge

Latest version: v8.33

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 61 of 99

4.54

- standardized on suffix overlap error detection methods between transformation functions and PCA/Binary dimensionality reductions
- so now they all use common approach for simplicity and consistency and you know just good practice
- PCA suffix overlap results now returned in postprocess_dict['miscparameters_results']['PCA_suffixoverlap_results']
- Binary suffix overlap results now returned in postprocess_dict['miscparameters_results']['Binary_suffixoverlap_results']
- added this validation for excl column processing, returned in postprocess_dict['miscparameters_results']['excl_suffixoverlap_results']
- consolidated final printouts for suffix overlap detection into a support function
- a few small tweaks to some of column header lists to run faster
- found an unused relic data structure entry in a few of the time series transforms now struck
- finally cleaned up the use of one hot encoding support function postprocess_textsupport_class used in several places
- now can be called in much cleaner fashion without initialized supporting data structure

4.53

- some spot testing of validation methods revealed that the tests for suffix overlap error
- were not being applied in a few cases when transformations include a temporary support column
- (suffix overlap error refers to cases where a new column is created with header that was already present in the dataframe)
- an example of suffix overlap could be applying transform mnmx to column 'column1' when the column header 'column1_mnmx' was already present in the dataframe
- so 4.53 introduces a new method for detecting suffix overlap error, replacing prior
- by performing a validation at each time of new column creation within transformation functions
- with support functions df_copy_train to perform a copy operation in parallel to validation or df_check_suffixoverlap to just check for overlaps
- note these validations only performed on train sets in automunge(.) transformation functions to avoid unneccesary redundancy
- the results of the validations are returned in postprocess_dict['miscparameters_results']['suffixoverlap_results']
- in addition to printouts in cases of identified overlap

4.52

- corrected bug from 4.51

4.51

- small tweak to exclude pass through columns from inf infill to be consistent with philosophy for passthrough columns
- (we have passthrough columns excl/exc6 with no transformations or infill because they may still be desired to serve a purpose of providing part of basis for ML infill to adjoining columns even though no preparation is performed. We have similar pass through columns with added infill support for transforms exc2-exc5)

4.50

- an iteration on the validations for new assignnan parameter to break previously consolidated results into seperate statements
- also realized MLinfilltype assigned to excl root category was innapropriate, so created new one
- MLinfilltype 'totalexclude' now availalbe for assignment in processdict
- 'totalexclude' is intended for complete pass-through columns
- now applied to excl and exc6 which are the root categories for total passthrough
- some small details added for assignnan, now global option for infill designated entries excludes application to complete passthrough columns without infill (e.g. excl, exc6)
- assignnan treatment still availbable for these root categories but must be assigned explicitly in categories or columns assignnan entries
- making note of this special treatment in read me under library of transformations entries for excl and exc6

4.49

- new automunge(.) assignment parameter assignnan
- for use to designate data set entries that will be targets for infill
- such as may be entries not covered by NArowtype definitions from processdict
- for example, we have general convention that NaN is a target for infill,
- but a data set may be passed with a custom string signal for infill, such as 'unknown'
- this assignment operator saves the step of manual munging prior to passing data to functions
- assignnan accepts following form:
- assignnan = {'categories':{}, 'columns':{}, 'global':[]}
- populated in first tier with any of 'categories'/'columns'/'global'
- note that global takes entry as a list, while categories and columns take entries as a dictionary with values of the target assignments and corresponding lists of terms
- which could be populated with entries as e.g.:
- assignnan = {'categories':{'cat1':['unknown1']}, 'columns':{'col1':['unknown2']}, 'global':['unknown3']}
- where 'cat1' is example of root category
- and 'col1' is example of source column
- and 'unknown1'/2/3 is example of entries intended for infill
- in cases of redundant specification, global takes precendence over columns which takes precedence over categories
- note that lists of terms can also be passed as single values such as string / number for internal conversion to list
- note that validations on this data structure are returned in postprocess_dict['miscparameters_results'] validations as check_assignnan_result

Page 61 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.