Automunge

Latest version: v8.33

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 28 of 99

6.52

- found and fixed a snafu in qbt1 transform originating from 6.50
- everything now aligned with original intent for 6.50 and back working as it should
- added abs_zero assignparam support to qbt1, defaulting to True
- abs_zero is boolean defaulting to True which converts received negative zeros to positive zero
- updated qbt1 family of transforms for cases that don't default to a returned sign column (qbt3 and qbt4) to defaultinfill of zeroinfill instead of negzeroinfill
- corrected a typo in read me library of transforms, mmq3 now corrected to read mmq2

6.51

- new assignparam option for nmbr transform as abs_zero
- abs_zero accepts booleans and defaulting to True
- when activated, abs_zero converts any negative zeros to positive zero prior to imputation
- as may desired to ensure the negzeroinfill imputation maintains a unique encoding in the data
- new transformation category nbr4
- similar to z-score normalization configuration prior to 6.50 update (with new abs_zero parameter deactivated)
- except changed defaultinfill for nbr4 from meaninfill to zeroinfill to solve rounding issue sometimes causing mean imputation to return as tiny decimal instead of zero

6.50

- added NArw_marker support to a few datetime family trees in whose omission had been an oversight
- fixed a bug with root category 'time'
- which was as a result of 'time' being entered in the 'time' family tree as a tree category
- where the 'time' process_dict entry was not populated with processing functions
- which is ok as long as a category is primarily intended to be applied as a root category but not a tree category
- otherwise when applied as a tree category no transforms are performed and downstream offspring not inspected when applicable
- updated the processfamily functions so that this scenario no longer produces error, just no transforms applied with printout for clarity
- oh and started to update process_dict entries in general for root categories lacking processing functions so they could be used as tree categories and midway decided some categories it actually makes more sense to leave them without, now that this scenario no longer halts operation won't be an issue
- changed default infill for datetime transforms (excluding timezone) to adjinfill
- reconsidered default infill for passthrough transforms with defaultinfill support (e.g. exc2 and exc5)
- previously we had applied mode infill based on neutrality towards numeric or categoric features
- decided mode is too computationally expensive for a passthrough transform, so reverting to adjinfill as default for categories built on top of exc2 / exc5
- updated default infill for shfl from adjinfill to naninfill
- added defaultinfill processdict specification support to dual/single/post process convention
- (a handful of transforms still pending support for esoteric reasons, those without process_dict defaultinfill specification in familytrees file)
- new option for defaultinfill as negzeroinfill, which is imputation by the float negative 0 (-0.)
- negzeroinfill is the new default infill for nmbr (z-score normalization) and qbt1
- as benefit the convention allows user to eliminate the NArw aggregation without loss of information content
- note that nmbr is the default transform for numeric sets under automation
- and previously applied meaninfill as precursor to ML infill which since the data is centered was equivalent to zero infill
- we anticipate there may be potential for downstream libraries to build capabilities to selectively distinguish between zero and negative zero based on the use case, otherwise we believe negative zero will be neutral towards model performance
- as a bonus the convention benefits interpretibility by visual inspection as user can distinguish between imputation points without NArw when ML infill not applied
- negzeroinfill also available for assignment via assigninfill

6.49

- simplified index recovery for validation data
- dropped printouts for stndrdinfill
- which were there to indicate columns with no infill performed as part of ML infill or assigninfill
- in which case infill defers to default infill perfomed as part of transformation function
- realized the inclusion was kind of cluttering the printouts
- and by omitting printouts just for this case there is no loss of information content
- moved validation that populated processing functions are callable
- from within the process family functions to distinct validation function _check_processdict4
- this validation now limited to entries in user passed processdict (after any functionpointer entries are populated)
- and validates that entries to any of the processing function slots (dual/single/post/inverseprocess or custom_train/custom_test/custom_inverison)
- are either callable functions or passed as None
- validation results returned in printouts and as check_processdict4_valresult
- a few improvements to functionpointer support function _grab_functionpointer_entries_support
- added support for edge scenario of self-referential pointers linked from a chain
- which were previously treated as infinite loops
- also removed a check for processing functions that was sort of superfluous
- a few small cleanups for clarity
- change evalcat format check from type to callable to be consistent with processing functions
- removed a comment in read me about adding assigninfill support for label sets
- if alternate infill conventions are desired for labels they can be applied with defaultinfill processdict entry in custom_train convention

6.48

- new option for the postmunge inversion parameter to specify a custom inversion path
- custom inversion path can be specified by passing inversion as a single entry set
- containing a string of a returned column header with suffix appenders
- such as to recover a specific input column based on inverting from a starting point of a specific target returned representation
- (note that label inversion is also available to collectively invert each of returned representations by the 'denselabels' option)
- inversion is for recovering the form of input data from train or test data returned from an automunge(.) or postmunge(.) call
- in default configuration, inversion selects an inversion path based on heuristic of shortest path of transformations with full information retention
- in the new custom inversion path option, alternate inversion paths can be specified
- which I don't have a specific use case in mind, just seemed like a reasonable extension

6.47

- updated custom_test application convention in automunge to be consistent with postmunge
- from the standpoint that if custom_train returned an empty set (including deletion of suffixcolumn)
- then suffixcolumn simply deleted from mdf_test without calling custom_test
- corrected some code comments in processfamily and processparent (and corresponding postprocess functions) regarding inplace elligibility
- updated processparent and postprocessparent to eliminate an edge case so that downstream transforms are halted when the associated upstream transform returned an empty set
- if user needs support for this scenario, need to configure upstream transform so that in the null scenario instead of returning empty set it performs passthrough
- also updated the parentcolumn derivation for passing an input column to downstream generations in processparent and postprocessparent
- to ensure consistent parentcolumn applied in both
- corrected a code comment in processparent that stated that downstream transforms require as input transforms returning a single column
- prior configuration already supported performing downstream transforms on multi-column sets with dual/single process convention
- just hadn't documented it well since don't currently have any applications in the library
- the convention is downstream transforms on received multicolumn sets will recieve as input a single column (which is now the first entry in the upstream categorylist)
- which they can then use as a key to access the upstream categorylist and normalization_dict from column_dict if needed
- updated the validation split performed in df_split to remove a redundant shuffle
- in the process found and fixed small snafu interfering with index retention in cases of validation split
- used that as a hint that needed to audit index retention, so ran everything with all options on and yeah looked good accross all returned sets
- fixed some printout categorizations for labelctgy assignment
- corrected single entry access approach to pandas.iat in a few places
- streamlined printouts at start of automunge / postmunge function calls (removed word "processing")

Page 28 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 28 of 99

6.52

6.51

6.50

6.49

6.48

6.47

Page 28 of 99

Links

Releases