Automunge

Latest version: v8.33

Safety actively analyzes 681857 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 31 of 99

6.34

- corrected a parameter validation for ML_cmnd associated with a changed parameter naming convention from a while back
- (ML_cmnd['MLinfill_type'] was replaced with ML_cmnd['autoML_type'] and default was changed from 'default' to 'randomforest'
- in the transformdict check for infinite loops found and fixed a scenario where validation result wasn't being reported correctly
- another fix for Binary inversion associated with partial inversion including Binary columns when Binary passed to automunge(.) as True
- improved printouts for clarity for not supported edge case for partial inversion including Binary columns when Binary passed to automunge(.) as 'retain'
- (this edge case not needed since Binary columns will be redundnat with what can be recoved from inversion of rest of set)
- improved documentation for transformdict and processdict parameters in read me

6.33

- performed an audit of internal process_dict 'labelctgy' entries
- where labelctgy again is kind of only used in a few remote scenarios
- such as when a category is applied as a root category to a label set
- and has a family tree that returns labels in multiple configurations
- such that when that label set is a target for feature selection or oversampling
- the labelctgy entry helps distinguish which of the constituent sets will serve as target
- so yeah previous convention was that labelctgy was meant to point to the recorded category
- populated in column_dict by the transformation function returning set to serve as target
- turns out the 6.30 update where we revised the recorded category convention impacted labelctgy
- such that any place where the recorded category entry to labelctgy did not match the target treecategory
- required update to new convention
- so result is several updates to recorded labelctgy entries
- also in process identified that process_dict entries for 'time' and 'tmsc'
- did not have correspoinding transform_dict family trees defined
- which was kind of intentional since those are only used as support functions in other family trees
- but to be consistent with convention of rest of library, went ahead an defined some simple family trees

6.32

- ok so 6.31's survey of printouts helped me recognize that 6.28's audit of validation results was incomplete
- (this was because have a few different convention for returned printouts with flag so didn't survey all at the time)
- so performed an additional audit of validation results printouts
- and found a few more cases of validation tests missing entries in the returned reports
- so postprocess_dict['miscparameters_results'] now contains additional entries for {ML_cmnd_hyperparam_tuner_valresult, returned_label_set_for_featureselect_valresult, labelctgy_not_found_in_familytree_valresult, featureselect_trained_model_valresult, assignnan_actions_valresult, labels_column_for_featureselect_valresult, featureselect_automungecall_validationresults}
- and postreports_dict['pm_miscparameters_results'] now contains additional entries for {labelscolumn_for_postfeatureslect_valresult, returned_label_set_for_postfeatureselect_valresult, labelctgy_not_found_in_familytree_pm_valresult, postfeatureselect_trained_model_valresult, postfeatureselect_with_inversion_valresult, labels_column_for_postfeatureselect_valresult, postfeatureselect_automungecall_validationresults}
- updated check_haltingproblem validation so that it only runs when there is a user defined transformdict to speed things up under default
- also new convention: labelctgy is now only an optional entry to user passed processdict
- if one is not assigned or accessed from functionpointer then an arbitrary entry is populated from family tree
- removed a validation check and if statement indentation in support function _featureselect
- associated with checking for cases where labels_column passed as False (this was redundant with a check performed prior to calling function)
- finally found and fixed bug for postmunge inversion in conjunction with Binary dimensionality reduction

6.31

- new 'silent' option for printstatus parameter to both automunge(.) and postmunge(.)
- new convention is that printstatus can be passed as one of {True, False, 'silent'}
- where True is the default and returns all printouts
- False only returns error message printouts
- and 'silent' turns off all printouts
- the thought is that there may be preference for a silent option
- for cases where Automunge incorporated as resource in script workflows outside of context of jupyter notebooks
- note that results of validations which may generate error message prinouts are generally included in the postprocess_dict['miscparameters_results'] or postreports_dict['pm_miscparameters_results']
- so validations are available for inspection even when all printouts are silent
- also updated the suffix overlap detection method for Binary dimensionality reduction to align with rest of library

6.30

- important update: new convention for automunge trasnformation functions
- now process and singleprocess functions have an additional parameter as treecategory
- which is passed to the functions as the transformation category populated in the family tree
- which is the category serving as the key for accessing those functions in process_dict
- treecategory now serves as the default suffix for a transformation
- unless suffix otherwise specified in assignparams or defaultparams
- treecategory is also recorded in the column_dict entry for 'category'
- replacing the hard coded transformation category recorded there previously
- benefits of the update include that we can now tailor the inversion function applied
- to each category with a process_dict entry
- as opposed to prior convention where inversion function was based on recorded category in the function
- no updates needed to inversion to support, everything already works as coded, difference is that now inversion inspects the recorded treecategory process_dict entry to access inversion function instead of the hard coded recorded category associated with a transformation function
- another benefit is that methods that inspect mlinfilltype will no longer inspect mlinfilltype of the recorded category
- they will instead inspect mlinfilltype of the treecategory
- which means can now have multiple mlinfilltypes for a given transformation function as reorded in different process_dict entries
- another benfit associated with the updated suffix convention
- is that now the returned headers will display the categories that may serve as a key for assigning parameters in assignparam
- where previously a user might have to dig into the documentaiton to identify a key based on the family trees
- so went ahead and scrubbed (almost) all cases of suffix assignment in process_dict defaultparams
- as will instead defer to matching suffix to treecategory which is much less confusing
- another benefit is that we were able to strike use of process_dict entry for 'recorded_category'
- as well as scrubbed recorded_cateory from support functions in place to accomodate such as for functionpointer
- since now the recorded category will be the treecategory serving as key for the process_dict entry
- also added validation check for PCAexcl parameter to _check_am_mioscparameters
- result returned in miscparameters_results as PCAexcl_valresult
- also updated featurethreshold validation test to allow passing integer 0 instead of float 0. if desired
- finally small updates to postprocess_bxcx, and Binary dimensionality reduction, and inverseprocess_year to accomodate new default suffix convention associated with this update

6.29

- improved function description in codebase for _evalcategory
- new validation result reported in miscparameters_results
- as suffixoverlap_aggregated_result which is single boolean marker aggregating all of the suffixoverlap_results into a single result
- removed a redundant parameter inspection in _madethecut
- found and offered accomodation to kind of a remote edge case where trainID_column passed as False and testID_column passed as True
- in which case we just revert testID_column to match trainID_column
- found and fixed edge case for Binary dimensionality reduction that was interfering with application of Binary to subset of categoric columns as specified by passing parameter as list
- added entry to postprocess_dict of Binary_orig
- assembled some new documentation detailing contents of postprocess_dict (keeping this as internal for now)
- also 6.23 had struck convention for applying a PCA heuristic
- which now realize was adequately covered in documentation
- as this heuristic is only applied when PCAn_components passed as None
- so went ahead and reintroduced option for heuristic to the code base

Page 31 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.