Automunge

Latest version: v8.33

Safety actively analyzes 681857 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 76 of 99

3.55

- revised order of operations in processfamily / processparent functions to apply replacement primitive entries before supplement primitive entries (to accomodate an inplace operation performed to a supplement operator such as 3.54 update to excl category)
- processfamily function now applies primitive entries in order of: parents, auntsuncles, siblings, cousins
- processparent function now applies primitive entries in order of: upstream, children, coworkers, niecesnephews, friends
- improved the implementation of validation function to evaluate user passed transformdict automunge(.) parameter
- new transform category 'exc6', comparable to 'excl' (direct pass-through) but relies on a copy operation instead of an inplace operation
- added clarification to read me that 'excl' should only be applied in a user-passed transformdict as an entry to the cousins primitive, and if comparable functionality is desired for any other primitive the 'exc6' transform should be applied in it's place
- note that the transformdict validation function now checks for this eventuality
- updated convention for feature importance dimensionality reduction treatment of NArw entries to basis that NArw columns are treated just like others, including for purposes of feature importance dimensionality reduction
- trying to avoid accumulation of too many edge cases, realized there are scenarios where an NArw derivation from a column may be just as valid an information source for ML training as other columns
- NArw is still excluded from ML infill (not applicatble) and is inelligible as a target label for feature importance evaluation
- fixed bug for postmunge(.) feature importance evaluation corresponding to cases when a feature importance dimensionality reduction was performed in automunge(.)
- slight cleanup tweek for populating feature importance results in FS_sorted report

3.54

- found an issue in feature importance evaluation with sorted results reported in FS_sorted orginating from entry overwrite for cases where multiple columns achieved the same feature importance score
- resolved by collecting FS_sorted bottom tier of results in a list
- (a small amount of complexity in the implementation, nothing couldn't handle)
- added support for NArw entries to be included in feature importance evaluation
- (the convention is that if all other columns from same source column are trimmed in a feature importance dimensionality reduction, the corresponding NArw column will be trimmed independent of score)
- new transform root category NAr5 which produces NArw activations corresponding to non-integer entries
- corrected family tree entries for NAr2/NAr3/NAr4
- remade 'excl' transform, a direct pass-through function, to only rename a column instead of copying the column
- this update is expected to improve the efficiency of operation at scale when there are columns excluded from processing
- (which in some workflow scenarios may be a majority of columns)
- as a result a restriction is placed on 'excl' in that it may only be used as a supplement primitive in a root category's family tree (eg siblings, cousins, niecesnephews, friends)
- (a function checks for cases of excl passed as replacement primitive and if found moves to corresponding supplement primitive, this is all "under the hood")
- new validation function to ensure normalization_dict entries which require unique identifier for purposes of deriving a column key to retrieve parameters don't have overlap from user passed transformation functions (as discussed in 3.53 rollout)
- slight reorg of automunge(.) call default parameter layout in READ ME for improved readability
- slight reorg of assigncat default layout in READ ME for improved readability
- slight reorg of postmunge(.) call default parameter layout in READ ME for improved readability

3.53

- fixed postmunge(.) bug corresponding to application of feature importance dimensionality reduction in automunge(.)
- new NArowtype 'integer' available for assignment in process_dict, for sets where non-integer values are subject to infill
- new transform category 'exc5', a pass-through transform that forces values to numeric and subjects non-integer entries to mode infill as default infill
- added automunge(.) parameter excl_suffix, defaults to False, can also be passed as True, to select whether '_excl' suffix appenders will be included on pass-through column headers
- added support for inclusion of 'excl' pass-through categories in feature importance evaluation
- (feature importance labels still require minimum of exc2 for target regression or exc5 for target classification)
- changed the order of sort for feature importance metric2 results to match basis of interpretation
- performed an audit of postprocess functions for column key retrieval in transform functions with multi-column returned sets
- found potential for error in multi-generation family trees associated with accessing incorrect normalization_dict
- resolved by incorporating two new methods:
- 1) ensure unique normalization_dict entry identifier between categories for cases where entry used in column key retrieval
- 2) new entry in transform functions' returned column_dict of 'inputcolumn' identifying the column header string that was passed to function (this is now checked in some of the column key retrieval methods for transforms with multicolumn returned sets)
- 'inputcolumn' differs from 'origcolumn' in that 'origcolumn' is subsequently updated to basis of original source column, while 'inputcolumn' retains column entry as was passed to the transform function
- replaced an earlier method for column key retrieval in 'text' transform postprocess function to one potentially more efficient

3.52

- found and fixed small bug for source column drift stats assembly associated with 'exclude' MLinfilltype
- added entries {'excl_columns_with_suffix':[], 'excl_columns_without_suffix':[]} to postprocess_dict to help identify and distinguish column header entries for 'excl' transforms, a special case pass-through transform which is returned to user without suffix appender
- (internally the '_excl' suffix is first applied and then removed, such that postprocess_dict['column_dict'] entries for these columns include the suffix and returned column headers don't, so there might be a scenario where these new entries could be useful to a user after completion of an automunge(.) call such as when comparing returned column headers to postprocess_dict['column_dict'] entries

3.51

- rounding out the set of driftreport metrics collected for a few categories of transformations
- dxdt/dxd2 now collect metrics positiveratio / negativeratio / zeroratio / minimum / maximum / mean / std
- bshr / wkdy / hldy now collect metrics for activationratio
- wkds now collects metrics for mon_ratio / tue_ratio / wed_ratio / thr_ratio / fri_ratio / sat_ratio / sun_ratio / infill_ratio
- mnts now collects metrics for infill_ratio / jan_ratio / feb_ratio / mar_ratio / apr_ratio / may_ratio / jun_ratio / jul_ratio / aug_ratio / sep_ratio / oct_ratio / nov_ratio / dec_ratio

3.50

- after rolling out 3.49, pulled up The Zen of Python
- realized that was comitting a cardinal error with the new default category parameters
- "There should be one-- and preferably only one --obvious way to do it."
- thus 3.50 removes the parameters defaultcategoric, defaultnumeric, defaultdatetime that had just rolled out in 3.49
- any updates to default categories are intended to be accomplished using existing methods by passed transformdict
- feeling much more zen about everything now

Page 76 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.