Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 83 of 99

3.13

- extension of driftreport methods to evaluate distribution property drift between sets
- now in addition to a set of transformation category specific metrics measured for each derived column, a generic set of metrics is also evaluated for each source column, based on the root category that was either assigned or based on an evaluation of set properties (specifically determined by the NArowtytpe entry for the root category's process_dict entry)
- as an example, for a vanillla numeric set, metrics are evaluated for: max / quantile_99 / quantile_90 / quantile_66 / median / quantile_33 / quantile_10 / quantile_01 / min / mean / std / MAD / nan_ratio
- a user can now pass driftreport to a postmunge call as any of {False, True, 'efficient', 'report_effic', 'report_full'}
- False means no postmunge drift assessment is performed
- True means a full assessment is performed for both the source column and derived column stats
- 'efficient' means that a postmunge drift assessment is only performed on the source columns (less information but much more energy efficient)
- 'report_effic' means that the efficient assessment is performed and returned with no processing of data
- 'report_full' means that the full assessment is performed and returned with no processing of data
- the results of a postmunge driftreport assessment are returned in the postreports_dict object returned from a postmunge call, as follows
postreports_dict = \
'featureimportance':{(not shown here)},
'finalcolumns_test':[columns],
'driftreport': {(column) : {'origreturnedcolumns_list':[(columns)],
'newreturnedcolumns_list':[(columns)],
'drift_category':(category),
'orignotinnew': {(column):{'orignormparam':{(stats)}},
'newreturnedcolumn':{(column):{'orignormparam':{(stats)},
'newnormparam':{(stats)}}}},
'sourcecolumn_drift': {'orig_driftstats': {(column) : (stats)},
'new_driftstats' : {(column) : (stats)}}}

- new printouts included in postmunge for source column stats when processing each source column
- also fixed bug in populating driftreport entries for postreports_dict
- also fixed silly bug triggered by passing df_test=False associated with a recent update

3.12

- new postmunge driftreport metrics assmebled for numerical set binned transformation categories.
- including one-hot encoded categories' column activation ratios:
- pwrs / pwr2 / bins / bint / bnwd / bnwK / bnwM / bnep / bne7 / bne9
- and ordinal encoded categories encoding activation ratios:
- pwor/ por2 / bnwo / bnKo / bnMo / bneo / bn7o / bn9o
- (driftreport metrics track distribution property drift between original training data prepared with automunge(.) and inference data prepared with postmunge(.) when postmunge passed with driftreport=True)

3.11

- new processing function family for numerical set graining to equal population bins
- i.e. based on a set's distribution the bin width in high density portion of set will be narrower than low density portion
- bnep/bne7/bne9 for one-hot encoded bins
- bneo/bn7o/bn9o for ordinal encoded bins
- first and last bins are unconstrained (ie -inf / inf)
- bins default to width of 5/7/9 eg for bnep/bne7/bne9
- bin count can also be user specified to distinct columns by passing parameter 'bincount' to assignparam e.g. assignparam = {'bnep' : {'column1' : {'bincount' : 11}}}

3.10

- code efficiency audit identified a few transformation categories with room for improvement
- revised categories: text/bxcx/pwrs/pwor/pwr2/por2/bins/bint/bsor/bnwd/bnwK/bnwM/bnwo/bnKo/bnMo
- every little bit helps

3.9

- new transformation function 'copy' which does simply that
- useful when applying a transformation to the same column more than once with different parameters
- accepts parameter 'suffix' for the suffix appender strng added to the column header

3.7

- new processing functions for numerical set graining to fixed width bins
- bnwd/bnwK/bnwM for one-hot encoded bins (columns without activations in train set excluded in train and test data)
- bnwo/bnKo/bnMo for ordinal encoded bins (integers without train set activations still included in test set)
- first and last bins are unconstrained (ie -inf / inf)
- bins default to width of 1/1000/1000000 eg for bnwd/bnwK/bnwM
- bin width can also be user specified to distinct columns by passing parameter 'width' to assignparam e.g. assignparam = {'bnwM' : {'column1' : {'width' : 200}}}
- found and fixed edge case bug in MLinfill associated with columns with entire set subject to infill

Page 83 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 83 of 99

3.13

3.12

3.11

3.10

3.9

3.7

Page 83 of 99

Links

Releases