- extension of driftreport methods to evaluate distribution property drift between sets
- now in addition to a set of transformation category specific metrics measured for each derived column, a generic set of metrics is also evaluated for each source column, based on the root category that was either assigned or based on an evaluation of set properties (specifically determined by the NArowtytpe entry for the root category's process_dict entry)
- as an example, for a vanillla numeric set, metrics are evaluated for: max / quantile_99 / quantile_90 / quantile_66 / median / quantile_33 / quantile_10 / quantile_01 / min / mean / std / MAD / nan_ratio
- a user can now pass driftreport to a postmunge call as any of {False, True, 'efficient', 'report_effic', 'report_full'}
- False means no postmunge drift assessment is performed
- True means a full assessment is performed for both the source column and derived column stats
- 'efficient' means that a postmunge drift assessment is only performed on the source columns (less information but much more energy efficient)
- 'report_effic' means that the efficient assessment is performed and returned with no processing of data
- 'report_full' means that the full assessment is performed and returned with no processing of data
- the results of a postmunge driftreport assessment are returned in the postreports_dict object returned from a postmunge call, as follows
postreports_dict = \
'featureimportance':{(not shown here)},
'finalcolumns_test':[columns],
'driftreport': {(column) : {'origreturnedcolumns_list':[(columns)],
'newreturnedcolumns_list':[(columns)],
'drift_category':(category),
'orignotinnew': {(column):{'orignormparam':{(stats)}},
'newreturnedcolumn':{(column):{'orignormparam':{(stats)},
'newnormparam':{(stats)}}}},
'sourcecolumn_drift': {'orig_driftstats': {(column) : (stats)},
'new_driftstats' : {(column) : (stats)}}}
- new printouts included in postmunge for source column stats when processing each source column
- also fixed bug in populating driftreport entries for postreports_dict
- also fixed silly bug triggered by passing df_test=False associated with a recent update