- new DP transformation family for differential privacy purposes
- such as may enable noise injection to training data (but not test data)
- includes root categories DPnm for numerical data, DPbn for boolean/binary data, DPod for ordinal encodings, DP10 for binary encodings, and DPoh for one-hot encodings
- where DPnm injects Gaussian noise and accepts parameters for mu and sigma (defaults to 0, 0.06), as may be suitable for application to z-score normalized data
- DPbn injects Bernoulli distributted activation flips to bnry encodings, accepting parameter flip_prob for probability of flip (defaults to 0.03)
- DPod / DPoh / DP10 changes categoric activation per a Bernoulli distribution for whether to change and when changed per an equal probability within the set of activations, accepting parameter flip_prob for probability of flip (defaults to 0.03)
- note that when passing parameters to the DP functions, be sure to use the transformation category in the family trees associated with the transformation function (which in some cases may be different than you'd expect)
- DP operation in postmunge(.) supported by a new "traindata" parameter to distinguish whether the df_test is to be treated as train or test data (defaults to False for test data)
- note that methods make use of numpy.random library, which doesn't accept a random seed for repeatability, so for now DP family is randomized between applications (prob better this way, just making note)
- idea for a DP noise injection transform partly inspired by the differential privacy podcast series from TWiML documented in the From the Diaries of John Henry essay "Differential Privacy"
- added code comment about potential deep copy operation for postprocess_dict in postmunge so as not to edit exterior object
- some cleanup to normalization_dict entries for various ordinal encoding transforms such as to standardize on a few of the activation encoding dictionary formats