Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 37 of 99

5.98

- new family of transforms for categoric encodings
- maxb (ordinal), matx (one-hot), ma10 (binary)
- for scenario where user wishes to put a cap on the number of activations
- such that any of following assignparam parameters may be passed
- maxbincount: set a maximum number of activations (integer)
- minentrycount: set a minimum number of entries in train set to register an activation (integer)
- minentryratio: set a minimum ratio of entries in train set to register an activation (float between 0-1)
- parameters default to False for inactive
- parameters may be passed in combination for redundant specifications if desired
- in each case consolidated entries are grouped in the top activation
- maxb transforms are each performaed downstream of an ord3 (ordinal sorted by frequency)
- and matx performed upstream of a onht (one hot encoding) and ma10 performed upstream of a 1010 (binary encoding)

5.97

- inspired by the new label smoothing parameter from 5.96
- added a new parameter to DP family of noise injection transforms
- DP family by default injects noise to train data but not to test data
- previously noise could optionally be injected to test data in postmunge by the traindata parameter
- now with new 'testnoise' parameter, noise can be injected to test data by default in both automunge and postmunge
- testnoise defaults to False for no noise injected to test data, True to activate

5.96

- added support for passing df_train and df_test as pandas Series instead of DataFrame
- new parameter for smth family of transforms (label smoothing transforms)
- boolean 'testsmooth', defaults to False, when True smoothing is applied to test data in both automunge and postmunge
- also updated family tree for label smoothing root categories lbsm and lbfs
- now when passing parameter through assignparam can pass directly to root category

5.95

- corrected a small typo from 5.94 missed in testing associated with backwards compatibility for deprecated parameters
- (the updated feature selection parameters still inspect the deprecated versions for now, although have stricken reference from documentation)

5.94

- inspired by the reduction in parameters of 5.93
- took a look at other parameters and found another opportunity to consolidate
- for simplicity
- so automunge(.) parameters featureselection / featurepct / featuremetric / featuremethod
- are now replaced and consolidated to featureselection / featurethreshold
- with equivalent functionality
- featureselection defaults to False, accepts {False, True, 'pct', 'metric', 'report'}
- where False turns off feature importance eval,
- True turns on
- 'pct' applies a feature importance dimensionality reduction to retain a % of features
- 'metric' applies a feature importance dimensionality reduction to retain features above a threshold metric
- and 'report' returns a feature importance report with no further processing of data
- featurethreshold only inspected for use with pct and metric
- accepts a float between 0-1
- eg retain 0.95 of columns with pct or eg retain features with metric > 0.03
- so to be clear, automunge(.) parameters featurepct / featuremetric / featuremethod are now deprecated
- replaced consolidated to parameters featureselection / featurethreshold

5.93

- with the new smth family of transforms rolled out in 5.92, found opportunity to decrease the number of parameters for simplicity
- so automunge(.) parameters LabelSmoothing_train / LabelSmoothing_test / LabelSmoothing_val / LSfit are now deprecated
- as are postmunge(.) parameters LabelSmoothing / LSfit
- replaced by the transformation categories smth and fsmh
- (where smth is vanilla label smoothing and fsmh is fitted label smoothing)
- the new convention for smth and fsmh is that smoothing is only applied to training data
- so in automunge(.) valiadation sets and test sets are not smoothed
- and in postmunge(.) smoothing can optionally be applied by activating the traindata parameter
- the only tradeoff is that oversampling no longer supported for smoothed labels on their own, requires supplementing smoothing transform with a categoric like one-hot or ordinal
- this update results in a material simplification of code base surrounding label processing
- much cleaner this way
- also new root categories lbsm and lbfs equivalent to smth and fsmh but without the NArw aggregation intended for label sets

Page 37 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 37 of 99

5.98

5.97

5.96

5.95

5.94

5.93

Page 37 of 99

Links

Releases