Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 37 of 99

5.98

- new family of transforms for categoric encodings
- maxb (ordinal), matx (one-hot), ma10 (binary)
- for scenario where user wishes to put a cap on the number of activations
- such that any of following assignparam parameters may be passed
- maxbincount: set a maximum number of activations (integer)
- minentrycount: set a minimum number of entries in train set to register an activation (integer)
- minentryratio: set a minimum ratio of entries in train set to register an activation (float between 0-1)
- parameters default to False for inactive
- parameters may be passed in combination for redundant specifications if desired
- in each case consolidated entries are grouped in the top activation
- maxb transforms are each performaed downstream of an ord3 (ordinal sorted by frequency)
- and matx performed upstream of a onht (one hot encoding) and ma10 performed upstream of a 1010 (binary encoding)

5.97

- inspired by the new label smoothing parameter from 5.96
- added a new parameter to DP family of noise injection transforms
- DP family by default injects noise to train data but not to test data
- previously noise could optionally be injected to test data in postmunge by the traindata parameter
- now with new 'testnoise' parameter, noise can be injected to test data by default in both automunge and postmunge
- testnoise defaults to False for no noise injected to test data, True to activate

5.96

- added support for passing df_train and df_test as pandas Series instead of DataFrame
- new parameter for smth family of transforms (label smoothing transforms)
- boolean 'testsmooth', defaults to False, when True smoothing is applied to test data in both automunge and postmunge
- also updated family tree for label smoothing root categories lbsm and lbfs
- now when passing parameter through assignparam can pass directly to root category

5.95

- corrected a small typo from 5.94 missed in testing associated with backwards compatibility for deprecated parameters
- (the updated feature selection parameters still inspect the deprecated versions for now, although have stricken reference from documentation)

5.94

- inspired by the reduction in parameters of 5.93
- took a look at other parameters and found another opportunity to consolidate
- for simplicity
- so automunge(.) parameters featureselection / featurepct / featuremetric / featuremethod
- are now replaced and consolidated to featureselection / featurethreshold
- with equivalent functionality
- featureselection defaults to False, accepts {False, True, 'pct', 'metric', 'report'}
- where False turns off feature importance eval,
- True turns on
- 'pct' applies a feature importance dimensionality reduction to retain a % of features
- 'metric' applies a feature importance dimensionality reduction to retain features above a threshold metric
- and 'report' returns a feature importance report with no further processing of data
- featurethreshold only inspected for use with pct and metric
- accepts a float between 0-1
- eg retain 0.95 of columns with pct or eg retain features with metric > 0.03
- so to be clear, automunge(.) parameters featurepct / featuremetric / featuremethod are now deprecated
- replaced consolidated to parameters featureselection / featurethreshold

5.93

- with the new smth family of transforms rolled out in 5.92, found opportunity to decrease the number of parameters for simplicity
- so automunge(.) parameters LabelSmoothing_train / LabelSmoothing_test / LabelSmoothing_val / LSfit are now deprecated
- as are postmunge(.) parameters LabelSmoothing / LSfit
- replaced by the transformation categories smth and fsmh
- (where smth is vanilla label smoothing and fsmh is fitted label smoothing)
- the new convention for smth and fsmh is that smoothing is only applied to training data
- so in automunge(.) valiadation sets and test sets are not smoothed
- and in postmunge(.) smoothing can optionally be applied by activating the traindata parameter
- the only tradeoff is that oversampling no longer supported for smoothed labels on their own, requires supplementing smoothing transform with a categoric like one-hot or ordinal
- this update results in a material simplification of code base surrounding label processing
- much cleaner this way
- also new root categories lbsm and lbfs equivalent to smth and fsmh but without the NArw aggregation intended for label sets

Page 37 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.