Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 38 of 99

5.92

- two new categoric encoding options incorporated into library
- with transformation categories 'smth' and 'fsmh'
- these borrow from the label smoothing options previously available for label sets
- and allow them to be applied to training data categoric encodings in addition to labels
- accepts parameter 'activation' to designate the value for activations
- as float between 0.5-1, defaults to 0.9
- smth applies a one-hot encoding followed by label smoothing operation
- fsmh applies a on-hot encoding followed by a fitted label smoothing operation
- where fitted smoothing refers to fitting the null values to activation frequency in relation to current activation
- more info on label smoothing and fitted smoothing noted in essay "A New Kind of ML"
- (we still recomend the prior label smoothing parameters for target categoric labels in order to distinguish between smoothing as applied to train / test / validation sets)
- inversion supported with full recovery
- also found and fixed a small bug in fitted label smoothing

5.91

- inspired by the success of 5.90, a further simplificaiton to categoric defaults under automation
- now removed a kind of weird singluar scenario for training data sets with 3 unique entries which were treated with one-hot encoding
- and instead treated them to binarization consistent with other categoric sets
- also increased defaults for numbercategoryheuristic from 127 to 255
- (numbercategoryheuristic is the size of unique value counts beyond which sets are treated to hashing instead of binarization under automation)
- 255 unique values returns an 8 column binarized set (1 activation set is reserved for missing data)
- this update does not impact backward compatibility

5.90

- a big simplification to label set encodings under automation
- realized had accumulated too many scenarios, this way much clearer
- now quite simply, numeric data is given pass-through (no normalization), categoric data is given ordinal encoding (alphabetical sorted encodings)
- other label encoding options documented in new section in library of transformations in read me
- also small bug fix in feature selection originating from new convention of single column sets returned as series

5.89

- update to convention for returned sets
- now single column pandas sets are returned as series instead of dataframes
- this decision was based on conventions of some downstream libraries for receiving labels
- kind of like how numpy arrays need to be flattened with ravel
- also small tweak to NArw update from last rollout to reduce memory overhead

5.88

- a housekeeping cleanup to processing function naming conventions
- had included the suffic '\_class' dating back to very earliest experiments
- in hindsight this may have potential to be a point of confusion
- so scrubbed that suffix
- processing function naming now follows convention process_ / postprocess_ / inverseprocess_
- where is the transformation category returned in column_dict
- much cleaner this way
- also found a potential edge case channel for inconsistent processing between train and test associated with NArw aggregation for infill
- originating from NArow assessment overwriting entries for NArowtypes positivenumeric, nonzeronumeric, nonnegativenumeric
- cleaned that up, issue resolved

5.87

- Now when passing a processdict entry to overwrite an internally defined processdict entry, you can pass the functionpointer to point to itself, and then only have to populate the entries you are overwriting.

Page 38 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 38 of 99

5.92

5.91

5.90

5.89

5.88

5.87

Page 38 of 99

Links

Releases