Automunge

Latest version: v8.33

Safety actively analyzes 724051 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 59 of 99

4.66

- new transforms sp19 and sp20
- which are extensions of sp15 and sp16 for string parsed encodings with concurrent activations
- but with activations aggregated into a binary encoding for reduced column count
- new transforms sbs3 and sbs4
- which are extensions of sbst and sbs2 for string parsed subset encodings
- but with activations aggregated into a binary encoding for reduced column count
- note that ML infill is expected to perform better on concurrent activations configuration
- but there may be scenarios with high dimensionality where binary is of benefit
- corrected 1010 binary inversion for edge case when one of categoric entries had overlap with one of binary encodings
- corrected 1010 binary inversion edge case for distinguishing floats and inters
- added fix for remote edge case scenario in 1010 transform associated with concurrent overlaps between recieved categoric entries, binary encodings, and categoric entry with suffix to address first overlap
- found and fixed a bug or two associated with inversion operation in conjunction with Binary dimensionality reduction
- slight cleanup to sbst and sbs2 parsing algorithms to remove a redundant step
- new automunge parameter privacy_encode
- privacy_encode converts the returned column headers to integers to preserve privacy for downstream applications, e.g. for cases where pandasoutput is True
- conversion dictionaries matching original and encoded column headers are returned in postprocess_dict for reference
- note that inversion is supported with privacy_encode

4.65

- default infill for bkt1 and bk2 (one-hot buckets) updated from mean insertion to no activation
- default infill for bkt3 and bk4 (ordinal buckets) updated from mean insertion to unique activation
- where bkb3 and bkb4 (binary buckets) inherit infill from bkt3 and bkt4
- new string parse transform function sbst, standing for "subset"
- sbst is similar to sp15, but only parses to identify subsets of entries that overlap with a complete unique value string set
- (as opposed to comparing subsets of unique values to subsets of other unique values)
- sbst allows test set entries that weren't present in train set
- also new transform sbs2, comparable to sbst but incorporates assumption that test set entries are subset of train set entries for more efficient application
- note that sbst transforms accept minsplit parameter to set floor to overlap detection length and int_headers parameter for privacy preserving encodings

4.64

- postmunge(.) now accepts numpy arrays without column headers when original data passed to automunge(.) was a dataframe
- inversion now accepts numpy arrays without column headers for both test set and label inversion

4.63

- new numeric binning variants for binary encoding
- as oposed to one-hot or ordinal encoding of bins
- available as root categories for binary bkb3, bkb4, bsbn, bnwb, bnKb, bnMb, bneb, bn7b, bn9b, pwbn
- which are extensions of ordinal variants bkt3, bkt4, bsor, bnwo, bnKo, bnMo, bneo, bn7o, bn9o, pwor
- which are extensions of one-hot variants bkt1, bkt2, bins, bnwd, bnwK, bnwM, bnep, bne7, bne9, pwrs

4.62

- new transform por3, for power of ten bins aggregated into a binary encoding
- such as may be useful with high variability of received data
- also found an edge case scenario for inversion operation in which a downstream encoding interferes with inversion operation on an upstream ordinal encoding by returning floats instead of int
- so added an int conversion operation into inversion transforms associated with ordinal enocodings
- also corrected the MLinfilltype processdict entries for spl5 and sp10 transforms from singlct to exclude
- (sort of an immaterial update just trying to make everything uniform)

4.61

- a few small cleanups:
- updated returned dtype from bsor transform to be integer instead of categoric (to be consistent with the other ordinal encodings)
- replaced transformation category designator 'DP06' with 'DPo6' to follow convention of other transforms from the series
- updated family trees for wkdo and mnto to downstream ordinal encoding by ordl instead of ord3 such as to maintain order of weekday / months from calendar in encoding
- added ID columns to printouts at conclusion of automunge(.) and postmunge(.)

Page 59 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.