Automunge

Latest version: v8.33

Safety actively analyzes 681844 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 48 of 99

5.32

- new dupl_rows parameter available for automunge(.) and postmunge(.)
- consolidates duplicate rows in a dataframe
- in other words if duplicate rows present only returns one of duplicates
- can be passed to automunge(.) as one of {True, False, 'test', 'traintest'}
- where True consolidates only rows found in train set
- 'test' consolidates only rows found in test set
- and 'traintest' sperately consolidates rows found within train set and rows found within test set
- defaults to False for not activated
- can be passed to postmunge(.) as one of {True, False}
- where True conolidates rows in test set
- and defaults to False for not activated
- note that ID and label sets if included are consistently consolidated

5.31

- new inversion option intended to support dense labels scenario
- i.e. when labels are served to ML model in multiple configurations for simultaneous predictions
- inversion postmunge parameter can now be passed as 'denselabels'
- which recovers label form derived from each path for comparison
- instead of relying on heuristic for single recovery from shortest transformation path
- each version of label recovery is returned with column header 'A***_B***'
- where A*** is labels column header originally passed to automunge
- and B*** is header of the transformed column that is basis for the inversion recovery
- currently denselabels inversion option only supported for label sets

5.30

- quick small fix to hs10 transform
- fixed suffix appender assembly
- quick small fix to hash transform
- fixed edge case where first character in string is space

5.29

- reorganizing hash transforms
- realized the the various accumulated permutations were starting to overcomplicate
- so consolidated to two master functions, hash and hs10
- with the other permutations available by variations on parameters to those functions
- permutations are as follows:
- hash: parsed words extracted from entries, returned in multiple columns, accepts parameter for excluded_characters and space
- hsh2: no word extraction, just hashing of unique entries, returned in one column
- hs10: comparable to hsh2 but hashings are binary encoded instead of integers, returned in multiple columns
- With each of these having an additional permutation with upstream uppercase conversion:
- hash/hsh2/hs10 -> Uhsh/Uhs2/Uh10
- in each of these cases vocab_size derived based on heuristic noted in 5.28 with parameters heuristic_multiplier and heuristic_cap to configure heuristic (defaulting to 2 and 1024)
- the heuristic derives vocab_size based on number of unique entries found in train set times the multipler
- where if that result is greater than the cap then the heuristic reverts to the cap as vocab_size
- and where for hash the number of unique entries is calculated after extracting words from entries
- in each of these cases can also pass parameter for vocab_size to override heuristic and manually specifiy a specific vocab_size

5.28

- extension of hash transforms rolled out in 5.24-5.26
- new root categories hsh3, Uhs3, hs11, Uh11
- they are similar to hsh2, Uhs2, hs10, Uh10
- but instead of accepting parameter for vocab_size, the vocab_size is determined automatically based on a heuristic
- more specifically, they accept parameters heuristic_multiplier and heuristic_cap
- where heuristic_multiplier defaults to 2 and heuristic_cap defaults to 1024
- the vocab_size is derived based on number of unique entries found in train set times the multipler
- where if that result is greater than the cap then the heuristic reverts to the cap as vocab_size
- since requires passing parameters between train and test sets the implementation is dualprocess instead of singleprocess convention

5.27

- quick fix to hs10 transform rolled out yesterday
- associated with dtype of returned set
- i.e. converting strings to integers
- details matter

Page 48 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.