Automunge

Latest version: v8.33

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 86 of 99

2.94

- new automunge parameter assignparam, for passing column-specific parameters to transformation functions
- thus allowing more tailored processing methods without having to redefine processing functions
- Automunge now a 'universal' function programming language
- updated various functions to support the method
- processing functions now have a new optional parameter which defaults to params={}
- logistics further detailed in READ ME
- 'splt' family of transforms now accept 'minsplit' parameter to customize the minimum character length threshold for overlap detection
- fixed bug for NArowtype's 'parsenumeric', 'parsecommanumeric', and 'exclude'
- fixed bug for driftreport when set includes a category 'null'

2.92

- changed default categorical processing to binary encoding via '1010' instead of one-hot encoding (labels will remain 'text')
- changed default numbercategoryheuristic to 63 (<=63 number unique values a column will be binary encoded via '1010', above ordinal encoded via 'ord3')
- moved some edge cases for '1010' MLinfill from automunge(.) into the support functions
- fixed bug for '1010' MLinfill support functions
- changed default infill for '1010' encoding to all zeros
- simplified the evalcategory function to support customizations with evalcat
- fixed a bug in evalcategory function for ordinal encoding assignment

2.90

- quick bug fix

2.89

- ML infill now available for '1010' binary encoded categorical sets
- Feature importance evaluation now available for '1010' binary encoded categorical label sets
- new MLinfilltype '1010' available for assignment in processdict
- thinking about making '1010' the new default for categorical sets instead of one-hot encoding, first will put some thought into specifics, to be continued...

2.88

- new processing root category family trees: or19 / or20
- comparable to or16 / or18
- but for numeric string parsing make use of nmc8
- which allows comma characters in numbers
- and make use of consistent assumption to spl9/sp10
- that set of unique values in test set is same or subset of train set
- (for more efficient postmunge)

2.87

- new processing root category family trees: or15 / or16 / or17 / or18
- comparable to or11 / or12 / or13 / or14
- but incorporate an UPCS transform upstream of encodings
- for consistent encodings in case of string case discrepencies
- and make use of spl9 / sp10 instead of spl2 / spl5
- for assumption that set of unique values in test set is same or subset of train set
- (for more efficient postmunge)

Page 86 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.