Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 86 of 99

2.94

- new automunge parameter assignparam, for passing column-specific parameters to transformation functions
- thus allowing more tailored processing methods without having to redefine processing functions
- Automunge now a 'universal' function programming language
- updated various functions to support the method
- processing functions now have a new optional parameter which defaults to params={}
- logistics further detailed in READ ME
- 'splt' family of transforms now accept 'minsplit' parameter to customize the minimum character length threshold for overlap detection
- fixed bug for NArowtype's 'parsenumeric', 'parsecommanumeric', and 'exclude'
- fixed bug for driftreport when set includes a category 'null'

2.92

- changed default categorical processing to binary encoding via '1010' instead of one-hot encoding (labels will remain 'text')
- changed default numbercategoryheuristic to 63 (<=63 number unique values a column will be binary encoded via '1010', above ordinal encoded via 'ord3')
- moved some edge cases for '1010' MLinfill from automunge(.) into the support functions
- fixed bug for '1010' MLinfill support functions
- changed default infill for '1010' encoding to all zeros
- simplified the evalcategory function to support customizations with evalcat
- fixed a bug in evalcategory function for ordinal encoding assignment

2.90

- quick bug fix

2.89

- ML infill now available for '1010' binary encoded categorical sets
- Feature importance evaluation now available for '1010' binary encoded categorical label sets
- new MLinfilltype '1010' available for assignment in processdict
- thinking about making '1010' the new default for categorical sets instead of one-hot encoding, first will put some thought into specifics, to be continued...

2.88

- new processing root category family trees: or19 / or20
- comparable to or16 / or18
- but for numeric string parsing make use of nmc8
- which allows comma characters in numbers
- and make use of consistent assumption to spl9/sp10
- that set of unique values in test set is same or subset of train set
- (for more efficient postmunge)

2.87

- new processing root category family trees: or15 / or16 / or17 / or18
- comparable to or11 / or12 / or13 / or14
- but incorporate an UPCS transform upstream of encodings
- for consistent encodings in case of string case discrepencies
- and make use of spl9 / sp10 instead of spl2 / spl5
- for assumption that set of unique values in test set is same or subset of train set
- (for more efficient postmunge)

Page 86 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 86 of 99

2.94

2.92

2.90

2.89

2.88

2.87

Page 86 of 99

Links

Releases