Automunge

Latest version: v8.33

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Page 67 of 99

4.18

- new inversion option to pass list of columns to inversion parameter for partial recovery
- where list may include headers of source columns and/or returned columns with suffix appenders
- as an example, if inversion only desired for two columns 'col1' and 'col2'
- can pass the inversion parameter as ['col1', 'col2']
- note that columns not inverted are retained in the returned set
- inversion support added for excl family transforms excl/exc2/exc3/exc4/exc5/exc6
- found and fixed bug in inversion operation for labels column with excl transform

4.17

- corrected support of nmEU family transforms for parsing numeric extracts for formats containing space as thousands deliminator
- added inversion support for nmr4/nmc4/nmE4 with full recovery,
- added inversion support for nmr5/nm6/nmr7/nmr8/nmr9/nmc5/nmc6/nmc7/nmc8/nmc9/nmE5/nmE6/nmE7/nmE8/nmE9 with partial recovery
- added source column drift stat assembly for nmrc/nmcm/nmEU families of transforms

4.16

- found and fixed bug in postmunge application of nmc7
- (was not stripping commas comparable to automunge nmc7)
- found and fixed incorrect processdict NArowtype entries for nmc4-nmc9
- rolling out new numeric string parsing family nmEU
- including nmEU/nmE2/nmE3/nmE4/nmE5/nmE6/nmE7/nmE8/nmE9
- similar to nmcm, which strips commas before testing extracts for numeric validity
- nmEU strips spaces and periods, then converts commas to periods
- such as to recognize numbers of international format embedded in strings
- and return a dedicated column of the extracts
- where nmE2/nmE5/nmE8 are folled by z-score normalization
- nmE3/nmE6/nmE9 are followed by min-max scaling
- and where 1-3, 4-6, and 7-9 are distinguished by assumptions of test set composition in relation to the train set
- i.e. 1-3 parse all entries in test set, 4-6 don't parse test set, and 7-9 only parse test set entries not found in train set
- inversion currently supported with full info recovery for nmEU/nmE2/nmE3
- nmEU family supported by new NArowtype parsenumeric_EU

4.15

- revised collection of source column drift stats to only collect the range of unique entries when number of unique entries is below an arbitrary threshold of 500
- this is to ensure don't run into postprocess_dict file size issues with larger data sets, such as for sets with all-unique entries

4.14

- new root categories or21, or22
- comparable to or19, or20 but make use of spl2/spl5 instead of spl9/sp10
- which allows string parsing to handle test set entries not found in the train set
- which is a trade-off vs efficiency
- also new UPCS parameter 'activate'
- can pass as boolean, defaults to True
- when False the UPCS character conversion is not performed, just pass-through
- such as may be useful in context of an or19 call
- note that to assign UPCS parameter with assignparam
- the parameter should be passed to the associated transformation category
- eg UPCS is processdict entry for the or19 category entry which is entry to the or19 root category family tree
- so could pass an UPCS 'activate' parameter to the UPCS function application in or19 root category through assignparam using the or19 transformation category entry as:
- assignparam = {'or19' : {'column1' : {'activate' : False}}}
- (this clarification intended for advanced users to avoid ambiguity)

4.13

- small (immaterial) update to string parsing functions for code clarity

Page 67 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 67 of 99

4.18

4.17

4.16

4.15

4.14

4.13

Page 67 of 99

Links

Releases