Automunge

Latest version: v8.33

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 67 of 99

4.18

- new inversion option to pass list of columns to inversion parameter for partial recovery
- where list may include headers of source columns and/or returned columns with suffix appenders
- as an example, if inversion only desired for two columns 'col1' and 'col2'
- can pass the inversion parameter as ['col1', 'col2']
- note that columns not inverted are retained in the returned set
- inversion support added for excl family transforms excl/exc2/exc3/exc4/exc5/exc6
- found and fixed bug in inversion operation for labels column with excl transform

4.17

- corrected support of nmEU family transforms for parsing numeric extracts for formats containing space as thousands deliminator
- added inversion support for nmr4/nmc4/nmE4 with full recovery,
- added inversion support for nmr5/nm6/nmr7/nmr8/nmr9/nmc5/nmc6/nmc7/nmc8/nmc9/nmE5/nmE6/nmE7/nmE8/nmE9 with partial recovery
- added source column drift stat assembly for nmrc/nmcm/nmEU families of transforms

4.16

- found and fixed bug in postmunge application of nmc7
- (was not stripping commas comparable to automunge nmc7)
- found and fixed incorrect processdict NArowtype entries for nmc4-nmc9
- rolling out new numeric string parsing family nmEU
- including nmEU/nmE2/nmE3/nmE4/nmE5/nmE6/nmE7/nmE8/nmE9
- similar to nmcm, which strips commas before testing extracts for numeric validity
- nmEU strips spaces and periods, then converts commas to periods
- such as to recognize numbers of international format embedded in strings
- and return a dedicated column of the extracts
- where nmE2/nmE5/nmE8 are folled by z-score normalization
- nmE3/nmE6/nmE9 are followed by min-max scaling
- and where 1-3, 4-6, and 7-9 are distinguished by assumptions of test set composition in relation to the train set
- i.e. 1-3 parse all entries in test set, 4-6 don't parse test set, and 7-9 only parse test set entries not found in train set
- inversion currently supported with full info recovery for nmEU/nmE2/nmE3
- nmEU family supported by new NArowtype parsenumeric_EU

4.15

- revised collection of source column drift stats to only collect the range of unique entries when number of unique entries is below an arbitrary threshold of 500
- this is to ensure don't run into postprocess_dict file size issues with larger data sets, such as for sets with all-unique entries

4.14

- new root categories or21, or22
- comparable to or19, or20 but make use of spl2/spl5 instead of spl9/sp10
- which allows string parsing to handle test set entries not found in the train set
- which is a trade-off vs efficiency
- also new UPCS parameter 'activate'
- can pass as boolean, defaults to True
- when False the UPCS character conversion is not performed, just pass-through
- such as may be useful in context of an or19 call
- note that to assign UPCS parameter with assignparam
- the parameter should be passed to the associated transformation category
- eg UPCS is processdict entry for the or19 category entry which is entry to the or19 root category family tree
- so could pass an UPCS 'activate' parameter to the UPCS function application in or19 root category through assignparam using the or19 transformation category entry as:
- assignparam = {'or19' : {'column1' : {'activate' : False}}}
- (this clarification intended for advanced users to avoid ambiguity)

4.13

- small (immaterial) update to string parsing functions for code clarity

Page 67 of 99

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.