Automunge

Latest version: v8.33

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 35 of 99

6.10

- important update
- found and fixed a single character bug
- in support function _convert_1010_to_onehot
- which I now know has been interfering with ML infill on binary encoded sets
- I expect this may result in an improvement to benchmarking experiments for performance of ML infill to categoric targets
- going to re-run the missing data infill paper benchmarks
- updates to follow after re-training

6.9

- added new classification to columntype_report
- as 'integer' for continuous integer sets
- previously these were grouped into 'continuous'
- figured worth a distinction for clarity
- differs from ordinal which is a categoric representation
- small cleanup had some deprecated parameters in automunge(.) definition for backward compatibility
- associated with feature importance and label smoothing
- went ahead and struck for cleanliness purposes
- a very small cleanup replaced instance or two of searching within list to searching within set
- found a small opportunity for improved clarity of code
- by adding exclude mlinfilltype to the infill functions
- a few cleanups to read me:
- settled on single convention for '=' placement in function call demonstrations
- a few corrections and clarifications to mlinfilltype descriptions
- added a description of or23 to library of transformations writeup
- finally, a big cleanup of a bunch of transformation category parameters
- for parameter adjinfill, which was to change default imputation for standard infill for a transformation category
- this was only included on a hunch and for running a few experiments
- which I think the ml infill validation experiments sufficiently demonsrtated adj not a good default convention
- so yeah, also one of those zen of python things, should be one and only one way to do it

6.8

- new 'integer' mlinfilltype available for assignment in processdict entries
- integer is for transforms that return single column integer sets
- and differs from 'singlct' in that singlct is for ordinal encodings subject to ml infill classificaiton
- while integer is for continuous integer sets subject to ml infill regression
- in ml infill the imputation predictions after regression inference are subject to a rounding
- to conform to the integer form of other entries
- where rounding applies up or down to nearest integer based on 0.5 decimal midpoint
- recast transformation categories lngt and lnlg as integer mlinfilltype
- new exc8 and exc9 for passthrough integer sets with integer mlinfilltype (exc8 includes NArw option)
- update to the powertransform 'infill' option in scenario of integer sets above heuristic of 0.75 ratio of unique entries
- which are now applied to exc8 instead of exc2

6.7

- new 'infill' option for powertransform parameter
- intended for cases where data is already numerically encoded
- and user just wants to apply infill functions
- follows convention of deleting feature sets with no numeric entries in train set
- applying 'exc2' if any floats are present or all integer numeric entries and unique ratio > 0.75
- else applying 'exc5'
- where exc2 is passthrough numeric with infill for nonnumeric entries
- and exc5 is passthrough integer with infill for non integer entries
- also updated exc2 and exc5 family trees to include NArw when NArw_marker parameter activated
- created backup trees exc6 and exc7 for exc2 and exc5 without NArw
- also a few small cleanups to evalcategory support function

6.6

- sort of a housekeeping cleanup
- read somewhere that is good python practice to use underscore in function naming
- where leading underscore indicates functions intended for internal use
- and no leading underscore for external use
- so performed a global update to all support functions to include leading underscore
- only functions without underscore for external use are automunge(.) and postmunge(.)
- also small cleanup to remove unused variable in infill application

6.5

- an update to infill application in automunge and postmunge
- now order of infill application is based on a reverse sorting of columns
- sorted by a count of a column's missing entries found in the train set
- this convention should be beneficial for ML infill
- as columns with most missing will have improved coherence for serving as basis of other columns
- postmunge order of infill consistent with order from automunge
- this update inspired by a similar convention applied in the MissForest R package

Page 35 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 35 of 99

6.10

6.9

6.8

6.7

6.6

6.5

Page 35 of 99

Links

Releases