Automunge

Latest version: v8.33

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 22 of 99

6.89

- updated the read me processdict writeup to align with last rollout
- note that we have decided not to make further revisions to the medium version, that version is frozen
- audited automunge parameters to identify cases where internal operations could result in editing exterior objects from dictionary memory sharing, identified a few cases not yet mitigated, now resolved
- surveyed ML_cmnd usage for feature selection, consolidated to use of single version (now using only version returned from automunge call which includes any validation preparations)
- added function description to support function _trainFSmodel
- corrected code comments to align with what is drafted in code, in regards to MLinfillytpe being based on a tree category and not the root category in support function _LabelFrequencyLevelizer
- (specifically the levelizer takes into account the MLinfilltype of the labelctgy entry (which is a tree category) associated with the root catregory)
- the struck comments were added in a recent code review in which I believe I had a small amount of confusion in interpretation between function variables labelscategory (the labelctgy tree category associated with the root category) vs origcategory (the label root category), the code implementation long preceded the addition of the struck comments
- struck scenario for concurrent_ordl MLinfilltype in TrainLabelFreqLevel
- added TrainLabelFreqLevel support for scenario of labels with integer MLinfilltype labelctgy supplemented by multirt
- recent convention is that any added code associated with backward compatibility accomodation includes the phrase "backward compatibility" in the code comment, the intent is that if we were ever to decide on any kind of backward compatibility breaking update we could then consolidate all of these scenarios at once (as used here backward compatibility refers to passing a postproces_dict to postmunge that was populated in a prior version than the one running postmunge)
- a few code comments added here and there for clarity
- struck some redundant MLinfilltype exclusions for boolexclude in support function _apply_am_infill and _apply_pm_infill
- struck concurrent_nmbr MLinfilltype exclusion from modeinfill and lcinfill in support function _apply_am_infill and _apply_pm_infill
- added stochastic_impute support for integer MLinfilltype
- exclude MLinfilltype is now subject to any data type conversion based on floatprecision parameter
- updated qbt1 family of transforms from exclude MLinfilltype to boolexclude to ensure exclusion from PCA dimensionality reduction
- this also results in qbt1 returned sets now being included in Binary dimensionality reduction and included in the boolean set in the returned columntype_report
- updated columntype_report so that MLinfilltype exclude is reported as numeric
- audited MLinfilltype and NArowtype inspections throughout codebase

6.88

- fixed the exc8 family tree to apply exc8 instead of exc5
- which impacts powertransform = 'infill'
- a few clarifications to read me regarding exc6/exc7/exc9
- a survey of family trees for cases where single category in family tree doesn't match root category
- there are tradeoffs, a more common category populated as a tree category might be easier to understand when viewing transformdict without inspecting processdict, decided would be more intuitive for this use to match the tree category to the root category, as would be less hassle to assign parameters in assignparam
- (we had several root categories populated in both conventions, decided it would be better to align to a single common convention, however limiting this update to root categories with a single tree category other than NArw and without offspring, root categories with more elaborate family trees we'll keep the convention of populating the more common tree category)
- updated nbr2/mnm4/101d/ordd/texd/bnrd/nuld/exc7/exc9/lbnm/lbnb/lb10/lbos/lbte/lbbn for this purpose with comparable functionality
- update to lbos so defaultparams is consistent with lbor
- corrected lbbn MLinfilltype to binary
- new NArowtype option as 'binary'
- binary now populated for transforms in the bnry family, such as bnry/bnr2/DPbn/bnrd/lbbn
- and treats entries other than the two most frequent as targets for infill for purposes of NArw aggregation
- identified and mitigated a potential edge case for noise injection transforms associated with index matching
- found and fixed bug in pwor associated with scenario negvalues=False, originating from recent rewrite
- revised our rollout validation tests, there were a few root categories we were previously omitting for esoteric reasons, rollout validations now cover all root categories in library (this was benefited by the new approach for MLinfilltype totalexclude)
- found and fixed bug in exc8 postprocess

6.87

- found and fixed a snafu originating from the autowhere rollout
- associated with populating a column for assignnan missing data injections
- was impacting assignnan['injections'] option, now resolved

6.86

- adjustment to label column treatment under automation in context of alternate powertransform scenarios
- added support for comparable label column treatment in powertransform scenarios {'excl', 'exc2', 'infill', 'infill2'}
- label column treatment for powertransform scenarios {False, True} remains comparable
- found and fixed bug associated with label columns applied to category excl
- ran an audit of the two support functions associated with 1010 conversions for ML infill
- which convert 1010 to onehot and vice versa
- revised support function for 1010 to onehot with comparable functionality to eliminate edge case
- found and fixed snafu in onehot to 1010 associated with derivation of binary form
- we anticipate these revisions will benefit performance of ML infill towards 1010 encodings which is the default categoric encoding under automation

6.85

- found and fixed a snafu originating from 6.63 rewrite of evalcategory associated with the null scenario
- which is associated with received training feature sets of all missing data, which are struck
- it appears the all nan case was incorectly being assigned to the default transform for numeric, now resolved
- added a new powertransform option as 'infill2'
- comparable to powertransform = 'infill' in that it is intended for application towards data that is already numerically encoded and just ML infill is desired without normalizations and etc
- the difference is that with infill2 user is now allowed to include non-numeric sets in the data, which are given an excl pass through transform and excluded from ML infill basis
- one way to think about it is that infill will ensure returned sets are all valid numeric, and infill2 will only ensure received majority numeric sets are returned as all valid numeric.

6.84

- transformation categories of MLinfilltype 'totalexclude' are now excluded from ML infill basis for other features
- by appending onto user specifications recorded in ML_cmnd['full_exclude']
- this serves purpose of allowing returned non-numeric data to automatically be excluded from ML infill basis
- which saves user the hassle of manually specifying exclusions
- 'totalexclude' is MLinfilltype for 'excl' transform (full passthorugh) and is now updated be registered for all other categories that may return non-numeric data
- if a passthrough transformed column is desired to be included in ML infill basis, can instead assign to one of other excl variants such as exc2, exc5, exc8
- clarifications added to read me library of transformations section on excl variants
- new NArowtype option as 'totalexclude', which is comparable to 'exclude' but source columns are also excluded from assignnan global option and nan conversions for missing data
- 'totalexclude' is now assigned as NArowtype for excl and null trasnforms
- updates to the processdict specifications associated with MLinfilltype, NArowtype, and concluding clarifications
- added clarifications of which between MLinfilltypes exclude/boolexclude/ordlexclude/totalexclude are included in ML infill basis
- added desription of new NArowtype totalexclude
- rephrased the concluding clarifications to processdict specifications for improved clarity, we believe this rephrasing makes the distinction between MLinfilltype and NArowtype scope more clear
- updates to feature selection in both automunge and postmunge to accomodate non-numeric sets, using basis exclusion specified under ML_cmnd['full_exclude'] as well as new convention of MLinfilltype 'totalexclude' exclusions
- updates to inspect NArowtype instead of MLinfilltype in _convert_to_nan and _assignnan_convert
- update to the wrapper functions for custom_train so that final infill exclusions are based on MLinfilltype instead of NArowtype
- identified a few cases of redundant sort operations applied in sp19 and sbs3 as well as un-needed search in list, simplified
- moved the validations of all valid numeric entries preceding ML infill model training to be inspected for each trained model instead of once for all models (made sense since the feature exclusions may be tailored to each infill model)

Page 22 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 22 of 99

6.89

6.88

6.87

6.86

6.85

6.84

Page 22 of 99

Links

Releases