Automunge

Latest version: v8.33

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 32 of 99

6.28

- performed a comprehensive audit of edge case printouts
- to ensure validation test results are being recorded in appropriate reports
- identified a few cases where validation checks were missing recorded results
- so added new entries to postprocess_dict['miscparameters_results']
- as trainID_column_valresult, testID_column_valresult, evalcat_valresult, validate_traintest_columnnumbercompare, validate_traintest_columnlabelscompare, validate_redundantcolumnlabels, validate_traintest_columnorder, numbercategoryheuristic_valresult
- and added new entries to postreports_dict['pm_miscparameters_results']
- as testID_column_valresult, validate_traintest_columnlabelscompare, validate_traintest_columnorder, validate_labelscolumn_string
- in the process developed some documentation defining various validation checks
- which am keeping as internal for time being

6.27

- updated postmunge featureeval methods to include support for cases where automunge privacyencode was elected
- updated scikit random forest initializer to include support for parameters ccp_alpha and max_samples (which scikit introduced in 0.22)
- update tweak to pwor transform to circumvent some kind of strange interaction between np.nan serving as key to python dictionary

6.26

- a refinement of the conditional data type conversions for ordinal encodings
- which again convert transformation outputs to one of uint8/uint16/uint32
- based on the size of the encoding space
- found that our logic test for selection was off by a few integers
- now will maximally utilize capacity of data types

6.25

- new set of validations performed for scenarios with model training
- including ML infill, feature selection, and PCA
- now prior to training data is validated to confirm all valid numeric entries
- which is automatic for most transforms in library, but there are a select few that may return non-numeric or NaN values
- such as excl, copy, shfl, strg
- if validation doesn't pass results in printout message
- validation results are returned with other tests in postprocess_dict['miscparameters_results'] for automunge(.)
- and postreports_dict['pm_miscparameters_results'] for postmunge(.)
- removed some automunge(.) initializations of variables that were no longer used (multicolumntransform_dict and LSfitparams_dict)
- a few code comments added, a few tweaks to printouts
- added inplace support for copy transform

6.24

- updated returned data type convention for exc5 and exc8 which are for integer passthrough
- exc5 has mlinfilltype singlct as intended for encoded categoric sets
- exc8 has mlinfilltype integer as intended for continuous integer sets
- previously these were being returned as float data types based on floatprecision designation
- recast to set data type in the transformation, making use of new assignparam parameter integertype
- where integertype defaults to 'singlect' and can also be passed as 'integer'
- where for 'singlct' returned data type is conditional uint based on max in train set which is used as proxy for encoding space, which is the default for exc5
- and for 'integer' returned data type is int32, which is the default for exc8
- updated the logic test for powertransform = 'infill' to distinguish between assigning exc5 and exc8
- previously exc5 was default for integer sets unless for train len unique set > 75% number of rows then cast as exc8
- now new scenario added to exc8 for cases where any negative integers found in train set allowing us to infer set is a continuous integer
- new convention for floatprecision conversion to now only be applied to columns returned from transforms with MLinfilltype in {'numeric', 'concurrent_nmbr'} or columns returned from PCA
- instead of previous convention's use of a type check for float
- which in practice means that now excl passthrough columns will retain their data type from received data instead of defering to floatprecision
- moved the application of floatprecision conversion in automunge and postmunge workflow to take place prior to excl suffix extraction to accomodate the MLinfilltype check
- which as side benefit means don't have to worry about privacy encodings
- finally, an update to methods for populating the column_map report
- which now will include empty set entries for source columns that had all of their derivations consolidated in a PCA or Binary dimensionality reduction

6.23

- struck an unneccesary numpy conversion in PCA application
- revisited a heuristic in place for PCA application associated with identifying cases when didn't have enough samples for the number of features (aka the col_row_ratio)
- since this was previously struck in documentation went ahead and initialized variable so it isn't inspected in ML_cmnd
- but left the code in place in case later decide to reintroduce
- improved a convention used in _postprocess_DPod from accessing an upstream normalization_dict to accessing own (just needed to add an additional entry to DPod normalization_dict)
- conducted audit of returned data types from entire library
- where convention is returned continuous sets have data type not addressed in transformation function but are converted based on the floatprecision parameter externally
- boolean integer sets are cast as np.int8
- and ordinal encoded integer sets are given a conditional data type based on size of encoding space as either uint8, uint16, or uint32
- found a few transforms needing update to dtype convention
- recast bsor from int8 to conditional (since bin count is customizable by parameter)
- added missing data type casting for DPbn (int8) and DPod (conditional)
- added some documentation to the read me associated with returned data types in section on automunge(.) returned sets and postmunge(.) returned sets
- then just a few opportunities to consolidate transformations to reduce lines of code:
- consolidated bnry and bnr2 to use of a single comnmon transformation function by adding parameter for infillconvention
- consolidated MADn and MAD3 to use of a single common transformation function by adding parameter for center
- consolidated transformation functions for shft, shf2, and shf3 into a single common trasnformation function (taking advantage of what is now standard accross library allowing redundant transformations to common input column applied with different parameters)

Page 32 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 32 of 99

6.28

6.27

6.26

6.25

6.24

6.23

Page 32 of 99

Links

Releases