Automunge

Latest version: v8.33

Safety actively analyzes 723976 Python packages for vulnerabilities to keep your Python projects secure.

Page 29 of 99

6.46

- reverted to convention that label sets with two unique values in train set given a lbbn root category instead of lbor under automation
- reverted to convention that traindata option is specific to postmunge in dual/singleprocess convention
- (in hindsight having to align for validation data kind of made this muddy, much cleaner to keep it a postmunge option, all potential workflows supported with combination of automunge and postmunge)
- lifted requirement for reserved strings in the keys of normalization_dict accessed in custom_train convention

6.45

- a few cleanups to code comments for custom_train_template and custom_test_template
- a small correction to read me on default transform for categoric labels under automation
- (ordinal applied to all categoric sets, even if 2 unique entries)
- added traindata entry to normalization_dict passed to custom_test in postmunge
- in the process did a little rethinking on whole strtategy for traindata option
- decided to introduce a traindata automunge parameter
- similar to traindata parameter in postmunge
- for purposes of distinguishing where df_test will be treated as train or test data
- which is relevent to a handful of transforms in library like noise injection and smoothing
- added traindata support to existing transforms where relevant
- and thus of course added traindata entry to normalization_dict passed to custom_test in automunge
- note that validation data prepared in automunge uses basis of automunge(.) traindata parameter
- note traindata differs from testnoise assignparam option available for noise transforms
- as testnoise turns on for all data in automunge and postmunge
- while traindata allows user to distinguish treatment between test data in automunge and postmunge

6.44

- added post-transform data type conversion to custom_train and custom_test wrappers
- dtype conversion is based on MLinfilltype
- where numeric sets are converted elsewhere based on floatprecision parameter
- boolean integer sets
- ordinal sets are given a conditional dtype based on size of the encoding space (determined by max entry in train set)
- now transformations passed through custom_train convention are followed by a dtype conversion conditional on the assigned MLinfilltype
- where {'numeric', 'concurrent_nmbr'} have datatype conversion performed elsewhere based on floatprecision parameter
- {'binary', 'multirt', '1010', 'concurrent_act', 'boolexclude'} are cast as np.int8 since entries are boolean integers
- ordinal sets {'singlct', 'ordlexclude'} are given a conditional (uint 8/16/32) dtype based on size of encoding space as determined by max activation in train data
- {'integer', 'exclude', 'totalexclude'} have no conversion, assumes any conversion takes place in transformation function if desired
- also new processdict option as dtype_convert which can be passed as boolean, defaults to True when not specified
- when dtype_convert == False, data returned from custom_train for the category are excluded from dtype conversion
- dtype_convert is also inspected to exclude from floatprecision conversions in both the custom_train convention and dual/singleprocess conventions
- where floatprecision refers to the automunge(.) parameter to set default returned float type
- (in general, we use lower bandwidth data types as defaults for floats than pandas because we assume data is returned normalized, I think pandas generally defaults to float64 when not otherwise designated, floatprecision devaults to 32 and can also be set to 16/64. we also try to use smallest possible integer type for integer encodings, either int8 for boolean integers or uint8/16/32 for ordinal encodings based on size of encoding space. passthrough columns via excl leave received data types intact. continous integer sets are based on whatever is applied in the transformation function.)
- small tweak to custom_train convention, now temporary columns logged as tempcolumns can have headers of other data types (like integers)
- settled on convention that integer mlinfilltype defaults to int32 data type unless otherwise applied in transformation function
- renamed the column_dict entry populated in custom_process_wrapper from defaultinfill_dict to custom_process_wrapper_dict for clarity (since now using to store properties for both infill and dtype conversion)
- rewrote function description for _assembletransformdict
- much clearer now

6.43

- further streamlined the custom_train_template published in readme
- basically we previously had two dictionaries, a received dictionary with any passed parameters (params) and a returned dictionary we populated in the transform with normalization parameters (normalization_dict)
- realized we could just combine the two, and treat the received params dictionary as a starting point for normalization_dict that had been prepopulated with any passed parameters
- this saves steps of initializing normalization_dict and also steps for transfering any passed parameters from params to normalization_dict
- yeah a really clean solution
- updated default for inplace_option processdict entry. Now when omitted defaults to True. (In other words only need to specify when inplace_option is False.)
- (this default is better aligned with custom_train conventions)
- Added inplace_option = False specifications in process_dict library for prior omissions to match new convention.
- revisions to the readme documentation for processdict for format, clarity, and to be more comprehensive
- oh and big cleanup to the readme, moved the family tree definitions reference material into a seperate file
- available in github repo as "FamilyTrees.md"
- incorporated documentation for defaultinfill processdict option (which had forgot to include with last rollout)
- found opportunity to simplify the code associated with functionpointer by consolidating some redundancies
- removed labelctgy from functionpointer since it is intended to be specific to a root category's family tree
- renamed the functionpointer support functions for clarity (_grab_functionpointer_entries, _grab_functionpointer_entries_support)

6.42

- further simplified conventions for user defined transformation functions
- eliminated the need for a returned list of column headers from custom_train which is now automatically derived
- only exception is for support columns created but not returned, their headers should be designated by a normalization_dict entry as 'tempcolumns' for purposes of suffix overlap detection
- now user defined custom transformation functions support designation of alternate default infill conventions in processdict entry
- where with 6.41 the transforms applied adjinfill, which will remain the default when not specified
- otherwise to designate alternate default infills to a transformation category can set processdict entry for 'defaultinfill'
- where defaultinfill may be passed as one of strings {'adjinfill', 'meaninfill', 'medianinfill', 'modeinfill', 'lcinfill', 'zeroinfill', 'oneinfill', 'naninfill'}
- note this is only designating the infill performed as a precursor to any applicaiton of ML infill
- or as a precursor to other infill conventions when assigned to a column in assigninfill
- defaultinfill includes functionpointer support
- added one additional infill application for custom transformation functions as adjinfill, but this one following their application instead of preceding
- which is meant to accomodate unforeseen edge cases in user defined transforms
- in the process a few various cleanups to the custom_train support functions _custom_process_wrapper and _custom_postprocess_wrapper
- found and fixed a bug for suffix attachment in _custom_postprocess_wrapper
- finally a slight rework of the processdict functionpointer option
- originally functionpointer was just intended for processing functions and thus functionpointers weren't supposed to be entered when processing functions were already present
- then we added convention that other entries of the pointer target were also copied when not previously specified
- with pointer potentially following chains of pointer targets until reaching a stopping point based on finding processing functions
- realized it made more sense to halt functionpointer when it reaches an entry without pointer as opposed to reaching an entry with processing functions
- so settled on convention that a processdict entry may include both populated processing functions and a functionpointer target
- in other words, processing functions are now on equal footing with other processdict entries in functionpointer chains
- in the process conducted a little sanity check walkthough on functionpointer, everything looks good

6.41

- really important update
- full rework of conventions for defining custom transformation functions
- now requirements for custom transfomation functions are greatly simplified
- populating data structures, default infill, suffix appending, inplace operation, suffix overlap detection, and etc are all conducted externally
- in short, now all user has to do is define a pandas operation wrapped in a function
- where the function recieves as arguments a dataframe, a column, and a set of parameters (as may have been passed in assignparam)
- and returns the resulting transformed dataframes, a list of returned columns associated with the transform, and a dictionary ('normalization_dict') storing any properties derived from the train set needed for test set processing
- where if train set properties aren't needed to process test data the same function can be applied to test data, or otherwise user can define a corresponding test processing function
- where the test processing function recieves as argument a dataframe, column, and the normalization_dict populated from the train set
- and returns the resulting transformed dataframe
- similarly, the conventions for defining custom inversion transforms have been simplified
- where an inversion transform now recieves as arguments a dataframe, a list of the columns returned from the original transformation, the inputcolumn header to be recovered, and the associated normalization_dict
- and returns the transformed dataframe
- full demonstrations provided in the read me under section Custom Transformation Functions
- a few more details, to pass functions with these conventions to a category in processdict they should be passed as entries to 'custom_train', 'custom_test', and 'custom_inversion'
- where if 'custom_test' isn't populated then the custom_train entry will be applied to both train and test data (similar to the singleprocess convention in library)
- note that functionpointer works for these entries too
- to incorporate included updates to support functions _processcousin, _processparent, _postprocesscousin, _postprocessparent, _df_inversion, _grab_processdict_functions_support, _populate_inverse_categorytree _populate_inverse_family, and possibly a few more
- created templates for the custom transformations shared in the read me as custom_train_template, custom_test_template, and custom_inversion_template
- created wrappers for the received custom functions as _custom_process_wrapper, _custom_postprocess_wrapper, and _custom_inverseprocess_wrapper
- in the process a few cleanups to the processfamily functions
- such as consolidating some redundancies in inplace stuff or for postprocessfamily also consolidating some redundancies in columnkey_list stuff
- a cleanup to support column _df_inversion to remove a redundant parameter derivation
- found and fixed bug in _grab_processdict_functions_support for functionpointer for accessing postprocess functions
- improved process flow for function pointer so that it only access dual/single/post process functions if they are not already populated
- lowered printout tier for unspecified labelctgy assignment from False to True
- reverted convention for _getNArows from evaluating a column to evaluating a copy of the column (helps to preserve data types)

Page 29 of 99

Releases

Has known vulnerabilities

Previous Next

Automunge

Page 29 of 99

6.46

6.45

6.44

6.43

6.42

6.41

Page 29 of 99

Links

Releases