Automunge

Latest version: v8.33

Safety actively analyzes 706259 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 68 of 99

4.12

- slight improvement to data structure population in por2 transform
- to accomodate disparity in data points between train and test sets
- also new aggregation transform aggt
- intended for categorical sets in which the may be redundant entries with different spellings
- relies on user passed parameter 'aggregate' as a list or as a list of lists
- in which are entered sets of string entries to aggregate
- with consolidated value based on final entry in each list
- note that aggt does not return numerically encoded sets
- so default family tree has a downstream ord3 encoding
- note that we already had similar functionality built into the search functions
- still hat tip to Mark Ryan for noting this type of operation in "Deep Learning with Structured Data"
- which sparked the idea of a dedicated version of aggregation outside of search operation

4.11

- rewrite of data structure maintenance function
- to correct potential inconsistency in output
- issue resolved
- this update should also slightly improve efficiency of automunge(.)

4.10

- cleanup of formatting in function populating family trees
- motivated by inclusion of this code in the READ ME
- also corrected populated category key for mnts transform

4.00

- 3.96 had noted intent to narrow focus on search options to srch and src4
- However just realized that src2 had been eluded to in the string theory paper
- So went ahead and added src2 to READ ME assigncat demonstrations
- Also added support for aggregated activations to src2 to be consistent with srch and src4
- The case parameter from srch is a little more tricky for this variant, so recomend if you need case neutrality for src2 just perform an upstream UPCS transform

3.99

- An audit of the string parsing functions identified a few more places where break operations could be applied to for loops
- This won't have huge impact on efficiency, but every little bit helps
- Also updated spl9 and spl3 family trees from ordl to ord3 to be consistent with rest of family

3.98

- Populated a new data structure final_assigncat available in postprocess_dict
- Comparable to assigncat with added results for those column categories that were derived under automation
- This is primarily intended as an informational resource
- Although there are some workflow scenarios where could be beneficial with data set composition permutations
- Also, new methods to mitigate the overhead caused by automunge(.) evaluation functions
- A new heuristic bases evaluations under automation on a subset of randomly sampled rows
- Based on configurable heuristic defaulting to 50% with the new automunge(.) eval_ratio parameter
- Which can be passed as a float 0-1 for a ratio of rows or as integer >1 for number of rows
- evalcat parameter functions now have two new positional arguements for eval_ratio and randomseed
- This heuristic makes automunge(.) run much much faster on large data sets without loss of functionality

Page 68 of 99

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.