Rdt

Latest version: v1.12.1

Safety actively analyzes 634660 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 9

0.6.2

This release adds a new `BayesGMMTransformer`. This transformer can be used to convert a numerical column into two
columns: a discrete column indicating the selected `component` of the GMM for each row, and a continuous column containing
the normalized value of each row based on the `mean` and `std` of the selected `component`. It is useful when the column being transformed
came from multiple distributions.

This release also adds multiple new methods to the `HyperTransformer` API. These allow for users to access the specfic
transformers used on each input field, as well as view the entire tree of transformers that are used when running `transform`.
The exact methods are:

* `BaseTransformer.get_input_columns()` - Return list of input columns for a transformer.
* `BaseTransformer.get_output_columns()` - Return list of output columns for a transformer.
* `HyperTransformer.get_transformer(field)` - Return the transformer instance used for a field.
* `HyperTransformer.get_output_transformers(field)` - Return dictionary mapping output columns of a field to the transformers used on them.
* `HyperTransformer.get_final_output_columns(field)` - Return list of all final output columns related to a field.
* `HyperTransformer.get_transformer_tree_yaml()` - Return YAML representation of transformers tree.

Additionally, this release fixes a bug where the `HyperTransformer` was incorrectly raising a `NotFittedError`. It also improved the
`DatetimeTransformer` by autonomously detecting if a column needs to be converted from `dtype` `object` to `dtype` `datetime`.

New Features

* Cast column to datetime if specified in field transformers - Issue [321](https://github.com/sdv-dev/RDT/issues/321) by amontanez24
* Add a BayesianGMM Transformer - Issue [183](https://github.com/sdv-dev/RDT/issues/183) by fealho
* Add transformer tree structure and traversal methods - Issue [330](https://github.com/sdv-dev/RDT/issues/330) by amontanez24

Bugs fixed

* HyperTransformer raises NotFittedError after fitting - Issue [332](https://github.com/sdv-dev/RDT/issues/332) by amontanez24

0.6.1

This release adds support for Python 3.9! It also removes unused document files.

Internal Improvements

* Add support for Python 3.9 - Issue [323](https://github.com/sdv-dev/RDT/issues/323) by amontanez24
* Remove docs - PR [322](https://github.com/sdv-dev/RDT/pull/322) by pvk-developer

0.6.0

This release makes major changes to the underlying code for RDT as well as the API for both the `HyperTransformer` and `BaseTransformer`.
The changes enable the following functionality:

* The `HyperTransformer` can now apply a sequence of transformers to a column.
* Transformers can now take multiple columns as an input.
* RDT has been expanded to allow for infinite data types to be added instead of being restricted to `pandas.dtypes`.
* Users can define acceptable output types for running `HyperTransformer.transform`.
* The `HyperTransformer` will continuously apply transformations to the input fields until only acceptable data types are in the output.
* Transformers can return data of any data type.
* Transformers now have named outputs and output types.
* Transformers can suggest which transformer to use on any of their outputs.

To take advantage of this functionality, the following API changes were made:

* The `HyperTransformer` has new initialization parameters that allow users to specify data types for any field in their data as well as
specify which transformer to use for a field or data type. The parameters are:
* `field_transformers` - A dictionary allowing users to specify which transformer to use for a field or derived field. Derived fields
are fields created by running `transform` on the input data.
* `field_data_types` - A dictionary allowing users to specify the data type of a field.
* `default_data_type_transformers` - A dictionary allowing users to specify the default transformer to use for a data type.
* `transform_output_types` - A dictionary allowing users to specify which data types are acceptable for the output of `transform`.
This is a result of the fact that transformers can now be applied in a sequence, and not every transformer will return numeric data.
* Methods were also added to the `HyperTransformer` to allow these parameters to be modified. These include `get_field_data_types`,
`update_field_data_types`, `get_default_data_type_transformers`, `update_default_data_type_transformers` and `set_first_transformers_for_fields`.
* The `BaseTransformer` now requires the column names it will transform to be provided to `fit`, `transform` and `reverse_transform`.
* The `BaseTransformer` added the following method to allow for users to see its output fields and output types: `get_output_types`.
* The `BaseTransformer` added the following method to allow for users to see the next suggested transformer for each output field:
`get_next_transformers`.

On top of the changes to the API and the capabilities of RDT, many automated checks and tests were also added to ensure that contributions
to the library abide by the current code style, stay performant and result in data of a high quality. These tests run on every push to the
repository. They can also be run locally via the following functions:

* `validate_transformer_code_style` - Checks that new code follows the code style.
* `validate_transformer_quality` - Tests that new transformers yield data that maintains relationships between columns.
* `validate_transformer_performance` - Tests that new transformers don't take too much time or memory.
* `validate_transformer_unit_tests` - Checks that the unit tests cover all new code, follow naming conventions and pass.
* `validate_transformer_integration` - Checks that the integration tests follow naming conventions and pass.

New Features

* Update HyperTransformer API - Issue [298](https://github.com/sdv-dev/RDT/issues/298) by amontanez24
* Create validate_pull_request function - Issue [254](https://github.com/sdv-dev/RDT/issues/254) by pvk-developer
* Create validate_transformer_unit_tests function - Issue [249](https://github.com/sdv-dev/RDT/issues/249) by pvk-developer
* Create validate_transformer_performance function - Issue [251](https://github.com/sdv-dev/RDT/issues/251) by katxiao
* Create validate_transformer_quality function - Issue [253](https://github.com/sdv-dev/RDT/issues/253) by amontanez24
* Create validate_transformer_code_style function - Issue [248](https://github.com/sdv-dev/RDT/issues/248) by pvk-developer
* Create validate_transformer_integration function - Issue [250](https://github.com/sdv-dev/RDT/issues/250) by katxiao
* Enable users to specify transformers to use in HyperTransformer - Issue [233](https://github.com/sdv-dev/RDT/issues/233) by amontanez24 and csala
* Addons implementation - Issue [225](https://github.com/sdv-dev/RDT/issues/225) by pvk-developer
* Create ways for HyperTransformer to know which transformers to apply to each data type - Issue [232](https://github.com/sdv-dev/RDT/issues/232) by amontanez24 and csala
* Update categorical transformers - PR [231](https://github.com/sdv-dev/RDT/pull/231) by fealho
* Update numerical transformer - PR [227](https://github.com/sdv-dev/RDT/pull/227) by fealho
* Update datetime transformer - PR [230](https://github.com/sdv-dev/RDT/pull/230) by fealho
* Update boolean transformer - PR [228](https://github.com/sdv-dev/RDT/pull/228) by fealho
* Update null transformer - PR [229](https://github.com/sdv-dev/RDT/pull/229) by fealho
* Update the baseclass - PR [224](https://github.com/sdv-dev/RDT/pull/224) by fealho

Bugs fixed

* If the input data has a different index, the reverse transformed data may be out of order - Issue [277](https://github.com/sdv-dev/RDT/issues/277) by amontanez24

Documentation changes

* RDT contributing guide - Issue [301](https://github.com/sdv-dev/RDT/issues/301) by katxiao and amontanez24

Internal improvements

* Add PR template for new transformers - Issue [307](https://github.com/sdv-dev/RDT/issues/307) by katxiao
* Implement Quality Tests for Transformers - Issue [252](https://github.com/sdv-dev/RDT/issues/252) by amontanez24
* Update performance test structure - Issue [257](https://github.com/sdv-dev/RDT/issues/257) by katxiao
* Automated integration test for transformers - Issue [223](https://github.com/sdv-dev/RDT/issues/223) by katxiao
* Move datasets to its own module - Issue [235](https://github.com/sdv-dev/RDT/issues/235) by katxiao
* Fix missing coverage in rdt unit tests - Issue [219](https://github.com/sdv-dev/RDT/issues/219) by fealho
* Add repo-wide automation - Issue [309](https://github.com/sdv-dev/RDT/issues/309) by katxiao

Other issues closed

* DeprecationWarning: np.float is a deprecated alias for the builtin float - Issue [304](https://github.com/sdv-dev/RDT/issues/304) by csala
* Add pip check to CI workflows - Issue [290](https://github.com/sdv-dev/RDT/issues/290) by csala
* Should Transformers subclasses exist for specific configurations? - Issue [243](https://github.com/sdv-dev/RDT/issues/243) by fealho

0.5.3

This release fixes a bug with learning rounding digits in the `NumericalTransformer`,
and includes a few housekeeping improvements.

Issues closed

* Update learn rounding digits to handle all nan data - Issue [244](https://github.com/sdv-dev/RDT/issues/244) by katxiao
* Adapt to latest PyLint housekeeping - Issue [216](https://github.com/sdv-dev/RDT/issues/216) by fealho

0.5.2

This release fixes a couple of bugs introduced by the previous release regarding the
`OneHotEncodingTransformer` and the `BooleanTransformer`.

Issues closed

* BooleanTransformer.reverse_transform sometimes crashes with TypeError - Issue [210](https://github.com/sdv-dev/RDT/issues/210) by katxiao
* OneHotEncodingTransformer causing shape misalignment in CopulaGAN, CTGAN, and TVAE - Issue [208](https://github.com/sdv-dev/RDT/issues/208) by sarahmish
* Boolean.transformer.reverse_transform modifies the input data - Issue [211](https://github.com/sdv-dev/RDT/issues/211) by katxiao

0.5.1

This release improves the overall performance of the library, both in terms of memory and time consumption.
More specifically, it makes the following modules more efficient: `NullTransformer`, `DatetimeTransformer`,
`LabelEncodingTransformer`, `NumericalTransformer`, `CategoricalTransformer`, `BooleanTransformer` and `OneHotEncodingTransformer`.

It also adds performance-based testing and a script for profiling the performance.

Issues closed

* Add performance-based testing - Issue [194](https://github.com/sdv-dev/RDT/issues/194) by amontanez24
* Audit the NullTransformer - Issue [192](https://github.com/sdv-dev/RDT/issues/192) by amontanez24
* Audit DatetimeTransformer - Issue [189](https://github.com/sdv-dev/RDT/issues/189) by sarahmish
* Audit the LabelEncodingTransformer - Issue [184](https://github.com/sdv-dev/RDT/issues/184) by amontanez24
* Audit the NumericalTransformer - Issue [181](https://github.com/sdv-dev/RDT/issues/181) by fealho
* Audit CategoricalTransformer - Issue [180](https://github.com/sdv-dev/RDT/issues/180) by katxiao
* Audit BooleanTransformer - Issue [179](https://github.com/sdv-dev/RDT/issues/179) by katxiao
* Auditing OneHotEncodingTransformer - Issue [178](https://github.com/sdv-dev/RDT/issues/178) by sarahmish
* Create script for profiling - Issue [176](https://github.com/sdv-dev/RDT/issues/176) by amontanez24
* Create folder structure for performance testing - Issue [174](https://github.com/sdv-dev/RDT/issues/174) by amontanez24

Page 5 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.