Sdv

Latest version: v1.16.1

Safety actively analyzes 663890 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 11

1.7.0

This release adds an alert to the `CTGANSynthesizer` during preprocessing. The alert informs the user if the fitting of the synthesizer is likely to be slow on their schema. Additionally, it is now possible to enforce that sampled datetime values stay within the range of the fitted data!

This release also makes internal changes to support address data in SDV Enterprise.

New Features

* Add set_address_columns method - Issue [1593](https://github.com/sdv-dev/SDV/issues/1593) by R-Palazzo
* Update_transformers should raise error on address columns - Issue [1594](https://github.com/sdv-dev/SDV/issues/1594) by R-Palazzo
* add_constraints should raise error on address columns - Issue [1595](https://github.com/sdv-dev/SDV/issues/1595) by R-Palazzo
* Print alert if CTGANSynthesizer is likely to be slow - Issue [1658](https://github.com/sdv-dev/SDV/issues/1658) by fealho
* Set enforce_min_max_values to True for datetime transformers - Issue [1676](https://github.com/sdv-dev/SDV/issues/1676) by R-Palazzo

Bugs Fixed

* Unable to visualize metadata (`Error: bad label format` and `CalledProcessError`) - Issue [1625](https://github.com/sdv-dev/SDV/issues/1625) by fealho
* Can't set address columns after fitting - Issue [1661](https://github.com/sdv-dev/SDV/issues/1661) by R-Palazzo

1.6.0

This release improves user messaging in multiple ways. The most notable is that users will now see an alert if the `HMASynthesizer` is likely to be slow for their data's schema. Additionally, the logger messaging for constraints and the error messaging when setting distributions on non-parametric models was made more detailed.

The visualization plots in the `sdv.evaluation` sub-package all got a new parameter called `plot_type`, allowing the users to specify the plot type to use if the one being inferred is not useful. The `sdv.datasets.local.load_csvs` method now has a parameter called `read_csv_parameters`, that allow users to specify how the csvs should be read during loading. The same change was also made to the `sdv.metadata.multi_table.detect_table_from_csv`, `sdv.metadata.multi_table.detect_from_csvs` and `sdv.metadata.single_table.detect_from_csv` methods.

Multiple bugs were resolved including one that caused new categories to be created during the sample step of `CTGANSynthesizer`.

New Features

* Improve debug messages when a constraint falls back to reject sampling approach - Issue [1478](https://github.com/sdv-dev/SDV/issues/1478) by amontanez24
* Constraints should work with timezone-aware datetime columns - Issue [1576](https://github.com/sdv-dev/SDV/issues/1576) by fealho
* Better error message when trying to get distributions from non-parametric models - PR [1633](https://github.com/sdv-dev/SDV/pull/1633) by frances-h
* Add options to read CSV files - Issue [1644](https://github.com/sdv-dev/SDV/issues/1644) by lajohn4747
* Print alert if HMASynthesizer is likely to be slow - Issue [1646](https://github.com/sdv-dev/SDV/issues/1646) by lajohn4747
* Make SDV compatible with SDMetrics 0.12.1 - Issue [1650](https://github.com/sdv-dev/SDV/issues/1650) by pvk-developer
* Make function to estimate number of columns CTGAN produces - Issue [1657](https://github.com/sdv-dev/SDV/issues/1657) by fealho

Bugs Fixed

* In get_available_demos, the num_tables column should be an int - Issue [1420](https://github.com/sdv-dev/SDV/issues/1420) by lajohn4747
* AttributeError when using specific locale strings (es_AR, fr_BE) - Issue [1439](https://github.com/sdv-dev/SDV/issues/1439) by lajohn4747
* Confusing error when passing in an empty dataframe (with constraints) - Issue [1455](https://github.com/sdv-dev/SDV/issues/1455) by lajohn4747
* HMASynthesizer: Better error message for learned distributions (misleading fit error) - Issue [1579](https://github.com/sdv-dev/SDV/issues/1579) by fealho
* Fix tests in SDV after update in RDT 1.7.1 - Issue [1638](https://github.com/sdv-dev/SDV/issues/1638) by lajohn4747
* CTGAN sometimes creates new categories (int data) - Issue [1647](https://github.com/sdv-dev/SDV/issues/1647) by pvk-developer
* CTGAN sometimes creates new categories (object data) - Issue [1648](https://github.com/sdv-dev/SDV/issues/1648) by pvk-developer
* Better error message if I provide an incompatible sdtype/locale combo - Issue [1653](https://github.com/sdv-dev/SDV/issues/1653) by pvk-developer

1.5.0

Several improvements and bug fixes were made in this release. Most notably, the metadata detection was substantially improved. Support for the 'unknown' sdtype was added, providing more flexibility in data representation. The software now attempts to intelligently detect primary keys and identify parent-child relationships in the metadata, streamlining the metadata creation process.

Additionally, issues related to conditional sampling with negative float values, the inability to update transformers for columns created by constraints, and compatibility with numpy version 1.25 and higher were addressed. The default branch was also switched from 'master' to 'main' for better development practices. Various bugs and errors, including those involving HMA and datetime format detection, were also resolved.

New Features

* Improve metadata detection - Issue [1515](https://github.com/sdv-dev/SDV/issues/1515) by R-Palazzo
* Support 'unknown' sdtype - Issue [1516](https://github.com/sdv-dev/SDV/issues/1516) by R-Palazzo
* Detect primary keys in metadata - Issue [1521](https://github.com/sdv-dev/SDV/issues/1521) by frances-h
* Detect relationships in MultiTableMetadata - Issue [1522](https://github.com/sdv-dev/SDV/issues/1522) by frances-h
* Make function to estimate number of columns HMA produces. - Issue [1572](https://github.com/sdv-dev/SDV/issues/1572) by fealho
* Add wrapper for get_cardinalty_plot - Issue [1573](https://github.com/sdv-dev/SDV/issues/1573) by frances-h
* [Metadata detection] Add a cardinality cap when choosing between categorical vs. numerical - Issue [1584](https://github.com/sdv-dev/SDV/issues/1584) by pvk-developer
* [Metadata Detection] Only make primary/foreign keys sdtype `id` (leave others as `unknown`) - Issue [1598](https://github.com/sdv-dev/SDV/issues/1598) by amontanez24
* Check and supply a more descriptive error when trying to use `'gaussian_kde'` with HMA - Issue [1604](https://github.com/sdv-dev/SDV/issues/1604) by frances-h

Bugs Fixed

* Conditional sampling with negative float values doesn't work - Issue [1161](https://github.com/sdv-dev/SDV/issues/1161) by fealho
* Cannot update transformers for columns that get created by constraints (`KeyError`) - Issue [1454](https://github.com/sdv-dev/SDV/issues/1454) by frances-h
* HMA produces KeyError for a schema with 3+ levels of depth - Issue [1558](https://github.com/sdv-dev/SDV/issues/1558) by fealho
* Columns consisting of only Nones are being detected as datetime - Issue [1589](https://github.com/sdv-dev/SDV/issues/1589) by pvk-developer
* HMASynthesizer throws an error when sampling multi table models with three levels of depths - Issue [1600](https://github.com/sdv-dev/SDV/issues/1600) by amontanez24
* `ValueError: Invalid distribution specification` when setting numerical_distributions on child table (HMA) - Issue [1605](https://github.com/sdv-dev/SDV/issues/1605) by fealho
* Bug: updating transformers in DataProcessor resets warning filters - Issue [1618](https://github.com/sdv-dev/SDV/issues/1618) by rwedge

Maintenance

* Investigate how to get numpy >1.25 to pass - Issue [1501](https://github.com/sdv-dev/SDV/issues/1501) by rwedge
* Switch default branch from master to main - Issue [1550](https://github.com/sdv-dev/SDV/issues/1550) by amontanez24

1.4.0

This release makes multiple improvements to the metadata. Both the single and multi table metadata classes now have a `validate_data` method. This method runs checks to validate the data against the current specifications in the metadata. The `SingleTableMetadata.visualize` is also improved. The sequence index is now shown in the same section as the sequence key. It also now shows all key and index information (eg. sequence key, primary key, sequence index) in one section.

The `CTGANSynthesizer` has been made more efficient in the following ways:
1. Boolean columns are now being skipped during `preprocess` like categorial columns are.
2. It is possible to apply other transformations to categorical columns and have `CTGAN` skip the one-hot encoding step.

Additional changes include that the columns labeled with the sdtype `id` will now go through the `IDGenerator` transformer by default and constraint transformations that were being overwritten during sampling will now be respected.

New Features

* Add validate_data method to Metadata - Issue [1518](https://github.com/sdv-dev/SDV/issues/1518) by fealho
* Use IDGenerator for ID columns - Issue [1519](https://github.com/sdv-dev/SDV/issues/1519) by frances-h
* Metadata visualization for sequential data: Only create 2 sections - Issue [1543](https://github.com/sdv-dev/SDV/issues/1543) by frances-h

Bugs Fixed

* Inefficient CTGAN modeling when adding categorical transformers - Issue [1450](https://github.com/sdv-dev/SDV/issues/1450) by fealho
* CTGANSynthesizer is assigning LabelEncoder to boolean columns (instead of None) - Issue [1530](https://github.com/sdv-dev/SDV/issues/1530) by fealho
* Metadata visualization for sequential data: Missing sequence index - Issue [1542](https://github.com/sdv-dev/SDV/issues/1542) by frances-h
* Constraint outputs are being overwritten in DataProcessor.reverse_transform - Issue [1551](https://github.com/sdv-dev/SDV/issues/1551) by amontanez24

1.3.0

This release adds two new methods to the `MultiTableMetadata`: `detect_from_csvs` and `detect_From_dataframes`. These methods allow you to detect metadata for a whole dataset at once by either loading them from a folder or a dictionary mapping table names to the `pandas.DataFrames`. The `SingleTableMetadata` can now be visualized! Additionally, there is now a `summarized` option in the `show_table_details` parameter of the `visualize` methods. This will print each sdtype in the table and the number of columns that have that sdtype.

Additionally, this release patches a bug that prevented custom constraints from working on columns that were primary or alternate keys. It also adds support for Python 3.11!

New Features

* Align default transformers between SDV and RDT - Issue [1484](https://github.com/sdv-dev/SDV/issues/1484) by R-Palazzo
* Add visualize method to SingleTableMetadata - Issue [1517](https://github.com/sdv-dev/SDV/issues/1517) by pvk-developer
* Add detect_from_csvs and detect_from_dataframes methods to MultiTableMetadata - Issue [1520](https://github.com/sdv-dev/SDV/issues/1520) by R-Palazzo
* Allow empty tables to be fitted using fit_processed_data - Issue [1524](https://github.com/sdv-dev/SDV/issues/1524) by fealho
* Summarized metadata visualization - Issue [1525](https://github.com/sdv-dev/SDV/issues/1525) by pvk-developer

Bugs Fixed

* Cannot use custom constraint transforms for certain columns (inconsistent ordering in forward vs. reverse) - Issue [1476](https://github.com/sdv-dev/SDV/issues/1476) by fealho
* Cannot create custom constraint with primary key - Issue [1514](https://github.com/sdv-dev/SDV/issues/1514) by amontanez24

Maintenance

* Add support for Python 3.11 - Issue [1459](https://github.com/sdv-dev/SDV/issues/1459) by fealho

1.2.1

This release fixes a bug that caused the `Inequality` constraint and others to fail if there were None values in a datetime column.

Bugs Fixed

* Inequality fails with None and datetime - Issue [1471](https://github.com/sdv-dev/SDV/issues/1471) by pvk-developer

Maintenance

* Drop support for Python 3.7 - Issue [1487](https://github.com/sdv-dev/SDV/issues/1487) by pvk-developer

Internal

* Make HMA use hierarchical sampling mixin - Issue [1428](https://github.com/sdv-dev/SDV/issues/1428) by frances-h
* Move progress bar out of base multi table synthesizer - Issue [1486](https://github.com/sdv-dev/SDV/issues/1486) by R-Palazzo

Page 3 of 11

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.