Sdv

Latest version: v1.16.1

Safety actively analyzes 663882 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 11

1.16.1

Internal

* [dtypes] `FixedIncrements` Fails with New Numerical Data Types - Issue [2157](https://github.com/sdv-dev/SDV/issues/2157) by R-Palazzo

1.16.0

This release enables the `HMASynthesizer` and other utility functions to work with null foreign key values! It also adds an `anonymization` method to the metadata classes. Additionally, it patches a bug that lets SDV work with more Pandas data types.

New Features

* Add metadata anonymization to public SDV - Issue [2137](https://github.com/sdv-dev/SDV/issues/2137) by R-Palazzo
* Switch drop_missing_values in in drop_unknown_references to support null foreign keys by default - Issue [2076](https://github.com/sdv-dev/SDV/issues/2076) by R-Palazzo
* Support nullable foreign keys in HMA - Issue [2063](https://github.com/sdv-dev/SDV/issues/2063) by rwedge
* Remove input error from base synthesizer class once nullable foreign keys are supported - Issue [2057](https://github.com/sdv-dev/SDV/issues/2057) by rwedge
* Support null foreign keys in get_random_subset - Issue [2056](https://github.com/sdv-dev/SDV/issues/2056) by R-Palazzo
* Warn the user if they are trying to save an unfit synthesizer - Issue [1961](https://github.com/sdv-dev/SDV/issues/1961) by fealho

Bugs Fixed

* Using FixedCombinations constraint with an integer constraint column causes sampling to fail - Issue [2183](https://github.com/sdv-dev/SDV/issues/2183) by R-Palazzo
* Metadata Detection Fails with new Data Type - Issue [2182](https://github.com/sdv-dev/SDV/issues/2182) by R-Palazzo
* Unable visualize just the real data (or just the synthetic data) in a multi-table setting - Issue [2160](https://github.com/sdv-dev/SDV/issues/2160) by R-Palazzo
* [dtypes] Numerical Formatter Fails to Learn Format of New Data Types - Issue [2156](https://github.com/sdv-dev/SDV/issues/2156) by R-Palazzo
* Primary keys may not be unique for variable length regexes - Issue [2116](https://github.com/sdv-dev/SDV/issues/2116) by amontanez24
* Confusing warning when using GANs that suggests that CUDA isn't being used - Issue [2052](https://github.com/sdv-dev/SDV/issues/2052) by fealho
* PAR DiagnosticReport not 1.0 with float categorical columns - Issue [1910](https://github.com/sdv-dev/SDV/issues/1910) by lajohn4747
* In `PARSynthesizer` I cannot pass in datetime context (`InvalidDataError` during fitting) - Issue [1485](https://github.com/sdv-dev/SDV/issues/1485) by lajohn4747

Internal

* Enabling sdv logging causes tests to fail locally - Issue [2162](https://github.com/sdv-dev/SDV/issues/2162) by amontanez24
* Separate primary key detection functionality - Issue [2101](https://github.com/sdv-dev/SDV/issues/2101) by amontanez24

Maintenance

* [dtypes] Update the NumericalFormatter to use the `learn_rounding_digits` from RDT - Issue [2164](https://github.com/sdv-dev/SDV/issues/2164) by R-Palazzo
* Mock every usage of `is_faker_function` to speed up the unit tests - Issue [2163](https://github.com/sdv-dev/SDV/issues/2163) by R-Palazzo
* Review docs-related dev dependencies - Issue [2148](https://github.com/sdv-dev/SDV/issues/2148) by rwedge
* Cap boto and botocore - Issue [2123](https://github.com/sdv-dev/SDV/issues/2123) by lajohn4747

1.15.0

This release adds a new utils function called `get_random_sequence_subset`, that allows users to get a subset of sequential data.

New Features

* Add utils to the Top Level Package. - Issue [2119](https://github.com/sdv-dev/SDV/issues/2119) by pvk-developer
* Add a utility function `get_random_sequence_subset` - Issue [2085](https://github.com/sdv-dev/SDV/issues/2085) by amontanez24

Bugs Fixed

* Context column cannot be a sequence key: Need better error message for this case - Issue [2097](https://github.com/sdv-dev/SDV/issues/2097) by gsheni
* Primary key and sequential key cannot be the same - Issue [2096](https://github.com/sdv-dev/SDV/issues/2096) by lajohn4747
* Error when applying `FixedCombinations` constraint on a child table with multiple parents in `HMASynthesizer` - Issue [2087](https://github.com/sdv-dev/SDV/issues/2087) by pvk-developer
* PARSynthesizer errors during `fit` if sequence_index is numerical sdtype - Issue [2079](https://github.com/sdv-dev/SDV/issues/2079) by lajohn4747
* Cap numpy to less than 2.0.0 until SDV supports - Issue [2075](https://github.com/sdv-dev/SDV/issues/2075) by gsheni
* Rename the `file_name` parameter to `filepath` parameter in ExcelHandler - Issue [2065](https://github.com/sdv-dev/SDV/issues/2065) by lajohn4747
* HMA sampling crashes when unknown sdtype detected for numerical column - Issue [2064](https://github.com/sdv-dev/SDV/issues/2064) by lajohn4747
* HMA Synthesizer's `scale` parameter doesn't work for small values - Issue [2045](https://github.com/sdv-dev/SDV/issues/2045) by lajohn4747
* PAR DiagnosticReport not 1.0 with float categorical columns - Issue [1910](https://github.com/sdv-dev/SDV/issues/1910) by lajohn4747
* If a parent has 0/1 children, HMASynthesizer may create constant data - Issue [1895](https://github.com/sdv-dev/SDV/issues/1895) by gsheni

Internal

* Add timeouts to requests in release notes script - Issue [2067](https://github.com/sdv-dev/SDV/issues/2067) by gsheni
* Investigate HMA case where parent is missing num_rows column - Issue [1703](https://github.com/sdv-dev/SDV/issues/1703) by gsheni

Maintenance

* Release notes should not include PRs - Issue [2074](https://github.com/sdv-dev/SDV/issues/2074) by amontanez24
* Switch to using ruff for Python linting and code formatting - Issue [1803](https://github.com/sdv-dev/SDV/issues/1803) by gsheni

1.14.0

This release provides a number of new features. A big one is that it adds the ability to fit the `HMASynthesizer` on disconnected schemas! It also enables the `PARSynthesizer` to work with constraints in certain conditions. More specifically, the `PARSynthesizer` can now handle constraints as long as the columns involved in the constraints are either exclusively all context columns or exclusively all non-context columns.

Additionally, a `verbose` parameter was added to the `TVAESynthesizer` to get a more detailed progress bar. Also, a bug was corrected that renamed the `file_path` parameter in the `ExcelHandler.read()` method to `filepath` as specified in the official [SDV docs](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/loading-data/excel#read).

Internal

* Add workflow to generate release notes - Issue [2050](https://github.com/sdv-dev/SDV/issues/2050) by amontanez24

Bugs Fixed

* PARSynthesizer: Duplicate sequence index values when `sequence_length` is higher than real data - Issue [2031](https://github.com/sdv-dev/SDV/issues/2031) by lajohn4747
* PARSynthesizer model won't fit if sequence_index is missing - Issue [1972](https://github.com/sdv-dev/SDV/issues/1972) by lajohn4747
* `DataProcessor` never gets assigned a `table_name`. - Issue [1964](https://github.com/sdv-dev/SDV/issues/1964) by fealho

New Features

* Rename `file_path` to `filepath` parameter in ExcelHandler - Issue [2055](https://github.com/sdv-dev/SDV/issues/2055) by amontanez24
* Enable the ability to run multi table synthesizers on disjointed table schemas - Issue [2047](https://github.com/sdv-dev/SDV/issues/2047) by lajohn4747
* Add header to log.csv file - Issue [2046](https://github.com/sdv-dev/SDV/issues/2046) by lajohn4747
* If no filepath is provided, do not create a file during `sample` - Issue [2042](https://github.com/sdv-dev/SDV/issues/2042) by lajohn4747
* Add verbosity to `TVAESynthesizer` - Issue [1990](https://github.com/sdv-dev/SDV/issues/1990) by fealho
* Allow constraints in PARSynthesizer (for all context cols, or all non-context columns) - Issue [1936](https://github.com/sdv-dev/SDV/issues/1936) by lajohn4747
* Improve error message when sampling on a non-CPU device - Issue [1819](https://github.com/sdv-dev/SDV/issues/1819) by fealho
* Better data validation message for `auto_assign_transformers` - Issue [1509](https://github.com/sdv-dev/SDV/issues/1509) by lajohn4747

Miscellaneous

* Do not enforce min/max on sequence index column - Issue [2043](https://github.com/sdv-dev/SDV/pull/2043)
* Include validation check for single table auto_assign_transformers - Issue [2021](https://github.com/sdv-dev/SDV/pull/2021)
* Add the dummy context column to metadata and not to extra_context_column - Issue [2019](https://github.com/sdv-dev/SDV/pull/2019)

1.13.1

This release fixes the `ModuleNotFoundError` error that was causing the 1.13.0 release to fail.

1.13.0

This release adds a utility function called `get_random_subset` that helps users get a subset of their multi-table data so that modeling can be done quicker. Given a dictionary of table names mapped to DataFrames, metadata, a main table and a desired number of rows to use for the main table, it will subsample the data in a way that maintains referential integrity.

This release also adds two new local file handlers: the `CSVHandler` and the `ExcelHandler`. This enables users to easily load from and save synthetic data to these files types. These handlers return data and metadata in the multi-table format, so we also added the function `get_table_metadata` to get a `SingleTableMetadata` object from a `MultiTableMetadata` object.

Finally, this release fixes some bugs that prevented synthesizers from working with data that had numerical column names.

New Features

* Add `get_random_subset` poc utility function - Issue [1877](https://github.com/sdv-dev/SDV/issues/1877) by R-Palazzo
* Add usage logging - Issue [1903](https://github.com/sdv-dev/SDV/issues/1903) by pvk-developer
* Move function `drop_unknown_references` from `poc` to be directly under `utils` - Issue [1947](https://github.com/sdv-dev/SDV/issues/1947) by R-Palazzo
* Add CSVHandler - Issue [1949](https://github.com/sdv-dev/SDV/issues/1949) by pvk-developer
* Add ExcelHandler - Issue [1950](https://github.com/sdv-dev/SDV/issues/1950) by pvk-developer
* Add get_table_metadata function - Issue [1951](https://github.com/sdv-dev/SDV/issues/1951) by R-Palazzo
* Save usage log file as a csv - Issue [1974](https://github.com/sdv-dev/SDV/issues/1974) by frances-h
* Split out metadata creation from data import in the local files handlers - Issue [1975](https://github.com/sdv-dev/SDV/issues/1975) by pvk-developer
* Improve error message when trying to sample before fitting (single table) - Issue [1978](https://github.com/sdv-dev/SDV/issues/1978) by R-Palazzo

Bugs Fixed

* Metadata detection crashes when the column names are integers (`AttributeError: 'int' object has no attribute 'lower'`) - Issue [1933](https://github.com/sdv-dev/SDV/issues/1933) by lajohn4747
* Synthesizers crash when column names are integers (`TypeError: unsupported operand`) - Issue [1935](https://github.com/sdv-dev/SDV/issues/1935) by lajohn4747
* Switch parameter order in drop_unknown_references - Issue [1944](https://github.com/sdv-dev/SDV/issues/1944) by R-Palazzo
* Unexpected NaN values in sequence_index when dataframe isn't reset - Issue [1973](https://github.com/sdv-dev/SDV/issues/1973) by fealho
* Fix pandas DtypeWarning in download_demo - Issue [1980](https://github.com/sdv-dev/SDV/issues/1980) by fealho

Maintenance

* Only run unit and integration tests on oldest and latest python versions for macos - Issue [1948](https://github.com/sdv-dev/SDV/issues/1948) by frances-h

Internal

* Update code to remove `FutureWarning` related to 'enforce_uniqueness' parameter - Issue [1995](https://github.com/sdv-dev/SDV/issues/1995) by pvk-developer

Page 1 of 11

Releases

Has known vulnerabilities

Sdv

Page 1 of 11

1.16.1

1.16.0

1.15.0

1.14.0

1.13.1

1.13.0

Page 1 of 11

Links

Releases