Dbldatagen

Latest version: v0.4.0.post1

Safety actively analyzes 701475 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.4.0

Changed
* Updated minimum pyspark version to be 3.2.1, compatible with Databricks runtime 10.4 LTS or later
* Modified data generator to allow specification of constraints to the data generation process
* Updated documentation for generating text data.
* Modified data distribiutions to use abstract base classes
* migrated data distribution tests to use `pytest`
* Additional standard datasets

Added
* Added classes for constraints on the data generation via new package `dbldatagen.constraints`
* Added support for standard data sets via the new package `dbldatagen.datasets`

0.3.6

Changed
* Updated readme to include details on which versions of Databricks runtime support Unity Catalog `shared` access mode.
* Updated code to use default parallelism of 200 when using a shared Spark session
* Updated code to use Spark's SQL function `element_at` instead of array indexing due to incompatibility

Notes
* Ths version marks the changing minimum version of Databricks runtime to 10.4 LTS and later releases.
* While there are no known incompatibilities with Databricks 9.1 LTS, we will not test against this release

0.3.5

Changed
* Added formatting of generated code as Html for script methods
* Allow use of inferred types on `withColumn` method when `expr` attribute is used
* Added ``withStructColumn`` method to allow simplified generation of struct and JSON columns
* Modified pipfile to use newer version of package specifications

0.3.4

Changed
* Modified option to allow for range when specifying `numFeatures` with `structType='array'` to allow generation
of varying number of columns
* When generating multi-column or array valued columns, compute random seed with different name for each column
* Additional build ordering enhancements to reduce circumstances where explicit base column must be specified

Added
* Scripting of data generation code from schema (Experimental)
* Scripting of data generation code from dataframe (Experimental)
* Added top level `random` attribute to data generator specification constructor

0.3.3post2

Changed
* Fixed use of logger in _version.py and in spark_singleton.py
* Fixed template issues
* Document reformatting and updates, related code comment changes

Fixed
* Apply pandas optimizations when generating multiple columns using same `withColumn` or `withColumnSpec`

Added
* Added use of prospector to build process to validate common code issues

0.3.2

Changed
* Adjusted column build phase separation (i.e which select statement is used to build columns) so that a
column with a SQL expression can refer to previously created columns without use of a `baseColumn` attribute
* Changed build labelling to comply with PEP440

Fixed
* Fixed compatibility of build with older versions of runtime that rely on `pyparsing` version 2.4.7

Added
* Parsing of SQL expressions to determine column dependencies

Notes
* The enhancements to build ordering does not change actual order of column building -
but adjusts which phase columns are built in

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.