Gretel-synthetics

Latest version: v0.22.10

Safety actively analyzes 634750 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 10

0.10.1

🐞 Fix when synthetic Batches are converted back to a DataFrame with a custom field delimiter

0.10.0

Major changes to Gretel Synthetics including native support for DataFrames and batched column training!

⚙️ Introduce a ``batch`` module that allows a DataFrame to be ingested and split into batches of smaller DataFrames where each batch has a subset of the columns of the source DataFrame. This allows training of datasets with several columns while still allowing the preservation of correlations and statistical data. See our [Medium Blog](https://medium.com/gretel-ai) for details and our example ``dataframe_batch`` Notebook located in the ``examples`` directory.

📖 Massive updates to docstrings for the ``config`` module. Details for each config parameter.

🤖 Update to generation functionality. If a validator is provided, the ``gen_lines`` config option will be used only to count _valid_ lines that are generated. In order to stop run away generation, a ``max_invalid`` parameter exists that specifies the maximum number of invalid lines that can be generated. If this number of invalid lines is exceeded, a ``RunTimeError`` will be thrown and generation will be halted.

0.9.3

⬆️ Upgraded to latest SetencePiece and added a `max_line_len` param to the Config options. This allows you to override the default SentencePiece line limit and set a custom one. During our testing, we found that we had to set the limit a few thousand characters higher than the actual line limit. For a line that was 49500 chars long, we had to make the limit about 53000, etc.

0.9.2

🐛 On installation from PIP where setup.py would fail.

📓 Updates to UCI Notebook

0.9.1

This update removes the `annotations` module from being used in order to provide type checks. We also provide Python 3.6 support by using the `[3.6]` extras option. By default, the package will work on Colab since Colab already installs a back port of `dataclasses`. So installing on Colab with the extras is not necessary.

0.9.0

**NOTE**: This release introduces some new constructs that are NOT backwards compatible with older versions.

⚙️ Configuration Changes:
* By default, we will not assume any structure in your training text. Lines will be generated without any presumed delimiter between the text. To use a delimiter you must specify the `field_delimiter` param when constructing your configuration. Our example notebooks have been updated to reflect this.
* Overwrite protection, if there is already a model and tokenizer in your checkpoint directory, you will receive a `RunTimeError` when attempting to train a new model that would overwrite the old data. If you wish to keep overwriting (like during rapid model generation / testing), set the `overwrite` param to `True` in your configuration. Example notebook has been updated to show this param.

👩‍🍳 Cooking up new data
* Previously, we would yield a `dict` when generating a new record. Instead, we will yield a `gen_text` object. This object has the same data, but you access the various components as attrs of the object, for example if you have a `line` variable that was emitted from the generator, you can access the raw text by doing `line.text`
* If you provided a delimiter during configuration. The `gen_text` objects are aware of this, and you can get your generated fields by using the `values_to_list()` method of the object. See our docs for more detail on this object: https://gretel-synthetics.readthedocs.io/en/stable/api/generate.html

👨‍💻 Code cleanup and test updates

Page 9 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.