Gretel-synthetics

Latest version: v0.22.19

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Page 11 of 12

0.9.1

This update removes the `annotations` module from being used in order to provide type checks. We also provide Python 3.6 support by using the `[3.6]` extras option. By default, the package will work on Colab since Colab already installs a back port of `dataclasses`. So installing on Colab with the extras is not necessary.

0.9.0

**NOTE**: This release introduces some new constructs that are NOT backwards compatible with older versions.

⚙️ Configuration Changes:
* By default, we will not assume any structure in your training text. Lines will be generated without any presumed delimiter between the text. To use a delimiter you must specify the `field_delimiter` param when constructing your configuration. Our example notebooks have been updated to reflect this.
* Overwrite protection, if there is already a model and tokenizer in your checkpoint directory, you will receive a `RunTimeError` when attempting to train a new model that would overwrite the old data. If you wish to keep overwriting (like during rapid model generation / testing), set the `overwrite` param to `True` in your configuration. Example notebook has been updated to show this param.

👩‍🍳 Cooking up new data
* Previously, we would yield a `dict` when generating a new record. Instead, we will yield a `gen_text` object. This object has the same data, but you access the various components as attrs of the object, for example if you have a `line` variable that was emitted from the generator, you can access the raw text by doing `line.text`
* If you provided a delimiter during configuration. The `gen_text` objects are aware of this, and you can get your generated fields by using the `values_to_list()` method of the object. See our docs for more detail on this object: https://gretel-synthetics.readthedocs.io/en/stable/api/generate.html

👨‍💻 Code cleanup and test updates

0.8.0

📖 Module docs now available at https://gretel-synthetics.readthedocs.io

🚧 Minor updates to internals to support better documentation

0.7.1

📚 Tutorial and doc improvements

* Use installed Tensorflow library by default (Colab uses optimized Tensorflow version for TPU)
* Optionally, install pinned version of Tensorflow with `pip install gretel-synthetics[tf]`

0.7.0

👍 Improvements

* Calculate model perplexity per training epoch (metric for synthetic data set quality)
* Added progress bar for SentencePiece tokenizer (can take a while on large datasets)
* Cleaned up logging

📚 Tutorial and doc improvements

* Automatically save model parameters and training history to model directory
* Specify `save_all_checkpoints` config option to save best, or all checkpoints (save disk space)

0.6.1

👍 Improvements
* Support CRLF newlines in training datasets
* Commas & newlines treated by tokenizer as user defined symbols
* Increased default vocabulary size from 200 to 15000. Increases number of successful record validations in most test sets

📚 Tutorial and doc improvements
* Specify max_lines in configuration file vs. max_chars (more intuitive)

Page 11 of 12

Releases

Has known vulnerabilities

Previous Next

Gretel-synthetics

Page 11 of 12

0.9.1

0.9.0

0.8.0

0.7.1

0.7.0

0.6.1

Page 11 of 12

Links

Releases