Gretel-synthetics

Latest version: v0.22.14

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 11

0.7.0

👍 Improvements

* Calculate model perplexity per training epoch (metric for synthetic data set quality)
* Added progress bar for SentencePiece tokenizer (can take a while on large datasets)
* Cleaned up logging

📚 Tutorial and doc improvements

* Automatically save model parameters and training history to model directory
* Specify `save_all_checkpoints` config option to save best, or all checkpoints (save disk space)

0.6.1

👍 Improvements
* Support CRLF newlines in training datasets
* Commas & newlines treated by tokenizer as user defined symbols
* Increased default vocabulary size from 200 to 15000. Increases number of successful record validations in most test sets

📚 Tutorial and doc improvements
* Specify max_lines in configuration file vs. max_chars (more intuitive)

0.6.0

Adding Sentencepiece tokenization (https://github.com/google/sentencepiece) to allow for fixed vocabulary sizes and character / token-based training.

0.5.0

Initial release of Gretel's synthetic data generation project.

Page 11 of 11

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.