Groq-qa-generator

Latest version: v1.2.1

Safety actively analyzes 722460 Python packages for vulnerabilities to keep your Python projects secure.

1.2.1

Announcing the release of **groq-qa-generator v1.2.1**, an update focused on enhancing the reliability of the QA generation process. This version addresses key issues and introduces improved logging for better clarity in managing train/test datasets. 🎯

Summary of Changes

🐛 Fixes:
- **Malformed QA Pair Handling**:
- QA pairs missing either a question or an answer are now properly filtered out during dataset creation, ensuring only valid pairs are included in the final output. This fix eliminates unexpected behavior related to malformed entries.

- **Accurate Dataset Saving**:
- Both train and test datasets are now consistently saved in either **JSON** or **plain text** format based on the selected configuration. This ensures train/test splits are correctly handled regardless of output format.

🔧 Improvements:
- **Enhanced Logging**:
- Logging has been improved to provide clear visibility into file paths for the train and test datasets. Users can now easily trace where their datasets are saved, whether in JSON or text formats.

- **Refined Documentation**:
- Docstrings across the codebase have been updated to improve clarity and ease of understanding for developers working with the source code.

🔬 Testing Enhancements:
- Test coverage has been strengthened to ensure malformed QA pairs are filtered out as expected, and the correct number of valid pairs is included in the saved datasets.

---

Update Instructions

To update to the latest version, run:

bash
pip install groq-qa-generator --upgrade

For additional details on usage and features, please refer to the [official repository](https://github.com/jcassady/groq-qa-generator).

Feedback, issue reports, and contributions to the project are appreciated.

1.2.0

I'm thrilled to announce the release of **v1.2.0** of the [groq-qa-generator](https://github.com/jcassady/groq-qa-generator) project! This update brings significant new features and improvements to enhance your QA pair processing, dataset creation, and model fine-tuning workflows. For additional usage details, check out the [**README**](https://github.com/jcassady/groq-qa-generator#readme).

New Features 🎉

Dataset Splitting ✂️

- **Flexible Dataset Ratios**: You can now split your QA datasets at custom ratios beyond the default **80%** train and **20%** test split. Tailor your dataset splits to suit your specific needs with ease using the new `--split` CLI argument.

Command Line Interface Enhancements 🖥️

- **`--split` Argument**: Specify your desired train/test split ratios directly from the CLI.
- **`--upload` Argument**: Seamlessly upload your datasets to Hugging Face with the new `--upload` argument.

Enhanced Output Formatting ✨

- **Formatted QA Pair Display**: The script now outputs QA pairs enclosed in neat ASCII boxes for better readability.
python
2024-10-12 15:20:24 - root - INFO - Question 1:
2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+
2024-10-12 15:20:24 - root - INFO - | Q: What was Alice's initial reaction when she saw the White Rabbit take a watch out of its |
2024-10-12 15:20:24 - root - INFO - | waistcoat-pocket and hurry on? |
2024-10-12 15:20:24 - root - INFO - | ---------------------------------------------------------------------------------------------------- |
2024-10-12 15:20:24 - root - INFO - | A: She was startled and her curiosity was piqued, prompting her to follow the Rabbit. |
2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+

- **ASCII Tables for QA Summary**: At the end of the QA generation, the script now provides an ASCII table summarizing the generated QA pairs, making it easier to review the output.
python
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - | Training QA Pairs |
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | | Question | Answer |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | 1 | What did Alice find on the three-legged | A tiny golden key that might unlock one |
2024-10-12 15:20:25 - root - INFO - | | glass table that gave her hope of | of the doors in the hall. |
2024-10-12 15:20:25 - root - INFO - | | escaping the hall? | |
2024-10-12 15:20:25 - root - INFO - | --- | ---------------------------------------- | ---------------------------------------- |
2024-10-12 15:20:25 - root - INFO - | 2 | What was Alice's cautious approach to | Alice decided to examine the bottle |
2024-10-12 15:20:25 - root - INFO - | | the mysterious bottle with the "DRINK | carefully to ensure it wasn't marked |
2024-10-12 15:20:25 - root - INFO - | | ME" label? | "poison" before tasting its contents. |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+

Hugging Face Integration 🤗

- **Upload to Hugging Face**: Directly upload your processed datasets to [Hugging Face](https://huggingface.co/), making sharing and collaborating on datasets easier than ever before.
python
2024-10-12 15:20:25 - root - INFO - Uploading QA dataset to Hugging Face Hub.
Creating parquet from Arrow format: 100%|██████████████████████████████████████████| 1/1 [00:00<00:00, 1283.05ba/s]
Uploading the dataset shards: 100%|████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.31it/s]
README.md: 100%|█████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 2.58MB/s]
2024-10-12 15:20:27 - root - INFO - Dataset uploaded to Hugging Face hub at https://huggingface.co/datasets/jcassady/test-dataset

Wrapping Up 🎉

I'm super excited about the new features in **v1.2.0**! With custom dataset splitting, an even better CLI, better output formatting, and seamless Hugging Face integration, generating and sharing QA datasets just got way cooler.

Don't wait—[upgrade](https://pypi.org/project/groq-qa-generator/) to the latest version and explore all the new perks. I'd love to hear what you think! Feel free to drop an [issue](https://github.com/jcassady/groq-qa-generator/issues) or swing by with a [pull request](https://github.com/jcassady/groq-qa-generator/pulls) if you've got ideas or want to contribute.

Thanks for being awesome and supporting me on this journey!

Catch you later,

*Jordan*

[jordan.cassady.me](https://jordan.cassady.me)

---

**References:**

- [Peek at the New Features](https://github.com/jcassady/groq-qa-generator/commit/182e26eaf71cc6a8d4097294de7de1d8ce50a9fb)
- [groq-qa-generator on PyPI](https://pypi.org/project/groq-qa-generator/)
- [groq-qa-generator on GitHub](https://github.com/jcassady/groq-qa-generator)

---

1.1.0

🚀 New Features
* **CLI Enhancement**: The CLI now supports the brand-new `--questions` argument! 🎯
You can now specify the exact number of question-answer pairs to generate per chunk of text, offering greater control over output. Whether you're generating questions for demos or fine-tuning, this new feature helps you tailor the output to your needs.
PR by [jcassady](https://github.com/jcassady) in [#3](https://github.com/jcassady/groq-qa-generator/pull/3)

Example usage:
bash
groq-qa --questions 1

🔧 Full Changelog:
* Check out all the details of this release: [Compare v1.0.0...v1.1.0](https://github.com/jcassady/groq-qa-generator/compare/v1.0.0...v1.1.0)

This version now makes it easier to generate precise, customizable question-answer pairs right from the command line. Enjoy the new flexibility! ✨

1.0.1

🔧 Improvements:
- 🧹 Removed redundant logging handler cleanup code from `config.py` for a cleaner setup.
- 🗑️ Removed unnecessary logging dependency from `pyproject.toml` to reduce complexity.
- 🛠️ Moved the `include` option under `[tool.poetry]` in `pyproject.toml` to ensure the necessary files are properly packaged.

1.0.0

Overview
This is the initial public release of **Groq QA Generator**, a Python library for automating the creation of question-answer pairs from text. Designed to streamline the process of fine-tuning large language models (LLMs) such as **LLaMA 3**, this tool is ideal for generating high-quality QA datasets with minimal manual effort. It can be used as a command-line interface (CLI) or directly imported into Python projects.

✨ Features
- **🖥️ CLI and Python Library**: Use `groq-qa` directly from the command line or as a library in your Python projects.
- **🤖 Automated QA Generation**: Automatically generate question-answer pairs from input text using powerful LLMs.
- **📄 Prompt Templates**: Flexible question generation enabled through customizable prompt templates.
- **📊 Model Support**: Integration with advanced models like **LLaMA 3.1 70B** via the Groq API for high-quality output.
- **⚙️ Customizable Configuration**: Configure the generator through a `config.json` file or programmatically for customized QA creation.

🚀 Installation
Install the package via PyPI:
bash
pip install groq-qa-generator

📌 Usage Examples
- **CLI**: Use the `groq-qa` command to generate QA pairs from text files with default or custom settings.
- **Python Library**: Import `groq_qa_generator` in your Python code and call `generate()` to generate QA pairs programmatically.

📝 Notes
This release marks the beginning of Groq QA Generator's journey, providing essential features to help developers streamline the dataset creation process for model fine-tuning. Contributions and feedback are welcome.

Releases

Has known vulnerabilities