I'm thrilled to announce the release of **v1.2.0** of the [groq-qa-generator](https://github.com/jcassady/groq-qa-generator) project! This update brings significant new features and improvements to enhance your QA pair processing, dataset creation, and model fine-tuning workflows. For additional usage details, check out the [**README**](https://github.com/jcassady/groq-qa-generator#readme).
New Features π
Dataset Splitting βοΈ
- **Flexible Dataset Ratios**: You can now split your QA datasets at custom ratios beyond the default **80%** train and **20%** test split. Tailor your dataset splits to suit your specific needs with ease using the new `--split` CLI argument.
Command Line Interface Enhancements π₯οΈ
- **`--split` Argument**: Specify your desired train/test split ratios directly from the CLI.
- **`--upload` Argument**: Seamlessly upload your datasets to Hugging Face with the new `--upload` argument.
Enhanced Output Formatting β¨
- **Formatted QA Pair Display**: The script now outputs QA pairs enclosed in neat ASCII boxes for better readability.
python
2024-10-12 15:20:24 - root - INFO - Question 1:
2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+
2024-10-12 15:20:24 - root - INFO - | Q: What was Alice's initial reaction when she saw the White Rabbit take a watch out of its |
2024-10-12 15:20:24 - root - INFO - | waistcoat-pocket and hurry on? |
2024-10-12 15:20:24 - root - INFO - | ---------------------------------------------------------------------------------------------------- |
2024-10-12 15:20:24 - root - INFO - | A: She was startled and her curiosity was piqued, prompting her to follow the Rabbit. |
2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+
- **ASCII Tables for QA Summary**: At the end of the QA generation, the script now provides an ASCII table summarizing the generated QA pairs, making it easier to review the output.
python
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - | Training QA Pairs |
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | | Question | Answer |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | 1 | What did Alice find on the three-legged | A tiny golden key that might unlock one |
2024-10-12 15:20:25 - root - INFO - | | glass table that gave her hope of | of the doors in the hall. |
2024-10-12 15:20:25 - root - INFO - | | escaping the hall? | |
2024-10-12 15:20:25 - root - INFO - | --- | ---------------------------------------- | ---------------------------------------- |
2024-10-12 15:20:25 - root - INFO - | 2 | What was Alice's cautious approach to | Alice decided to examine the bottle |
2024-10-12 15:20:25 - root - INFO - | | the mysterious bottle with the "DRINK | carefully to ensure it wasn't marked |
2024-10-12 15:20:25 - root - INFO - | | ME" label? | "poison" before tasting its contents. |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
Hugging Face Integration π€
- **Upload to Hugging Face**: Directly upload your processed datasets to [Hugging Face](https://huggingface.co/), making sharing and collaborating on datasets easier than ever before.
python
2024-10-12 15:20:25 - root - INFO - Uploading QA dataset to Hugging Face Hub.
Creating parquet from Arrow format: 100%|ββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 1283.05ba/s]
Uploading the dataset shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.31it/s]
README.md: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 398/398 [00:00<00:00, 2.58MB/s]
2024-10-12 15:20:27 - root - INFO - Dataset uploaded to Hugging Face hub at https://huggingface.co/datasets/jcassady/test-dataset
Wrapping Up π
I'm super excited about the new features in **v1.2.0**! With custom dataset splitting, an even better CLI, better output formatting, and seamless Hugging Face integration, generating and sharing QA datasets just got way cooler.
Don't waitβ[upgrade](https://pypi.org/project/groq-qa-generator/) to the latest version and explore all the new perks. I'd love to hear what you think! Feel free to drop an [issue](https://github.com/jcassady/groq-qa-generator/issues) or swing by with a [pull request](https://github.com/jcassady/groq-qa-generator/pulls) if you've got ideas or want to contribute.
Thanks for being awesome and supporting me on this journey!
Catch you later,
*Jordan*
[jordan.cassady.me](https://jordan.cassady.me)
---
**References:**
- [Peek at the New Features](https://github.com/jcassady/groq-qa-generator/commit/182e26eaf71cc6a8d4097294de7de1d8ce50a9fb)
- [groq-qa-generator on PyPI](https://pypi.org/project/groq-qa-generator/)
- [groq-qa-generator on GitHub](https://github.com/jcassady/groq-qa-generator)
---