Overview
This is the first minor release of Pygestor, marking the official launch of a fully functional version. This version introduces core features that allow AI researchers to seamlessly acquire, organize, and manage datasets.
Key Features
- Dataset Acquisition:
- Support for downloading and loading datasets from Hugging Face with a simple one-line command.
- Automatic handling of subsets and partitions for efficient data storage and access.
- Data Organization:
- Three-level data organization structure: dataset, subset, and partition.
- Support for both local and network file systems for data storage.
- Efficient handling of large files by allowing batched loading.
- Graphic User Interface
- Introduced a Web-GUI for intuitive data management and analysis.
- Support for viewing schema, metadata and data samples.
- Ability to download and remove one subset or multiple partitions in one go.
- Support for data searching and sorting.
- Ability to generate code snippets for quick access to datasets.
Known Limitations
- Object Storage: Currently, S3 and other object storage solutions are not supported but are planned for future releases.
- Currently supports only two datasets, but more can be added swiftly with minimal effort.
Getting Started
Refer to the [Quick Start Guide](README.md) to set up and start using Pygestor.
Future Plans
- Object Storage Support: Integration with S3 and similar storage services.
- Enhanced API Features: Additional functionality for more granular data management.
- More Dataset Support: Support more datasets and modalities
Feedback
Your feedback and suggestions can help us improve future versions. Please report any issues or feature requests via GitHub Issues.
**Full Changelog**: https://github.com/rlsn/Pygestor/commits/v0.1.1