Pygestor

Latest version: v0.2.1

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

0.2.1

- Fixed a major bug for downloading partitions.
- Updated docs
**Full Changelog**: https://github.com/rlsn/Pygestor/compare/v0.2.0...v0.2.1

0.2.0

Overview

This release introduces significant enhancements in version control, dataset management, and UI functionality. With the addition of new features and improvements, managing and updating datasets has become more efficient and user-friendly.
Key Updates
- **Version Control**: Introduced the ability to check the status of datasets to ensure that the latest versions are downloaded and up-to-date. Users can now easily identify and update outdated datasets.

- **Dataset Management**: General Pipeline for Adding New Datasets: Added support for ingesting new datasets using predefined general pipelines. For instance, the HuggingFaceParquet pipeline can be used to easily ingest Parquet datasets from Hugging Face. This simplifies the process of adding new datasets with minimal effort.

- **UI Improvements**: Improved UI for easier navigation.

**Full Changelog**: https://github.com/rlsn/Pygestor/compare/v0.1.2...v0.2.0

0.1.2

Overview

This update address minor issues with system configuration. Also made some improvements to GUI layout.

Feedback

Your feedback and suggestions can help us improve future versions. Please report any issues or feature requests via GitHub Issues.

**Full Changelog**: https://github.com/rlsn/Pygestor/compare/v0.1.1...v0.1.2

0.1.1

Overview

This is the first minor release of Pygestor, marking the official launch of a fully functional version. This version introduces core features that allow AI researchers to seamlessly acquire, organize, and manage datasets.
Key Features
- Dataset Acquisition:
- Support for downloading and loading datasets from Hugging Face with a simple one-line command.
- Automatic handling of subsets and partitions for efficient data storage and access.

- Data Organization:
- Three-level data organization structure: dataset, subset, and partition.
- Support for both local and network file systems for data storage.
- Efficient handling of large files by allowing batched loading.

- Graphic User Interface
- Introduced a Web-GUI for intuitive data management and analysis.
- Support for viewing schema, metadata and data samples.
- Ability to download and remove one subset or multiple partitions in one go.
- Support for data searching and sorting.
- Ability to generate code snippets for quick access to datasets.

Known Limitations

- Object Storage: Currently, S3 and other object storage solutions are not supported but are planned for future releases.
- Currently supports only two datasets, but more can be added swiftly with minimal effort.

Getting Started

Refer to the [Quick Start Guide](README.md) to set up and start using Pygestor.
Future Plans
- Object Storage Support: Integration with S3 and similar storage services.
- Enhanced API Features: Additional functionality for more granular data management.
- More Dataset Support: Support more datasets and modalities

Feedback

Your feedback and suggestions can help us improve future versions. Please report any issues or feature requests via GitHub Issues.

**Full Changelog**: https://github.com/rlsn/Pygestor/commits/v0.1.1

Releases

Has known vulnerabilities