Welcome to **StandardizeX**, the ultimate Python package designed to simplify the data standardization process for Delta format Data Products using a config-driven approach.
Effortlessly transform raw data products into consistent, high-quality data products without writing complex code. StandardizeX ensures flexibility, scalability, and maintainability, making your data standardization process smoother and more efficient than ever before. 💪
With StandardizeX, you can:
- Use local paths or cloud storage paths (AWS S3, Azure Blob Storage, etc.)
- Utilize Databricks Unity Catalog references (catalog.schema.table) for seamless integration with Databricks environments. In the future, support for other metastores like Hive, AWS Glue, etc. can be added.
This package currently supports the following capabilities for transforming a raw data product into a standardized one.
- 🗑️ Removing unwanted columns.
- 🔄 Renaming column names.
- 🔧 Changing the data type of selected columns.
- 📝 Column description metadata update.
- 🔄 Data transformations.
- ➕ Addition of new columns derived from existing columns or other standardized data products.