We are glad to announce the release of Gravitino **0.5.0**. This release is a major milestone for Gravitino and includes over 240 issues that cover several new features, improvements, and bug fixes.
This release introduces several core features like Apache Spark connector support, messaging catalog support, general user and authority management system, event listener system, and Python client support. In the meantime, we have made a lot of improvements and bug fixes to the existing features.
Core Features
1. **Apache Spark connector support**: Gravitino now supports Spark connector. You can use Spark to read and write catalog metadata through Gravitino. https://github.com/datastrato/gravitino/issues/1227, for more, refer to [spark-connector](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/spark-connector/spark-connector.md)
2. **Messaging catalog support**: Gravitino now supports messaging catalogs such as Apache Kafka or Kafka-compatible streaming systems. You can use Messaging Catalog to manage your messaging catalog. https://github.com/datastrato/gravitino/issues/2369, for more, please refer to [kafka-catalog](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/kafka-catalog.md)
3. **General user and access management**: Gravitino now supports general user and access management. https://github.com/datastrato/gravitino/issues/2232. This feature is currently in an alpha phase and is not available for production usage.
4. **Event listener system**: Gravitino now supports an event listener system. You can use it to manage all operation events or use the hook mechanism for your own events, such as operation history auditing, operation monitoring, etc. etc. https://github.com/datastrato/gravitino/issues/2233, refer to [event-listener-configuration](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/gravitino-server-config.md#event-listener-configuration) for more information
5. **Python client support**: Gravitino now supports a Python client. Users can use Python to connect to Gravitino and operate the catalog directly. https://github.com/datastrato/gravitino/issues/2229. Currently, we only support fileset type catalogs with Python clients.
6. **Doris catalog support**: Gravitino now supports Apache Doris catalogs. https://github.com/datastrato/gravitino/issues/1339, for more information, you can refer to [jdbc-doris-catalog](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/jdbc-doris-catalog.md)
7. **Support JDBC backend store**: Gravitino now supports using a JDBC backend store besides RocksDB. If you want to use MySQL or PostgreSQL as the entity store, you can use the JDBC entity store. https://github.com/datastrato/gravitino/issues/1811, for more, refer to [storage-configuration](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/gravitino-server-config.md#storage-configuration)
8. **Support fileset catalog**: Gravitino now supports Fileset catalogs. with it, you can manage non-tabular data on HDFS, S3, or other Hadoop-compatible filesystems. https://github.com/datastrato/gravitino/issues/1241, for more, refer to [fileset catalog](https://github.com/datastrato/gravitino/blob/v0.5.0/docs/manage-fileset-metadata-using-gravitino.md)
9. **Introduce capability framework**: This framework shows the capability of different catalogs such as whether the name is case sensitive, name specification, whether null values are supported or not, and so on. https://github.com/datastrato/gravitino/issues/2952
Gravitino core
- Handling multi-thread issues in Gravitino and introducing tree lock. https://github.com/datastrato/gravitino/issues/407
- Introduces the user system. https://github.com/datastrato/gravitino/issues/2232
- Support multiple kinds of entity types in the same namespace with the same name. https://github.com/datastrato/gravitino/issues/2697
- Improve KV GC collector. https://github.com/datastrato/gravitino/issues/1276, https://github.com/datastrato/gravitino/issues/2888
- Improve client API https://github.com/datastrato/gravitino/issues/1292, https://github.com/datastrato/gravitino/issues/2628, https://github.com/datastrato/gravitino/issues/839, https://github.com/datastrato/gravitino/issues/1635, https://github.com/datastrato/gravitino/issues/1793, https://github.com/datastrato/gravitino/issues/1759, https://github.com/datastrato/gravitino/issues/1758
- Separate Java client. https://github.com/datastrato/gravitino/issues/2478
- Make class loaders of catalog able to be GC. https://github.com/datastrato/gravitino/issues/2706
- Support `UnparedType` to handle an unresolvable type from the catalog. https://github.com/datastrato/gravitino/issues/2117
Catalog related
MySQL & PostgreSQL
- Add PostgreSQL support for array type conversion. https://github.com/datastrato/gravitino/issues/947
- Obtain MySQL table meta information from JDBC metadata. https://github.com/datastrato/gravitino/issues/2934
- Avoid using system tables for MySQL catalogs. https://github.com/datastrato/gravitino/issues/2085
Kafka
- Please see the section `Core Features`
Fileset
- Please see the section `Core Features`
Doris
- Please see the section `Core Features`
Trino connector
- Add datatype test cases for the Trino connector. https://github.com/datastrato/gravitino/issues/2034
- Optimize varchar/char mapping between Gravitino catalogs and the Trino server. https://github.com/datastrato/gravitino/issues/2356
- Support the system table catalog. https://github.com/datastrato/gravitino/issues/2416
- Support update catalog operations in the Trino connector. https://github.com/datastrato/gravitino/issues/2417
- Make the Gravitino Trino connector compatible with Trino 435. https://github.com/datastrato/gravitino/issues/2376
Spark connector
- Please see the section `Core Features`
Build, test, and CI
- Introduce the error-prone plugin to check the code quality. https://github.com/datastrato/gravitino/issues/2225
- Increase the retry interval of the container status check. https://github.com/datastrato/gravitino/issues/2365
- Isolate catalog class path in IT. https://github.com/datastrato/gravitino/pull/2397
- Extend sleep time in the testInternalCache unit test. https://github.com/datastrato/gravitino/issues/2745
- Add web UI support for the fileset catalog. https://github.com/datastrato/gravitino/issues/2883
- Add check mechanism in CI to validate gradle publish. https://github.com/datastrato/gravitino/issues/2655
- More tests added in the Gravitino web E2E test framework. https://github.com/datastrato/gravitino/issues/1503
- Separate output log of test containers. https://github.com/datastrato/gravitino/issues/2839
- Upload process logs of IT container. https://github.com/datastrato/gravitino/issues/2832
- Merging embedded and deploy test mode for frontend integration test. https://github.com/datastrato/gravitino/issues/2798
Web UI
- Verify whether the catalog exists before creating it. https://github.com/datastrato/gravitino/issues/2324
- Add web UI support for the Kafka catalog. https://github.com/datastrato/gravitino/issues/2614
- Add web UI support for the Fileset catalog. https://github.com/datastrato/gravitino/issues/2292
Documents
- Separate metadata operations into different docs. https://github.com/datastrato/gravitino/issues/2750
- Add a document about how to debug the Gravitino Trino connector locally. https://github.com/datastrato/gravitino/pull/2446
Limitation and known issues
- The Doris catalog does not support features like sort order, distribution, and partitioning, and these features are under development.
Credits
bknbkn caican00 charliecheng630 ch3yne coolderli Clearvive danhuawang diqiu50 FANNG1 hiirrxnn ichuniq jerryshao justinmclean lw-yang Lanznx LauraXia123 mchades MohitKambli nk1506 qqqttt123 shaofengshi SteNicholas TEOTEO520 unknowntpo xiacongling xiaozcy xloya xunliu yijhenlin yuqi1129 Yangxuhao123 YxAc zhaoyongjie zhoukangcn zivali