Apache-gravitino

Latest version: v0.8.0

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.5.1

Hadoop catalog with Cloud Storage
- For S3, please refer to [Hadoop-catalog-with-s3](./hadoop-catalog-with-s3.md) for more details.
- For GCS, please refer to [Hadoop-catalog-with-gcs](./hadoop-catalog-with-gcs.md) for more details.
- For OSS, please refer to [Hadoop-catalog-with-oss](./hadoop-catalog-with-oss.md) for more details.
- For Azure Blob Storage, please refer to [Hadoop-catalog-with-adls](./hadoop-catalog-with-adls.md) for more details.

How to custom your own HCFS file system fileset?

Developers and users can custom their own HCFS file system fileset by implementing the `FileSystemProvider` interface in the jar [gravitino-catalog-hadoop](https://repo1.maven.org/maven2/org/apache/gravitino/catalog-hadoop/). The `FileSystemProvider` interface is defined as follows:

java

// Create a FileSystem instance by the properties you have set when creating the catalog.
FileSystem getFileSystem(Nonnull Path path, Nonnull Map<String, String> config)
throws IOException;

// The schema name of the file system provider. 'file' for Local file system,
// 'hdfs' for HDFS, 's3a' for AWS S3, 'gs' for GCS, 'oss' for Aliyun OSS.
String scheme();

// Name of the file system provider. 'builtin-local' for Local file system, 'builtin-hdfs' for HDFS,
// 's3' for AWS S3, 'gcs' for GCS, 'oss' for Aliyun OSS.
// You need to set catalog properties `filesystem-providers` to support this file system.
String name();


In the meantime, `FileSystemProvider` uses Java SPI to load the custom file system provider. You need to create a file named `org.apache.gravitino.catalog.fs.FileSystemProvider` in the `META-INF/services` directory of the jar file. The content of the file is the full class name of the custom file system provider.
For example, the content of `S3FileSystemProvider` is as follows:
![img.png](assets/fileset/custom-filesystem-provider.png)

After implementing the `FileSystemProvider` interface, you need to put the jar file into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory. Then you can set the `filesystem-providers` property to use your custom file system provider.

Authentication for Hadoop Catalog

The Hadoop catalog supports multi-level authentication to control access, allowing different authentication settings for the catalog, schema, and fileset. The priority of authentication settings is as follows: catalog < schema < fileset. Specifically:

- **Catalog**: The default authentication is `simple`.
- **Schema**: Inherits the authentication setting from the catalog if not explicitly set. For more information about schema settings, please refer to [Schema properties](schema-properties).
- **Fileset**: Inherits the authentication setting from the schema if not explicitly set. For more information about fileset settings, please refer to [Fileset properties](fileset-properties).

The default value of `authentication.impersonation-enable` is false, and the default value for catalogs about this configuration is false, for
schemas and filesets, the default value is inherited from the parent. Value set by the user will override the parent value, and the priority mechanism is the same as authentication.

Catalog operations

Refer to [Catalog operations](./manage-fileset-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

The Hadoop catalog supports creating, updating, deleting, and listing schema.

Schema properties

| Property name | Description | Default value | Required | Since Version |
|---------------------------------------|----------------------------------------------------------------------------------------------------------------|---------------------------|----------|------------------|
| `location` | The storage location managed by Hadoop schema. | (none) | No | 0.5.0 |
| `authentication.impersonation-enable` | Whether to enable impersonation for this schema of the Hadoop catalog. | The parent(catalog) value | No | 0.6.0-incubating |
| `authentication.type` | The type of authentication for this schema of Hadoop catalog , currently we only support `kerberos`, `simple`. | The parent(catalog) value | No | 0.6.0-incubating |
| `authentication.kerberos.principal` | The principal of the Kerberos authentication for this schema. | The parent(catalog) value | No | 0.6.0-incubating |
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication for this schema. | The parent(catalog) value | No | 0.6.0-incubating |
| `credential-providers` | The credential provider types, separated by comma. | (none) | No | 0.8.0-incubating |

Schema operations

Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.mdschema-operations) for more details.

Fileset

Fileset capabilities

- The Hadoop catalog supports creating, updating, deleting, and listing filesets.

Fileset properties

| Property name | Description | Default value | Required | Since Version |
|---------------------------------------|--------------------------------------------------------------------------------------------------------|--------------------------|----------|------------------|
| `authentication.impersonation-enable` | Whether to enable impersonation for the Hadoop catalog fileset. | The parent(schema) value | No | 0.6.0-incubating |
| `authentication.type` | The type of authentication for Hadoop catalog fileset, currently we only support `kerberos`, `simple`. | The parent(schema) value | No | 0.6.0-incubating |
| `authentication.kerberos.principal` | The principal of the Kerberos authentication for the fileset. | The parent(schema) value | No | 0.6.0-incubating |
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication for the fileset. | The parent(schema) value | No | 0.6.0-incubating |
| `credential-providers` | The credential provider types, separated by comma. | (none) | No | 0.8.0-incubating |

Credential providers can be specified in several places, as listed below. Gravitino checks the `credential-providers` setting in the following order of precedence:

1. Fileset properties
2. Schema properties
3. Catalog properties

Fileset operations

Refer to [Fileset operations](./manage-fileset-metadata-using-gravitino.mdfileset-operations) for more details.


---
title: How to sign and verify Gravitino releases
slug: /how-to-sign-releases
license: "This software is licensed under the Apache License version 2."
---

These instructions provide a guide to signing and verifying Apache Gravitino releases to enhance the security of releases. A signed release enables people to confirm the author of the release and guarantees that the code hasn't been altered.

Prerequisites

Before signing or verifying a Gravitino release, ensure you have the following prerequisites installed:

- GPG/GnuPG
- Release artifacts

Platform support

These instructions are for macOS. You may need to make adjustments for other platforms.

1. **How to Install GPG or GnuPG:**

[GnuPG](https://www.gnupg.org) is an open-source implementation of the OpenPGP standard and allows you to encrypt and sign files or emails. GnuPG, also known as GPG, is a command line tool.

Check to see if GPG is installed by running the command:

shell
gpg -help


If GPG/GnuPG isn't installed, run the following command to install it. You only need to do this step once.

shell
brew install gpg


Signing a release

1. **Create a Public/Private Key Pair:**

Check to see if you already have a public/private key pair by running the command:

shell
gpg --list-secret-keys


If you get no output, you'll need to generate a public/private key pair.

Use this command to generate a public/private key pair. This is a one-time process. Setting the key expiry to 5 years and omitting a comment. All other defaults are acceptable.

shell
gpg --full-generate-key


Here is an example of generating a public/private key pair by using the previous command.

shell
gpg (GnuPG) 2.4.3; Copyright (C) 2023 g10 Code GmbH
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
(1) RSA and RSA
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
(9) ECC (sign and encrypt) *default*
(10) ECC (sign only)
(14) Existing key from card
Your selection?
Please select which elliptic curve you want:
(1) Curve 25519 *default*
(4) NIST P-384
(6) Brainpool P-256
Your selection?
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0) 5y
Key expires at Mon 13 Nov 16:08:58 2028 AEDT
Is this correct? (y/N) y

GnuPG needs to construct a user ID to identify your key.

Real name: John Smith
Email address: johnapache.org
Comment:
You selected this USER-ID:
"John Smith <johnapache.org>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: revocation certificate stored as '/Users/justin/.gnupg/openpgp-revocs.d/CC6BD9B0A3A31A7ACFF9E1383DF672F671B7F722.rev'
public and secret key created and signed.

pub ed25519 2023-11-15 [SC] [expires: 2028-11-13]
CC6BD9B0A3A31A7ACFF9E1383DF672F671B7F722
uid John Smith <johnapache.org>
sub cv25519 2023-11-15 [E] [expires: 2028-11-13]


:::caution important
Keep your private key secure and saved somewhere other than just on your computer. Don't forget your key password, and also securely record it somewhere. If you lose your keys or forget your password, you won't be able to sign releases.
:::

2. **Sign a release:**

To sign a release, use the following command for each release file:

shell
gpg --detach-sign --armor <filename>.[zip|tar.gz]


For example, to sign the Gravitino 0.2.0 release you would use this command.

shell
gpg --detach-sign --armor gravitino.0.2.0.zip


This generates an .asc file containing a PGP signature. Anyone can use this file and your public signature to verify the release.

3. **Generate hashes for a release:**

Use the following command to generate hashes for a release:

shell
shasum -a 256 <filename>.[zip|tar.gz] > <filename>.[zip|tar.gz].sha256


For example, to generate a hash for the Gravitino 0.2.0 release you would use this command:

shell
shasum -a 256 gravitino.0.2.0.zip > gravitino.0.2.0.zip.sha256


4. **Copy your public key to the KEYS file:**

The KEYS file contains public keys used to sign previous releases. You only need to do this step once. Execute the following command to copy your public key to a KEY file and then append your KEY to the KEYS file. The KEYS file contains all the public keys used to sign previous releases.

shell
gpg --output KEY --armor --export <youremail>
cat KEY >> KEYS


5. **Publish hashes and signatures:**

Upload the generated .asc and .sha256 files along with the release artifacts and KEYS file to the release area.

Verifying a release

1. **Import public keys:**

Download the KEYS file. Import the public keys used to sign all previous releases with this command. It doesn't matter if you have already imported the keys.

shell
gpg --import KEYS


2. **Verify the signature:**

Download the .asc and release files. Use the following command to verify the signature:

shell
gpg --verify <filename>.[zip|tar.gz].asc


The output should contain the text "Good signature from ...".

For example to verify the Gravitino 0.2.0 zip file you would use this command:

shell
gpg --verify gravitino.0.2.0.zip.asc


3. **Verify the hashes:**

Check if the hashes match, using the following command:

shell
diff -u <filename>.[zip|tar.gz].sha256 <(shasum -a 256 <filename>.[zip|tar.gz])


For example to verify the Gravitino 2.0 zip file you would use this command:

shell
diff -u gravitino.0.2.0.zip.sha256 <(shasum -a 256 gravitino.0.2.0.zip)


This command ensures that the signatures match and that there are no differences between them.


---
title: "OceanBase catalog"
slug: /jdbc-oceanbase-catalog
keywords:
- jdbc
- OceanBase
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from 'theme/Tabs';
import TabItem from 'theme/TabItem';

Introduction

Apache Gravitino provides the ability to manage OceanBase metadata.

:::caution
Gravitino saves some system information in schema and table comment, like `(From Gravitino, DO NOT EDIT: gravitino.v1.uid1078334182909406185)`, please don't change or remove this message.
:::

Catalog

Catalog capabilities

- Gravitino catalog corresponds to the OceanBase instance.
- Supports metadata management of OceanBase (4.x).
- Supports DDL operation for OceanBase databases and tables.
- Supports table index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value) and [auto-increment](./manage-relational-metadata-using-gravitino.mdtable-column-auto-increment).

Catalog properties

You can pass to a OceanBase data source any property that isn't defined by Gravitino by adding `gravitino.bypass.` prefix as a catalog property. For example, catalog property `gravitino.bypass.maxWaitMillis` will pass `maxWaitMillis` to the data source property.

Check the relevant data source configuration in [data source properties](https://commons.apache.org/proper/commons-dbcp/configuration.html)

If you use a JDBC catalog, you must provide `jdbc-url`, `jdbc-driver`, `jdbc-user` and `jdbc-password` to catalog properties.
Besides the [common catalog properties](./gravitino-server-config.mdgravitino-catalog-properties-configuration), the OceanBase catalog has the following properties:

| Configuration item | Description | Default value | Required | Since Version |
|----------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------|
| `jdbc-url` | JDBC URL for connecting to the database. For example, `jdbc:mysql://localhost:2881` or `jdbc:oceanbase://localhost:2881` | (none) | Yes | 0.7.0-incubating |
| `jdbc-driver` | The driver of the JDBC connection. For example, `com.mysql.jdbc.Driver` or `com.mysql.cj.jdbc.Driver` or `com.oceanbase.jdbc.Driver`. | (none) | Yes | 0.7.0-incubating |
| `jdbc-user` | The JDBC user name. | (none) | Yes | 0.7.0-incubating |
| `jdbc-password` | The JDBC password. | (none) | Yes | 0.7.0-incubating |
| `jdbc.pool.min-size` | The minimum number of connections in the pool. `2` by default. | `2` | No | 0.7.0-incubating |
| `jdbc.pool.max-size` | The maximum number of connections in the pool. `10` by default. | `10` | No | 0.7.0-incubating |

:::caution
Before using the OceanBase Catalog, you must download the corresponding JDBC driver to the `catalogs/jdbc-oceanbase/libs` directory.
Gravitino doesn't package the JDBC driver for OceanBase due to licensing issues.
:::

Catalog operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

- Gravitino's schema concept corresponds to the OceanBase database.
- Supports creating schema, but does not support setting comment.
- Supports dropping schema.
- Supports cascade dropping schema.

Schema properties

- Doesn't support any schema property settings.

Schema operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations) for more details.

Table

Table capabilities

- Gravitino's table concept corresponds to the OceanBase table.
- Supports DDL operation for OceanBase tables.
- Supports index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value) and [auto-increment](./manage-relational-metadata-using-gravitino.mdtable-column-auto-increment)..

Table properties

- Doesn't support table properties.

Table column types

| Gravitino Type | OceanBase Type |
|-------------------|---------------------|
| `Byte` | `Tinyint` |
| `Byte(false)` | `Tinyint Unsigned` |
| `Short` | `Smallint` |
| `Short(false)` | `Smallint Unsigned` |
| `Integer` | `Int` |
| `Integer(false)` | `Int Unsigned` |
| `Long` | `Bigint` |
| `Long(false)` | `Bigint Unsigned` |
| `Float` | `Float` |
| `Double` | `Double` |
| `String` | `Text` |
| `Date` | `Date` |
| `Time` | `Time` |
| `Timestamp` | `Timestamp` |
| `Decimal` | `Decimal` |
| `VarChar` | `VarChar` |
| `FixedChar` | `FixedChar` |
| `Binary` | `Binary` |

:::info
OceanBase doesn't support Gravitino `Boolean` `Fixed` `Struct` `List` `Map` `Timestamp_tz` `IntervalDay` `IntervalYear` `Union` `UUID` type.
Meanwhile, the data types other than listed above are mapped to Gravitino **[External Type](./manage-relational-metadata-using-gravitino.mdexternal-type)** that represents an unresolvable data type since 0.6.0-incubating.
:::

Table column auto-increment

:::note
OceanBase setting an auto-increment column requires simultaneously setting a unique index; otherwise, an error will occur.
:::

<Tabs groupId='language' queryString>
<TabItem value="json" label="Json">

json
{
"columns": [
{
"name": "id",
"type": "integer",
"comment": "id column comment",
"nullable": false,
"autoIncrement": true
},
{
"name": "name",
"type": "varchar(500)",
"comment": "name column comment",
"nullable": true,
"autoIncrement": false
}
],
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
}
]
}


</TabItem>
<TabItem value="java" label="Java">

java
Column[] cols = new Column[] {
Column.of("id", Types.IntegerType.get(), "id column comment", false, true, null),
Column.of("name", Types.VarCharType.of(500), "Name of the user", true, false, null)
};
Index[] indexes = new Index[] {
Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}})
}


</TabItem>
</Tabs>


Table indexes

- Supports PRIMARY_KEY and UNIQUE_KEY.

<Tabs groupId='language' queryString>
<TabItem value="json" label="Json">

json
{
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
},
{
"indexType": "unique_key",
"name": "id_name_uk",
"fieldNames": [["id"] ,["name"]]
}
]
}


</TabItem>
<TabItem value="java" label="Java">

java
Index[] indexes = new Index[] {
Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}}),
Indexes.of(IndexType.UNIQUE_KEY, "id_name_uk", new String[][]{{"id"} , {"name"}}),
}


</TabItem>
</Tabs>

Table operations

:::note
The OceanBase catalog does not support creating partitioned tables in the current version.
:::

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.

Alter table operations

Gravitino supports these table alteration operations:

- `RenameTable`
- `UpdateComment`
- `AddColumn`
- `DeleteColumn`
- `RenameColumn`
- `UpdateColumnType`
- `UpdateColumnPosition`
- `UpdateColumnNullability`
- `UpdateColumnComment`
- `UpdateColumnDefaultValue`
- `SetProperty`

:::info
- You cannot submit the `RenameTable` operation at the same time as other operations.
- If you update a nullability column to non-nullability, there may be compatibility issues.
:::


---
title: "MySQL catalog"
slug: /jdbc-mysql-catalog
keywords:
- jdbc
- MySQL
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from 'theme/Tabs';
import TabItem from 'theme/TabItem';

Introduction

Apache Gravitino provides the ability to manage MySQL metadata.

:::caution
Gravitino saves some system information in schema and table comment, like `(From Gravitino, DO NOT EDIT: gravitino.v1.uid1078334182909406185)`, please don't change or remove this message.
:::

Catalog

Catalog capabilities

- Gravitino catalog corresponds to the MySQL instance.
- Supports metadata management of MySQL (5.7, 8.0).
- Supports DDL operation for MySQL databases and tables.
- Supports table index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value) and [auto-increment](./manage-relational-metadata-using-gravitino.mdtable-column-auto-increment).
- Supports managing MySQL table features though table properties, like using `engine` to set MySQL storage engine.

Catalog properties

You can pass to a MySQL data source any property that isn't defined by Gravitino by adding `gravitino.bypass.` prefix as a catalog property. For example, catalog property `gravitino.bypass.maxWaitMillis` will pass `maxWaitMillis` to the data source property.

Check the relevant data source configuration in [data source properties](https://commons.apache.org/proper/commons-dbcp/configuration.html)

When you use the Gravitino with Trino. You can pass the Trino MySQL connector configuration using prefix `trino.bypass.`. For example, using `trino.bypass.join-pushdown.strategy` to pass the `join-pushdown.strategy` to the Gravitino MySQL catalog in Trino runtime.

If you use a JDBC catalog, you must provide `jdbc-url`, `jdbc-driver`, `jdbc-user` and `jdbc-password` to catalog properties.
Besides the [common catalog properties](./gravitino-server-config.mdgravitino-catalog-properties-configuration), the MySQL catalog has the following properties:

| Configuration item | Description | Default value | Required | Since Version |
|----------------------|--------------------------------------------------------------------------------------------------------|---------------|----------|---------------|
| `jdbc-url` | JDBC URL for connecting to the database. For example, `jdbc:mysql://localhost:3306` | (none) | Yes | 0.3.0 |
| `jdbc-driver` | The driver of the JDBC connection. For example, `com.mysql.jdbc.Driver` or `com.mysql.cj.jdbc.Driver`. | (none) | Yes | 0.3.0 |
| `jdbc-user` | The JDBC user name. | (none) | Yes | 0.3.0 |
| `jdbc-password` | The JDBC password. | (none) | Yes | 0.3.0 |

0.5.0

| `replication_num` | The number of replications for the table. If not specified and the number of backend servers less than 3, then the default value is 1; If not specified and the number of backend servers greater or equals to 3, the default value (3) in Doris server will be used. For more, please see the [doc](https://doris.apache.org/docs/1.2/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE/) | `1` or `3` | No | 0.6.0-incubating |

Before using the Doris Catalog, you must download the corresponding JDBC driver to the `catalogs/jdbc-doris/libs` directory.
Gravitino doesn't package the JDBC driver for Doris due to licensing issues.

Catalog operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

- Gravitino's schema concept corresponds to the Doris database.
- Supports creating schema.
- Supports dropping schema.

Schema properties

- Support schema properties, including Doris database properties and user-defined properties.

Schema operations

Please refer to
[Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations) for more details.

Table

Table capabilities

- Gravitino's table concept corresponds to the Doris table.
- Supports index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value).

Table column types

| Gravitino Type | Doris Type |
|----------------|------------|
| `Boolean` | `Boolean` |
| `Byte` | `TinyInt` |
| `Short` | `SmallInt` |
| `Integer` | `Int` |
| `Long` | `BigInt` |
| `Float` | `Float` |
| `Double` | `Double` |
| `Decimal` | `Decimal` |
| `Date` | `Date` |
| `Timestamp` | `Datetime` |
| `VarChar` | `VarChar` |
| `FixedChar` | `Char` |
| `String` | `String` |


Doris doesn't support Gravitino `Fixed` `Timestamp_tz` `IntervalDay` `IntervalYear` `Union` `UUID` type.
The data types other than those listed above are mapped to Gravitino's **[Unparsed Type](./manage-relational-metadata-using-gravitino.mdunparsed-type)** that represents an unresolvable data type since 0.5.0.

:::note
Gravitino can not load Doris `array`, `map` and `struct` type correctly, because Doris doesn't support these types in JDBC.
:::


Table column auto-increment

Unsupported for now.

Table properties

- Doris supports table properties, and you can set them in the table properties.
- Only supports Doris table properties and doesn't support user-defined properties.

Table indexes

- Supports PRIMARY_KEY

Please be aware that the index can only apply to a single column.

<Tabs groupId='language' queryString>
<TabItem value="json" label="Json">

json
{
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
}
]
}


</TabItem>
<TabItem value="java" label="Java">

java
Index[] indexes = new Index[] {
Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}})
}


</TabItem>
</Tabs>

Table partitioning

The Doris catalog supports partitioned tables.
Users can create partitioned tables in the Doris catalog with specific partitioning attributes. It is also supported to pre-assign partitions when creating Doris tables.
Note that although Gravitino supports several partitioning strategies, Apache Doris inherently only supports these two partitioning strategies:

- `RANGE`
- `LIST`

:::caution
The `fieldName` specified in the partitioning attributes must be the name of columns defined in the table.
:::

Table distribution

Users can also specify the distribution strategy when creating tables in the Doris catalog. Currently, the Doris catalog supports the following distribution strategies:
- `HASH`
- `RANDOM`

For the `RANDOM` distribution strategy, Gravitino uses the `EVEN` to represent it. More information about the distribution strategy defined in Gravitino can be found [here](./table-partitioning-distribution-sort-order-indexes.mdtable-distribution).


Table operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.

Alter table operations

Gravitino supports these table alteration operations:

- `RenameTable`
- `UpdateComment`
- `AddColumn`
- `DeleteColumn`
- `UpdateColumnType`
- `UpdateColumnPosition`
- `UpdateColumnComment`
- `SetProperty`

Please be aware that:

- Not all table alteration operations can be processed in batches.
- Schema changes, such as adding/modifying/dropping columns can be processed in batches.
- Supports modifying multiple column comments at the same time.
- Doesn't support modifying the column type and column comment at the same time.
- The schema alteration in Doris is asynchronous. You might get an outdated schema if you
execute a schema query immediately after the alteration. It is recommended to pause briefly
after the schema alteration. Gravitino will add the schema alteration status into
the schema information in the upcoming version to solve this problem.

0.4.0

| `list-all-tables` | Lists all tables in a database, including non-Hive tables, such as Iceberg, Hudi, etc. | false | No | 0.5.1 |

:::note
For `list-all-tables=false`, the Hive catalog will filter out:
- Iceberg tables by table property `table_type=ICEBERG`
- Paimon tables by table property `table_type=PAINMON`
- Hudi tables by table property `provider=hudi`
:::

When you use the Gravitino with Trino. You can pass the Trino Hive connector configuration using prefix `trino.bypass.`. For example, using `trino.bypass.hive.config.resources` to pass the `hive.config.resources` to the Gravitino Hive catalog in Trino runtime.

When you use the Gravitino with Spark. You can pass the Spark Hive connector configuration using prefix `spark.bypass.`. For example, using `spark.bypass.hive.exec.dynamic.partition.mode` to pass the `hive.exec.dynamic.partition.mode` to the Spark Hive connector in Spark runtime.

When you use the Gravitino authorization Hive with Apache Ranger. You can see the [Authorization Hive with Ranger properties](security/authorization-pushdown.mdauthorization-hive-with-ranger-properties)
Catalog operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

The Hive catalog supports creating, updating, and deleting databases in the HMS.

Schema properties

Schema properties supply or set metadata for the underlying Hive database.
The following table lists predefined schema properties for the Hive database. Additionally, you can define your own key-value pair properties and transmit them to the underlying Hive database.

| Property name | Description | Default value | Required | Since Version |
|---------------|--------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|----------|---------------|
| `location` | The directory for Hive database storage, such as `/user/hive/warehouse`. | HMS uses the value of `hive.metastore.warehouse.dir` in the `hive-site.xml` by default. | No | 0.1.0 |

Schema operations

see [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations).

Table

Table capabilities

- The Hive catalog supports creating, updating, and deleting tables in the HMS.
- Doesn't support column default value.

Table partitioning

The Hive catalog supports [partitioned tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables). Users can create partitioned tables in the Hive catalog with the specific partitioning attribute.
Although Gravitino supports several partitioning strategies, Apache Hive inherently only supports a single partitioning strategy (partitioned by column). Therefore, the Hive catalog only supports `Identity` partitioning.

:::caution
The `fieldName` specified in the partitioning attribute must be the name of a column defined in the table.
:::

Table sort orders and distributions

The Hive catalog supports [bucketed sorted tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables). Users can create bucketed sorted tables in the Hive catalog with specific `distribution` and `sortOrders` attributes.
Although Gravitino supports several distribution strategies, Apache Hive inherently only supports a single distribution strategy (clustered by column). Therefore the Hive catalog only supports `Hash` distribution.

:::caution
The `fieldName` specified in the `distribution` and `sortOrders` attribute must be the name of a column defined in the table.
:::

Table column types

The Hive catalog supports all data types defined in the [Hive Language Manual](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types).
The following table lists the data types mapped from the Hive catalog to Gravitino.

| Hive Data Type | Gravitino Data Type | Since Version |
|-----------------------------|---------------------|---------------|
| `boolean` | `boolean` | 0.2.0 |
| `tinyint` | `byte` | 0.2.0 |
| `smallint` | `short` | 0.2.0 |
| `int`/`integer` | `integer` | 0.2.0 |
| `bigint` | `long` | 0.2.0 |
| `float` | `float` | 0.2.0 |
| `double`/`double precision` | `double` | 0.2.0 |
| `decimal` | `decimal` | 0.2.0 |
| `string` | `string` | 0.2.0 |
| `char` | `char` | 0.2.0 |
| `varchar` | `varchar` | 0.2.0 |
| `timestamp` | `timestamp` | 0.2.0 |
| `date` | `date` | 0.2.0 |
| `interval_year_month` | `interval_year` | 0.2.0 |
| `interval_day_time` | `interval_day` | 0.2.0 |
| `binary` | `binary` | 0.2.0 |
| `array` | `list` | 0.2.0 |
| `map` | `map` | 0.2.0 |
| `struct` | `struct` | 0.2.0 |
| `uniontype` | `union` | 0.2.0 |

:::info
Since 0.6.0-incubating, the data types other than listed above are mapped to Gravitino **[External Type](./manage-relational-metadata-using-gravitino.mdexternal-type)** that represents an unresolvable data type from the Hive catalog.
:::

Table properties

Table properties supply or set metadata for the underlying Hive tables.
The following table lists predefined table properties for a Hive table. Additionally, you can define your own key-value pair properties and transmit them to the underlying Hive database.

:::note
**Reserved**: Fields that cannot be passed to the Gravitino server.

**Immutable**: Fields that cannot be modified once set.
:::

| Property Name | Description | Default Value | Required | Reserved | Immutable | Since Version |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|----------|----------|-----------|---------------|
| `location` | The location for table storage, such as `/user/hive/warehouse/test_table`. | HMS uses the database location as the parent directory by default. | No | No | Yes | 0.2.0 |
| `table-type` | Type of the table. Valid values include `MANAGED_TABLE` and `EXTERNAL_TABLE`. | `MANAGED_TABLE` | No | No | Yes | 0.2.0 |
| `format` | The table file format. Valid values include `TEXTFILE`, `SEQUENCEFILE`, `RCFILE`, `ORC`, `PARQUET`, `AVRO`, `JSON`, `CSV`, and `REGEX`. | `TEXTFILE` | No | No | Yes | 0.2.0 |
| `input-format` | The input format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. | The property `format` sets the default value `org.apache.hadoop.mapred.TextInputFormat` and can change it to a different default. | No | No | Yes | 0.2.0 |
| `output-format` | The output format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat`. | The property `format` sets the default value `org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat` and can change it to a different default. | No | No | Yes | 0.2.0 |
| `serde-lib` | The serde library class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcSerde`. | The property `format` sets the default value `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` and can change it to a different default. | No | No | Yes | 0.2.0 |

0.3.0

:::caution
You must download the corresponding JDBC driver to the `catalogs/jdbc-mysql/libs` directory.
:::

Catalog operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

- Gravitino's schema concept corresponds to the MySQL database.
- Supports creating schema, but does not support setting comment.
- Supports dropping schema.
- Supports cascade dropping schema.

Schema properties

- Doesn't support any schema property settings.

Schema operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations) for more details.

Table

Table capabilities

- Gravitino's table concept corresponds to the MySQL table.
- Supports DDL operation for MySQL tables.
- Supports index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value) and [auto-increment](./manage-relational-metadata-using-gravitino.mdtable-column-auto-increment)..
- Supports managing MySQL table features though table properties, like using `engine` to set MySQL storage engine.

Table column types

| Gravitino Type | MySQL Type |
|--------------------|---------------------|
| `Byte` | `Tinyint` |
| `Unsigned Byte` | `Tinyint Unsigned` |
| `Short` | `Smallint` |
| `Unsigned Short` | `Smallint Unsigned` |
| `Integer` | `Int` |
| `Unsigned Integer` | `Int Unsigned` |
| `Long` | `Bigint` |
| `Unsigned Long` | `Bigint Unsigned` |
| `Float` | `Float` |
| `Double` | `Double` |
| `String` | `Text` |
| `Date` | `Date` |
| `Time` | `Time` |
| `Timestamp` | `Timestamp` |
| `Decimal` | `Decimal` |
| `VarChar` | `VarChar` |
| `FixedChar` | `FixedChar` |
| `Binary` | `Binary` |
| `BOOLEAN` | `BIT` |

:::info
MySQL doesn't support Gravitino `Fixed` `Struct` `List` `Map` `Timestamp_tz` `IntervalDay` `IntervalYear` `Union` `UUID` type.
Meanwhile, the data types other than listed above are mapped to Gravitino **[External Type](./manage-relational-metadata-using-gravitino.mdexternal-type)** that represents an unresolvable data type since 0.6.0-incubating.
:::

Table column auto-increment

:::note
MySQL setting an auto-increment column requires simultaneously setting a unique index; otherwise, an error will occur.
:::

<Tabs groupId='language' queryString>
<TabItem value="json" label="Json">

json
{
"columns": [
{
"name": "id",
"type": "integer",
"comment": "id column comment",
"nullable": false,
"autoIncrement": true
},
{
"name": "name",
"type": "varchar(500)",
"comment": "name column comment",
"nullable": true,
"autoIncrement": false
}
],
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
}
]
}


</TabItem>
<TabItem value="java" label="Java">

java
Column[] cols = new Column[] {
Column.of("id", Types.IntegerType.get(), "id column comment", false, true, null),
Column.of("name", Types.VarCharType.of(500), "Name of the user", true, false, null)
};
Index[] indexes = new Index[] {
Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}})
};


</TabItem>
</Tabs>

Table properties

Although MySQL itself does not support table properties, Gravitino offers table property management for MySQL tables through the `jdbc-mysql` catalog, enabling control over table features. The supported properties are listed as follows:

:::note
**Reserved**: Fields that cannot be passed to the Gravitino server.

**Immutable**: Fields that cannot be modified once set.
:::

:::caution
- Doesn't support remove table properties. You can only add or modify properties, not delete properties.
:::

| Property Name | Description | Default Value | Required | Reserved | Immutable | Since version |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-----------|------------|-----------|---------------|
| `engine` | The engine used by the table. For example `MyISAM`, `MEMORY`, `CSV`, `ARCHIVE`, `BLACKHOLE`, `FEDERATED`, `ndbinfo`, `MRG_MYISAM`, `PERFORMANCE_SCHEMA`. | `InnoDB` | No | No | Yes | 0.4.0 |
| `auto-increment-offset` | Used to specify the starting value of the auto-increment field. | (none) | No | No | Yes | 0.4.0 |


:::note
Some MySQL storage engines, such as FEDERATED, are not enabled by default and require additional configuration to use. For example, to enable the FEDERATED engine, set federated=1 in the MySQL configuration file. Similarly, engines like ndbinfo, MRG_MYISAM, and PERFORMANCE_SCHEMA may also require specific prerequisites or configurations. For detailed instructions,
refer to the [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/federated-storage-engine.html).
:::

Table indexes

- Supports PRIMARY_KEY and UNIQUE_KEY.

:::note
The index name of the PRIMARY_KEY must be PRIMARY
[Create table index](https://dev.mysql.com/doc/refman/8.0/en/create-table.html)
:::

<Tabs groupId='language' queryString>
<TabItem value="json" label="Json">

json
{
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
},
{
"indexType": "unique_key",
"name": "id_name_uk",
"fieldNames": [["id"] ,["name"]]
}
]
}


</TabItem>
<TabItem value="java" label="Java">

java
Index[] indexes = new Index[] {
Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}}),
Indexes.of(IndexType.UNIQUE_KEY, "id_name_uk", new String[][]{{"id"} , {"name"}}),
};


</TabItem>
</Tabs>

Table operations

Refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.

Alter table operations

Gravitino supports these table alteration operations:

- `RenameTable`
- `UpdateComment`
- `AddColumn`
- `DeleteColumn`
- `RenameColumn`
- `UpdateColumnType`
- `UpdateColumnPosition`
- `UpdateColumnNullability`
- `UpdateColumnComment`
- `UpdateColumnDefaultValue`
- `SetProperty`

:::info
- You cannot submit the `RenameTable` operation at the same time as other operations.
- If you update a nullability column to non-nullability, there may be compatibility issues.
:::


---
title: "Apache Hive catalog"
slug: /apache-hive-catalog
date: 2023-12-10
keyword: hive catalog
license: "This software is licensed under the Apache License version 2."
---

Introduction

Apache Gravitino offers the capability to utilize [Apache Hive](https://hive.apache.org) as a catalog for metadata management.

Requirements and limitations

* The Hive catalog requires a Hive Metastore Service (HMS), or a compatible implementation of the HMS, such as AWS Glue.
* Gravitino must have network access to the Hive metastore service using the Thrift protocol.

:::note
Although the Hive catalog uses the Hive2 metastore client, it can be compatible with the Hive3 metastore service because the called HMS APIs are still available in Hive3.
If there is any compatibility issue, please create an [issue](https://github.com/apache/gravitino/issues).
:::

Catalog

Catalog capabilities

The Hive catalog supports creating, updating, and deleting databases and tables in the HMS.

Catalog properties

Besides the [common catalog properties](./gravitino-server-config.mdgravitino-catalog-properties-configuration), the Hive catalog has the following properties:

| Property Name | Description | Default Value | Required | Since Version |
|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|------------------------------|---------------|

0.2.0

Table indexes

- Doesn't support table indexes.

Table operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.

Alter table operations

Supports operations:

- `RenameTable`
- `SetProperty`
- `RemoveProperty`
- `UpdateComment`
- `AddColumn`
- `DeleteColumn`
- `RenameColumn`
- `UpdateColumnType`
- `UpdateColumnPosition`
- `UpdateColumnNullability`
- `UpdateColumnComment`

:::info
The default column position is `LAST` when you add a column. If you add a non nullability column, there may be compatibility issues.
:::

:::caution
If you update a nullability column to non nullability, there may be compatibility issues.
:::

HDFS configuration

You can place `core-site.xml` and `hdfs-site.xml` in the `catalogs/lakehouse-iceberg/conf` directory to automatically load as the default HDFS configuration.

:::info
Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters.
When writing to HDFS, the Gravitino Iceberg REST server can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See [How to access Apache Hadoop](gravitino-server-config.md) for more details.
:::


---
title: "Paimon catalog"
slug: /lakehouse-paimon-catalog
keywords:
- lakehouse
- Paimon
- metadata
license: "This software is licensed under the Apache License version 2."
---

Introduction

Apache Gravitino provides the ability to manage Apache Paimon metadata.

Requirements

:::info
Builds with Apache Paimon `0.8.0`.
:::

Catalog

Catalog capabilities

- Works as a catalog proxy, supporting `FilesystemCatalog`, `JdbcCatalog` and `HiveCatalog`.
- Supports DDL operations for Paimon schemas and tables.

- Doesn't support alterSchema.

Catalog properties

| Property name | Description | Default value | Required | Since Version |
|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
| `catalog-backend` | Catalog backend of Gravitino Paimon catalog. Supports `filesystem`, `jdbc` and `hive`. | (none) | Yes | 0.6.0-incubating |
| `uri` | The URI configuration of the Paimon catalog. `thrift://127.0.0.1:9083` or `jdbc:postgresql://127.0.0.1:5432/db_name` or `jdbc:mysql://127.0.0.1:3306/metastore_db`. It is optional for `FilesystemCatalog`. | (none) | required if the value of `catalog-backend` is not `filesystem`. | 0.6.0-incubating |
| `warehouse` | Warehouse directory of catalog. `file:///user/hive/warehouse-paimon/` for local fs, `hdfs://namespace/hdfs/path` for HDFS , `s3://{bucket-name}/path/` for S3 or `oss://{bucket-name}/path` for Aliyun OSS | (none) | Yes | 0.6.0-incubating |
| `catalog-backend-name` | The catalog name passed to underlying Paimon catalog backend. | The property value of `catalog-backend`, like `jdbc` for JDBC catalog backend. | No | 0.8.0-incubating |
| `authentication.type` | The type of authentication for Paimon catalog backend, currently Gravitino only supports `Kerberos` and `simple`. | `simple` | No | 0.6.0-incubating |
| `hive.metastore.sasl.enabled` | Whether to enable SASL authentication protocol when connect to Kerberos Hive metastore. This is a raw Hive configuration | `false` | No, This value should be true in most case(Some will use SSL protocol, but it rather rare) if the value of `gravitino.iceberg-rest.authentication.type` is Kerberos. | 0.6.0-incubating |
| `authentication.kerberos.principal` | The principal of the Kerberos authentication. | (none) | required if the value of `authentication.type` is Kerberos. | 0.6.0-incubating |
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication. | (none) | required if the value of `authentication.type` is Kerberos. | 0.6.0-incubating |
| `authentication.kerberos.check-interval-sec` | The check interval of Kerberos credential for Paimon catalog. | 60 | No | 0.6.0-incubating |
| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60 | No | 0.6.0-incubating |
| `oss-endpoint` | The endpoint of the Aliyun OSS. | (none) | required if the value of `warehouse` is a OSS path | 0.7.0-incubating |
| `oss-access-key-id` | The access key of the Aliyun OSS. | (none) | required if the value of `warehouse` is a OSS path | 0.7.0-incubating |
| `oss-accesss-key-secret` | The secret key the Aliyun OSS. | (none) | required if the value of `warehouse` is a OSS path | 0.7.0-incubating |
| `s3-endpoint` | The endpoint of the AWS S3. | (none) | required if the value of `warehouse` is a S3 path | 0.7.0-incubating |
| `s3-access-key-id` | The access key of the AWS S3. | (none) | required if the value of `warehouse` is a S3 path | 0.7.0-incubating |
| `s3-secret-access-key` | The secret key of the AWS S3. | (none) | required if the value of `warehouse` is a S3 path | 0.7.0-incubating |

:::note
If you want to use the `oss` or `s3` warehouse, you need to place related jars in the `catalogs/lakehouse-paimon/lib` directory, more information can be found in the [Paimon S3](https://paimon.apache.org/docs/master/filesystems/s3/).
:::

:::note
The hive backend does not support the kerberos authentication now.
:::

Any properties not defined by Gravitino with `gravitino.bypass.` prefix will pass to Paimon catalog properties and HDFS configuration. For example, if specify `gravitino.bypass.table.type`, `table.type` will pass to Paimon catalog properties.

JDBC backend

If you are using JDBC backend, you must specify the properties like `jdbc-user`, `jdbc-password` and `jdbc-driver`.

| Property name | Description | Default value | Required | Since Version |
|-----------------|-----------------------------------------------------------------------------------------------------------|-----------------|-------------------------------------------------------|------------------|
| `jdbc-user` | Jdbc user of Gravitino Paimon catalog for `jdbc` backend. | (none) | required if the value of `catalog-backend` is `jdbc`. | 0.7.0-incubating |
| `jdbc-password` | Jdbc password of Gravitino Paimon catalog for `jdbc` backend. | (none) | required if the value of `catalog-backend` is `jdbc`. | 0.7.0-incubating |
| `jdbc-driver` | `com.mysql.jdbc.Driver` or `com.mysql.cj.jdbc.Driver` for MySQL, `org.postgresql.Driver` for PostgreSQL | (none) | required if the value of `catalog-backend` is `jdbc`. | 0.7.0-incubating |

:::caution
You must download the corresponding JDBC driver and place it to the `catalogs/lakehouse-paimon/libs` directory If you are using JDBC backend.
:::

Catalog operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

- Supporting createSchema, dropSchema, loadSchema and listSchema.
- Supporting cascade drop schema.

- Doesn't support alterSchema.

Schema properties

- Doesn't support specify location and store any schema properties when createSchema for FilesystemCatalog.
- Doesn't return any schema properties when loadSchema for FilesystemCatalog.
- Doesn't support store schema comment for FilesystemCatalog.

Schema operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations) for more details.

Table

Table capabilities

- Supporting createTable, purgeTable, alterTable, loadTable and listTable.
- Supporting Column default value through table properties, such as `fields.{columnName}.default-value`, not column expression.

- Doesn't support dropTable.
- Doesn't support table distribution and sort orders.

:::info
Gravitino Paimon Catalog does not support dropTable, because the dropTable in Paimon will both remove the table metadata and the table location from the file system and skip the trash, we should use purgeTable instead in Gravitino.
:::

:::info
Paimon does not support auto increment column.
:::

Table changes

- RenameTable
- AddColumn
- DeleteColumn
- RenameColumn
- UpdateColumnComment
- UpdateColumnNullability
- UpdateColumnPosition
- UpdateColumnType
- UpdateComment
- SetProperty
- RemoveProperty

Table partitions

- Only supports Identity partitions, such as `day`, `hour`, etc.

Please refer to [Paimon DDL Create Table](https://paimon.apache.org/docs/0.8/spark/sql-ddl/#create-table) for more details.

Table sort orders

- Doesn't support table sort orders.

Table distributions

- Doesn't support table distributions.

Table indexes

- Only supports primary key Index.

:::info
We cannot specify more than one primary key Index, and a primary key Index can contain multiple fields as a joint primary key.
:::

:::info
Paimon Table primary key constraint should not be same with partition fields, this will result in only one record in a partition.
:::

Table column types

| Gravitino Type | Apache Paimon Type |
|-----------------------------|------------------------------|
| `Struct` | `Row` |
| `Map` | `Map` |
| `List` | `Array` |
| `Boolean` | `Boolean` |
| `Byte` | `TinyInt` |
| `Short` | `SmallInt` |
| `Integer` | `Int` |
| `Long` | `BigInt` |
| `Float` | `Float` |
| `Double` | `Double` |
| `Decimal` | `Decimal` |
| `String` | `VarChar(Integer.MAX_VALUE)` |
| `VarChar` | `VarChar` |
| `FixedChar` | `Char` |
| `Date` | `Date` |
| `Time` | `Time` |
| `TimestampType withZone` | `LocalZonedTimestamp` |
| `TimestampType withoutZone` | `Timestamp` |
| `Fixed` | `Binary` |
| `Binary` | `VarBinary` |

:::info
Gravitino doesn't support Paimon `MultisetType` type.
:::

Table properties

You can pass [Paimon table properties](https://paimon.apache.org/docs/0.8/maintenance/configurations/) to Gravitino when creating a Paimon table.

:::note
**Reserved**: Fields that cannot be passed to the Gravitino server.

**Immutable**: Fields that cannot be modified once set.
:::

| Configuration item | Description | Default Value | Required | Reserved | Immutable | Since version |
|------------------------------------|--------------------------------------------------------------|---------------|-----------|----------|-----------|-------------------|
| `merge-engine` | The table merge-engine. | (none) | No | No | Yes | 0.6.0-incubating |
| `sequence.field` | The table sequence.field. | (none) | No | No | Yes | 0.6.0-incubating |
| `rowkind.field` | The table rowkind.field. | (none) | No | No | Yes | 0.6.0-incubating |
| `comment` | The table comment. | (none) | No | Yes | No | 0.6.0-incubating |
| `owner` | The table owner. | (none) | No | Yes | No | 0.6.0-incubating |
| `bucket-key` | The table bucket-key. | (none) | No | Yes | No | 0.6.0-incubating |
| `primary-key` | The table primary-key. | (none) | No | Yes | No | 0.6.0-incubating |
| `partition` | The table partition. | (none) | No | Yes | No | 0.6.0-incubating |

Table operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.

HDFS configuration

You can place `core-site.xml` and `hdfs-site.xml` in the `catalogs/lakehouse-paimon/conf` directory to automatically load as the default HDFS configuration.

:::caution
When reading and writing to HDFS, the Gravitino server can only operate as the specified Kerberos user and doesn't support proxying to other Kerberos users now.
:::


---
title: "Hudi catalog"
slug: /lakehouse-hudi-catalog
keywords:
- lakehouse
- hudi
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from 'theme/Tabs';
import TabItem from 'theme/TabItem';

Introduction

Apache Gravitino provides the ability to manage Apache Hudi metadata.

Requirements and limitations

:::info
Tested and verified with Apache Hudi `0.15.0`.
:::

Catalog

Catalog capabilities

- Works as a catalog proxy, supporting `HMS` as catalog backend.
- Only support read operations (list and load) for Hudi schemas and tables.
- Doesn't support timeline management operations now.

Catalog properties

| Property name | Description | Default value | Required | Since Version |
|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------|
| `catalog-backend` | Catalog backend of Gravitino Hudi catalog. Only supports `hms` now. | (none) | Yes | 0.7.0-incubating |
| `uri` | The URI associated with the backend. Such as `thrift://127.0.0.1:9083` for HMS backend. | (none) | Yes | 0.7.0-incubating |
| `client.pool-size` | For HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.7.0-incubating |
| `client.pool-cache.eviction-interval-ms` | For HMS backend. The cache pool eviction interval. | 300000 | No | 0.7.0-incubating |
| `gravitino.bypass.` | Property name with this prefix passed down to the underlying backend client for use. Such as `gravitino.bypass.hive.metastore.failure.retries = 3` indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend. | (none) | No | 0.7.0-incubating |

Catalog operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdcatalog-operations) for more details.

Schema

Schema capabilities

- Only support read operations: listSchema, loadSchema, and schemaExists.

Schema properties

- The `Location` is an optional property that shows the storage path to the Hudi database

Schema operations

Only support read operations: listSchema, loadSchema, and schemaExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdschema-operations) for more details.

Table

Table capabilities

- Only support read operations: listTable, loadTable, and tableExists.

Table partitions

- Support loading Hudi partitioned tables (Hudi only supports identity partitioning).

Table sort orders

- Doesn't support table sort orders.

Table distributions

- Doesn't support table distributions.

Table indexes

- Doesn't support table indexes.

Table properties

- For HMS backend, it will bring out all the table parameters from the HMS.

Table column types

The following table shows the mapping between Gravitino and [Apache Hudi column types](https://hudi.apache.org/docs/sql_ddl#supported-types):

| Gravitino Type | Apache Hudi Type |
|----------------|------------------|
| `boolean` | `boolean` |
| `integer` | `int` |
| `long` | `long` |
| `date` | `date` |
| `timestamp` | `timestamp` |
| `float` | `float` |
| `double` | `double` |
| `string` | `string` |
| `decimal` | `decimal` |
| `binary` | `bytes` |
| `array` | `array` |
| `map` | `map` |
| `struct` | `struct` |

Table operations

Only support read operations: listTable, loadTable, and tableExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.mdtable-operations) for more details.


---
title: "Apache Doris catalog"
slug: /jdbc-doris-catalog
keywords:
- jdbc
- Apache Doris
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from 'theme/Tabs';
import TabItem from 'theme/TabItem';

Introduction

Apache Gravitino provides the ability to manage [Apache Doris](https://doris.apache.org/) metadata through JDBC connection.

:::caution
Gravitino saves some system information in schema and table comments, like
`(From Gravitino, DO NOT EDIT: gravitino.v1.uid1078334182909406185)`, please don't change or remove this message.
:::

Catalog

Catalog capabilities

- Gravitino catalog corresponds to the Doris instance.
- Supports metadata management of Doris (1.2.x).
- Supports table index.
- Supports [column default value](./manage-relational-metadata-using-gravitino.mdtable-column-default-value).

Catalog properties

You can pass to a Doris data source any property that isn't defined by Gravitino by adding
`gravitino.bypass.` prefix as a catalog property. For example, catalog property
`gravitino.bypass.maxWaitMillis` will pass `maxWaitMillis` to the data source property.

You can check the relevant data source configuration in
[data source properties](https://commons.apache.org/proper/commons-dbcp/configuration.html) for
more details.

Besides the [common catalog properties](./gravitino-server-config.mdgravitino-catalog-properties-configuration), the Doris catalog has the following properties:

| Configuration item | Description | Default value | Required | Since Version |
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------|
| `jdbc-url` | JDBC URL for connecting to the database. For example, `jdbc:mysql://localhost:9030` | (none) | Yes | 0.5.0 |
| `jdbc-driver` | The driver of the JDBC connection. For example, `com.mysql.jdbc.Driver`. | (none) | Yes | 0.5.0 |
| `jdbc-user` | The JDBC user name. | (none) | Yes | 0.5.0 |
| `jdbc-password` | The JDBC password. | (none) | Yes | 0.5.0 |

Links

Releases

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.