Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark-connector):support JDBC catalog #6212

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions docs/spark-connector/spark-catalog-jdbc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: "Spark connector JDBC catalog"
slug: /spark-connector/spark-catalog-jdbc
keyword: spark connector jdbc catalog
license: "This software is licensed under the Apache License version 2."
---

The Apache Gravitino Spark connector offers the capability to read JDBC tables, with the metadata managed by the Gravitino server. To enable the use of the JDBC catalog within the Spark connector, you must download the jdbc driver jar which you used to Spark classpath.
liangyouze marked this conversation as resolved.
Show resolved Hide resolved

## Capabilities
liangyouze marked this conversation as resolved.
Show resolved Hide resolved

Supports MySQL and PostgreSQL. For OceanBase which is compatible with Mysql Dialects could use Mysql driver and Mysql Dialects as a trackoff way. But for Doris which do not support MySQL Dialects, are not currently supported.

#### Support DML and DDL operations:

- `CREATE TABLE`
- `DROP TABLE`
- `ALTER TABLE`
- `SELECT`
- `INSERT`

:::info
JDBCTable does not support distributed transaction. When writing data to RDBMS, each task is an independent transaction. If some tasks of spark succeed and some tasks fail, dirty data is generated.
:::

#### Not supported operations:
liangyouze marked this conversation as resolved.
Show resolved Hide resolved

- `UPDATE`
- `DELETE`
- `TRUNCATE`

## SQL example

```sql
-- Suppose mysql_a is the mysql catalog name managed by Gravitino
USE mysql_a;

CREATE DATABASE IF NOT EXISTS mydatabase;
USE mydatabase;

CREATE TABLE IF NOT EXISTS employee (
id bigint,
name string,
department string,
hire_date timestamp
)
DESC TABLE EXTENDED employee;

INSERT INTO employee
VALUES
(1, 'Alice', 'Engineering', TIMESTAMP '2021-01-01 09:00:00'),
(2, 'Bob', 'Marketing', TIMESTAMP '2021-02-01 10:30:00'),
(3, 'Charlie', 'Sales', TIMESTAMP '2021-03-01 08:45:00');

SELECT * FROM employee WHERE date(hire_date) = '2021-01-01';


```

## Catalog properties

Gravitino spark connector will transform below property names which are defined in catalog properties to Spark JDBC connector configuration.

| Gravitino catalog property name | Spark JDBC connector configuration | Description | Since Version |
|---------------------------------|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| `jdbc-url` | `url` | JDBC URL for connecting to the database. For example, jdbc:mysql://localhost:3306 | 0.3.0 |
| `jdbc-user` | `jdbc.user` | JDBC user name | 0.3.0 |
| `jdbc-password` | `jdbc.password` | JDBC password | 0.3.0 |
| `jdbc-driver` | `driver` | The driver of the JDBC connection. For example, com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver | 0.3.0 |

Gravitino catalog property names with the prefix `spark.bypass.` are passed to Spark JDBC connector.

2 changes: 1 addition & 1 deletion docs/spark-connector/spark-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The Apache Gravitino Spark connector leverages the Spark DataSourceV2 interface

## Capabilities

1. Supports [Hive catalog](spark-catalog-hive.md), [Iceberg catalog](spark-catalog-iceberg.md) and [Paimon catalog](spark-catalog-paimon.md).
1. Supports [Hive catalog](spark-catalog-hive.md), [Iceberg catalog](spark-catalog-iceberg.md), [Paimon catalog](spark-catalog-paimon.md) and [Jdbc catalog](spark-catalog-jdbc.md).
2. Supports federation query.
3. Supports most DDL and DML SQLs.

Expand Down
3 changes: 3 additions & 0 deletions spark-connector/spark-common/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ val scalaCollectionCompatVersion: String = libs.versions.scala.collection.compat

dependencies {
implementation(project(":catalogs:catalog-common"))
implementation(project(":catalogs:catalog-jdbc-common")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalogs:catalog-jdbc-common is the module of Gravitino server, why adding this dependences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@FANNG1 FANNG1 Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to fix the IT failures, we could checkout the errors in server side. you could download the log in https://github.com/apache/gravitino/actions/runs/12900772663?pr=6212

exclude("org.apache.logging.log4j")
}
implementation(libs.guava)

compileOnly(project(":clients:client-java-runtime", configuration = "shadow"))
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.gravitino.spark.connector.jdbc;

import com.google.common.collect.Maps;
import java.util.Map;
import org.apache.gravitino.spark.connector.PropertiesConverter;
import org.apache.gravitino.spark.connector.SparkTransformConverter;
import org.apache.gravitino.spark.connector.SparkTypeConverter;
import org.apache.gravitino.spark.connector.catalog.BaseCatalog;
import org.apache.spark.sql.catalyst.analysis.NamespaceAlreadyExistsException;
import org.apache.spark.sql.connector.catalog.Identifier;
import org.apache.spark.sql.connector.catalog.SupportsNamespaces;
import org.apache.spark.sql.connector.catalog.Table;
import org.apache.spark.sql.connector.catalog.TableCatalog;
import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTable;
import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog;
import org.apache.spark.sql.util.CaseInsensitiveStringMap;

public class GravitinoJdbcCatalog extends BaseCatalog {

@Override
protected TableCatalog createAndInitSparkCatalog(
String name, CaseInsensitiveStringMap options, Map<String, String> properties) {
JDBCTableCatalog jdbcTableCatalog = new JDBCTableCatalog();
Map<String, String> all =
getPropertiesConverter().toSparkCatalogProperties(options, properties);
jdbcTableCatalog.initialize(name, new CaseInsensitiveStringMap(all));
return jdbcTableCatalog;
}

@Override
protected Table createSparkTable(
Identifier identifier,
org.apache.gravitino.rel.Table gravitinoTable,
Table sparkTable,
TableCatalog sparkCatalog,
PropertiesConverter propertiesConverter,
SparkTransformConverter sparkTransformConverter,
SparkTypeConverter sparkTypeConverter) {
return new SparkJdbcTable(
identifier,
gravitinoTable,
(JDBCTable) sparkTable,
(JDBCTableCatalog) sparkCatalog,
propertiesConverter,
sparkTransformConverter,
sparkTypeConverter);
}

@Override
protected PropertiesConverter getPropertiesConverter() {
return JdbcPropertiesConverter.getInstance();
}

@Override
protected SparkTransformConverter getSparkTransformConverter() {
return new SparkTransformConverter(false);
}

@Override
protected SparkTypeConverter getSparkTypeConverter() {
return new SparkJdbcTypeConverter();
}

@Override
public void createNamespace(String[] namespace, Map<String, String> metadata)
throws NamespaceAlreadyExistsException {
super.createNamespace(
namespace, Maps.filterKeys(metadata, key -> key.equals(SupportsNamespaces.PROP_COMMENT)));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.gravitino.spark.connector.jdbc;

public class JdbcPropertiesConstants {

public static final String GRAVITINO_JDBC_USER = "jdbc-user";
public static final String GRAVITINO_JDBC_PASSWORD = "jdbc-password";
public static final String GRAVITINO_JDBC_DRIVER = "jdbc-driver";
public static final String GRAVITINO_JDBC_URL = "jdbc-url";

public static final String SPARK_JDBC_URL = "url";
public static final String SPARK_JDBC_USER = "user";
public static final String SPARK_JDBC_PASSWORD = "password";
public static final String SPARK_JDBC_DRIVER = "driver";
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.gravitino.spark.connector.jdbc;

import com.google.common.base.Preconditions;
import java.util.HashMap;
import java.util.Map;
import org.apache.gravitino.spark.connector.PropertiesConverter;

public class JdbcPropertiesConverter implements PropertiesConverter {

public static class JdbcPropertiesConverterHolder {
private static final JdbcPropertiesConverter INSTANCE = new JdbcPropertiesConverter();
}

private JdbcPropertiesConverter() {}

public static JdbcPropertiesConverter getInstance() {
return JdbcPropertiesConverterHolder.INSTANCE;
}

@Override
public Map<String, String> toSparkCatalogProperties(Map<String, String> properties) {
Preconditions.checkArgument(properties != null, "Jdbc Catalog properties should not be null");
HashMap<String, String> jdbcProperties = new HashMap<>();
jdbcProperties.put(
JdbcPropertiesConstants.SPARK_JDBC_URL,
properties.get(JdbcPropertiesConstants.GRAVITINO_JDBC_URL));
jdbcProperties.put(
JdbcPropertiesConstants.SPARK_JDBC_USER,
properties.get(JdbcPropertiesConstants.GRAVITINO_JDBC_USER));
jdbcProperties.put(
JdbcPropertiesConstants.SPARK_JDBC_PASSWORD,
properties.get(JdbcPropertiesConstants.GRAVITINO_JDBC_PASSWORD));
jdbcProperties.put(
JdbcPropertiesConstants.SPARK_JDBC_DRIVER,
properties.get(JdbcPropertiesConstants.GRAVITINO_JDBC_DRIVER));
return jdbcProperties;
}

@Override
public Map<String, String> toGravitinoTableProperties(Map<String, String> properties) {
return new HashMap<>(properties);
}

@Override
public Map<String, String> toSparkTableProperties(Map<String, String> properties) {
return new HashMap<>(properties);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.gravitino.spark.connector.jdbc;

import java.util.Map;
import org.apache.gravitino.rel.Table;
import org.apache.gravitino.spark.connector.PropertiesConverter;
import org.apache.gravitino.spark.connector.SparkTransformConverter;
import org.apache.gravitino.spark.connector.SparkTypeConverter;
import org.apache.gravitino.spark.connector.utils.GravitinoTableInfoHelper;
import org.apache.spark.sql.connector.catalog.Identifier;
import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTable;
import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog;
import org.apache.spark.sql.types.StructType;

public class SparkJdbcTable extends JDBCTable {

private GravitinoTableInfoHelper gravitinoTableInfoHelper;

public SparkJdbcTable(
Identifier identifier,
Table gravitinoTable,
JDBCTable jdbcTable,
JDBCTableCatalog jdbcTableCatalog,
PropertiesConverter propertiesConverter,
SparkTransformConverter sparkTransformConverter,
SparkTypeConverter sparkTypeConverter) {
super(identifier, jdbcTable.schema(), jdbcTable.jdbcOptions());
this.gravitinoTableInfoHelper =
new GravitinoTableInfoHelper(
false,
identifier,
gravitinoTable,
propertiesConverter,
sparkTransformConverter,
sparkTypeConverter);
}

@Override
public String name() {
return gravitinoTableInfoHelper.name();
}

@Override
@SuppressWarnings("deprecation")
public StructType schema() {
return gravitinoTableInfoHelper.schema();
}

@Override
public Map<String, String> properties() {
return gravitinoTableInfoHelper.properties();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.gravitino.spark.connector.jdbc;

import org.apache.gravitino.rel.types.Type;
import org.apache.gravitino.rel.types.Types;
import org.apache.gravitino.spark.connector.SparkTypeConverter;
import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;

public class SparkJdbcTypeConverter extends SparkTypeConverter {

@Override
public DataType toSparkType(Type gravitinoType) {
// if spark version lower than 3.4.4, using VarCharType will throw an exception: Unsupported
// type varchar.
if (gravitinoType instanceof Types.VarCharType) {
liangyouze marked this conversation as resolved.
Show resolved Hide resolved
return DataTypes.StringType;
} else {
return super.toSparkType(gravitinoType);
}
}
}
Loading
Loading