Skip to content

Commit

Permalink
[SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR update document for make Hive 2.3 dependency by default.

### Why are the changes needed?

The documentation is incorrect.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#26919 from wangyum/SPARK-30280.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
wangyum authored and dongjoon-hyun committed Dec 21, 2019
1 parent cd84400 commit fa47b7f
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 11 deletions.
7 changes: 2 additions & 5 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,10 @@ Example:

To enable Hive integration for Spark SQL along with its JDBC server and CLI,
add the `-Phive` and `-Phive-thriftserver` profiles to your existing build options.
By default, Spark will use Hive 1.2.1 with the `hadoop-2.7` profile, and Hive 2.3.6 with the `hadoop-3.2` profile.

# With Hive 1.2.1 support
./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
By default Spark will build with Hive 2.3.6.

# With Hive 2.3.6 support
./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.2 -DskipTests clean package
./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

## Packaging without Hadoop Dependencies for YARN

Expand Down
8 changes: 4 additions & 4 deletions docs/sql-data-sources-hive-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,15 @@ One of the most important pieces of Spark SQL's Hive support is interaction with
which enables Spark SQL to access metadata of Hive tables. Starting from Spark 1.4.0, a single binary
build of Spark SQL can be used to query different versions of Hive metastores, using the configuration described below.
Note that independent of the version of Hive that is being used to talk to the metastore, internally Spark SQL
will compile against Hive 1.2.1 and use those classes for internal execution (serdes, UDFs, UDAFs, etc).
will compile against built-in Hive and use those classes for internal execution (serdes, UDFs, UDAFs, etc).

The following options can be used to configure the version of Hive that is used to retrieve metadata:

<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.sql.hive.metastore.version</code></td>
<td><code>1.2.1</code></td>
<td><code>2.3.6</code></td>
<td>
Version of the Hive metastore. Available
options are <code>0.12.0</code> through <code>2.3.6</code> and <code>3.0.0</code> through <code>3.1.2</code>.
Expand All @@ -141,9 +141,9 @@ The following options can be used to configure the version of Hive that is used
property can be one of three options:
<ol>
<li><code>builtin</code></li>
Use Hive 1.2.1, which is bundled with the Spark assembly when <code>-Phive</code> is
Use Hive 2.3.6, which is bundled with the Spark assembly when <code>-Phive</code> is
enabled. When this option is chosen, <code>spark.sql.hive.metastore.version</code> must be
either <code>1.2.1</code> or not defined.
either <code>2.3.6</code> or not defined.
<li><code>maven</code></li>
Use Hive jars of specified version downloaded from Maven repositories. This configuration
is not generally recommended for production deployments.
Expand Down
2 changes: 1 addition & 1 deletion docs/sql-distributed-sql-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ without the need to write any code.
## Running the Thrift JDBC/ODBC server

The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
in Hive 1.2.1. You can test the JDBC server with the beeline script that comes with either Spark or Hive 1.2.1.
in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.

To start the JDBC/ODBC server, run the following in the Spark directory:

Expand Down
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ Python UDF registration is unchanged.
## Compatibility with Apache Hive

Spark SQL is designed to be compatible with the Hive Metastore, SerDes and UDFs.
Currently, Hive SerDes and UDFs are based on Hive 1.2.1,
Currently, Hive SerDes and UDFs are based on built-in Hive,
and Spark SQL can be connected to different versions of Hive Metastore
(from 0.12.0 to 2.3.6 and 3.0.0 to 3.1.2. Also see [Interacting with Different Versions of Hive Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).

Expand Down

0 comments on commit fa47b7f

Please sign in to comment.