Skip to content

Commit

Permalink
[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive…
Browse files Browse the repository at this point in the history
… (Hive 1.2.1)

### What changes were proposed in this pull request?

As of today,
- SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository.
- SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions.

This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0.
```
<hive.group>org.spark-project.hive</hive.group>
<hive.version>1.2.1.spark2</hive.version>
```

For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it.

### Why are the changes needed?

- First, Apache Spark community should not use the unofficial forked release of another Apache project.
- Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far.

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`.

### How was this patch tested?

1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366)
2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382)
3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.)
4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected)

Closes apache#29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun committed Oct 5, 2020
1 parent 14aeab3 commit 008a2ad
Show file tree
Hide file tree
Showing 320 changed files with 7 additions and 69,240 deletions.
1 change: 0 additions & 1 deletion dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,6 @@ def get_hive_profiles(hive_version):
"""

sbt_maven_hive_profiles = {
"hive1.2": ["-Phive-1.2"],
"hive2.3": ["-Phive-2.3"],
}

Expand Down
6 changes: 1 addition & 5 deletions dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ export LC_ALL=C
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive"
MVN="build/mvn"
HADOOP_HIVE_PROFILES=(
hadoop-2.7-hive-1.2
hadoop-2.7-hive-2.3
hadoop-3.2-hive-2.3
)
Expand Down Expand Up @@ -71,12 +70,9 @@ for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
if [[ $HADOOP_HIVE_PROFILE == **hadoop-3.2-hive-2.3** ]]; then
HADOOP_PROFILE=hadoop-3.2
HIVE_PROFILE=hive-2.3
elif [[ $HADOOP_HIVE_PROFILE == **hadoop-2.7-hive-2.3** ]]; then
HADOOP_PROFILE=hadoop-2.7
HIVE_PROFILE=hive-2.3
else
HADOOP_PROFILE=hadoop-2.7
HIVE_PROFILE=hive-1.2
HIVE_PROFILE=hive-2.3
fi
echo "Performing Maven install for $HADOOP_HIVE_PROFILE"
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE jar:jar jar:test-jar install:install clean -q
Expand Down
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ license: |

- In Spark 3.1, incomplete interval literals, e.g. `INTERVAL '1'`, `INTERVAL '1 DAY 2'` will fail with IllegalArgumentException. In Spark 3.0, they result `NULL`s.

- In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your custom SerDes to Hive 2.3. See [HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details.

## Upgrading from Spark SQL 3.0 to 3.0.1

- In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer TimestampType from string values if they match to the pattern defined by the JSON option `timestampFormat`. Since version 3.0.1, the timestamp type inference is disabled by default. Set the JSON option `inferTimestamp` to `true` to enable such type inference.
Expand Down
25 changes: 0 additions & 25 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2970,13 +2970,9 @@
<sourceDirectories>
<directory>${basedir}/src/main/java</directory>
<directory>${basedir}/src/main/scala</directory>
<directory>${basedir}/v${hive.version.short}/src/main/java</directory>
<directory>${basedir}/v${hive.version.short}/src/main/scala</directory>
</sourceDirectories>
<testSourceDirectories>
<directory>${basedir}/src/test/java</directory>
<directory>${basedir}/v${hive.version.short}/src/test/java</directory>
<directory>${basedir}/v${hive.version.short}/src/test/scala</directory>
</testSourceDirectories>
<configLocation>dev/checkstyle.xml</configLocation>
<outputFile>${basedir}/target/checkstyle-output.xml</outputFile>
Expand Down Expand Up @@ -3148,27 +3144,6 @@
<!-- Default hadoop profile. Uses global properties. -->
</profile>

<profile>
<id>hive-1.2</id>
<properties>
<hive.group>org.spark-project.hive</hive.group>
<hive.classifier></hive.classifier>
<!-- Version used in Maven Hive dependency -->
<hive.version>1.2.1.spark2</hive.version>
<!-- Version used for internal directory structure -->
<hive.version.short>1.2</hive.version.short>
<hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
<hive.storage.version>2.6.0</hive.storage.version>
<hive.storage.scope>provided</hive.storage.scope>
<hive.common.scope>provided</hive.common.scope>
<hive.llap.scope>provided</hive.llap.scope>
<hive.serde.scope>provided</hive.serde.scope>
<hive.shims.scope>provided</hive.shims.scope>
<orc.classifier>nohive</orc.classifier>
<datanucleus-core.version>3.2.10</datanucleus-core.version>
</properties>
</profile>

<profile>
<id>hive-2.3</id>
<!-- Default hive profile. Uses global properties. -->
Expand Down
3 changes: 0 additions & 3 deletions sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -221,8 +221,6 @@
</goals>
<configuration>
<sources>
<source>v${hive.version.short}/src/main/scala</source>
<source>v${hive.version.short}/src/main/java</source>
<source>src/main/scala-${scala.binary.version}</source>
</sources>
</configuration>
Expand All @@ -235,7 +233,6 @@
</goals>
<configuration>
<sources>
<source>v${hive.version.short}/src/test/scala</source>
<source>src/test/gen-java</source>
</sources>
</configuration>
Expand Down

This file was deleted.

This file was deleted.

Loading

0 comments on commit 008a2ad

Please sign in to comment.