RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.
Before installing RHive, you have to have installed Hadoop and Hive on a machine which you want to install RHive
- Single Node
- Cluster Node
- set HADOOP_HOME at machine on which R runs
- install Hive on RHive machine and remote machine on which NameNode runs or Hive-Server runs.
- Installation Guide
- set HIVE_HOME at local machine on which R runs.
- launch Hive Server with following command on remote machine. it should be as a background process.
$HIVE_HOME/bin/hive --service hiveserver
- install R
- you need to install R on all tasktracker nodes.
- install rJava
- RHive require rJava package as a prerequsite. you need to install rJava.
- Rserve mode - install Rserve
- you need to install Rserve on all tasktracker nodes
- set RHIVE_DATA as R objects and R functions repository on all tasktracker nodes. if RHIVE_DATA is not set then it will be '/tmp' as a default.
- e.g>
export RHIVE_DATA=/rhive/data
- e.g>
- make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
- launch all Rserve on all tasktracker nodes.
- e.q>
R CMD Rserve
- e.q>
- No Rserve mode - setting tasktracker nodes (Optional)
- set RHIVE_DATA as R objects and R functions repository on all tasktracker nodes.
- e.q>
export RHIVE_DATA=/rhive/data
- e.q>
export RHIVE_DATA=/tmp
- e.q>
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- e.q>
export R_HOME=/usr/lib/R
- e.q>
- set RHIVE_DATA as R objects and R functions repository on all tasktracker nodes.
- install RUnit
- Requirements
- ant (in order to build jar files)
- Installing RHive
- Compressed package:
R CMD INSTALL RHive_1.0-0.0.tar.gz
- Source code:
R CMD INSTALL ./RHive
- Compressed package:
- If HADOOP_HOME doesn't exist, do following instruction :
- copy RUDF/RUDAF library(rhive_udf.jar) to '/rhive/lib/' of HDFS path, using this command : 'hadoop fs -put rhive_udf.jar /rhive/lib/rhive_udf.jar'. this jar file exists under $HIVE_HOME/lib.
- launch R
library(RHive)
rhive.connect(hive-server-ip)
for help you can also refer the tutorials
- Java 1.6
- R 2.13.0
- Rserve 0.6-0
- rJava 0.9-0
- Hadoop 0.20.x (x >= 1)
- Hive 0.8.x (x >= 0)