GIT is installed
JDK (1.8+) is installed
Scala (2.1+) is installed
Maven is installed
SBT (Scala build tool) is installed
Spark cluster (with 1 master and at least 1 worker) is up and running and accessible
Following environment variables should be set:
SPARK_HOME is set in ~/.bashrc
Steps to setup the project in your local system
GIT CLONE avenir , beymani , chombo , hoidla.
Navigate to the folder named hoilda and execute the below commands:
mvn clean install sbt publishLocal
Navigate to the folder named chombo and follow the below sequence:
Build chombo first in master branch with
mvn clean install
sbt publishLocal
Build chombo-spark in chombo/spark directory
sbt clean package
sbt publishLocal
Navigate to the folder named avenir and execute the below command
mvn clean install
Navigate to the folder named beymani and execute the below command
mvn clean install sbt publishLocal
Build beymani-spark in beymani/spark directory
sbt clean package sbt publishLocal
Navigate to the folder named beymani /resource and execute
ant -f beymani_spark.xml
Navigate to the folder named chombo /resource and execute the below command
ant -f chombo_spark.xml
Navigate to the folder named beymani /resource and edit the file to reflect the path in your local system:
Set the project home path ( PROJECT_HOME ) set the spark home path ( SPARK_HOME ) set the master as spark master ( MASTER )
Now you are ready to run the file and below are the various parameters you should use to run the file:
Step 1 : Create base normal data
./ crInput <num_of_days> <reading_intervaL> <num_servers> <output_file>
where num_of_days = number of days e.g 10
reading_intervaL = reading interval in sec e.g. 300
num_servers = number of servers e.g. 4
output_file = output file, we will use c.txt from now on
./ crInput 10 300 40 c.txt
Step 2 : Copy modeling data
- insert outliers
./ insOutliers <normal_data_file> <with_outlier_data_file>
normal_data_file = normal data file (c.txt)
with_outlier_data_file = data file with outliers (cusage.txt)
./ insOutliers c.txt cusage.txt
./ cpModData <with_outlier_data_file>
with_outlier_data_file = data file with outliers (cusage.txt)
./ cpModData cusage.csv
Step 3 : Run Spark job for stats
./ numStat
Step 4 : Copy and consolidate stats file
./ crStatsFile
Step 5 : Run Spark job to detect outliers
Set output.outliers = true and rem.outliers = true
./ olPred
Step 6 : Copy and consolidate clean file
./ crCleanFile
Step 7 : Copy test data
- insert outliers
./ insOutliers <normal_data_file> <with_outlier_data_file>
normal_data_file = normal data file (c.txt)
with_outlier_data_file = data file with outliers (cusage.txt)
./ insOutliers c.txt cusage.txt
./ cpTestData <with_outlier_data_file>
with_outlier_data_file = data file with outliers (cusage.txt)
./ cpTestData cusage.txt
Step 8 : Run Spark job for stats again with clean data
./ numStat
Step 9 : Copy and consolidate stats file
./ crStatsFile
Step 10 : Run Spark job to detect outliers
Set output.outliers = true and rem.outliers = true
./ olPred
Configuration is in and.conf & in and1.conf.