Assumptions
-
GIT is installed
-
JDK (1.8+) is installed
-
Scala (2.1+) is installed
-
Maven is installed
-
SBT (Scala build tool) is installed
-
Spark cluster (with 1 master and at least 1 worker) is up and running and accessible
-
Following environment variables should be set:
SPARK_HOME is set in ~/.bashrc
Steps to setup the project in your local system
-
GIT CLONE avenir , beymani , chombo , hoidla.
-
Navigate to the folder named hoilda and execute the below commands:
mvn clean install sbt publishLocal
-
Navigate to the folder named chombo and follow the below sequence:
Build chombo first in master branch with
mvn clean install
sbt publishLocal
Build chombo-spark in chombo/spark directory
sbt clean package
sbt publishLocal
-
Navigate to the folder named avenir and execute the below command
mvn clean install
-
Navigate to the folder named beymani and execute the below command
mvn clean install sbt publishLocal
-
Build beymani-spark in beymani/spark directory
sbt clean package sbt publishLocal
-
Navigate to the folder named beymani /resource and execute
ant -f beymani_spark.xml
-
Navigate to the folder named chombo /resource and execute the below command
ant -f chombo_spark.xml
-
Navigate to the folder named beymani /resource and edit the and_spark.sh file to reflect the path in your local system:
Set the project home path ( PROJECT_HOME ) set the spark home path ( SPARK_HOME ) set the master as spark master ( MASTER )
Now you are ready to run the file and_spark.sh and below are the various parameters you should use to run the file:
Step 1 : Create base normal data
./and_spark.sh crInput <num_of_days> <reading_intervaL> <num_servers> <output_file>
where num_of_days = number of days e.g 10
reading_intervaL = reading interval in sec e.g. 300
num_servers = number of servers e.g. 4
output_file = output file, we will use c.txt from now on
./and_spark.sh crInput 10 300 40 c.txt
Step 2 : Copy modeling data
- insert outliers
./and_spark.sh insOutliers <normal_data_file> <with_outlier_data_file>
where
normal_data_file = normal data file (c.txt)
with_outlier_data_file = data file with outliers (cusage.txt)
./and_spark.sh insOutliers c.txt cusage.txt
-copy
./and_spark.sh cpModData <with_outlier_data_file>
where
with_outlier_data_file = data file with outliers (cusage.txt)
./and_spark.sh cpModData cusage.csv
Step 3 : Run Spark job for stats
./and_spark.sh numStat
Step 4 : Copy and consolidate stats file
./and_spark.sh crStatsFile
Step 5 : Run Spark job to detect outliers
Set output.outliers = true and rem.outliers = true
./and_spark.sh olPred
Step 6 : Copy and consolidate clean file
./and_spark.sh crCleanFile
Step 7 : Copy test data
- insert outliers
./and_spark.sh insOutliers <normal_data_file> <with_outlier_data_file>
where
normal_data_file = normal data file (c.txt)
with_outlier_data_file = data file with outliers (cusage.txt)
./and_spark.sh insOutliers c.txt cusage.txt
-copy
./and_spark.sh cpTestData <with_outlier_data_file>
where
with_outlier_data_file = data file with outliers (cusage.txt)
./and_spark.sh cpTestData cusage.txt
Step 8 : Run Spark job for stats again with clean data
./and_spark.sh numStat
Step 9 : Copy and consolidate stats file
./and_spark.sh crStatsFile
Step 10 : Run Spark job to detect outliers
Set output.outliers = true and rem.outliers = true
./and_spark.sh olPred
Configuration
Configuration is in and.conf & in and1.conf.