-
Notifications
You must be signed in to change notification settings - Fork 26
IBM BlueMix Analytics for Apache Spark
You can bring your own EclairJS Server JavaScript Apache Spark application and run it on the IBM® Analytics for Apache Spark for Bluemix® service.
Running your own EclairJS Server JavaScript Apache Spark application on the Analytics for Apache Spark service lets you take advantage of powerful on-demand processing in the cloud. And you can load your own data into an object store in Bluemix for fast, cost-effective access.
- In order to submit jobs you must have a IBM BlueMix account, and a Spark Service instance configured for your account.
- Download the IBM BlueMix spark-submit.sh script. The
spark-submit.sh
script is supported on Linux and Mac OS X. - Note the location where the IBM BlueMix
spark-submit.sh
script is download to and set the BLUEMIX_SPARK_SUMIT_SH environment variable to its location. For Example:export BLUEMIX_SPARK_SUMIT_SH=</Users/billreed/bluemix/spark-submit.sh
To run a EclairJS-nashorn JavaScript Spark application using the EclairJS-nashorn eclairjs-bluemix-spark-submit.sh script:
- Create your application. Be sure to test and debug your application locally before submitting it to the Analytics for Apache Spark service with the
eclairjs-bluemix-spark-submit.sh
script. - Create an instance of the Analytics for Apache Spark service on IBM Bluemix, and record the credentials from the Service Credentials page.
- Copy the service credentials from your Spark instance to a file named vcap.json in the directory where you plan to run the
eclairjs-bluemix-spark-submit.sh
script. For example:
{
"credentials": {
"tenant_id": "s1a9-7f40d9344b951f-42a9cf195d79",
"tenant_id_full": "b404510b-d17a-43d2-b1a9-7f40d9344b95_9c09a3cb-7178-43f9-9e1f-42a9cf195d79",
"cluster_master_url": "https://spark.bluemix.net",
"instance_id": "b404510b-d17a-43d2-b1a9-7f40d9344b95",
"tenant_secret": "8b2d72ad-8ac5-4927-a90c-9ca9edfad124",
"plan":"ibm.SparkService.PayGoPersonal"
}
}
- Run the
$ECLAIRJS_HOME/bin/eclairjs-bluemix-spark-submit.sh
script in your local shell. Wheretest.js
is your EclairJS-nashorn JavaScript Apache Spark Appplication For example:
bin/eclairjs-bluemix-spark-submit.sh /
--vcap ./vcap.json --deploy-mode cluster /
--files /Users/billreed/eclairjs_dev/eclairjs/server/examples/test.js /
--master https://spark.bluemix.net file://test.js
The generated log file lists the steps taken by the IBM BlueMix spark-submit.sh
script are located in the folder where you run the script. eclairjs-bluemix-spark-submit.sh
is a convince script that uses IBM BlueMix spark-submit.sh
for submitting Spark Jobs to the IBM BlueMix Spark service. For more details about using spark-submit.
IBM® Object Storage for Bluemix® provides you with access to a fully provisioned Swift Object Storage account to manage your data. Swift provides a fully distributed, API-accessible storage platform. You can use it directly in your applications or for backups, making it ideal for cost effective, scale-out storage.
To get started with Object Storage:
- Provision your Object Storage instance from the Bluemix catalog.
- Configure your Object Storage instance and click Create, leave Unbound option for the App field.
- Create a storage container, through the web UI, the CLI or REST API
- Add your files to the storage container.
- Record your service Service Credentials for your service instance, they will be needed in your Spark application.
- Add the following lines of code to allow Apache Spark Hadoop to connect to your Object Storage.
sc.setHadoopConfiguration("fs.swift.service.softlayer.auth.url",
"https://identity.open.softlayer.com/v3/auth/tokens"
);
sc.setHadoopConfiguration("fs.swift.service.softlayer.auth.endpoint.prefix", "endpoints");
sc.setHadoopConfiguration("fs.swift.service.softlayer.tenant", "productId"); // productId
sc.setHadoopConfiguration("fs.swift.service.softlayer.username", "user_id"); // userId
sc.setHadoopConfiguration("fs.swift.service.softlayer.password", "secret"); // password
sc.setHadoopConfiguration("fs.swift.service.softlayer.apikey", "secret"); // password
sc.setHadoopConfiguration("fs.swift.service.softlayer.region", "dallas");
// Use the swift:// protocol when accessing your files
replace container with your storage container name and object with your file object name.
var rdd = sc.textFile("swift://container.softlayer/object").
A working example can be found in the eclairJS-examples git repository.