Skip to content

IBM BlueMix Analytics for Apache Spark

Bill Reed edited this page Nov 16, 2016 · 3 revisions

Run applications with your IBM BlueMix Analytics for Apache Spark instance

You can bring your own EclairJS Server JavaScript Apache Spark application and run it on the IBM® Analytics for Apache Spark for Bluemix® service.

Running your own EclairJS Server JavaScript Apache Spark application on the Analytics for Apache Spark service lets you take advantage of powerful on-demand processing in the cloud. And you can load your own data into an object store in Bluemix for fast, cost-effective access.

Follow this procedure when your application JAR file is stable and no more testing is required.

Restriction:

  • In order to submit jobs you must have a IBM BlueMix account, and a Spark Service instance configured for your account.
  • Download the IBM BlueMix spark-submit.sh script. The spark-submit.sh script is supported on Linux and Mac OS X.
  • Note the location where the IBM BlueMix spark-submit.sh script is download to and set the BLUEMIX_SPARK_SUMIT_SH environment variable to its location. For Example: export BLUEMIX_SPARK_SUMIT_SH=</Users/billreed/bluemix/spark-submit.sh

To run a EclairJS-nashorn JavaScript Spark application using the EclairJS-nashorn eclairjs-bluemix-spark-submit.sh script:

  1. Create your application. Be sure to test and debug your application locally before submitting it to the Analytics for Apache Spark service with the eclairjs-bluemix-spark-submit.sh script.
  2. Create an instance of the Analytics for Apache Spark service on IBM Bluemix, and record the credentials from the Service Credentials page.
  3. Copy the service credentials from your Spark instance to a file named vcap.json in the directory where you plan to run the eclairjs-bluemix-spark-submit.sh script. For example:
{
  "credentials": {
    "tenant_id": "s1a9-7f40d9344b951f-42a9cf195d79",
    "tenant_id_full": "b404510b-d17a-43d2-b1a9-7f40d9344b95_9c09a3cb-7178-43f9-9e1f-42a9cf195d79",
    "cluster_master_url": "https://spark.bluemix.net",
    "instance_id": "b404510b-d17a-43d2-b1a9-7f40d9344b95",
    "tenant_secret": "8b2d72ad-8ac5-4927-a90c-9ca9edfad124", 
     "plan":"ibm.SparkService.PayGoPersonal"
  }
}
  1. Run the $ECLAIRJS_HOME/bin/eclairjs-bluemix-spark-submit.sh script in your local shell. Where test.js is your EclairJS-nashorn JavaScript Apache Spark Appplication For example:
bin/eclairjs-bluemix-spark-submit.sh /
--vcap ./vcap.json --deploy-mode cluster /
--files /Users/billreed/eclairjs_dev/eclairjs/server/examples/test.js /
--master https://spark.bluemix.net  file://test.js

The generated log file lists the steps taken by the IBM BlueMix spark-submit.sh script are located in the folder where you run the script. eclairjs-bluemix-spark-submit.sh is a convince script that uses IBM BlueMix spark-submit.sh for submitting Spark Jobs to the IBM BlueMix Spark service. For more details about using spark-submit.

Using IBM BlueMix Object Store Service in your Spark Application.

IBM® Object Storage for Bluemix® provides you with access to a fully provisioned Swift Object Storage account to manage your data. Swift provides a fully distributed, API-accessible storage platform. You can use it directly in your applications or for backups, making it ideal for cost effective, scale-out storage.

To get started with Object Storage:

  1. Provision your Object Storage instance from the Bluemix catalog.
  2. Configure your Object Storage instance and click Create, leave Unbound option for the App field.
  3. Create a storage container, through the web UI, the CLI or REST API
  4. Add your files to the storage container.
  5. Record your service Service Credentials for your service instance, they will be needed in your Spark application.
  6. Add the following lines of code to allow Apache Spark Hadoop to connect to your Object Storage.
sc.setHadoopConfiguration("fs.swift.service.softlayer.auth.url",
                          "https://identity.open.softlayer.com/v3/auth/tokens"
                         );
sc.setHadoopConfiguration("fs.swift.service.softlayer.auth.endpoint.prefix", "endpoints");
sc.setHadoopConfiguration("fs.swift.service.softlayer.tenant", "productId"); // productId
sc.setHadoopConfiguration("fs.swift.service.softlayer.username", "user_id"); // userId
sc.setHadoopConfiguration("fs.swift.service.softlayer.password", "secret"); // password
sc.setHadoopConfiguration("fs.swift.service.softlayer.apikey", "secret"); // password
sc.setHadoopConfiguration("fs.swift.service.softlayer.region", "dallas");
// Use the swift:// protocol when accessing your files
replace container with your storage container name and object with your file object name.
var rdd = sc.textFile("swift://container.softlayer/object"). 

A working example can be found in the eclairJS-examples git repository.