Skip to content

Server module loading with require

Bill Reed edited this page Oct 3, 2016 · 1 revision

As of 04/13/16 the master branch of EclairJS was updated such that "require" must now be used for module loading. This is also true of Nashorn release 0.4 on. If you have an EclairJS module or custom module you wish to use you must require it at the top of your file. Additionally, if you wish to use that module in a lambda (a.k.a. anonymous) function you must bind it as an argument to your lambda function.

This was done to 1) reduce the memory print of what we load for the Spark worker nodes as well as 2) allow custom modules to be shipped to worker nodes in a clustered environment.

The basic rules of thumb are:

  • If you use a class/module in your code then require it.
  • If you use that class/module by the variable name you assign it to in a lambda function then pass the variable name into the lambda as a bound argument.
var LabeledPoint = require('eclairjs/mllib/regression/LabeledPoint');
var DenseVector = require('eclairjs/mllib/linalg/DenseVector');
var parsedData = data.map(function (s, LabeledPoint, DenseVector) {
    var parts = s.split(",");
    var features = parts[1].split(" ");
    return new LabeledPoint(parts[0], new DenseVector(features));
}, [LabeledPoint, DenseVector]);
  • Use full path to module in your require statement:
var DataFrame = require('eclairjs/sql/DataFrame');

All of the examples in our repository have been updated to reflect this and a simple example is given below as well.

/*
 * Require modules needed within lambda functions.
 */
var LabeledPoint = require('eclairjs/mllib/regression/LabeledPoint');
var DenseVector = require('eclairjs/mllib/linalg/DenseVector');
var LinearRegressionWithSGD = require('eclairjs/mllib/regression/LinearRegressionWithSGD');

var sparkConf = new SparkConf().setAppName("Linear Regression Example");
var sc = new SparkContext(sparkConf);

var filename = ((typeof args !== "undefined") && (args.length > 1)) ? args[1] : "examples/data/lpsa.data";
var data = sc.textFile(filename).cache();

/*
 * Map data into a LabeledPoint - notice the use of required modules LabeledPoint and 
 * DenseVector as bound arguments to RDD.map() since they are directly used by name 
 * within the lambda function.
 */
var parsedData = data.map(function (s, LabeledPoint, DenseVector) {
    var parts = s.split(",");
    var features = parts[1].split(" ");
    return new LabeledPoint(parts[0], new DenseVector(features));
}, [LabeledPoint, DenseVector]);

var numIterations = 3;
var linearRegressionModel = LinearRegressionWithSGD.train(parsedData, numIterations);

var delta = 17;
/*
 * Since lp and linearRegressionModel are instances of classes in the modules where they
 * are defined the modules do not have to be passed as bound arguemnts to the lambda.  This 
 * is because the objects themselves can be interrogated to find out what module they are 
 * an instance of.
 */
var valuesAndPreds = parsedData.mapToPair(function (lp, linearRegressionModel, delta) {
    var label = lp.getLabel();
    var f = lp.getFeatures();
    var prediction = linearRegressionModel.predict(f) + delta;
    return new Tuple(prediction, label);
}, [linearRegressionModel, delta]); // end MapToPair

var result = valuesAndPreds.take(10);
print("valuesAndPreds: " + result.toString());
sc.stop();