You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this code example runs on local spark (even local spark cluster)
var eclairjs = require('eclairjs');
var spark = new eclairjs();
var session = spark.sql.SparkSession.builder()
.appName("test")
.getOrCreate();
var sc = session.sparkContext();
var rdd = sc.parallelize([1,2,3]);
rdd.takeOrdered(2, function (x) {
return 0;
});
but throws exception on BlueMix Spark service
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.util.MutableURLClassLoader@de7ec20
StackTrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1461)
The text was updated successfully, but these errors were encountered:
This seems to be an issue with the BlueMix service, I am able to reproduce the error by using a scala spark 2.0 notebook on the BlueMix service, EclairJS is not in the mix:
var builder1 = org.apache.spark.sql.SparkSession.builder();
var sparkSession1 = builder1.getOrCreate();
var sparkContext1 = sparkSession1.sparkContext;
sparkContext1.version;
var javaSC = new org.apache.spark.api.java.JavaSparkContext(sparkContext1);
var rdd = javaSC.parallelizeDoubles(java.util.Arrays.asList(1.0, 2.0, 3.0, 4.0));
rdd.count;
class DoubleComparator extends java.util.Comparator[java.lang.Double] with java.io.Serializable {
def compare(o1: java.lang.Double, o2: java.lang.Double) = o1.compareTo(o2)
}
var rdd2 = rdd.takeOrdered(2, new DoubleComparator());
The exception displayed in the notebook is:
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.util.MutableURLClassLoader@de7ec20
StackTrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1461)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1449)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1448)
As a work around convert the RDD to a Dataset/Dataframe and use a sort to order the results and then a take.
this code example runs on local spark (even local spark cluster)
but throws exception on BlueMix Spark service
The text was updated successfully, but these errors were encountered: