Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when using RDD.takeOrdered with BlueMix Apache Spark Service #16

Open
billreed63 opened this issue Jan 9, 2017 · 1 comment
Assignees

Comments

@billreed63
Copy link
Contributor

this code example runs on local spark (even local spark cluster)


var eclairjs = require('eclairjs');
var spark = new eclairjs();
var session = spark.sql.SparkSession.builder()
  .appName("test")
  .getOrCreate();
var sc = session.sparkContext();
var rdd = sc.parallelize([1,2,3]);
rdd.takeOrdered(2, function (x) {
       return 0;
});
 

but throws exception on BlueMix Spark service

Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.util.MutableURLClassLoader@de7ec20
StackTrace:   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1461)
@billreed63 billreed63 self-assigned this Jan 9, 2017
@billreed63
Copy link
Contributor Author

This seems to be an issue with the BlueMix service, I am able to reproduce the error by using a scala spark 2.0 notebook on the BlueMix service, EclairJS is not in the mix:

var builder1 = org.apache.spark.sql.SparkSession.builder();
var sparkSession1 = builder1.getOrCreate();
var sparkContext1 = sparkSession1.sparkContext;
sparkContext1.version;

var javaSC = new org.apache.spark.api.java.JavaSparkContext(sparkContext1);
var rdd = javaSC.parallelizeDoubles(java.util.Arrays.asList(1.0, 2.0, 3.0, 4.0));
rdd.count;

class DoubleComparator extends java.util.Comparator[java.lang.Double] with java.io.Serializable {
 def compare(o1: java.lang.Double, o2: java.lang.Double) = o1.compareTo(o2)
}

var rdd2 = rdd.takeOrdered(2, new DoubleComparator());

The exception displayed in the notebook is:

Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.util.MutableURLClassLoader@de7ec20
StackTrace:   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1461)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1449)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1448)

As a work around convert the RDD to a Dataset/Dataframe and use a sort to order the results and then a take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant