functions like count, take not working on PairRDD #320

vananth22 · 2016-09-29T23:42:38Z

once the RDD converted in to pairRDD, none of the agg function working on it.
steps
var file = "src/test/resources/dream.txt"; var rdd = sc.textFile(file).cache(); var rdd2 = rdd.flatMap(function(sentence) { return sentence.split(" "); }); var count = rdd2.count() // throw exception
ErrorTrace
`FlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator;
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1682)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:446)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:403)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:399)
at jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
at org.eclairjs.nashorn.SparkJS.eval(SparkJS.java:91)
at org.eclairjs.nashorn.SparkJS.repl(SparkJS.java:151)
at org.eclairjs.nashorn.SparkJS.main(SparkJS.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.AbstractMethodError: org.eclairjs.nashorn.JSFlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator;
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1682)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
at org.apache.spark.rdd.RDD.count(RDD.scala:1115)
at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:454)
at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
at jdk.nashorn.internal.scripts.Script$Recompilation$127$6199$RDD.count(jar:/target/eclairjs-nashorn-0.1.jar!/RDD.js:161)
at jdk.nashorn.internal.scripts.Script$139$^eval_.:program(:1)
at jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
at jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
... 17 more
Caused by: java.lang.AbstractMethodError: org.eclairjs.nashorn.JSFlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator;
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1682)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1115)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)`

The text was updated successfully, but these errors were encountered:

doronrosenberg · 2016-09-30T14:20:59Z

What version are you using?

vananth22 · 2016-10-01T21:00:04Z

@doronrosenberg I tried with spark_1.6.0 branch

billreed63 · 2016-10-03T12:53:13Z

Does the examples/word_count.js work for you?

vananth22 · 2016-10-03T21:25:31Z

@billreed63 Nope. getting the same exception

java.lang.AbstractMethodError: org.eclairjs.nashorn.JSFlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

billreed63 · 2016-10-04T16:03:19Z

What version of Java are you running, and what OS?
I just cloned the latest master:

export SPARK_HOME=/usr/local/spark-1.6.0-bin-hadoop2.6
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
export PATH=$JAVA_HOME/bin:$PATH
mvn clean
mvn package
./bin/eclairjs.sh examples/word_count.js

When you do a mvn package the test cases are run, if you dont see

--- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ eclairjs-nashorn ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:55 min
[INFO] Finished at: 2016-10-04T11:49:01-04:00
[INFO] Final Memory: 67M/480M
[INFO] ------------------------------------------------------------------------

You have not gotten a clean build and one of the test cases is failing.

vananth22 · 2016-10-06T20:07:53Z

@billreed63 I'm running on Mac OS with
java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
and on spark_1.6.0 branch.
One more thing I noticed that toree-kernel-api_2.11 been used as a dependency in the latest branch where I don't found the dependency published on maven repository.

vananth22 · 2016-10-06T20:24:33Z

I tried in Linux x86_64 x86_64 x86_64 GNU/Linux with spark1.6 and 1.8.0_101. Still getting the same exception.

billreed63 · 2016-10-06T21:50:33Z

switch to the master branch, spark_1.6.0 is out of date. The master is for spark 1.6.x

vananth22 · 2016-10-07T00:31:16Z

🙇 Let me try that.

billreed63 self-assigned this Oct 4, 2016

billreed63 added help wanted question and removed help wanted labels Oct 4, 2016

billreed63 added this to the 0.7 milestone Oct 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functions like count, take not working on PairRDD #320

functions like count, take not working on PairRDD #320

vananth22 commented Sep 29, 2016

doronrosenberg commented Sep 30, 2016

vananth22 commented Oct 1, 2016

billreed63 commented Oct 3, 2016

vananth22 commented Oct 3, 2016

billreed63 commented Oct 4, 2016

vananth22 commented Oct 6, 2016

vananth22 commented Oct 6, 2016 •

edited

Loading

billreed63 commented Oct 6, 2016 •

edited

Loading

vananth22 commented Oct 7, 2016

functions like count, take not working on PairRDD #320

functions like count, take not working on PairRDD #320

Comments

vananth22 commented Sep 29, 2016

doronrosenberg commented Sep 30, 2016

vananth22 commented Oct 1, 2016

billreed63 commented Oct 3, 2016

vananth22 commented Oct 3, 2016

billreed63 commented Oct 4, 2016

vananth22 commented Oct 6, 2016

vananth22 commented Oct 6, 2016 • edited Loading

billreed63 commented Oct 6, 2016 • edited Loading

vananth22 commented Oct 7, 2016

vananth22 commented Oct 6, 2016 •

edited

Loading

billreed63 commented Oct 6, 2016 •

edited

Loading