Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prashan_pul #1

Open
wants to merge 3,444 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
3444 commits
Select commit Hold shift + click to select a range
97cd27e
Add graph loader links to doc
ankurdave Jan 13, 2014
27311b1
Added unpersisting and modified testsuite to better test out metadata…
tdas Jan 13, 2014
8038da2
Merge pull request #2 from jegonzal/GraphXCCIssue
ankurdave Jan 13, 2014
30328c3
Updated JavaStreamingContext to make scaladoc compile.
rxin Jan 13, 2014
e2d25d2
Merge branch 'master' into graphx
rxin Jan 14, 2014
01c0d72
Merge pull request #410 from rxin/scaladoc1
rxin Jan 14, 2014
dc041cd
Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx
rxin Jan 14, 2014
c0bb38e
Improved file input stream further.
tdas Jan 14, 2014
1bd5cef
Remove aggregateNeighbors
ankurdave Jan 14, 2014
ae4b75d
Add EdgeDirection.Either and use it to fix CC bug
ankurdave Jan 14, 2014
cfe4a29
Improvements in example code for the programming guide as well as add…
jegonzal Jan 14, 2014
1233b3d
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 14, 2014
02a8f54
Miscel doc update.
rxin Jan 14, 2014
a4e12af
Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…
rxin Jan 14, 2014
87f335d
Made more things private.
rxin Jan 14, 2014
ae06d2c
Updated GraphGenerator.
rxin Jan 14, 2014
1dce9ce
Moved PartitionStrategy's into an object.
rxin Jan 14, 2014
79a5ba3
Yarn Client refactor
colorant Jan 9, 2014
161ab93
Yarn workerRunnable refactor
colorant Jan 9, 2014
622b7f7
Minor changes in graphx programming guide.
jegonzal Jan 14, 2014
552de5d
Finished second pass on pregel docs.
jegonzal Jan 14, 2014
4c22c55
Address comments to fix code formats
colorant Jan 10, 2014
8e5c732
Moved SVDPlusPlusConf into SVDPlusPlus object itself.
rxin Jan 14, 2014
9317286
More cleanup.
rxin Jan 14, 2014
0b18bfb
Updated doc for PageRank.
rxin Jan 14, 2014
0fbc0b0
Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…
rxin Jan 14, 2014
d4cd5de
Fix for Kryo Serializer
pwendell Jan 14, 2014
ee8931d
Finished documenting vertexrdd.
jegonzal Jan 14, 2014
9e84e70
Add default value for HadoopRDD's `cloneRecords` constructor arg, to …
harveyfeng Jan 14, 2014
a2fee38
Merge pull request #411 from tdas/filestream-fix
pwendell Jan 14, 2014
33022d6
Adjusted visibility of various components.
rxin Jan 14, 2014
b07bc02
Merge pull request #412 from harveyfeng/master
pwendell Jan 14, 2014
cc93c2a
Disable MLlib tests for now while Jenkins is still on Python 2.6
mateiz Jan 14, 2014
8399341
Wording changes per Patrick
andrewor14 Jan 14, 2014
d4d9ece
Remove Graph.statistics and GraphImpl.printLineage
ankurdave Jan 14, 2014
84d6af8
Make Graph{,Impl,Ops} serializable to work around capture
ankurdave Jan 14, 2014
c6023be
Fix infinite loop in GraphGenerators.generateRandomEdges
ankurdave Jan 14, 2014
59e4384
Fix Pregel SSSP example in programming guide
ankurdave Jan 14, 2014
c28e5a0
Improve scaladoc links
ankurdave Jan 14, 2014
e14a14b
Remove K-Core and LDA sections from guide; they are unimplemented
ankurdave Jan 14, 2014
67795db
Write Graph Builders section in guide
ankurdave Jan 14, 2014
6f6f8c9
Wrap methods in the appropriate class/object declaration
ankurdave Jan 14, 2014
c6dbfd1
Edge object must be public for Edge case class
ankurdave Jan 14, 2014
76ebdae
Fix bug in GraphLoader.edgeListFile that caused srcId > dstId
ankurdave Jan 14, 2014
08b9fec
Merge pull request #409 from tdas/unpersist
pwendell Jan 14, 2014
2cd9358
Finish 6f6f8c928ce493357d4d32e46971c5e401682ea8
ankurdave Jan 14, 2014
af645be
Fix all code examples in guide
ankurdave Jan 14, 2014
0ca0d4d
Merge pull request #401 from andrewor14/master
pwendell Jan 14, 2014
0d94d74
Code clean up for mllib
soulmachine Jan 14, 2014
12386b3
Since getLong() and getInt() have side effect, get back parentheses, …
soulmachine Jan 14, 2014
68641bc
Merge pull request #413 from rxin/scaladoc
pwendell Jan 14, 2014
4bafc4f
adding documentation about EdgeRDD
jegonzal Jan 14, 2014
945fe7a
Merge pull request #408 from pwendell/external-serializers
pwendell Jan 14, 2014
80e73ed
Adding minimal additional functionality to EdgeRDD
jegonzal Jan 14, 2014
4a805af
Merge pull request #367 from ankurdave/graphx
pwendell Jan 14, 2014
c2852cf
Indent two spaces
soulmachine Jan 14, 2014
fdaabdc
Merge pull request #380 from mateiz/py-bayes
pwendell Jan 14, 2014
4e497db
Removed StreamingContext.registerInputStream and registerOutputStream…
tdas Jan 14, 2014
0984647
Enable compression by default for spills
pwendell Jan 14, 2014
055be5c
Merge pull request #415 from pwendell/shuffle-compress
pwendell Jan 14, 2014
a3da468
Merge remote-tracking branch 'upstream/master' into code-style
soulmachine Jan 14, 2014
845e568
Merge remote-tracking branch 'upstream/master' into sparsesvd
rezazadeh Jan 14, 2014
f8e239e
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 14, 2014
f8bd828
Fixed loose ends in docs.
tdas Jan 14, 2014
980250b
Merge pull request #416 from tdas/filestream-fix
pwendell Jan 14, 2014
1442cd5
Modifications as suggested in PR feedback-
Jan 14, 2014
2303479
Add missing header files
pwendell Jan 14, 2014
fa75e5e
Merge pull request #420 from pwendell/header-files
pwendell Jan 14, 2014
57fcfc7
Added parentheses for that getDouble() also has side effect
soulmachine Jan 14, 2014
486f37c
Improving the graphx-programming-guide.
jegonzal Jan 14, 2014
3fcc68b
Merge pull request #423 from jegonzal/GraphXProgrammingGuide
rxin Jan 14, 2014
0bba773
Additional edits for clarity in the graphx programming guide.
jegonzal Jan 14, 2014
71b3007
Broadcast variable visibility change & doc update.
rxin Jan 14, 2014
6a12b9e
Updated API doc for Accumulable and Accumulator.
rxin Jan 14, 2014
f8c12e9
Added package doc for the Java API.
rxin Jan 14, 2014
55db774
Added license header for package.scala in the Java API package.
rxin Jan 14, 2014
1b5623f
Maintain Serializable API compatibility by reverting back to java.io.…
rxin Jan 14, 2014
f12e506
Fixed a typo in JavaSparkContext's API doc.
rxin Jan 14, 2014
6f965a4
Don't clone records for text files
pwendell Jan 14, 2014
938e4a0
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)
mateiz Jan 14, 2014
b683608
Deprecate rather than remove old combineValuesByKey function
pwendell Jan 14, 2014
5b3a3e2
Complain if Python and NumPy versions are too old for MLlib
mateiz Jan 14, 2014
2ce23a5
Merge pull request #425 from rxin/scaladoc
rxin Jan 14, 2014
8ea2cd5
Adding fix covering combineCombinersByKey as well
pwendell Jan 14, 2014
b1b22b7
Style fix
pwendell Jan 14, 2014
8ea056d
Add GraphX dependency to examples/pom.xml
ankurdave Jan 14, 2014
d601a76
Merge pull request #427 from pwendell/deprecate-aggregator
rxin Jan 14, 2014
193a075
Merge pull request #429 from ankurdave/graphx-examples-pom.xml
rxin Jan 14, 2014
74b46ac
Merge pull request #428 from pwendell/writeable-objects
rxin Jan 14, 2014
1210ec2
Describe GraphX caching and uncaching in guide
ankurdave Jan 15, 2014
ad294db
Merge pull request #431 from ankurdave/graphx-caching-doc
rxin Jan 15, 2014
3a386e2
Merge pull request #424 from jegonzal/GraphXProgrammingGuide
rxin Jan 15, 2014
148757e
Add deb profile to assembly/pom.xml
markhamstra Jan 15, 2014
f4d9019
VertexID -> VertexId
ankurdave Jan 15, 2014
147a943
Removed repl-bin and updated maven build doc.
markhamstra Jan 15, 2014
dfb1524
Fixed SVDPlusPlusSuite in Maven build.
rxin Jan 15, 2014
1f4718c
Changed SparkConf to not be serializable. And also fixed unit-test lo…
tdas Jan 15, 2014
0e15bd7
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 15, 2014
087487e
Merge pull request #434 from rxin/graphxmaven
pwendell Jan 15, 2014
139c24e
Merge pull request #435 from tdas/filestream-fix
pwendell Jan 15, 2014
0aea33d
Expose method and class - so that we can use it from user code (parti…
mridulm Jan 15, 2014
3d9e66d
Merge pull request #436 from ankurdave/VertexId-case
rxin Jan 15, 2014
263933d
remove "-XX:+UseCompressedStrings" option
CrazyJvm Jan 15, 2014
cef2af9
Merge pull request #366 from colorant/yarn-dev
tgravescs Jan 15, 2014
494d3c0
Merge pull request #433 from markhamstra/debFix
pwendell Jan 15, 2014
9259d70
GraphX shouldn't list Spark as provided
pwendell Jan 15, 2014
00a3f7e
Workers should use working directory as spark home if it's not specified
pwendell Jan 15, 2014
5fecd25
Merge pull request #441 from pwendell/graphx-build
pwendell Jan 15, 2014
9e63753
Made some classes private[stremaing] and deprecated a method in JavaS…
tdas Jan 15, 2014
2a05403
Merge pull request #443 from tdas/filestream-fix
pwendell Jan 15, 2014
59f475c
Merge pull request #442 from pwendell/standalone
pwendell Jan 15, 2014
2ffdaef
Clarify that Python 2.7 is only needed for MLlib
mateiz Jan 15, 2014
4f0c361
Merge pull request #444 from mateiz/py-version
pwendell Jan 15, 2014
a268d63
Fail rather than hanging if a task crashes the JVM.
kayousterhout Jan 16, 2014
0675ca5
Merge pull request #439 from CrazyJvm/master
rxin Jan 16, 2014
7a0c5b5
fix "set MASTER automatically fails" bug.
CrazyJvm Jan 16, 2014
8400536
fix some format problem.
CrazyJvm Jan 16, 2014
84595ea
Merge pull request #414 from soulmachine/code-style
rxin Jan 16, 2014
718a13c
Updated unit test comment
kayousterhout Jan 16, 2014
c06a307
Merge pull request #445 from kayousterhout/exec_lost
rxin Jan 16, 2014
4e510b0
Fixed Window spark shell launch script error.
Qiuzhuang Jan 16, 2014
1a0da89
Address review comments
mridulm Jan 16, 2014
edd82c5
Use method, not variable
mridulm Jan 16, 2014
11e6534
Updated java API docs for streaming, along with very minor changes in…
tdas Jan 16, 2014
fcb4fc6
adding clone records field to equivaled java apis
ScrapCodes Jan 14, 2014
d4fd89e
Merge pull request #438 from ScrapCodes/clone-records-java-api
pwendell Jan 17, 2014
d749d47
Merge pull request #451 from Qiuzhuang/master
pwendell Jan 17, 2014
b690e11
Address review comment
mridulm Jan 17, 2014
d28bf41
changes from PR
rezazadeh Jan 17, 2014
cb13b15
use 0-indexing
rezazadeh Jan 17, 2014
eb2d8c4
replace this.type with SVD
rezazadeh Jan 17, 2014
dbec69b
add rename computeSVD
rezazadeh Jan 17, 2014
c9b4845
prettify
rezazadeh Jan 17, 2014
5c639d7
0index docs
rezazadeh Jan 17, 2014
4e96757
make example 0-indexed
rezazadeh Jan 17, 2014
caf97a2
Merge remote-tracking branch 'upstream/master' into sparsesvd
rezazadeh Jan 17, 2014
fa32998
rename to MatrixSVD
rezazadeh Jan 17, 2014
85b95d0
rename to MatrixSVD
rezazadeh Jan 17, 2014
e91ad3f
Correct L2 regularized weight update with canonical form
srowen Jan 18, 2014
5316bca
Use renamed shuffle spill config in CoGroupedRDD.scala
pwendell Jan 18, 2014
aa981e4
Merge pull request #461 from pwendell/master
pwendell Jan 18, 2014
fd833e7
Allow files added through SparkContext.addFile() to be overwritten
liyinan926 Jan 18, 2014
bf56995
Merge pull request #462 from mateiz/conf-file-fix
pwendell Jan 19, 2014
4c16f79
Merge pull request #426 from mateiz/py-ml-tests
pwendell Jan 19, 2014
73dfd42
Merge pull request #437 from mridulm/master
pwendell Jan 19, 2014
fe8a354
Merge pull request #459 from srowen/UpdaterL2Regularization
pwendell Jan 19, 2014
584323c
Addressed comments from Reynold
liyinan926 Jan 19, 2014
720836a
LocalSparkContext for MLlib
ajtulloch Jan 19, 2014
ceb79a3
Only log error on missing jar to allow spark examples to jar.
tgravescs Jan 19, 2014
dd56b21
update comment
tgravescs Jan 19, 2014
256a355
Merge pull request #458 from tdas/docs-update
pwendell Jan 19, 2014
792d908
Merge pull request #470 from tgravescs/fix_spark_examples_yarn
pwendell Jan 19, 2014
f9a95d6
executor creation failed should not make the worker restart
CodingCat Jan 16, 2014
29f4b6a
fix for SPARK-1027
CodingCat Jan 16, 2014
3e85b87
SPARK-1033. Ask for cores in Yarn container requests
sryza Jan 19, 2014
cdb003e
Removing docs on akka options
pwendell Jan 21, 2014
54867e9
Minor fixes
pwendell Jan 21, 2014
1b29914
Bug fix for reporting of spill output
pwendell Jan 21, 2014
c324ac1
Force use of LZF when spilling data
pwendell Jan 21, 2014
f84400e
Fixing speculation bug
pwendell Jan 21, 2014
de526ad
Remove shuffle files if they are still present on a machine.
pwendell Jan 21, 2014
d46df96
Avoid matching attempt files in the checkpoint
pwendell Jan 21, 2014
2e95174
Added StreamingContext.awaitTermination to streaming examples.
tdas Jan 21, 2014
e437069
Restricting /lib to top level directory in .gitignore
pwendell Jan 21, 2014
e0b741d
Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.
tdas Jan 21, 2014
7373ffb
Merge pull request #483 from pwendell/gitignore
rxin Jan 21, 2014
0367981
Merge pull request #482 from tdas/streaming-example-fix
pwendell Jan 21, 2014
6b4eed7
Merge pull request #449 from CrazyJvm/master
rxin Jan 21, 2014
a917a87
Adding small code comment
pwendell Jan 21, 2014
65869f8
Removed SPARK_MEM from run-examples.
tdas Jan 21, 2014
c67d3d8
Merge pull request #484 from tdas/run-example-fix
pwendell Jan 21, 2014
a9bcc98
Style clean-up
pwendell Jan 21, 2014
77b986f
Merge pull request #480 from pwendell/0.9-fixes
pwendell Jan 21, 2014
adf4261
Incorporate Tom's comments - update doc and code to reflect that core…
sryza Jan 21, 2014
3a067b4
Fixed import order
ajtulloch Jan 21, 2014
f854498
Merge pull request #469 from ajtulloch/use-local-spark-context-in-tes…
rxin Jan 21, 2014
069bb94
Clarify spark.default.parallelism
ash211 Jan 21, 2014
749f842
Merge pull request #489 from ash211/patch-6
rxin Jan 21, 2014
90ea9d5
Replace the code to check for Option != None with Option.isDefined ca…
hsaputra Jan 22, 2014
36f9a64
fixed job name and usage information for the JavaSparkPi example
kmader Jan 22, 2014
19da82c
Fixed bug where task set managers are added to queue twice
kayousterhout Jan 22, 2014
d009b17
Merge pull request #315 from rezazadeh/sparsesvd
mateiz Jan 22, 2014
5bcfd79
Merge pull request #493 from kayousterhout/double_add
mateiz Jan 22, 2014
576c4a4
Merge pull request #478 from sryza/sandy-spark-1033
pwendell Jan 22, 2014
fd0c5b8
Depend on Commons Math explicitly instead of accidentally getting it …
srowen Jan 22, 2014
a1238bb
Merge pull request #492 from skicavs/master
pwendell Jan 22, 2014
4476398
Also add graphx commons-math3 dependeny in sbt build
srowen Jan 22, 2014
3184fac
Merge pull request #495 from srowen/GraphXCommonsMathDependency
pwendell Jan 22, 2014
2b3c461
refactor sparkHome to val
CodingCat Jan 23, 2014
6285513
Fix bug in worker clean-up in UI
pwendell Jan 23, 2014
034dce2
Merge pull request #447 from CodingCat/SPARK-1027
pwendell Jan 23, 2014
a1cd185
Merge pull request #496 from pwendell/master
pwendell Jan 23, 2014
cc0fd33
Replace commons-math with jblas
jdk8 Jan 23, 2014
a5a513e
Add jblas dependency
jdk8 Jan 23, 2014
19a01c1
Add jblas dependency
jdk8 Jan 23, 2014
60e7457
fixed ClassTag in mapPartitions
Jan 23, 2014
a2b47da
Merge pull request #499 from jianpingjwang/dev1
rxin Jan 23, 2014
fad6aac
Merge pull request #406 from eklavya/master
JoshRosen Jan 23, 2014
0035dbb
Fix SPARK-1034: Py4JException on PySpark Cartesian Result
JoshRosen Jan 23, 2014
6156990
Fix SPARK-978: ClassCastException in PySpark cartesian.
JoshRosen Jan 23, 2014
7101017
Remove Hadoop object cloning and warn users making Hadoop RDD's.
pwendell Jan 23, 2014
0213b40
Fix bug on read-side of external sort when using Snappy.
pwendell Jan 24, 2014
c58d4ea
Response to Matei's review
pwendell Jan 24, 2014
f830684
Fix for SPARK-1025: PySpark hang on missing files.
JoshRosen Jan 24, 2014
268ecbd
Minor changes after auditing diff from earlier version
pwendell Jan 24, 2014
cad3002
Merge pull request #501 from JoshRosen/cartesian-rdd-fixes
pwendell Jan 24, 2014
c319617
Merge pull request #502 from pwendell/clone-1
pwendell Jan 24, 2014
ff44732
Minor fix
pwendell Jan 24, 2014
3d6e754
Merge pull request #503 from pwendell/master
pwendell Jan 24, 2014
4cebb79
Deprecate mapPartitionsWithSplit in PySpark.
JoshRosen Jan 24, 2014
05be704
Merge pull request #505 from JoshRosen/SPARK-1026
pwendell Jan 24, 2014
531d9d7
Increase JUnit test verbosity under SBT.
JoshRosen Jan 26, 2014
740e865
Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040)
JoshRosen Jan 26, 2014
c66a2ef
Merge pull request #511 from JoshRosen/SPARK-1040
rxin Jan 26, 2014
c40619d
Merge pull request #504 from JoshRosen/SPARK-1025
rxin Jan 26, 2014
6a5af7b
modified SparkPluginBuild.scala to use https protocol for accessing g…
sarutak Jan 27, 2014
f67ce3e
Merge pull request #460 from srowen/RandomInitialALSVectors
srowen Jan 27, 2014
f16c21e
Merge pull request #490 from hsaputra/modify_checkoption_with_isdefined
rxin Jan 27, 2014
3d5c03e
Merge pull request #516 from sarutak/master
rxin Jan 28, 2014
84670f2
Merge pull request #466 from liyinan926/file-overwrite-new
rxin Jan 28, 2014
1381fc7
Switch from MUTF8 to UTF8 in PySpark serializers.
JoshRosen Jan 29, 2014
f8c742c
Merge pull request #523 from JoshRosen/SPARK-1043
JoshRosen Jan 29, 2014
7930209
Merge pull request #497 from tdas/docs-update
tdas Jan 29, 2014
0ff38c2
Merge pull request #494 from tyro89/worker_registration_issue
Jan 29, 2014
ac712e4
Merge pull request #524 from rxin/doc
rxin Jan 30, 2014
a8cf3ec
Merge pull request #527 from ankurdave/graphx-assembly-pom
ankurdave Feb 1, 2014
0386f42
Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala
hsaputra Feb 3, 2014
1625d8c
Merge pull request #530 from aarondav/cleanup. Closes #530.
aarondav Feb 3, 2014
23af00f
Merge pull request #528 from mengxr/sample. Closes #528.
mengxr Feb 3, 2014
0c05cd3
Merge pull request #535 from sslavic/patch-2. Closes #535.
sslavic Feb 4, 2014
9209287
Merge pull request #534 from sslavic/patch-1. Closes #534.
sslavic Feb 4, 2014
f7fd80d
Merge pull request #540 from sslavic/patch-3. Closes #540.
sslavic Feb 5, 2014
cc14ba9
Merge pull request #544 from kayousterhout/fix_test_warnings. Closes …
kayousterhout Feb 5, 2014
18c4ee7
Merge pull request #549 from CodingCat/deadcode_master. Closes #549.
CodingCat Feb 6, 2014
3802096
Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Close…
tgravescs Feb 6, 2014
79c9552
Merge pull request #545 from kayousterhout/fix_progress. Closes #545.
kayousterhout Feb 6, 2014
084839b
Merge pull request #498 from ScrapCodes/python-api. Closes #498.
ScrapCodes Feb 6, 2014
446403b
Merge pull request #554 from sryza/sandy-spark-1056. Closes #554.
sryza Feb 6, 2014
18ad59e
Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321.
kayousterhout Feb 7, 2014
0b448df
Merge pull request #450 from kayousterhout/fetch_failures. Closes #450.
kayousterhout Feb 7, 2014
1896c6e
Merge pull request #533 from andrewor14/master. Closes #533.
andrewor14 Feb 7, 2014
3a9d82c
Merge pull request #506 from ash211/intersection. Closes #506.
ash211 Feb 7, 2014
fabf174
Merge pull request #552 from martinjaggi/master. Closes #552.
martinjaggi Feb 8, 2014
7805080
Merge pull request #454 from jey/atomic-sbt-download. Closes #454.
jey Feb 8, 2014
f0ce736
Merge pull request #561 from Qiuzhuang/master. Closes #561.
Qiuzhuang Feb 8, 2014
c2341c9
Merge pull request #542 from markhamstra/versionBump. Closes #542.
markhamstra Feb 9, 2014
f892da8
Merge pull request #565 from pwendell/dev-scripts. Closes #565.
pwendell Feb 9, 2014
b6d40b7
Merge pull request #560 from pwendell/logging. Closes #560.
pwendell Feb 9, 2014
2ef37c9
Merge pull request #562 from jyotiska/master. Closes #562.
jyotiska Feb 9, 2014
b6dba10
Merge pull request #556 from CodingCat/JettyUtil. Closes #556.
CodingCat Feb 9, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
*~
*.swp
*.ipr
*.iml
*.iws
.idea/
sbt/*.jar
.settings
.cache
/build/
Expand Down Expand Up @@ -36,4 +39,9 @@ streaming-tests.log
dependency-reduced-pom.xml
.ensime
.ensime_lucene
checkpoint
derby.log
dist/
spark-*-bin.tar.gz
unit-tests.log
/lib/
373 changes: 372 additions & 1 deletion LICENSE

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Apache Spark
Copyright 2013 The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
100 changes: 73 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,110 @@
# Spark
# Apache Spark

Lightning-Fast Cluster Computing - <http://www.spark-project.org/>
Lightning-Fast Cluster Computing - <http://spark.incubator.apache.org/>


## Online Documentation

You can find the latest Spark documentation, including a programming
guide, on the project webpage at <http://spark-project.org/documentation.html>.
guide, on the project webpage at <http://spark.incubator.apache.org/documentation.html>.
This README file only contains basic setup instructions.


## Building

Spark requires Scala 2.9.2 (Scala 2.10 is not yet supported). The project is
built using Simple Build Tool (SBT), which is packaged with it. To build
Spark and its example programs, run:
Spark requires Scala 2.10. The project is built using Simple Build Tool (SBT),
which can be obtained [here](http://www.scala-sbt.org). If SBT is installed we
will use the system version of sbt otherwise we will attempt to download it
automatically. To build Spark and its example programs, run:

sbt/sbt package
./sbt/sbt assembly

Spark also supports building using Maven. If you would like to build using Maven,
see the [instructions for building Spark with Maven](http://spark-project.org/docs/latest/building-with-maven.html)
in the spark documentation..
Once you've built Spark, the easiest way to start using it is the shell:

To run Spark, you will need to have Scala's bin directory in your `PATH`, or
you will need to set the `SCALA_HOME` environment variable to point to where
you've installed Scala. Scala must be accessible through one of these
methods on your cluster's worker nodes as well as its master.
./bin/spark-shell

To run one of the examples, use `./run <class> <params>`. For example:
Or, for the Python API, the Python shell (`./bin/pyspark`).

./run spark.examples.SparkLR local[2]
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> <params>`. For example:

./bin/run-example org.apache.spark.examples.SparkLR local[2]

will run the Logistic Regression example locally on 2 CPUs.

Each of the example programs prints usage help if no params are given.

All of the Spark samples take a `<host>` parameter that is the cluster URL
All of the Spark samples take a `<master>` parameter that is the cluster URL
to connect to. This can be a mesos:// or spark:// URL, or "local" to run
locally with one thread, or "local[N]" to run locally with N threads.

## Running tests

Testing first requires [Building](#building) Spark. Once Spark is built, tests
can be run using:

`./sbt/sbt test`

## A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the HDFS API has changed in different versions of
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
You can change the version by setting the `HADOOP_VERSION` variable at the top
of `project/SparkBuild.scala`, then rebuilding Spark.
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
when building Spark.

For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
versions without YARN, use:

# Apache Hadoop 1.2.1
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly

# Cloudera CDH 4.2.0 with MapReduce v1
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly

For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
with YARN, also set `SPARK_YARN=true`:

# Apache Hadoop 2.0.5-alpha
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

# Cloudera CDH 4.2.0 with MapReduce v2
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly

# Apache Hadoop 2.2.X and newer
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

When developing a Spark application, specify the Hadoop version by adding the
"hadoop-client" artifact to your project's dependencies. For example, if you're
using Hadoop 1.2.1 and build your application using SBT, add this entry to
`libraryDependencies`:

"org.apache.hadoop" % "hadoop-client" % "1.2.1"

If your project is built with Maven, add this to your POM file's `<dependencies>` section:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.2.1</version>
</dependency>


## Configuration

Please refer to the "Configuration" guide in the online documentation for a
full overview on how to configure Spark. At the minimum, you will need to
create a `conf/spark-env.sh` script (copy `conf/spark-env.sh.template`) and
set the following two variables:
Please refer to the [Configuration guide](http://spark.incubator.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.


- `SCALA_HOME`: Location where Scala is installed.
## Apache Incubator Notice

- `MESOS_NATIVE_LIBRARY`: Your Mesos library (only needed if you want to run
on Mesos). For example, this might be `/usr/local/lib/libmesos.so` on Linux.
Apache Spark is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of
all newly accepted projects until a further review indicates that the
infrastructure, communications, and decision making process have stabilized in
a manner consistent with other successful ASF projects. While incubation status
is not necessarily a reflection of the completeness or stability of the code,
it does indicate that the project has yet to be fully endorsed by the ASF.


## Contributing to Spark
Expand All @@ -71,3 +116,4 @@ project's open source license. Whether or not you state this explicitly, by
submitting any copyrighted material via pull request, email, or other means
you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.

12 changes: 12 additions & 0 deletions assembly/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
This is an assembly module for Spark project.

It creates a single tar.gz file that includes all needed dependency of the project
except for org.apache.hadoop.* jars that are supposed to be available from the
deployed Hadoop cluster.

This module is off by default. To activate it specify the profile in the command line
-Pbigtop-dist

If you need to build an assembly for a different version of Hadoop the
hadoop-version system property needs to be set as in this example:
-Dhadoop.version=2.0.6-alpha
Loading