Skip to content

stevenybw/spark-kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

spark-kit: Toolkits simplifying the experiments on Spark

Typically, Spark runs in YARN, which is not convenient if we need finer control of executor placement (for example, run in a single machine with specific number of executors with exactly configuration). Standalone better suites this use cases.

Example

  1. In order to use the spark-kit:
git clone https://github.com/stevenybw/spark-kit
cd spark-kit
source manage-standalone.sh
  1. Get Spark official release
wget https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
  1. Check the environment and follow the direction
check_environment
  1. Adjust the parameters in manage-standalone.sh.
  2. Establish Spark standalone cluster with all the nodes in ${SLAVES_HOSTLIST}
reset_environment $DIST
  1. Establish Spark standalone cluster with a single node (the first node in ${SLAVES_HOSTLIST})
reset_environment $LOCAL
  1. Establish Spark standalone cluster with a single node (current node running the script)
reset_environment_locally $LOCAL
  1. Check the Spark standalone resource manager master
show_master_webui
  1. Show the command to launch a spark shell (its argument must be the same as how you setup the environments, assume distributed here)
show_spark_shell_command $DIST
  1. Or launch a spark shell
enter_spark_shell $DIST
  1. See the session web UI of the spark job at port 4040

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages