Download
I downloaded spark-2.2.1-bin-hadoop2.7.
Installation
Spark can run on Yarn or as standalone.
Running on Yarn assumes that Hadoop is running on the cluster.
Standalone does not require Hadoop but Spark should be installed on every node of the cluster.
Here is the configuration of cluster.
Hostname |
Spark |
Hadoop |
ebdp-po-dkr10d.sys.comcast.net |
Master |
Master |
ebdp-po-dkr11d.sys.comcast.net |
Worker |
Slave |
ebdp-po-dkr12d.sys.comcast.net |
Worker |
Slave |
Note that the following procedure was done with "hduser" account of each node.
Untar the tarball
The downloaded tarball was decompressed to /app/bigdata/spark-2.2.1-bin-hadoop2.7.
In addition, a symlink - /app/bigdata/spark - was created to point out the original directory.
Env Variables
Following env variables were added to /etc/profile.
export JAVA_HOME=/usr/java/jdk1.8.0_131 export HADOOP_HOME=/app/bigdata/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin export HDFS_NAMENODE_USER="hduser" export HDFS_DATANODE_USER="hduser" export HDFS_SECONDARYNAMENODE_USER="hduser" export YARN_RESOURCEMANAGER_USER="hduser" export YARN_NODEMANAGER_USER="hduser" export SPARK_HOME=/app/bigdata/spark export PATH=$PATH:$SPARK_HOME/bin | cs |
Configuration
$SPARK_HOME/conf/slaves should be created as listing up the worker nodes.
ebdp-po-dkr11d.sys.comcast.net ebdp-po-dkr12d.sys.comcast.net | cs |
Test
Running on Yarn
Please refer to the following link to understand the deploy mode - cluster and client.
Cluster Mode
spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 1 \ $SPARK_HOME/examples/jars/spark-examples*.jar \ 10 | cs |
Client Mode
spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode client \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 1 \ $SPARK_HOME/examples/jars/spark-examples*.jar \ 10 | cs |
Standalone
First, you need to run the Spark cluster from the master node.
$SPARK_HOME/start_all.sh | cs |
Then, you can test an example code as following.
spark-submit \ --master spark://ebdp-po-dkr10d.sys.comcast.net:7077 \ --class org.apache.spark.examples.SparkPi \ $SPARK_HOME/examples/jars/spark-examples*.jar \ 100 | cs |