Download


I downloaded spark-2.2.1-bin-hadoop2.7.

Installation

Spark can run on Yarn or as standalone.
Running on Yarn assumes that Hadoop is running on the cluster.
Standalone does not require Hadoop but Spark should be installed on every node of the cluster.

Here is the configuration of cluster.

Hostname 

Spark 

Hadoop 

 ebdp-po-dkr10d.sys.comcast.net

Master 

Master 

 ebdp-po-dkr11d.sys.comcast.net

Worker 

Slave 

 ebdp-po-dkr12d.sys.comcast.net

Worker

Slave 


Note that the following procedure was done with "hduser" account of each node.


Untar the tarball

The downloaded tarball was decompressed to /app/bigdata/spark-2.2.1-bin-hadoop2.7.
In addition, a symlink - /app/bigdata/spark - was created to point out the original directory.

Env Variables

Following env variables were added to /etc/profile.
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HADOOP_HOME=/app/bigdata/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HDFS_NAMENODE_USER="hduser"
export HDFS_DATANODE_USER="hduser"
export HDFS_SECONDARYNAMENODE_USER="hduser"
export YARN_RESOURCEMANAGER_USER="hduser"
export YARN_NODEMANAGER_USER="hduser"
export SPARK_HOME=/app/bigdata/spark
export PATH=$PATH:$SPARK_HOME/bin
cs

Configuration

$SPARK_HOME/conf/slaves should be created as listing up the worker nodes.
ebdp-po-dkr11d.sys.comcast.net
ebdp-po-dkr12d.sys.comcast.net
cs

Test

Running on Yarn

Cluster Mode

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    $SPARK_HOME/examples/jars/spark-examples*.jar \
    10
cs


Client Mode

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    $SPARK_HOME/examples/jars/spark-examples*.jar \
    10
cs

Standalone

First, you need to run the Spark cluster from the master node.

$SPARK_HOME/start_all.sh
cs


Then, you can test an example code as following.

spark-submit \
     --master spark://ebdp-po-dkr10d.sys.comcast.net:7077 \
     --class org.apache.spark.examples.SparkPi \
     $SPARK_HOME/examples/jars/spark-examples*.jar \
     100
cs


References


+ Recent posts