Big Data/Apache Spark

Installation Spark on CentOS 7 Cluster 2018.03.01

Installation Spark on CentOS 7 Cluster

bailey.. 2018. 3. 1. 05:45

2018. 3. 1. 05:45

Download

http://spark.apache.org/downloads.html

I downloaded spark-2.2.1-bin-hadoop2.7.

Installation

Spark can run on Yarn or as standalone.

Running on Yarn assumes that Hadoop is running on the cluster.

Standalone does not require Hadoop but Spark should be installed on every node of the cluster.

Here is the configuration of cluster.

Hostname	Spark	Hadoop
ebdp-po-dkr10d.sys.comcast.net	Master	Master
ebdp-po-dkr11d.sys.comcast.net	Worker	Slave
ebdp-po-dkr12d.sys.comcast.net	Worker	Slave

Note that the following procedure was done with "hduser" account of each node.

Untar the tarball

The downloaded tarball was decompressed to /app/bigdata/spark-2.2.1-bin-hadoop2.7.

In addition, a symlink - /app/bigdata/spark - was created to point out the original directory.

Env Variables

Following env variables were added to /etc/profile.

export JAVA_HOME=/usr/java/jdk1.8.0_131
export HADOOP_HOME=/app/bigdata/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HDFS_NAMENODE_USER="hduser"
export HDFS_DATANODE_USER="hduser"
export HDFS_SECONDARYNAMENODE_USER="hduser"
export YARN_RESOURCEMANAGER_USER="hduser"
export YARN_NODEMANAGER_USER="hduser"
export SPARK_HOME=/app/bigdata/spark
export PATH=$PATH:$SPARK_HOME/bin
cs

Configuration

$SPARK_HOME/conf/slaves should be created as listing up the worker nodes.

ebdp-po-dkr11d.sys.comcast.net
ebdp-po-dkr12d.sys.comcast.net
cs

Test

Running on Yarn

Please refer to the following link to understand the deploy mode - cluster and client.

https://linode.com/docs/databases/hadoop/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/#understand-client-and-cluster-mode

Cluster Mode

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    $SPARK_HOME/examples/jars/spark-examples*.jar \
    10
Colored by Color Scripter
cs

Client Mode

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    $SPARK_HOME/examples/jars/spark-examples*.jar \
    10
Colored by Color Scripter
cs

Standalone

First, you need to run the Spark cluster from the master node.

$SPARK_HOME/start_all.sh
cs

Then, you can test an example code as following.

spark-submit \
     --master spark://ebdp-po-dkr10d.sys.comcast.net:7077 \
     --class org.apache.spark.examples.SparkPi \
     $SPARK_HOME/examples/jars/spark-examples*.jar \
     100
Colored by Color Scripter
cs

References

http://daeson.tistory.com/279

저작자표시 비영리 변경금지 (새창열림)

PREV 이전 1 NEXT 다음

Movin' on