Prerequisites
Servers
Hostname |
IP Address |
Type |
ebdp-po-dkr10d.sys.comcast.net |
147.191.72.175 |
Master |
ebdp-po-dkr11d.sys.comcast.net |
147.191.72.176 |
Slave |
ebdp-po-dkr12d.sys.comcast.net |
147.191.74.184 |
Slave |
JDK 1.8
# echo $JAVA_HOME /usr/java/jdk1.8.0_131 | cs |
User for Hadoop
Passwordless SSH
# su - hduser # ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa | cs |
Then, the public key of each node needs to be registered in authorized_keys of other nodes (including itself).
Here is the example of ~/.ssh/authorized_keys.
ssh-dss AAAA...HD3no= hduser@ebdp-po-dkr10d.sys.comcast.net ssh-dss AAAA...YBnYs= hduser@ebdp-po-dkr11d.sys.comcast.net ssh-dss AAAA...mREIg== hduser@ebdp-po-dkr12d.sys.comcast.net | cs |
The permission should be changed as following.
# chmod go-w $HOME $HOME/.ssh # chmod 600 $HOME/.ssh/authorized_keys # chown hduser $HOME/.ssh/authorized_keys | cs |
Finally, to access other nodes with a shortcut, ~/.ssh/config should contain the following modifications.
(Note that "localhost" should be adjusted according to its hostname.)
Host dk10 HostName ebdp-po-dkr10d.sys.comcast.net User hduser Host dk11 HostName ebdp-po-dkr11d.sys.comcast.net User hduser Host dk12 HostName ebdp-po-dkr12d.sys.comcast.net User hduser Host localhost HostName ebdp-po-dkr10d.sys.comcast.net User hduser | cs |
Installation
Download
Untar the tarball at a directory
Environment Variables
export JAVA_HOME=/usr/java/jdk1.8.0_131 export HADOOP_HOME=/app/bigdata/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin export HDFS_NAMENODE_USER="hduser" export HDFS_DATANODE_USER="hduser" export HDFS_SECONDARYNAMENODE_USER="hduser" export YARN_RESOURCEMANAGER_USER="hduser" export YARN_NODEMANAGER_USER="hduser" | cs |
Configurations
Masters and Slaves
It just lists up the nodes of master and slave.
# echo "ebdp-po-dkr10d.sys.comcast.net" >> $HADOOP_CONF_DIR/masters # echo "ebdp-po-dkr11d.sys.comcast.net" >> $HADOOP_CONF_DIR/slaves # echo "ebdp-po-dkr12d.sys.comcast.net" >> $HADOOP_CONF_DIR/slaves | cs |
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ebdp-po-dkr10d.sys.comcast.net:54310</value> <description>The name of the default file system.</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> </configuration> | cs |
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>/app/bigdata/hadoop/namedir</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/app/bigdata/hadoop/datadir</value> <final>true</final> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>ebdp-po-dkr10d.sys.comcast.net:50090</value> </property> </configuration> | cs |
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>ebdp-po-dkr10d.sys.comcast.net:54311</value> <description>Map Reduce jobtracker</description> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.local.dir</name> <value>/app/bigdata/hadoop/mapred-localdir</value> </property> <property> <name>mapred.system.dir</name> <value>/app/bigdata/hadoop/mapred-systemdir</value> </property> </configuration> | cs |
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>ebdp-po-dkr10d.sys.comcast.net:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>ebdp-po-dkr10d.sys.comcast.net:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>ebdp-po-dkr10d.sys.comcast.net:8035</value> </property> </configuration> | cs |
Format Namenode
# hadoop namenode -format | cs |
Launch Hadoop Daemons
# cd $HADOOP_HOME # ./sbin/start-dfs.sh .... # ./sbin/start-yarn.sh .... | cs |
Note that "$HADOOP_HOME/sbin/start-all.sh" is equivalent to the 2 commands above but it is deprecated.
Verification
Main Webpage
JPS
# jps 22480 NameNode 23558 Jps 22874 ResourceManager 22700 SecondaryNameNode | cs |
On slave,
# jps 12448 DataNode 12811 Jps 12590 NodeManager | cs |
Hadoop Command
# hadoop fs -df -h Filesystem Size Used Available Use% hdfs://hadoop-master:54310 2.0 T 8 K 1.9 T 0% | cs |