HTTP Proxy by Squid on CentOS 7

bailey.. 2018. 3. 27. 00:50

2018. 3. 27. 00:50

Installation

yum install -y squid
cs

Configuration

Here is the example of squid configuration (/etc/squid/squid.conf) in order to allow all requests.

visible_hostname localhost
acl all src 0.0.0.0/0.0.0.0
http_access allow all
http_port 3128
cs

You need to restart the squid service after changing the configuration.

systemctl restart squid.service
cs

Configuration on Client

The following env variables should be set.

Note that the assumption is you already have HTTP_PROXY_HOSTNAME.

export http_proxy=http://$HTTP_PROXY_HOSTNAME:3128
export https_proxy=http://$HTTP_PROXY_HOSTNAME:3128
cs

Test

If your host cannot access to public network (e.g. google.com) without proxy, the result of curl must be as following.

# curl -v http://www.google.com
* About to connect() to www.google.com port 80 (#0)
*   Trying 172.217.3.228...
cs

After the proxy configuration, the result should be as following.

# curl -v http://www.google.com
* About to connect() to proxy ebdp-po-dkr10d.sys.comcast.net port 3128 (#0)
*   Trying 147.191.72.175...
* Connected to ebdp-po-dkr10d.sys.comcast.net (147.191.72.175) port 3128 (#0)
> GET http://www.google.com/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: www.google.com
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
....
Colored by Color Scripter
cs

저작자표시 비영리 변경금지 (새창열림)

Tip - How to create a file with multiple lines on a Dockerfile

bailey.. 2018. 3. 14. 06:02

2018. 3. 14. 06:02

EOF can be used for creating a file with multiple lines in shell script as following.

cat << EOF > /tmp/yourfilehere
These contents will be written to the file.
        This line is indented.
EOF
Colored by Color Scripter
cs

https://stackoverflow.com/questions/2953081/how-can-i-write-a-heredoc-to-a-file-in-bash-script

However, this cannot be applied to Dockerfile.

Instead, the following way can be used in a similar way.

RUN echo $'[user]\n\
    email = bumjoon_kim@comcast.com\n\
    name = Bumjoon Kim\n[push]\n\
    default = current\n' >> /root/.gitconfig
cs

저작자표시 비영리 변경금지 (새창열림)

Installation of Hadoop 2.7.5 on CentOS 7 Cluster

bailey.. 2018. 3. 9. 08:49

2018. 3. 9. 08:49

Prerequisites

Servers

There are 3 VMs of CentOS 7 as following.

Hostname	IP Address	Type
ebdp-po-dkr10d.sys.comcast.net	147.191.72.175	Master
ebdp-po-dkr11d.sys.comcast.net	147.191.72.176	Slave
ebdp-po-dkr12d.sys.comcast.net	147.191.74.184	Slave

JDK 1.8

JDK 1.8 u131 was installed. JAVA_HOME should be set as following.

# echo $JAVA_HOME
/usr/java/jdk1.8.0_131
cs

User for Hadoop

hduser was created to run Hadoop daemon across all the nodes.

Passwordless SSH

The master node should be able to log in to slaves via SSH without password.

First, a SSH key was created with hduser on every node as following.

# su - hduser
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
cs

Then, the public key of each node needs to be registered in authorized_keys of other nodes (including itself).

Here is the example of ~/.ssh/authorized_keys.

ssh-dss AAAA...HD3no= hduser@ebdp-po-dkr10d.sys.comcast.net
ssh-dss AAAA...YBnYs= hduser@ebdp-po-dkr11d.sys.comcast.net
ssh-dss AAAA...mREIg== hduser@ebdp-po-dkr12d.sys.comcast.net
cs

The permission should be changed as following.

# chmod go-w $HOME $HOME/.ssh
# chmod 600 $HOME/.ssh/authorized_keys
# chown hduser $HOME/.ssh/authorized_keys
cs

Finally, to access other nodes with a shortcut, ~/.ssh/config should contain the following modifications.

(Note that "localhost" should be adjusted according to its hostname.)

Host dk10
    HostName ebdp-po-dkr10d.sys.comcast.net
    User hduser
Host dk11
    HostName ebdp-po-dkr11d.sys.comcast.net
    User hduser
Host dk12
    HostName ebdp-po-dkr12d.sys.comcast.net
    User hduser
Host localhost
    HostName ebdp-po-dkr10d.sys.comcast.net
    User hduser
cs

Installation

Download

Hadoop Package tarballs can be downloaded from http://hadoop.apache.org.

What I selected is version 2.7.5 and there are many mirrors at the link below.

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

Untar the tarball at a directory

My location is /app/bigdata/hadoop.

In anticipation of future release, I created a symlink which is pointing to the version 2.7.5.

Environment Variables

Here is the example of env variables which were set in .bashrc of hduser.

export JAVA_HOME=/usr/java/jdk1.8.0_131
export HADOOP_HOME=/app/bigdata/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HDFS_NAMENODE_USER="hduser"
export HDFS_DATANODE_USER="hduser"
export HDFS_SECONDARYNAMENODE_USER="hduser"
export YARN_RESOURCEMANAGER_USER="hduser"
export YARN_NODEMANAGER_USER="hduser"
cs

Configurations

The configuration files are located at $HADOOP_CONF_DIR (/app/bigdata/hadoop/etc/hadoop).

Masters and Slaves

Files named "masters" and "slaves" should be created in every node at $HADOOP_CONF_DIR.

It just lists up the nodes of master and slave.

# echo "ebdp-po-dkr10d.sys.comcast.net" >> $HADOOP_CONF_DIR/masters
# echo "ebdp-po-dkr11d.sys.comcast.net" >> $HADOOP_CONF_DIR/slaves
# echo "ebdp-po-dkr12d.sys.comcast.net" >> $HADOOP_CONF_DIR/slaves
cs

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ebdp-po-dkr10d.sys.comcast.net:54310</value>
    <description>The name of the default file system.</description>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/app/hadoop/hadoop/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
</configuration>
Colored by Color Scripter
cs

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Default block replication</description>
  </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/app/bigdata/hadoop/namedir</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/app/bigdata/hadoop/datadir</value>
        <final>true</final>
    </property>
 
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>ebdp-po-dkr10d.sys.comcast.net:50090</value>
    </property>
</configuration>
Colored by Color Scripter
cs

mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>ebdp-po-dkr10d.sys.comcast.net:54311</value>
        <description>Map Reduce jobtracker</description>
    </property>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>/app/bigdata/hadoop/mapred-localdir</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>/app/bigdata/hadoop/mapred-systemdir</value>
  </property>
</configuration>
Colored by Color Scripter
cs

yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>ebdp-po-dkr10d.sys.comcast.net:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>ebdp-po-dkr10d.sys.comcast.net:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>ebdp-po-dkr10d.sys.comcast.net:8035</value>
  </property>
</configuration>
Colored by Color Scripter
cs

Format Namenode

Note that this is required on the master node only.

# hadoop namenode -format
cs

Launch Hadoop Daemons

This is required on the master node only, too.

# cd $HADOOP_HOME
# ./sbin/start-dfs.sh
....
# ./sbin/start-yarn.sh
....
cs

Note that "$HADOOP_HOME/sbin/start-all.sh" is equivalent to the 2 commands above but it is deprecated.

Verification

Main Webpage

http://ebdp-po-dkr10d.sys.comcast.net:50070

JPS

On master,

# jps
22480 NameNode
23558 Jps
22874 ResourceManager
22700 SecondaryNameNode
cs

On slave,

# jps
12448 DataNode
12811 Jps
12590 NodeManager
cs

Hadoop Command

# hadoop fs -df -h
Filesystem                   Size  Used  Available  Use%
hdfs://hadoop-master:54310  2.0 T   8 K      1.9 T    0%
cs

저작자표시 비영리 변경금지 (새창열림)

PREV 이전 1 2 3 4 5 6 NEXT 다음

Movin' on

HTTP Proxy by Squid on CentOS 7

Installation

Configuration

Configuration on Client

Test

Tip - How to create a file with multiple lines on a Dockerfile

Installation of Hadoop 2.7.5 on CentOS 7 Cluster

Prerequisites

Servers

JDK 1.8

User for Hadoop

Passwordless SSH

Installation

Download

Untar the tarball at a directory

Environment Variables

Configurations

Masters and Slaves

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

Format Namenode

Launch Hadoop Daemons

Verification

Main Webpage

JPS

Hadoop Command

+ Recent posts

티스토리툴바