HADOOP MULTI NODE CLUSTER
We will build a multi-node cluster
using two Ubuntu boxes. First setup two single node cluster and merge these two
single node clusters into one multi node cluster in which one Ubuntu box will
become the designated master and other box will become only a slave.
First we will edit the ‘/etc/hosts’
of both master and slaves machine. Enter both Ipaddress and hostname of both
master and slave machines. This is the way to identify which are the machines in
hadoop clustering.
Ø
nano /etc/hosts ( For both master and
slave machines)
On the Master machine (The machine
on which ‘bin/start-dfs.sh’ is run will become the Primary Namenode)
On master machine, update
‘conf/masters’
When we are starting Hadoop daemons
using the command ‘bin/start-all.sh’ we have to mention which is the master
machine ( machine running namenode, secondarynamenode, jobtracker etc.)
nano /home/hduser/utilities/hadoop-1.0.3/conf/masters
On master machine, update
‘conf/slaves’. Enter all the slave machines in this file. If master node act as
the slave machine, we can enter the master hostname in ‘conf/slaves’ file.
nano
/home/hduser/utilities/hadoop-1.0.3/conf/slaves
Editing the Configuration files
1. core-site.xml
‘hadoop.tmp.dir’, The directory specified
by this property is used to store file system meta information by namenode and
block information by datanode.By default two directories by the name and data
will be created in the tmp dir.
We need to ensure that ‘hduser’has
sufficient permission on the newly provided ‘hadoop.tmp.dir’ .We are
configuring it to ‘/home/hduser/app/hadoop/tmp’.
The property ‘fs.default.name’ is required
to provide the hostname and port of the namenode
Creating the directory and changing the
ownership and permission to ‘hduser’
Ø cd /home/hduser/utilities
Ø sudo mkdir –p /app/hadoop/tmp
Ø sudo chown hduser:hadoop
/app/hadoop/tmp
Ø sudo chmod 755 app/hadoop/tmp
setting ownership and permission is
very important.If you forget this, you will get into some exceptions while
formatting the namenode
Open the core-site.xml file, you can
see empty configuration tags. Add the following lines between the configuration
tags
Ø
nano
/home/hduser/hadoop-1.0.3/conf/core-site.xml (ALL machines)
Edit mapred-site.xml (ALL
machines)
In
the mapred-site.xml, we need to provide the hostname and port for Jobtracker as
TaskTrackers would be using this for their communication
Ø
sudo nano /home/hduser/utilities/hadoop-1.0.3/conf/mapred-site.xml
Edit
hdfs-site.xml
In the hdfs-site.xml, add the following property between the
configuration tags
Ø
sudo nano
/home/hduser/utilities/hadoop-1.0.3/conf/hdfs-site.xml
Starting multi-mode cluster
1.
Starting
the HDFS daemons:
The
NameNode daemon is started on ‘master’ machine, and Datanode daemons are
started on all slaves. Run the command bin/start-dfs.sh on the machine you want
the Namenode to run on
Ø
/home/hduser/utilities/hadoop-1.0.3/bin/start-dfs.sh
2.
Starting
Mapreduce daemons:
The Jobtracker daemon is started on ‘master’
machine and tasktracker daemons are started on all slaves. Run the command
bin/start-mapred.sh on the machine you want the jobtracker to run on
Ø
/home/hduser/utilities/hadoop-1.0.3/bin/start-mapred.sh
Stopping multimode cluster
For stopping multimode cluster, the
workflow however is the opposite of starting. To stop Mapreduce daemons, the
jobtracker is stopped on master and tasktracker daemons are stopped on all slaves. To stop mapreduce daemons, run the
command ‘bin/stop-mapred.sh’ on the machine where jobtracker is running.
Ø /home/hduser/utilities/hadoop-1.0.3/bin/stop-mapred.sh
To stop hdfs daemons, run the command ‘bin/stop-dfs.sh’ on
the machine where namenode is running
Ø /home/hduser/utilities/hadoop-1.0.3/stop-mapred.sh
The UI of Hdfs
daemons is follows
http://<ipaddress / hostname of master >:50070
The UI of Mapred daemons is as follows
http://<ipaddress /hostname of master>:50030
No comments:
Post a Comment