Hadoop Single Node Installation
Basic Prerequisites
This section lists required
services and some required system configuration
Software’s required
·
Java JDK
As of now the recommended and tested versions
of java for Hadoop and HBase installation include
Ø
Oracle jdk 1.6(u 20,21,26,28,31)
Hadoop requires java 1.6+.It is built and tested on Oracle java, which is
the only “supported” JVM
·
The latest stable version of Hadoop 1.x.x (here we are using the current stable version
of Hadoop-1.0.3)
·
The latest stable version of Hbase-0.9x.x(here
we are using the current stable version of HBase-0.94.x)
Notes:
For HBase development, selecting Hadoop
version is very critical. The Hadoop version should be compactable with HBase
version .Following table shows some important information about what versions
of Hadoop are supported by various HBase versions. Based on the version of HBase,
you should select the most appropriate version of Hadoop.
HBase-0.92.x
|
HBase-0.94.x
|
HBase-0.95
|
|
Hadoop-0.20.205
|
S
|
X
|
X
|
Hadoop-0.22.x
|
S
|
X
|
X
|
Hadoop-1.0.0-1.0.2
|
S
|
S
|
X
|
Hadoop-1.0.3+
|
S
|
S
|
S
|
Hadoop-1.1.x
|
NT
|
S
|
S
|
Hadoop-0.23.x
|
X
|
S
|
NT
|
Hadoop-2.x
|
X
|
S
|
S
|
HBase requires hadoop-1.0.3 at a minimum
Where
S = Supported and
tested
X = not supported
NT= It should run, but not tested enough
Installing and configuring java JDK
Step 1:
Before Installing
Hadoop, we have to install java. It is recommended to use oracle java 1.6. For
checking whether java is already available or not, we are using the Linux
command
Java
–version
This will show the
installed java, if it is already installed. If it is open jdk, remove it and
install oracle jdk
Step 2:
Download the stable version of java from the
list .The downloaded file can be .bin
or .tar file
1. For installing the .bin file
Go to the directory
containing the binary file
sudo chmod u+x <filename>.bin
./<filename>.bin
2.
For installing the tar file
Sudo
chmod u+x <filename>.tar
Sudo tar –xzvf <filename>.tar
Step 3:
Set the JAVA_HOME in /etc/bash.bashrc file.We can use nano or vi editor to edit the files
nano /etc/bash.bashrc
Add the following
lines towards the end of file. If JAVA_HOME is already set for open jdk, replace the same with the following
lines
#set the JAVA_HOME
export JAVA_HOME=<path from root to that java directory>
export PATH=$JAVA_HOME/bin:$PATH
Use Ctrl+X to save
the change to files in nano editor
Note: You can edit
the JAVA_HOME in user’s home directory ( $HOME/.bashrc file), the disadvantage
of doing so is that JAVA_HOME will be available only for that user
To refresh the ‘bash.bashrc’ file, use
source command
source /etc/bash.bashrc
Note:
To effect the changes in the
Virtual machine, we have to close or reboot the system. In a real time cluster,
if we close the Virtual machine, the data will be lost. So to avoid this, we
need to refresh the virtual machine using the command ‘source’.
Step 4: To convert openjdk to Oracle jdk.
Now close the terminal, re-open again
and check whether the java installation and path is working as desired
Or
alternatives –-install /usr/bin/java java <path
from root to that java directory>/bin/java 2
update-alternatives –-config java
Select the installed oracle jdk number
(here number is 2)
Adding a super user for running Hadoop Services
For running Hadoop daemons, we are
creating a super user rather than executing hadoop in root. The super user
isolates other software’s, service and other users on the same from hadoop
installation
We have to create a super user for
running Hadoop daemons rather than using root for the same, this is
recommended, as it isolates other software’s, service and other users on the
same machine from Hadoop installation
We are creating a user ‘hduser’ in
group ‘hadoop’.
sudo addgroup hadoop
sudo adduser –-ingroup hadoop hduser
Adding the newly created user to sudo users group
For Hadoop installation, the newly
created user have some extra privileges rather than a normal user. For giving
some root privileges to newly created user, we are adding the newly created
user to sudo users group.
To add ‘hduser’ to sudoers group, open etc/sudoers
file using nano text editor
sudo nano /etc/sudoers
Add the following lines to
the file
%hduser ALL= (ALL) ALL
Save the changes using Ctrl+X. This
will give ‘hduser’ as root privileges
Configuring password less SSH
By doing this, we don’t want to
enter the password every time when Hadoop interacts with its nodes. Password
less certificates needs to be created and implemented for the ssh communication
without asking for user intervention
ssh-keygen
–t dsa –p ‘’ –f ~/.ssh/id_dsa
cat
~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Changing the Hostname of a Linux machine Without Rebooting
When we are using virtual machines, sometimes
it will take the IPaddress or hostname. When we reboot the virtual machine, The
IPaddress may change .It will badly affect the cluster. So to avoid this, we
are using the hostname only. That is why we are changing the hostname of the Linux
machine without rebooting.
Step 1: Change the hostname
We can change the hostname of Linux
system. First you must change the config file that control this.
·
In Red hat/Centos/Fedora systems,
We can edit the hostname in /etc/sysconfig/network
nano /etc/sysconfig/network
HOSTNAME=< hostname of the system>
·
In Ubuntu/Debain systems,
We can edit the hostname in /etc/hostname
nano
/etc/hostname
delete old name and add the new name
Step 2: Update /etc/hosts
Now, you need to edit /etc/hosts file
nano
/etc/hosts
In ‘/etc/hosts’ file, we can enter
both Ipaddress and hostname. So either hostname or Ipaddress will be taken.
Single Node Hadoop Installation
Step 1: Extracting Hadoop tarball
We are creating a user home for Hadoop
installation. Here we are using ‘/home/hduser/utilities’ as user home. You need
to extract the tarball in this location and change the permissions recursively
on the extracted directory
Here we are using hadoop-1.0.3.tar
mkdir
–p /home/hduser/utilities
cd
/home/hduser/utilities
sudo
tar –xzvf hadoop-1.0.3.tar.gz
sudo
chown –R hduser:hadoop hadoop-1.0.3
Step 2: Configuring Hadoop on environment variables
We
are adding HADOOP_HOME as environment variables on bash.bashrc files. By doing
this Hadoop commands can access every user.
sudo
nano /etc/bash.bashrc
Append the following lines to add
HADOOP_HOME to PATH
#set HADOOP_HOME
export
HADOOP_HOME=/home/hduser/utilities/hadoop-1.0.3
export
PATH=$HADOOP_HOME/bin:$PATH
Step 3: Configuring Java for Hadoop
sudo
nano /home/hduser/utilities/hadoop-1.0.3/conf/hadoop-env.sh
JAVA_HOME will be commented by
default. Edit the value for JAVA_HOME with your installation path and uncomment
the line. The bin folder should not contain this JAVA_HOME path
#The Java Implementation to use
export JAVA_HOME=<absolute path to java
directory>
Step 4: Configuring Hadoop
Properties
In Hadoop, we have three configuration
files core-site.xml, mapred-site.xml,
hdfs-site.xml present in HADOOP_HOME/conf directory.
Editing the Configuration files
1. Core-site.xml
‘hadoop.tmp.dir’, the directory specified
by this property is used to store file system Meta information by namenode and
block information by datanode.By default two directories by the name and data
will be created in the tmp dir.
We need to ensure that ‘hduser’ has
sufficient permission on the newly provided ‘hadoop.tmp.dir’ .We are
configuring it to ‘/home/hduser/app/hadoop/tmp’.
The property ‘fs.default.name’ is required
to provide the hostname and port of the namenode
Creating the directory and changing the
ownership and permission to ‘hduser’
cd
/home/hduser/utilities
sudo
mkdir –p /app/hadoop/tmp
sudo
chown hduser:hadoop /app/hadoop/tmp
sudo
chmod 755 app/hadoop/tmp
Setting ownership and permission is
very important. If you forget this, you will get into some exceptions while
formatting the namenode
Open the core-site.xml file, you can
see empty configuration tags. Add the following lines between the configuration
tags
sudo
nano /home/hduser/utilities/hadoop-1.0.3/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/utilities/app/hadoop/tmp</value>
<description>
A base for other temporary directories.
</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://<hostname/Ip address of the system where namenode
is installed>:54310</value>
<description>the name of the default file system</description>
</property>
2.
hdfs-site.xml
It is used for file systems and storage.
In the hdfs-site.xml, add the following property between the configuration tags
sudo nano
/home/hduser/utilities/hadoop-1.0.3/conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication</description>
</property>
3. Mapred-site.xml
This is used for processing. In the
mapred-site.xml, we need to provide the hostname and port for Jobtracker as TaskTrackers would be
using this for their communication
sudo nano /home/hduser/utilities/hadoop-1.0.3/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value><hostname/ipaddress of the system where jobtracker is
installed >:54311</value>
<description>
The
host and port that the MapReduce job tracker runs
</description>
</property>
Step 5: Formatting NameNode
Before starting the hdfs daemons like
Namenode for the first time, it is mandatory that you format the
Namenode/hdfs.This is only for the first run, for subsequent runs, the
formatting of namenode will lose all data .Be careful not to format an already
running cluster, even if you need to restart the Namenode daemon.
Namenode can be formatted as
/home/hduser/utilities/hadoop/bin/hadoop
namendoe –format
The output console may look like this
Step 6: Starting Hadoop Daemons
/home/hduser/utilities/hadoop-1.0.3/bin/start-all.sh
This will run all the hadoop daemons
Namenode, datanode, secondarynamenode, jobtracker, tasktracker
For Stopping Hadoop daemons
We are using the command
/home/hduser/utilities/hadoop-1.0.3/bin/stop-all.sh
This will stop all Hadoop daemons.
Only Jps is showing after stopping the Hadoop daemons
No comments:
Post a Comment