Tuesday, May 6, 2014

Apache Accumulo - Installation

Introduction:
Accumulo is a sorted, distributed key-value system built on top of Apache Hadoop, ZooKeeper, and Apache Thrift. Accumulo has cell-level access labels and a server-side programming mechanisms.
We will cover the following info regarding Accumulo database:
·         Installation and configuration of Accumulo.
·         Running Accumulo. 
·         Example of usage.
·         Java API


1.        Tool chain requirements for Accumulo are: Java (1.6 and higher) Hadoop, Zookeeper (3.3.3 and higher). In this tutorial we use Java version 1.7.0, Hadoop version 1.0.3, and Zookeeper version 3.4.5.
2.        Download Accumulo files from here: http://accumulo.apache.org/downloads/.
3.        Extract Accumulo files. e.g. to /specific/disk1/temp/Accumulo/.
4.        Download Zookeeper from http://zookeeper.apache.org/.
5.        Extract Zookeeper files. e.g. to /specific/disk1/temp/Zookeeper/.
6.        Go to the conf folder in the zookeeper directory and create a file called zoo.cfg.
Insert the following lines inside zoo.cfg:
tickTime=2000
maxClientCnxns=100
dataDir=/var/zookeeper
clientPort=2181
# change the var instance to the place you would like zookeeper data file to be placed
# e.g dataDir=/specific/disk1/temp/zookeeper/conf/zookeeper
Save the file and close it.
7.        Download Hadoop from http://hadoop.apache.org/.
8.        Extract Hadoop files. e.g. to /specific/disk1/temp/Hadoop/.
9.        Go to the conf folder in the hadoop directory and edit the following files:
·            Insert the following lines inside core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>



·            Insert the following lines inside hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

·            Insert the following lines inside mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

·            If you would like to change the default place for Hadoop to deploy local data files insert the following lines inside core-site.xml:
<configuration>
<property>
<name> dfs.data.dir</name>
<value>"DesignatedPath"</value>
</property>
</configuration>
           
10.     Go to the conf folder in the Accumulo directory and copy one of the configuration available in the conf/examples folder to conf folder:
e.g: "cp conf/examples/512MB/native-standalone/* conf"
If you are configuring a larger cluster you will need to create the configuration files yourself and propagate the changes to the $ACCUMULO_HOME/conf directories:
·         Create a "slaves" file in $ACCUMULO_HOME/conf/.
This is a list of machines where tablet servers and loggers will run.
 
·         Create a "masters" file in $ACCUMULO_HOME/conf/.
This is a list of machines where the master server will run. 
·         Create conf/accumulo-env.sh following the template of example/3GB/native-standalone/accumulo-env.sh.
 
11.     Edit the JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME values in conf/accumulo-env.sh and point each of them to their home folder location accordingly:
e.g: ZOOKEEPER_HOME=/specific/disk1/temp/zookeeper

12.     Edit conf/accumulo-site.xml and set the zookeeper servers in the
instance.zookeeper.host property:

 



Edit the value of the property to be the ip's of the machines to run zookeeper (you need at least one computer running zookeeper)
E.g:         1 zookeeper configuration:
<property>
      <name>instance.zookeeper.host</name>
      <value>132.67.104.169:2181</value>
      <description>comma separated list of zookeeper servers</description>
    </property>
E.g:         2 zookeeper configuration:
<property>
      <name>instance.zookeeper.host</name>
      <value>132.67.104.169:2181,132.67.104.158:2181</value>
      <description>comma separated list of zookeeper servers</description>
    </property>


1.        Now let's bring Accumulo server.
Once zookeeper and Hadoop are configured correctly on the machine you may start Zookeeper, Hadoop and Accumulo servers.

Run Zookeeper: bin/zkServer.sh start (you may stop it with: bin/zkServer.sh stop)
 



Run Hadoop:
o    bin/hadoop namenode –format
o    bin/start-all.sh (you may stop it with: bin/stop-all.sh)

 


Run Accumulo:
o    bin/accumulo init (enter the instance id and password in our example we set it to accum/accum)
o    bin/start-all.sh (you may stop it with: bin/stop-all.sh)


2.        You may check that Hadoop runs correctly through the monitor page:
This should look like:
 

Except the port number in our case will be 50070.

3.        You may check that Accumulo runs correctly through the monitor page:
This should look like:
 

No comments:

Post a Comment