Big Data Handling: Apache Accumulo

Introduction:

Accumulo is a sorted, distributed key-value system built on top of Apache Hadoop, ZooKeeper, and Apache Thrift. Accumulo has cell-level access labels and a server-side programming mechanisms.

We will cover the following info regarding Accumulo database:

· Installation and configuration of Accumulo.

· Running Accumulo.

· Example of usage.

· Java API

Installation and configuration of Accumulo:

1. Tool chain requirements for Accumulo are: Java (1.6 and higher) Hadoop, Zookeeper (3.3.3 and higher). In this tutorial we use Java version 1.7.0, Hadoop version 1.0.3, and Zookeeper version 3.4.5.

2. Download Accumulo files from here: http://accumulo.apache.org/downloads/.

3. Extract Accumulo files. e.g. to /specific/disk1/temp/Accumulo/.

4. Download Zookeeper from http://zookeeper.apache.org/.

5. Extract Zookeeper files. e.g. to /specific/disk1/temp/Zookeeper/.

6. Go to the conf folder in the zookeeper directory and create a file called zoo.cfg.

Insert the following lines inside zoo.cfg:

tickTime=2000

maxClientCnxns=100

dataDir=/var/zookeeper

clientPort=2181

# change the var instance to the place you would like zookeeper data file to be placed

# e.g dataDir=/specific/disk1/temp/zookeeper/conf/zookeeper

Save the file and close it.

7. Download Hadoop from http://hadoop.apache.org/.

8. Extract Hadoop files. e.g. to /specific/disk1/temp/Hadoop/.

9. Go to the conf folder in the hadoop directory and edit the following files:

· Insert the following lines inside core-site.xml:

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

· Insert the following lines inside hdfs-site.xml:

<name>dfs.replication</name>

</property>

</configuration>

· Insert the following lines inside mapred-site.xml:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

· If you would like to change the default place for Hadoop to deploy local data files insert the following lines inside core-site.xml:

<value>"DesignatedPath"</value>

</property>

</configuration>

10. Go to the conf folder in the Accumulo directory and copy one of the configuration available in the conf/examples folder to conf folder:

e.g: "cp conf/examples/512MB/native-standalone/* conf"

If you are configuring a larger cluster you will need to create the configuration files yourself and propagate the changes to the $ACCUMULO_HOME/conf directories:

·         Create a "slaves" file in $ACCUMULO_HOME/conf/.

This is a list of machines where tablet servers and loggers will run.

·         Create a "masters" file in $ACCUMULO_HOME/conf/.

This is a list of machines where the master server will run.

·         Create conf/accumulo-env.sh following the template of example/3GB/native-standalone/accumulo-env.sh.

11. Edit the JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME values in conf/accumulo-env.sh and point each of them to their home folder location accordingly:

e.g: ZOOKEEPER_HOME=/specific/disk1/temp/zookeeper

12. Edit conf/accumulo-site.xml and set the zookeeper servers in the

instance.zookeeper.host property:

Edit the value of the property to be the ip's of the machines to run zookeeper (you need at least one computer running zookeeper)

E.g: 1 zookeeper configuration:

<name>instance.zookeeper.host</name>

<description>comma separated list of zookeeper servers</description>

</property>

E.g: 2 zookeeper configuration:

<name>instance.zookeeper.host</name>

<description>comma separated list of zookeeper servers</description>

</property>

Running Accumulo:

1. Now let's bring Accumulo server.

Once zookeeper and Hadoop are configured correctly on the machine you may start Zookeeper, Hadoop and Accumulo servers.

Run Zookeeper: bin/zkServer.sh start (you may stop it with: bin/zkServer.sh stop)

Run Hadoop:

o bin/hadoop namenode –format

o bin/start-all.sh (you may stop it with: bin/stop-all.sh)

Run Accumulo:

o bin/accumulo init (enter the instance id and password in our example we set it to accum/accum)

o bin/start-all.sh (you may stop it with: bin/stop-all.sh)

2. You may check that Hadoop runs correctly through the monitor page:

http://localhost:50070

This should look like:

Except the port number in our case will be 50070.

3. You may check that Accumulo runs correctly through the monitor page:

http://localhost:50095

This should look like:

Big Data Handling

Tuesday, May 6, 2014

Apache Accumulo - Installation

No comments:

Post a Comment

About Me