Monday, May 13, 2013

Map Reduce Testing Framework - MRUnit




MR Unit


MR Unit is a testing tool used for testing map-reduce job. Normally for a java program we have JUnit which will take the input and checks whether the input for that particular piece of code is emitting the desired output. Like Wise we can give the inputs, outputs for mapper and reducer class and verify it is emitting the desired output or not.

Advantage
1.       We can check our mapper and reducer  from our IDE(say eclipse) itself instead of making it as a jar and run using the hadoop jar command.
2.       Saving a lots of time and resources by preventing a false run on Hadoop cluster which will internally run a map-reduce job.

Jars needed

Download the latest versions of
    1.       MRUnit jar from apache website.
    2.       mokito and JUnit jar from the below given link.

Implementation

Sample input data
CDRID    CDRType              Phone                                  StatusCode
655209     1                           796764372490213             6
353415     0                           356857119806206             4
835699     1                           252280313968413             0

Requirment
                Need to fetch those having CRD Type 1 and noting its status code.

Mapper Class  Implentation
public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
         private Text status = new Text();
         private final static IntWritable addOne = new IntWritable(1);

         /**
          * Returns the SMS status code and its count
          */
         protected void map(LongWritable key, Text value, Context context)
             throws java.io.IOException, InterruptedException {

           //655209;1;796764372490213;804422938115889;6 is the Sample record format
           String[] line = value.toString().split(";");
           // If record is of SMS CDR
           if (Integer.parseInt(line[1]) == 1) {
             status.set(line[4]);
             context.write(status, addOne);
           }
         }
       }

Reducer Class  Implentation

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, InterruptedException {
  int sum = 0;
  for (IntWritable value : values) {
    sum += value.get();
  }
  context.write(key, new IntWritable(sum));
 }
}

MRUnit class implementation

public class MyMapReduceTest {
       MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
       ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
 MapReduceDriver<LongWritable,Text,Text,IntWritable,Text,IntWritable>mapReduceDriver;

  @Before
  public void setUp() {
    MyMapper mapper = new MyMapper ();
    MyReducer reducer = new MyReducer();
    mapDriver = MapDriver.newMapDriver(mapper);
    reduceDriver = ReduceDriver.newReduceDriver(reducer);
    mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
  }

  @Test
  public void testMapper() {
    mapDriver.withInput(new LongWritable(), new Text(
        "655209;1;796764372490213;804422938115889;6"));
    mapDriver.withOutput(new Text("6"), new IntWritable(1));
    mapDriver.runTest();
  }

  @Test
  public void testReducer() {
    List<IntWritable> values = new ArrayList<IntWritable>();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("6"), values);
    reduceDriver.withOutput(new Text("6"), new IntWritable(2));
    reduceDriver.runTest();
  }
}

MapDriver Class
Functionality : This class allows you to test a Mapper instance. You provide the input key and value that should be sent to the Mapper, and outputs you expect to be sent by the Mapper to the collector for those inputs. By calling runTest(),the harness will deliver the input to the Mapper and will check its outputs against the expected results.

Input – Input key and value to mapper
Output – Desired Output <Key, Value> from mapper.

ReduceDriver Class

Functionality : This class allows you to test a Reducer instance. You provide a key and a set of intermediate values for that key that represent inputs that should be sent to the Reducer(as if they came from a Mapper),and outputs you expect to be sent by the Reducer to the collector. By calling runTest(),the harness will deliver the input to the Reducer and will check its outputs against the expected results.

Input – Input key and value to reducer
Output – Desired Output <Key, Value> from reducer.

MapReduceDriver Class

Functionality : This class allows you to test a Mapper and a Reducer instance together You provide the input key and value that should be sent to the Mapper, and outputs you expect to be sent by the Reducer to the collector for those inputs. By calling runTest(),the harness will deliver the input to the Mapper, feed the intermediate results to the Reducer (without checking them), and will check the Reducer's outputs against the expected results.

Input – Key and Value sent to mapper.
Output – Expected result of reducer to the collector.

Conclusion

By calling the runTest() method, it will check the functionality of mapper, Reducer and the whole map-reduce framework. If it is same as the expected one ,then your MR unit test is a success.

No comments:

Post a Comment