Saturday, June 1, 2013

Getting file name of input block to Mapper in Hadoop


Getting file name of input block to Mapper in Hadoop
                         Hadoop by default splits the input file into 64MB blocks and each block will be processed by a mapper task. For gathering the metrics across each file and not the entire set of files, it's required to get the file name within the mapper. Here is how to extract the file name of the split being processed.

Using the old MR API

Add the below to the mapper class.




String fileName = new String();
public void configure(JobConf job)
{
   filename = job.get("map.input.file");
}

Using the new MR API

Add the below to the mapper class.






String fileName = new String();
protected void setup(Context context) throws java.io.IOException, java.lang.InterruptedException
{
   fileName = ((FileSplit) context.getInputSplit()).getPath().toString();
}

Now the String fileName can be used in the mapper code

No comments:

Post a Comment