Getting file name of input block to Mapper in Hadoop
Hadoop by default splits the input
file into 64MB blocks and each block will be processed by a mapper task. For
gathering the metrics across each file and not the entire set of files, it's
required to get the file name within the mapper. Here is how to extract the
file name of the split being processed.
Using the old MR API
Add the below to the mapper class.
Using the old MR API
Add the below to the mapper class.
|
String
fileName = new String();
public
void configure(JobConf job)
{
filename
= job.get("map.input.file");
}
|
Using the new MR API
Add the below to the mapper class.
|
String
fileName = new String();
protected
void setup(Context context) throws java.io.IOException,
java.lang.InterruptedException
{
fileName
= ((FileSplit) context.getInputSplit()).getPath().toString();
}
|
Now the String fileName can be used in
the mapper code
No comments:
Post a Comment