本文共 3160 字,大约阅读时间需要 10 分钟。
0、本文承接上文
1、环境配置,本文的开发环境请直接参考
2、Mapper,Reducer参数解释
import java.io.IOException;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;public class Point3DMapper extends Mapper{ protected void map(Text key, Point3D value, Context context) throws IOException, InterruptedException{ context.write(key, value); }}
进入Mapper源代码可以看到
public class Mapper{ public class Context extends MapContext { public Context(Configuration conf, TaskAttemptID taskid, RecordReader reader, RecordWriter writer, OutputCommitter committer, StatusReporter reporter, InputSplit split) throws IOException, InterruptedException { super(conf, taskid, reader, writer, committer, reporter, split); } } /** * Called once at the beginning of the task. */ protected void setup(Context context ) throws IOException, InterruptedException { // NOTHING } /** * Called once for each key/value pair in the input split. Most applications * should override this, but the default is the identity function. */ @SuppressWarnings("unchecked") protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { context.write((KEYOUT) key, (VALUEOUT) value); }
即,/**
* Text -> KEYIN* Point3D -> VALUEIN* Text -> KEYOUT* Point3D -> VALUEOUT**/Mapper
而,// Text -> ball etc. -> KEYOUT
// value -> .5, 12.7, 9.0 etc. -> VALUEOUTcontext.write(key, value);
同理,我们查看Reducer源代码,可以看到,Mapper的KEYOUT类型和VALUEOUT类型,必须对应Reducer的KEYIN类型和VLUEIN类型。
public class Reducer{ public class Context extends ReduceContext { public Context(Configuration conf, TaskAttemptID taskid, RawKeyValueIterator input, Counter inputKeyCounter, Counter inputValueCounter, RecordWriter output, OutputCommitter committer, StatusReporter reporter, RawComparator comparator, Class keyClass, Class valueClass ) throws IOException, InterruptedException { super(conf, taskid, input, inputKeyCounter, inputValueCounter, output, committer, reporter, comparator, keyClass, valueClass); } } /** * Called once at the start of the task. */ protected void setup(Context context ) throws IOException, InterruptedException { // NOTHING } /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. */ @SuppressWarnings("unchecked") protected void reduce(KEYIN key, Iterable values, Context context ) throws IOException, InterruptedException { for(VALUEIN value: values) { context.write((KEYOUT) key, (VALUEOUT) value); } }
本文的目的是因为有些时候文档不是很清楚,而最好的方法是看源代码。“源码之前,了无秘密”。
对于MapReduce的细致执行流程,我推荐看例子经典,博客作者一流的。相信看完会有所收获!
转载地址:http://jvwob.baihongyu.com/