Hadoop中map/reduce编程中关于mapper和reducer的Format问题(转)
http://www.linuxidc.com/Linux/2012-01/51627.htm
Hadoop中的map/reduce编程中有几个非常关键的组件,其中包括 Mapper,Reducer,InputFormat,OutputFormat,OutputKeyClass,OutputValueClass 等,在刚接触map/reduce编程的时候很容易由于 InputFormat,OutputFormat,OutputKeyClass,OutputValueClass在程序中的设置 不正确导致程序出错,有时候试来试去没准能让程序跑的通过,但是结果不一定如预期的那样,即使结果也如预期,但是也没有明白hadoop 的mapred框架内部到底是如何处理的细节,等下次编程的时候还是会面临相同的问题,所以干脆一下狠心,开始研究hadoop的源码了解其实现细节。做笔记 如下:
程序出错描述
以一个mapreduce运行实例 为例:
比如运行命令 :
bin/hadoop jar hadoop-0.19.0-streaming.jar -input /home/luoli/input/goodman -output /home/luoli/output -mapper org.apache.hadoop.mapred.lib.IdentityMapper -reducer /usr/bin/wc
java .io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:548) at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155)
public void map(K key, V val, OutputCollector<K, V> output, Reporter reporter) throws IOException { output.collect(key, val); }public synchronized void collect(K key, V value) throws IOException {...}if (key.getClass() != keyClass) { throw new IOException("Type mismatch in key from map: expected " + keyClass.getName() + ", recieved " + key.getClass().getName()); } if (value.getClass() != valClass) { throw new IOException("Type mismatch in value from map: expected " + valClass.getName() + ", recieved " + value.getClass().getName()); }keyClass = (Class<K>)job.getMapOutputKeyClass();valClass = (Class<V>)job.getMapOutputValueClass();
public Class<?> getMapOutputKeyClass() { Class<?> retv = getClass("mapred.mapoutput.key.class", null, Object.class); if (retv == null) { retv = getOutputKeyClass(); } return retv; }public Class<?> getOutputKeyClass() { return getClass("mapred.output.key.class", LongWritable.class, Object.class); }在bin/hadoop jar hadoop-0.19.0-streaming.jar -input /home/luoli/input/goodman -output /home/luoli/output -mapper org.apache.hadoop.mapred.lib.IdentityMapper -reducer /usr/bin/wc命令后加上-jobconf 设置,并指定map的输出时准备的key类型是LongWritable类型的。-jobconf mapred.mapoutput.key.class=org.apache.hadoop.io.LongWritable,或者是-D mapred.mapoutput.key.class=org.apache.hadoop.io.LongWritable.
@Overridepublic void map(LongWritable key, Text value,Context context)//innerclass Contextthrows IOException, InterruptedException { super.map(key,value,context);//就是这一句了