首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

编撰简单的Mapreduce程序并部署在Hadoop2.2.0上运行

2013-10-30 
编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行经过几天的折腾,终于配置好了Hadoop2.2.0(如何配置在Li

编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行

        经过几天的折腾,终于配置好了Hadoop2.2.0(如何配置在Linux平台部署Hadoop请参见本博客《在Fedora上部署Hadoop2.2.0伪分布式平台》),今天主要来说说怎么在Hadoop2.2.0伪分布式上面运行我们写好的Mapreduce程序。先给出这个程序所依赖的Maven包:

01020304050607080910111213141516171819202122<dependencies>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-mapreduce-client-core</artifactId>        <version>2.1.1-beta</version>    </dependency>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-common</artifactId>        <version>2.1.1-beta</version>    </dependency>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-mapreduce-client-common</artifactId>        <version>2.1.1-beta</version>    </dependency>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-mapreduce-client-jobclient</artifactId>        <version>2.1.1-beta</version>    </dependency></dependencies>

好了,现在给出程序,代码如下:

001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109110111112113114115package com.wyp.hadoop; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.*; import java.io.IOException; /** * User: wyp * Date: 13-10-25 * Time: 下午3:26 * Email:wyphao.2007@163.com */public class MaxTemperatureMapper extends MapReduceBase                       implements Mapper<LongWritable, Text,                       Text,IntWritable>{    private static final int MISSING = 9999;     @Override    public void map(LongWritable key, Text value,                       OutputCollector<Text, IntWritable> output,                       Reporter reporter) throws IOException {         String line = value.toString();        String year = line.substring(15, 19);        int airTemperature;        if(line.charAt(87) == '+'){            airTemperature = Integer.parseInt(line.substring(88, 92));        }else{            airTemperature = Integer.parseInt(line.substring(87, 92));        }         String quality = line.substring(92, 93);        if(airTemperature != MISSING && quality.matches("[01459]")){            output.collect(new Text(year), new IntWritable(airTemperature));        }    }} package com.wyp.hadoop; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter; import java.io.IOException;import java.util.Iterator; /** * User: wyp * Date: 13-10-25 * Time: 下午3:36 * Email:wyphao.2007@163.com */public class MaxTemperatureReducer extends MapReduceBase                     implements Reducer<Text, IntWritable,                     Text, IntWritable> {    @Override    public void reduce(Text key, Iterator<IntWritable> values,                     OutputCollector<Text, IntWritable> output,                     Reporter reporter) throws IOException {        int maxValue = Integer.MIN_VALUE;        while (values.hasNext()){            maxValue = Math.max(maxValue, values.next().get());        }         output.collect(key, new IntWritable(maxValue));    }} package com.wyp.hadoop; import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf; import java.io.IOException; /** * User: wyp * Date: 13-10-25 * Time: 下午3:40 * Email:wyphao.2007@163.com */public class MaxTemperature {     public static void main(String[] args) throws IOException {        if(args.length != 2){            System.err.println("Error!");            System.exit(1);        }         JobConf conf = new JobConf(MaxTemperature.class);        conf.setJobName("Max Temperature");         FileInputFormat.addInputPath(conf, new Path(args[0]));        FileOutputFormat.setOutputPath(conf, new Path(args[1]));        conf.setMapperClass(MaxTemperatureMapper.class);        conf.setReducerClass(MaxTemperatureReducer.class);        conf.setOutputKeyClass(Text.class);        conf.setOutputValueClass(IntWritable.class);         JobClient.runJob(conf);     }}

  将上面的程序编译和打包成jar文件,然后开始在Hadoop2.2.0(本文假定用户都部署好了Hadoop2.2.0)上面部署了。下面主要讲讲如何去部署:
  首先,启动Hadoop2.2.0,命令如下:

12[wyp@wyp hadoop]$ sbin/start-dfs.sh [wyp@wyp hadoop]$ sbin/start-yarn.sh

  如果你想看看Hadoop2.2.0是否运行成功,运行下面的命令去查看

123456789[wyp@wyp hadoop]$ jps9582 Main9684 RemoteMavenServer16082 Jps7011 DataNode7412 ResourceManager7528 NodeManager7222 SecondaryNameNode6832 NameNode

  其中jps是jdk自带的一个命令,在jdk/bin目录下。如果你电脑上面出现了以上的几个进程(NameNode、SecondaryNameNode、NodeManager、ResourceManager、DataNode这五个进程必须出现!)说明你的Hadoop服务器启动成功了!现在来运行上面打包好的jar文件(这里为Hadoop.jar,其中/home/wyp/IdeaProjects/Hadoop/out/artifacts/Hadoop_jar/Hadoop.jar是它的绝对路径,不知道绝对路径是什么?那你好好去学学吧!),运行下面的命令:

12345[wyp@wyp Hadoop_jar]$ /home/wyp/Downloads/hadoop/bin/hadoop jar \           /home/wyp/IdeaProjects/Hadoop/out/artifacts/Hadoop_jar/Hadoop.jar  \           com/wyp/hadoop/MaxTemperature \           /user/wyp/data.txt \           /user/wyp/result

  (上面是一条命令,由于太长了,所以我分行写,在实际情况中,请写一行!)其中,/home/wyp/Downloads/hadoop/bin/hadoop是hadoop的绝对路径,如果你在环境变量中配置好hadoop命令的路径就不需要这样写;com/wyp/hadoop/MaxTemperature是上面程序的main函数的入口;/user/wyp/data.txt是Hadoop文件系统(HDFS)中的绝对路径(注意:这里不是你Linux系统中的绝对路径!),为需要分析文件的路径(也就是input);/user/wyp/result是分析结果输出的绝对路径(注意:这里不是你Linux系统中的绝对路径!而是HDFS上面的路径!而且/user/wyp/result一定不能存在,否则会抛出异常!这是Hadoop的保护机制,你总不想你以前运行好几天的程序突然被你不小心给覆盖掉了吧?所以,如果/user/wyp/result存在,程序会抛出异常,很不错啊)。好了。输入上面的命令,应该会得到下面类似的输出:

  在网上Google了一下,找到别人的观点:
  经个人总结,这通常是由于以下几种原因造成的:
(1)你编写了一个java lib,封装成了jar,然后再写了一个Hadoop程序,调用这个jar完成mapper和reducer的编写
(2)你编写了一个Hadoop程序,期间调用了一个第三方java lib。
之后,你将自己的jar包或者第三方java包分发到各个TaskTracker的HADOOP_HOME目录下,运行你的JAVA程序,报了以上错误。

  那怎么解决呢?一个笨重的方法是,在运行Hadoop作业的时候,先运行下面的命令:

12[wyp@wyp Hadoop_jar]$ export \    HADOOP_CLASSPATH=/home/wyp/IdeaProjects/Hadoop/out/artifacts/Hadoop_jar/

  其中,/home/wyp/IdeaProjects/Hadoop/out/artifacts/Hadoop_jar/是上面Hadoop.jar文件所在的目录。好了,现在再运行一下Hadoop作业命令:

0102030405060708091011[wyp@wyp Hadoop_jar]$ hadoop fs -ls /user/wypFound 2 items-rw-r--r--   1 wyp supergroup    1777168 2013-10-25 17:44 /user/wyp/data.txtdrwxr-xr-x   - wyp supergroup          0 2013-10-28 15:35 /user/wyp/result[wyp@wyp Hadoop_jar]$ hadoop fs -ls /user/wyp/resultFound 2 items-rw-r--r--   1 wyp supergroup    0 2013-10-28 15:35 /user/wyp/result/_SUCCESS-rw-r--r--   1 wyp supergroup  18 2013-10-28 15:35 /user/wyp/result/part-00000[wyp@wyp Hadoop_jar]$ hadoop fs -cat  /user/wyp/result/part-000001901    3171902    244

  到此,你自己写好的一个Mapreduce程序终于成功运行了!
  附程序测试的数据的下载地址:http://pan.baidu.com/s/1iSacM

转载请注明: 转载自过往记忆(http://www.wypblog.com/)
本文链接地址: 编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行(http://www.wypblog.com/archives/789)

热点排行