首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

Hadoop学习小结:Hadoop的运行痕迹

2012-10-28 
Hadoop学习总结:Hadoop的运行痕迹原文地址:http://www.cnblogs.com/end/archive/2011/04/26/2029497.html?

Hadoop学习总结:Hadoop的运行痕迹

原文地址:http://www.cnblogs.com/end/archive/2011/04/26/2029497.html

?

?

?

Hadoop 学习总结之一:HDFS简介

Hadoop学习总结之二:HDFS读写过程解析

Hadoop学习总结之三:Map-Reduce入门

Hadoop学习总结之四:Map-Reduce的过程解析?

?

在使用hadoop的时候,可能遇到各种各样的问题,然而由于hadoop的运行机制比较复杂,因而出现了问题的时候比较难于发现问题。

本文欲通过某种方式跟踪Hadoop的运行痕迹,方便出现问题的时候可以通过这些痕迹来解决问题。

一、环境的搭建

为了能够跟踪这些运行的痕迹,我们需要搭建一个特殊的环境,从而可以一步步的查看上一节提到的一些关键步骤所引起的变化。

我 们首先搭建一个拥有一个NameNode(namenode:192.168.1.104),三个 DataNode(datanode01:192.168.1.105, datanode02:192.168.1.106, datanode03:192.168.1.107)的Hadoop环境,其中SecondaryNameNode和NameNode运行在同一台机器 上。

对于这四台机器上的Hadoop,我们需要进行如下相同的配置:

    NameNode,SeondaryNameNode,JobTracker都应该运行在namenode:192.168.1.104机器上DataNode,TaskTracker,以及生成的Map和Reduce的Task JVM应该运行在datanode01, datanode02, datanode03上数据共有三份备份HDFS以及Map-Reduce运行的数据放在/data/hadoop/dir/tmp文件夹下

    <property>

    ? <name>fs.default.name</name>

    ? <value>hdfs://192.168.1.104:9000</value>

    </property>

    <property>

    ? <name>mapred.job.tracker</name>

    ? <value>192.168.1.104:9001</value>

    </property>

    <property>

    ? <name>dfs.replication</name>

    ? <value>3</value>

    </property>

    <property>

    ? <name>hadoop.tmp.dir</name>

    ? <value>/data/hadoopdir/tmp</value>

    ? <description>A base for other temporary directories.</description>

    </property>

    然而由于Map-Reduce过程相对复杂,为了能够对Map和Reduce的Task JVM进行远程的调试,从而能一步一步观察,因而对NameNode和三个DataNode有一些不同的配置:

    对于NameNode:

      设 置mapred.job.reuse.jvm.num.tasks为-1,使得多个运行于同一个DataNode上的Map和Reduce的Task共用 同一个JVM,从而方便对此JVM进行远程调试,并且不会因为多个Task JVM监听同一个远程调试端口而发生冲突对于mapred.tasktracker.map.tasks.maximum和mapred.tasktracker.reduce.tasks.maximum的配置以DataNode上的为准设置io.sort.mb为1M(原来为100M),是为了在Map阶段让内存中的map output尽快的spill到文件中来,从而我们可以观察map的输出设置mapred.child.java.opts的时候,即设置Task JVM的运行参数,添加远程调试监听端口8333

      ? <property>
      ??? <name>mapred.job.reuse.jvm.num.tasks</name>
      ??? <value>-1</value>
      ??? <description></description>
      ? </property>
      ? <property>
      ??? <name>mapred.tasktracker.map.tasks.maximum</name>
      ??? <value>1</value>
      ??? <description></description>
      ? </property>
      ? <property>
      ??? <name>mapred.tasktracker.reduce.tasks.maximum</name>
      ??? <value>1</value>
      ??? <description></description>
      ? </property>
      ? <property>
      ??? <name>io.sort.mb</name>
      ??? <value>1</value>
      ??? <description></description>
      ? </property>
      ? <property>
      ??? <name>mapred.child.java.opts</name>
      ??? <value>-Xmx200m -agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y</value>
      ??? <description></description>
      ? </property>

      ? <property>
      ??? <name>mapred.job.shuffle.input.buffer.percent</name>
      ??? <value>0.001</value>
      ??? <description></description>
      ? </property>

      <property>
      ??? <name>mapred.job.shuffle.merge.percent</name>
      ??? <value>0.001</value>
      ??? <description></description>
      ??? </property>

      ? <property>
      ??? <name>io.sort.factor</name>
      ??? <value>2</value>
      ??? <description></description>
      ? </property>

      对于DataNode:

        对 于datanode01:192.168.1.105,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为1,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为0对于 datanode02:192.168.1.106,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为0,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为0对于 datanode02:192.168.1.107,设置同时运行的map task的个数(mapred.tasktracker.map.tasks.maximum)为0,同时运行的reduce task的个数(mapred.tasktracker.reduce.tasks.maximum)为1之所以这样设置,是因为 我们虽然可以控制多个Map task共用同一个JVM,然而我们不能控制Map task和Reduce Task也共用一个JVM。从而当Map task的JVM和Reduce Task的JVM同时在同一台机器上启动的时候,仍然会出现监听远程调用端口冲突的问题。经过上面的设置,从而datanode01专门负责运行Map Task,datanode03专门负责运行Reduce Task,而datanode02不运行任何的Task,甚至连TaskTracker也不用启动了对 于Reduce Task设置mapred.job.shuffle.input.buffer.percent和 mapred.job.shuffle.merge.percent为0.001,从而使得拷贝,合并阶段的中间结果都因为内存设置过小而写入硬盘,我们 能够看到痕迹设置io.sort.factor为2,使得在map task输出不多的情况下,也能触发合并。

        除了对Map task和Reduce Task进行远程调试之外,我们还想对NameNode,SecondaryName,DataNode,JobTracker,TaskTracker进行远程调试,则需要修改一下bin/hadoop文件:

        if [ "$COMMAND" = "namenode" ] ; then

        ? CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'

        ? HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8888,server=y,suspend=n"

        elif [ "$COMMAND" = "secondarynamenode" ] ; then

        ? CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'

        ? HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8887,server=y,suspend=n"

        elif [ "$COMMAND" = "datanode" ] ; then

        ? CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'

        ? HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS -agentlib:jdwp=transport=dt_socket,address=8886,server=y,suspend=n"

        ……

        elif [ "$COMMAND" = "jobtracker" ] ; then

        ? CLASS=org.apache.hadoop.mapred.JobTracker

        ? HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOBTRACKER_OPTS -agentlib:jdwp=transport=dt_socket,address=8885,server=y,suspend=n"

        elif [ "$COMMAND" = "tasktracker" ] ; then

        ? CLASS=org.apache.hadoop.mapred.TaskTracker

        ? HADOOP_OPTS="$HADOOP_OPTS $HADOOP_TASKTRACKER_OPTS -agentlib:jdwp=transport=dt_socket,address=8884,server=y,suspend=n"

        在进行一切实验之前,我们首先清空/data/hadoopdir/tmp以及logs文件夹。

        ?

        二、格式化HDFS

        格式化HDFS需要运行命令:bin/hadoop namenode –format

        于是打印出如下的日志:

        10/11/20 19:52:21 INFO namenode.NameNode: STARTUP_MSG:
        /************************************************************
        STARTUP_MSG: Starting NameNode
        STARTUP_MSG:?? host = namenode/192.168.1.104
        STARTUP_MSG:?? args = [-format]
        STARTUP_MSG:?? version = 0.19.2
        STARTUP_MSG:?? build =?https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.19?-r 789657; compiled by 'root' on Tue Jun 30 12:40:50 EDT 2009
        ************************************************************/
        10/11/20 19:52:21 INFO namenode.FSNamesystem: fsOwner=admin,sambashare
        10/11/20 19:52:21 INFO namenode.FSNamesystem: supergroup=supergroup
        10/11/20 19:52:21 INFO namenode.FSNamesystem: isPermissionEnabled=true
        10/11/20 19:52:21 INFO common.Storage: Image file of size 97 saved in 0 seconds.
        10/11/20 19:52:21 INFO common.Storage: Storage directory?/data/hadoopdir/tmp/dfs/name?has been successfully formatted.
        10/11/20 19:52:21 INFO namenode.NameNode: SHUTDOWN_MSG:
        /************************************************************
        SHUTDOWN_MSG: Shutting down NameNode at namenode/192.168.1.104
        ************************************************************/

        这个时候在NameNode的/data/hadoopdir/tmp下面出现如下的文件树形结构:

        +- dfs
        ?????? +- name
        ????????????? +--- current
        ???????????????????????? +---- edits
        ???????????????????????? +---- fsimage
        ???????????????????????? +---- fstime
        ???????????????????????? +---- VERSION
        ????????????? +---image
        ???????????????????????? +---- fsimage

        这个时候,DataNode的/data/hadoopdir/tmp中还是空的。

        ?

        二、启动Hadoop

        启动Hadoop需要调用命令bin/start-all.sh,输出的日志如下:

        starting namenode, logging to logs/hadoop-namenode-namenode.out

        192.168.1.106: starting datanode, logging to logs/hadoop-datanode-datanode02.out

        192.168.1.105: starting datanode, logging to logs/hadoop-datanode-datanode01.out

        192.168.1.107: starting datanode, logging to logs/hadoop-datanode-datanode03.out

        192.168.1.104: starting secondarynamenode, logging to logs/hadoop-secondarynamenode-namenode.out

        starting jobtracker, logging to logs/hadoop-jobtracker-namenode.out

        192.168.1.106: starting tasktracker, logging to logs/hadoop-tasktracker-datanode02.out

        192.168.1.105: starting tasktracker, logging to logs/hadoop-tasktracker-datanode01.out

        192.168.1.107: starting tasktracker, logging to logs/hadoop-tasktracker-datanode03.out

        从日志中我们可以看出,此脚本启动了NameNode, 三个DataNode,SecondaryName,JobTracker以及三个TaskTracker.

        下面我们分别从NameNode和三个DataNode中运行jps -l,看看到底运行了那些java程序:

        在NameNode中:

        ?

        22214 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode

        22107 org.apache.hadoop.hdfs.server.namenode.NameNode

        22271 org.apache.hadoop.mapred.JobTracker

        在datanode01中:

        12580 org.apache.hadoop.mapred.TaskTracker

        12531 org.apache.hadoop.hdfs.server.datanode.DataNode

        在datanode02中:

        10548 org.apache.hadoop.hdfs.server.datanode.DataNode

        在datanode03中:

        12593 org.apache.hadoop.hdfs.server.datanode.DataNode

        12644 org.apache.hadoop.mapred.TaskTracker

        同我们上面的配置完全符合。

        当启动了Hadoop以后,/data/hadoopdir/tmp目录也发生了改变,通过ls -R我们可以看到。

        对于NameNode:

          在name文件夹中,多了in_use.lock文件,说明NameNode已经启动了多了nameseondary文件夹,用于存放SecondaryNameNode的数据

          ?

          .:

          dfs

          ./dfs:

          name??namesecondary

          ./dfs/name:

          current? image??in_use.lock

          ./dfs/name/current:

          edits? fsimage? fstime? VERSION

          ./dfs/name/image:

          fsimage

          ./dfs/namesecondary:

          current? image??in_use.lock

          ./dfs/namesecondary/current:

          edits? fsimage? fstime? VERSION

          ./dfs/namesecondary/image:

          fsimage

          对于DataNode:

            多了dfs和mapred两个文件夹dfs文件夹用于存放HDFS的block数据的mapred用于存放Map-Reduce Task任务执行所需要的数据的。

            ?

            .:

            dfs??mapred

            ./dfs:

            data

            ./dfs/data:

            current? detach? in_use.lock? storage? tmp

            ./dfs/data/current:

            dncp_block_verification.log.curr? VERSION

            ./dfs/data/detach:

            ./dfs/data/tmp:

            ./mapred:

            local

            ./mapred/local:

            ?

            当然随着Hadoop的启动,logs文件夹下也多个很多的日志:

            在NameNode上,日志有:

              NameNode的日志:
                hadoop-namenode-namenode.log此为log4j的输出日志hadoop-namenode-namenode.out此为stdout和stderr的输出日志SecondaryNameNode的日志:
                  hadoop-secondarynamenode-namenode.log此为log4j的输出日志hadoop-secondarynamenode-namenode.out此为stdout和stderr的输出日志JobTracker的日志:
                    hadoop-jobtracker-namenode.log此为log4j的输出日志hadoop-jobtracker-namenode.out此为stdout和stderr的输出日志

                    在DataNode上的日志有(以datanode01为例子):

                      DataNode的日志
                        hadoop-datanode-datanode01.log此为log4j的输出日志hadoop-datanode-datanode01.out此为stdout和stderr的输出日志TaskTracker的日志
                          hadoop-tasktracker-datanode01.log此为log4j的输出日志hadoop-tasktracker-datanode01.out此为stdout和stderr的输出日志

                          下面我们详细查看这些日志中的有重要意义的信息:

                          在hadoop-namenode-namenode.log文件中,我们可以看到NameNode启动的过程:

                          Namenode up at: namenode/192.168.1.104:9000

                          //文件的数量

                          Number of files = 0

                          Number of files under construction = 0

                          //加载fsimage和edits文件形成FSNamesystem

                          Image file of size 97 loaded in 0 seconds.

                          Edits file /data/hadoopdir/tmp/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds.

                          Image file of size 97 saved in 0 seconds.

                          Finished loading FSImage in 12812 msecs

                          //统计block的数量和状态

                          Total number of blocks = 0

                          Number of invalid blocks = 0

                          Number of under-replicated blocks = 0

                          Number of? over-replicated blocks = 0

                          //离开safe mode

                          Leaving safe mode after 12 secs.

                          //注册DataNode

                          Adding a new node: /default-rack/192.168.1.106:50010

                          Adding a new node: /default-rack/192.168.1.105:50010

                          Adding a new node: /default-rack/192.168.1.107:50010

                          在hadoop-secondarynamenode-namenode.log文件中,我们可以看到SecondaryNameNode的启动过程:

                          Secondary Web-server up at: 0.0.0.0:50090

                          //进行Checkpoint的周期

                          Checkpoint Period?? :3600 secs (60 min)

                          Log Size Trigger??? :67108864 bytes (65536 KB)

                          //进行一次checkpoint,从NameNode下载fsimage和edits

                          Downloaded file fsimage size 97 bytes.

                          Downloaded file edits size 370 bytes.

                          //加载edit文件,进行合并,将合并后的fsimage保存,我们可以看到fsimage变大了

                          Edits file /data/hadoopdir/tmp/dfs/namesecondary/current/edits of size 370 edits # 6 loaded in 0 seconds.

                          Image file of size 540 saved in 0 seconds.

                          //此次checkpoint结束

                          Checkpoint done. New Image Size: 540

                          在hadoop-jobtracker-namenode.log文件中,我们可以看到JobTracker的启动过程:

                          JobTracker up at: 9001

                          JobTracker webserver: 50030

                          //清除HDFS中的/data/hadoopdir/tmp/mapred/system文件夹,是用于Map-Reduce运行过程中保存数据的

                          Cleaning up the system directory

                          //不断的从TaskTracker收到heartbeat,第一次是注册TaskTracker

                          Got heartbeat from: tracker_datanode01:localhost/127.0.0.1:58297

                          Adding a new node: /default-rack/datanode01

                          Got heartbeat from: tracker_datanode03:localhost/127.0.0.1:37546

                          Adding a new node: /default-rack/datanode03

                          在hadoop-datanode-datanode01.log中,可以看到DataNode的启动过程:

                          //格式化DataNode存放block的文件夹

                          Storage directory /data/hadoopdir/tmp/dfs/data is not formatted.

                          Formatting ...

                          //启动DataNode

                          Opened info server at 50010

                          Balancing bandwith is 1048576 bytes/s

                          Initializing JVM Metrics with processName=DataNode, sessionId=null

                          //向NameNode注册此DataNode

                          dnRegistration = DatanodeRegistration(datanode01:50010, storageID=, infoPort=50075, ipcPort=50020)

                          New storage id DS-1042573498-192.168.1.105-50010-1290313555129 is assigned to data-node 192.168.1.105:5001

                          DatanodeRegistration(192.168.1.105:50010, storageID=DS-1042573498-192.168.1.105-50010-1290313555129, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/data/hadoopdir/tmp/dfs/data/current'}

                          //启动block scanner

                          Starting Periodic block scanner.

                          在hadoop-tasktracker-datanode01.log中,可以看到TaskTracker的启动过程:

                          //启动TaskTracker

                          Initializing JVM Metrics with processName=TaskTracker, sessionId=

                          TaskTracker up at: localhost/127.0.0.1:58297

                          Starting tracker tracker_datanode01:localhost/127.0.0.1:58297

                          //向JobTracker发送heartbeat

                          Got heartbeatResponse from JobTracker with responseId: 0 and 0 actions

                          一 个特殊的log文件是hadoop-tasktracker-datanode02.log中,因为我们设置的最大Map Task数目和最大Reduce Task数据为0,而报了一个Exception,Can not start task tracker because java.lang.IllegalArgumentException,从而使得datanode02上的TaskTracker没有启动起来。

                          ?

                          当Hadoop启动起来以后,在HDFS中也创建了一些文件夹/data/hadoopdir/tmp/mapred/system,用来保存Map-Reduce运行时候的共享资源。

                          三、向HDFS中放入文件

                          向HDFS中放入文件,需要使用命令:bin/hadoop fs -put inputdata /data/input

                          放入文件完毕后,我们查看HDFS:bin/hadoop fs -ls /data/input,结果为:

                          -rw-r--r--?? 3 hadoop supergroup??? 6119928 2010-11-21 00:47 /data/input/inputdata

                          这个时候,我们查看DataNode下的/data/hadoopdir/tmp文件夹发生了变化:

                            在datanode01, datanode02, datanode03上的/data/hadoopdir/tmp/dfs/data/current下面都多了如下的block文件可见block文件被复制了三份

                            ?

                            .:

                            dfs? mapred

                            ./dfs:

                            data

                            ./dfs/data:

                            current? detach? in_use.lock? storage? tmp

                            ./dfs/data/current:

                            blk_2672607439166801630??blk_2672607439166801630_1002.meta? dncp_block_verification.log.curr? VERSION

                            ./dfs/data/detach:

                            ./dfs/data/tmp:

                            ./mapred:

                            local

                            ./mapred/local:

                            在放入文件的过程中,我们可以看log如下:

                            namenode的hadoop-namenode-namenode.log如下:

                            //创建/data/input/inputdata??????

                            ugi=admin,sambashareip=/192.168.1.104?????? cmd=create????? src=/data/input/inputdata?????? dst=null??????? perm=hadoop:supergroup:rw-r--r--

                            //分配block

                            NameSystem.allocateBlock: /data/input/inputdata.?blk_2672607439166801630_1002

                            NameSystem.addStoredBlock: blockMap updated: 192.168.1.107:50010 is added to?blk_2672607439166801630_1002?size 6119928

                            NameSystem.addStoredBlock: blockMap updated: 192.168.1.105:50010 is added to?blk_2672607439166801630_1002?size 6119928

                            NameSystem.addStoredBlock: blockMap updated: 192.168.1.106:50010 is added to?blk_2672607439166801630_1002?size 6119928

                            datanode01的hadoop-datanode-datanode01.log如下:

                            //datanode01从客户端接收一个block

                            Receiving block blk_2672607439166801630_1002 src: /192.168.1.104:41748 dest: /192.168.1.105:50010

                            src: /192.168.1.104:41748, dest: /192.168.1.105:50010, bytes: 6119928, op: HDFS_WRITE, cliID: DFSClient_-1541812792, srvID: DS-1042573498-192.168.1.105-50010-1290313555129, blockid: blk_2672607439166801630_1002

                            PacketResponder 2 for block blk_2672607439166801630_1002 terminating

                            datanode02的hadoop-datanode-datanode02.log如下:

                            //datanode02从datanode01接收一个block

                            Receiving block blk_2672607439166801630_1002 src: /192.168.1.105:60266 dest: /192.168.1.106:50010

                            src: /192.168.1.105:60266, dest: /192.168.1.106:50010, bytes: 6119928, op: HDFS_WRITE, cliID: DFSClient_-1541812792, srvID: DS-1366730865-192.168.1.106-50010-1290313543717, blockid: blk_2672607439166801630_1002

                            PacketResponder 1 for block blk_2672607439166801630_1002 terminating

                            datanode03的hadoop-datanode-datanode03.log如下:

                            //datanode03从datanode02接收一个block

                            Receiving block blk_2672607439166801630_1002 src: /192.168.1.106:58899 dest: /192.168.1.107:50010

                            src: /192.168.1.106:58899, dest: /192.168.1.107:50010, bytes: 6119928, op: HDFS_WRITE, cliID: DFSClient_-1541812792, srvID: DS-765014609-192.168.1.107-50010-1290313555841, blockid: blk_2672607439166801630_1002

                            PacketResponder 0 for block blk_2672607439166801630_1002 terminating

                            Verification succeeded for blk_2672607439166801630_1002

                            ?

                            四、运行一个Map-Reduce程序

                            运行Map-Reduce函数,需要运行命令:bin/hadoop jar hadoop-0.19.2-examples.jar wordcount /data/input /data/output

                            为了能够观察Map-Reduce一步步运行的情况,我们首先远程调试JobTracker,将断点设置在JobTracker.submitJob函数中。

                            按照我们上一篇文章讨论的那样,DFSClient向JobTracker提交任务之前,会将任务运行所需要的三类文件放入HDFS,从而可被JobTracker和TaskTracker得到:

                              运行的jar文件:job.jar运行所需要的input split的信息:job.split运行所需的配置:job.xml

                              当Map-Reduce程序停在JobTracker.submitJob函数中的时候,让我们查看HDFS中有如下的变化:

                              bin/hadoop fs -ls /data/hadoopdir/tmp/mapred/system

                              其中多了一个文件夹job_201011202025_0001,这是当前运行的Job的ID,在这个文件夹中有三个文件:

                              bin/hadoop fs -ls /data/hadoopdir/tmp/mapred/system/job_201011202025_0001

                              Found 3 items

                              -rw-r--r--? /data/hadoopdir/tmp/mapred/system/job_201011202025_0001/job.jar

                              -rw-r--r--? /data/hadoopdir/tmp/mapred/system/job_201011202025_0001/job.split

                              -rw-r--r--? /data/hadoopdir/tmp/mapred/system/job_201011202025_0001/job.xml

                              现在我们可以断开对JobTracker的远程调试。

                              在JobTracker.submitJob的函数中,会读取这些上传到HDFS的文件,从而将Job拆分成Map Task和Reduce Task。

                              当 TaskTracker通过heartbeat向JobTracker请求一个Map Task或者Reduce Task来运行,按照我们上面的配置,显然datanode01会请求Map Task来执行,而datanode03会申请Reduce Task来执行。

                              ?

                              下面我们首先来看datanode01上Map Task的执行过程:

                              当TaskTracker得到一个Task的时候,它会调用TaskTracker.localizeJob将job运行的三个文件从HDFS中拷贝到本地文件夹,然后调用TaskInProgress.localizeTask创建Task运行的本地工作目录。

                              我 们来远程调试datanode01上的TaskTracker,分别将断点设在localizeJob和localizeTask函数中,当程序停在做完 localizeTask后,我们来看datanode01上的/data/hadoopdir/tmp/mapred/local /taskTracker/jobcache下多了一个文件夹

                              job_201011202025_0001,在此文件夹下面有如下的结构:

                              datanode01:/data/hadoopdir/tmp/mapred/local/taskTracker/jobcache/job_201011202025_0001$ ls -R

                              .:

                              att

热点排行