首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 其他教程 > 开源软件 >

hadoop 2.0.1 使用distcp有关问题解决

2012-10-05 
hadoop 2.0.1 使用distcp问题解决幸幸苦苦安装好了新版hadoop,然后辛辛苦苦调通,可以跑mr了。然后用distcp

hadoop 2.0.1 使用distcp问题解决
幸幸苦苦安装好了新版hadoop,
然后辛辛苦苦调通,可以跑mr了。

然后用distcp从1.0.3的集群拷数据到2.0.1的集群中。
首先是由于版本不同,不能用hdfs协议直接考,需要用http协议。
即不能用 distcp hdfs://src:54310/foo hdfs://dst:54310/
而要用 distcp hftp://src:50070/foo hdfs://dst:54310/
注意端口号哦。

然后,要在目的集群上执行该命令,也就是在2.0.1的集群上执行。

最后,尼玛碰到一个checksum mismatch的错误。
        Caused by: java.io.IOException: Check-sum mismatch between hftp://src:50070/foo/yyy.yy and hdfs://dst:54310/foo/xxx.xx
网上搜了半天,这个问题发生的概率真是太小,几乎没有人问。
后来终于在cloudera的网站上找到了解决方法:
Error: java.io.IOException: File copy failed: hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 --> hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 to hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 10 moreCaused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: HTTP_OK expected, received 500 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 moreCaused by: java.io.IOException: HTTP_OK expected, received 500 at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381) at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121) at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158) at java.io.DataInputStream.read(DataInputStream.java:132) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198)

目前还没有找到好解决办法。

热点排行