HBase怎么从Hadoop读取数据，DFSInputStream

2012-09-20

HBase如何从Hadoop读取数据，DFSInputStreamHDFS Client的读取流是从DFSInputStream来获得的，外层做了不少

HBase如何从Hadoop读取数据，DFSInputStream

HDFS Client的读取流是从DFSInputStream来获得的，外层做了不少包装。

从DFSInputStream读取数据有两种方式：

（1）seek(long targetPos) +?read(byte buf[], int off, int len)

（2）read(long position, byte[] buffer, int offset, int length)

第一种适合顺序读取，比如hbase里面的scan请求或者hbase的compact读取。读取的数据量一般都比较大，所以打开预读可以减少磁盘的iops，预读可以见hdfs相关的jira

Add HDFS support for fadvise readahead and drop-behind

https://issues.apache.org/jira/browse/HDFS-2465

Enable fadvise readahead by default?

https://issues.apache.org/jira/browse/HDFS-3697

第二种是随机读取，适合读取少量的数据，比如hbase里面的get请求

这两种读取还有一些不同的地方：

同一个DFSInputStream可以有多个应用程序在使用，但是需要注意下面：

（1）同一个时候只能有一个应用程序在执行seek(long targetPos) +?read(byte buf[], int off, int len)?

（2）在一个应用程序执行seek(long targetPos) +?read(byte buf[], int off, int len) 时，其他应用程序使用这个DFSInputStream执行read(long position, byte[] buffer, int offset, int length)，并且同一时间可以有多个应用程序使用这个DFSInputStream执行read(long position, byte[] buffer, int offset, int length)

TestCase可以见hdfs的org.apache.hadoop.hdfs.TestPread

我们看下hbase是如何来使用的：

pread为true代表随机读取，当请求是get请求时会设置pread为true

0.90.x版本：

BoundedRangeFileInputStream

  @Override  public int read(byte[] b, int off, int len) throws IOException {    if ((off | len | (off + len) | (b.length - (off + len))) < 0) {      throw new IndexOutOfBoundsException();    }    int n = (int) Math.min(Integer.MAX_VALUE, Math.min(len, (end - pos)));    if (n == 0) return -1;    int ret = 0;    if (this.pread) { // 随机读取，第二种方式，可以任意多应用程序同时调用      ret = in.read(pos, b, off, n);    } else {      synchronized (in) { //顺序读取，第一种方式，同一个时候只能有一个应用程序调用，所以这个地方加了锁        in.seek(pos);        ret = in.read(b, off, n);      }    }    if (ret < 0) {      end = pos;      return -1;    }    pos += ret;    return ret;  }

0.94.x版本：

HFileBlock    protected int readAtOffset(FSDataInputStream istream,        byte[] dest, int destOffset, int size,        boolean peekIntoNextBlock, long fileOffset, boolean pread)        throws IOException {      if (peekIntoNextBlock &&          destOffset + size + hdrSize > dest.length) {        // We are asked to read the next block's header as well, but there is        // not enough room in the array.        throw new IOException("Attempted to read " + size + " bytes and " +            hdrSize + " bytes of next header into a " + dest.length +            "-byte array at offset " + destOffset);      }      if (pread) {        // Positional read. Better for random reads.        int extraSize = peekIntoNextBlock ? hdrSize : 0;        int ret = istream.read(fileOffset, dest, destOffset, size + extraSize);  // 随机读取，第二种方式，可以任意多应用程序同时调用        if (ret < size) {          throw new IOException("Positional read of " + size + " bytes " +              "failed at offset " + fileOffset + " (returned " + ret + ")");        }        if (ret == size || ret < size + extraSize) {          // Could not read the next block's header, or did not try.          return -1;        }      } else {        // Seek + read. Better for scanning.        synchronized (istream) { //顺序读取，第一种方式，同一个时候只能有一个应用程序调用，所以这个地方加了锁          istream.seek(fileOffset);          long realOffset = istream.getPos();          if (realOffset != fileOffset) {            throw new IOException("Tried to seek to " + fileOffset + " to "                + "read " + size + " bytes, but pos=" + realOffset                + " after seek");          }          if (!peekIntoNextBlock) {            IOUtils.readFully(istream, dest, destOffset, size);            return -1;          }          // Try to read the next block header.          if (!readWithExtra(istream, dest, destOffset, size, hdrSize))            return -1;        }      }      assert peekIntoNextBlock;      return Bytes.toInt(dest, destOffset + size + BlockType.MAGIC_LENGTH) +          hdrSize;    }

为什么随机读取的时候可以任意多个应用程序使用呢：

（1）seek(long targetPos) +?read(byte buf[], int off, int len)?

第一种，seek很简单改变下pos，read(byte buf[], int off, int len) 里面：重要的是blockSeekTo(long target) ，blockSeekTo第一步做的操作就是看下当前blockReader是否为null，如果不为null那么就close这个BlockReader，

然后new一个BlockReader，向DataNode发送的请求里面包含当前开始读取的位置，以及长度（blk.getNumBytes() - offsetIntoBlock来获得，也就是说长度是当前块剩余可读的数据量），?read(byte buf[], int off, int len) 可以多次调用，直到数据已经读完。在这里我们可以看到new 了一个BlockReader后，没有及时关闭，BlockReader关闭的时机是下次调用seek+read时发现上次new 的BlockReader还在，那么就关闭。所以如果一个应用程序在调用一个DFSInputStream流的seek+read时，如果另外一个应用执行seek+read的话，后面这个应用程序会把前面那个应用程序的BlockReader关闭，导致前面一个应用程序读取不到数据了，所以同一时间只能有一个应用程序调用一个DFSInputStream流的seek+read，除非使用另外一个DFSInputStream流的seek+read

（2）read(long position, byte[] buffer, int offset, int length)

第二种，读取时每次都会new一个BlockReader，向DataNode发送的请求里面包含了读取开始的位置（position），以及长度length，读取完后就会close这个BlockReader

pread Use positional read instead of seek+read (positional is?better doing random reads whereas seek+read is better scanning).

?https://issues.apache.org/jira/browse/HBASE-2180

?/proc/sys/net/ipv4/tcp_tw_recycle

热点排行

开源软件

HBase怎么从Hadoop读取数据，DFSInputStream