多region下的hbase写入问题
最近在集群上发现hbase写入性能受到较大下降,测试环境下没有该问题产生。而生产环境和测试环境的区别之一是生产环境的region数量远远多于测试环境,单台regionserver服务了约3500个region。
通过jstack工具检查到大半写入线程BLOCKED状态在"public synchronized void reclaimMemStoreMemory() {"这一行,这是在put之前的一个检查过程。
hbase在每次put以前,需要检查当前regionserver上的memstore是否超过总memstore阀值,如果超过,需要block住当前的写入,防止OOM,代码片段见下:
/** * Check if the regionserver's memstore memory usage is greater than the * limit. If so, flush regions with the biggest memstores until we're down * to the lower limit. This method blocks callers until we're down to a safe * amount of memstore consumption. */ public synchronized void reclaimMemStoreMemory() { if (isAboveHighWaterMark()) { lock.lock(); try { while (isAboveHighWaterMark() && !server.isStopped()) { wakeupFlushThread(); try { // we should be able to wait forever, but we've seen a bug where // we miss a notify, so put a 5 second bound on it at least. flushOccurred.await(5, TimeUnit.SECONDS); } catch (InterruptedException ie) { Thread.currentThread().interrupt(); } } } finally { lock.unlock(); } } else if (isAboveLowWaterMark()) { wakeupFlushThread(); } } private boolean isAboveHighWaterMark() { return server.getGlobalMemStoreSize() >= globalMemStoreLimit; } public long getGlobalMemStoreSize() { long total = 0; for (HRegion region : onlineRegions.values()) { total += region.memstoreSize.get(); } return total; }