log4j日志停写问题的排查和原理
一个月前系统开始发生一个奇怪的现象,log4j日志会突然停写,开始并没有很关注,总以为是系统的原因,后来发现在多台机器上依然发生,这个问题开始被重视。
?
说一下问题排查的路径:
?
?
Cloner cloner = new Cloner();PricingCpmTrans pricingCpmTrans = cloner.deepClone(oldPricingCpmTrans);?这里使用了一个第三方jar,Cloner(罪魁祸首啊),而在PricingCpmTrans里有一个Log对象
private static native void initIDs(); static {initIDs(); }?JNIEXPORT void JNICALL Java_sun_nio_ch_IOUtil_initIDs(JNIEnv *env, jclass clazz) { OSVERSIONINFO ver; ver.dwOSVersionInfoSize = sizeof(ver); GetVersionEx(&ver); if (ver.dwPlatformId == VER_PLATFORM_WIN32_NT) { onNT = JNI_TRUE; } clazz = (*env)->FindClass(env, "java/io/FileDescriptor"); fd_fdID = (*env)->GetFieldID(env, clazz, "fd", "I"); handle_fdID = (*env)->GetFieldID(env, clazz, "handle", "J"); }? protected void finalize() throws IOException { if (fd != null) { if (fd == FileDescriptor.out || fd == FileDescriptor.err) { flush(); } else { /* * Finalizer should not release the FileDescriptor if another * stream is still using it. If the user directly invokes * close() then the FileDescriptor is also released. */ runningFinalize.set(Boolean.TRUE); try { close(); } finally { runningFinalize.set(Boolean.FALSE); } } } } private native void close0() throws IOException;?系统回收调用了close方法java.io.IOException: 句柄无效。at java.io.FileOutputStream.writeBytes(Native Method)at java.io.FileOutputStream.write(FileOutputStream.java:282)at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:57)?
再看下文件描述符的结构图,可能更好理解了
?
?

进程到文件表的关系还在,但文件表到具体的文件节点的关系已经被丢失了,导致日志无法输出。
?
?
总结下这次的经验教训:
?
?