WAS的异常高CPU占用率分析方法
来源URL:http://blog.csdn.net/fightfordream/article/details/6103850
问题描述?
故障排除
请注意,并非下面所有任务都需要完成。有些问题仅通过执行几项任务就可以解决。
?
为什么发生此问题?
发生此问题有许多原因:WebSphere Application Server 本身、用户创建的线程、不良编码习惯或第三方软件。遗憾的是,证明在什么地方发生此问题有时候非常困难。本模式尝试通过利用特定操作命令和收集数据来帮助排除此问题。
收集高 CPU 占用率的数据
对于有关收集高 CPU 占用率的数据的特定操作信息,请根据您的操作系统执行以下步骤。
?
?
转换为十六进制号码
备注:为协助您计算在本模式中讨论的十六进制值,您可以在 Shell 脚本中使用下列行将十进制号码转换为十六进制号码。如果您使用 Unix 操作系统,那么转换会很方便。
dec2hex.sh:
??? printf "dec -> hex:? %d = %x /n" ${1} ${1}
用法:
??? $ sh dec2hex.sh 755
??? dec -> hex:? 755 = 2f3
?
AIX
这将为您揭示造成高 CPU 占用率问题的线程。
下面是 AIX 系统中上述进程的一个示例:
1、ps -mp 250076 -o THREAD 将显示以下内容:
USER??? PID?? PPID???? TID ST? CP PRI SC??? WCHAN??????? F???? TT BND COMMAND
??? usera 250076 217266??????- ?A?? 38? 60 72??????? *?? 242011? pts/0?? - /WebSphere/jdk130/...
?????? -????? -????? -? 315593 Z??? 0? 97? 1??????? -?? c00007????? -?? - -
?????? -????? -????? -? 344305 S??? 0? 60? 1 f1000089c020e200?? 400400????? -?? - -
?????? -????? -????? -? 499769 S??? 0? 60? 1 f1000089c0213a00?? 400400????? -?? - -
?????? -????? -????? -? 540699 S??? 0? 60? 1 f100008790008440? 8410400????? -?? - -
?????? -????? -????? -? 544789 S??? 0? 60? 1 f100008790008540? 8410400????? -?? - -
?????? -????? -????? -? 548883 S??? 0? 60? 1 f100008790008640? 8410400????? -?? - -
?????? -????? -????? -? 552979 S??? 0? 60? 1 f100008790008740? 8410400????? -?? - -
?????? -????? -????? -? 565283 Z??? 0? 60? 1??????? -?? c00007????? -?? - -
?????? -????? -????? -? 585783 S??? 0? 60? 1 f100008790008f40? 8410400????? -?? - -
?????? -????? -????? -? 589865 Z??? 0? 80? 1??????? -?? c00007????? -?? - -
?????? -????? -????? -? 593959 S??? 1? 60? 1 f100008790009140? 8410400????? -?? - -
?????? -????? -????? -? 610365 S??? 0? 60? 1 f100008790009540? 8410400????? -?? - -
?????? -????? -????? -? 614453 S??? 0? 60? 1 f100008790009640? 8410400????? -?? - -
?????? -????? -????? -? 618547 S??? 0? 60? 1 f100008790009740? 8410400????? -?? - -
?????? -????? -????? -? 622645 S??? 0? 60? 1 f100008790009840? 8410400????? -?? - -
?????? -????? -????? -? 626743 S??? 0? 60? 1 f100008790009940? 8410400????? -?? - -
?????? -????? -????? -? 630841 S??? 0? 60? 1 f100008790009a40? 8410400????? -?? - -
?????? -????? -????? -? 634941 S??? 0? 60? 1 f100008790009b40? 8410400????? -?? - -
?????? -????? -????? -? 639037 S??? 0? 60? 1 f100008790009c40? 8410400????? -?? - -
?????? -????? -????? -? 643135 S??? 0? 60? 1 f100008790009d40? 8410400????? -?? - -
?????? -????? -????? -? 647233 S??? 0? 60? 1 f100008790009e40? 8410400????? -?? - -
?????? -????? -????? -? 651331 S??? 0? 60? 1 f100008790009f40? 8410400????? -?? - -
?????? -????? -????? -? 655429 S??? 0? 60? 1 f10000879000a040? 8410400????? -?? - -
?????? -????? -????? -? 659527 S??? 0? 60? 1 f10000879000a140? 8410400????? -?? - -
?????? -????? -????? -? 663625 S??? 0? 60? 1 f10000879000a240? 8410400????? -?? - -
?????? -????? -????? -? 667723 S?? 37? 78? 1 f1000089c020f150?? 400400????? -?? - -
?????? -????? -????? -? 671821 S??? 0? 60? 1 f10000879000a440? 8410400????? -?? - -
?????? -????? -????? -? 675919 S??? 0? 60? 1??????? -?? 418400????? -?? - -
?????? -????? -????? -? 680017 S??? 0? 60? 1 f10000879000a640? 8410400????? -?? - -
?????? -????? -????? -? 684115 S??? 0? 60? 1 f10000879000a740? 8410400????? -?? - -
?????? -????? -????? -? 688213 S??? 0? 60? 1 f10000879000a840? 8410400????? -?? - -
?????? -????? -????? -? 692311 S??? 0? 60? 1 f10000879000a940? 8410400????? -?? - -
?????? -????? -????? -? 696409 S??? 0? 60? 1 f10000879000aa40? 8410400????? -?? - -
?????? -????? -????? -? 712801 S??? 0? 60? 1 f10000879000ae40? 8410400????? -?? - -
?????? -????? -????? -? 716899 S??? 0? 60? 1 f10000879000af40? 8410400????? -?? - -
????……
?
thread? state-k???? wchan??? state-u??? k-tid?? mode held scope function
?.....
?$t15??? wait????? 0xf10000879000a140 blocked?? 659527???? k?? no?? sys? _event_sleep
?$t16??? wait????? 0xf10000879000a240 blocked?? 663625???? k?? no?? sys? _event_sleep
?$t17??? run????????????????? running??667723???? k?? no?? sys? JVM_Send?????????
?$t18??? wait????? 0xf10000879000a440 blocked?? 671821???? k?? no?? sys? _event_sleep
?$t19??? wait???????????????? running??675919???? k?? no?? sys? poll?????????????
?$t20??? wait????? 0xf10000879000a640 blocked?? 680017???? k?? no?? sys? _event_sleep
?.....
?
?
(dbx) th info 17
??? ?thread? state-k???? wchan??? state-u??? k-tid?? mode held scope function
??? ?$t17??? run????????????????? running?? 667723???? k?? no?? sys? JVM_Send
??? ????? general:
??? ???????? pthread addr = 0x3ea55c68???????? size???????? = 0x244
??? ???????? vp addr????? = 0x3e69e5e0???????? size???????? = 0x2a8
??? ???????? thread errno = 2
??? ???????? start pc???? = 0x300408b0
??? ???????? joinable???? = no
??? ???????? pthread_t??? = 1011
??? ????? scheduler:
??? ???????? kernel?????? =
??? ???????? user???????? = 1 (other)
??? ????? event :
??? ???????? event??????? = 0x0
??? ???????? cancel?????? = enabled, deferred, not pending
??? ????? stack storage:
??? ???????? base???????? = 0x3ea15000???????? size???????? = 0x40000
??? ???????? limit??????? = 0x3ea55c68
??? ???????? sp?????????? = 0x3ea55054
?
下面是匹配此十六进制号码并造成高 CPU 占用率问题的线程示例:
?
3XMTHREADINFO????? "Servlet.Engine.Transports:915" (TID:0x34B82C78, sys_thread_t:0x778F7670, state:MW, native ID:0x1011) prio=5
4XESTACKTRACE????????? at com.henry.util.JSONWriter.putValue(JSONWriter.java(Compiled Code))
4XESTACKTRACE????????? at com.henry.util.JSONWriter.putNextValue(JSONWriter.java(Inlined Compiled Code))
4XESTACKTRACE????????? at person._FPT_5F_I_5F_0002_5F_1Pay_jsp_0.toAccountList2JsonArray(_FPT_5F_I_5F_0002_5F_1Pay_jsp_0.java(Compiled Code))
4XESTACKTRACE????????? at person._FPT_5F_I_5F_0002_5F_1Pay_jsp_0._jspService(_FPT_5F_I_5F_0002_5F_1Pay_jsp_0.java:392)
4XESTACKTRACE????????? at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java(Compiled Code))
4XESTACKTRACE????????? at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled Code))
4XESTACKTRACE????????? at org.apache.jasper.runtime.JspServlet$JspServletWrapper.service(JspServlet.java(Compiled Code))
4XESTACKTRACE????????? at org.apache.jasper.runtime.JspServlet.serviceJspFile(JspServlet.java(Compiled Code))
4XESTACKTRACE????????? at org.apache.jasper.runtime.JspServlet.service(JspServlet.java(Compiled Code))
4XESTACKTRACE????????? at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.StrictServletInstance.doService(ServletManager.java(Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.StrictLifecycleServlet._service(StrictLifecycleServlet.java(Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.ServicingServletState.service(StrictLifecycleServlet.java(Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.StrictLifecycleServlet.service(StrictLifecycleServlet.java(Inlined Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.ServletInstance.service(ServletManager.java(Compiled Code))
4XESTACKTRACE????????? at com.ibm.servlet.engine.webapp.ValidServletReferenceState.dispatch(ServletManager.java(Compiled Code))
?
?
?
?
?
Windows
使用 pslist
您可以在 Windows 中使用 pslist 并获取 java 进程的线程详细信息。? pslist可从以下网址得到:http://www.microsoft.com/technet/sysinternals/ProcessesAndThreads/PsList.mspx
您将看到“用户时间”和“内核时间”不断增加。
?
使用 Process Explorer
您还可以使用 Systinternals 提供的 Process Explorer?http://www.microsoft.com/technet/sysinternals/ProcessesAndThreads/ProcessExplorer.mspx。?
该工具直观动态显示 CPU 占用率。?由于 Process Explorer 没有记录功能或历史记录,您必须监视该程序并记录占用几乎全部 CPU 资源的 Java 进程的线程 ID。?若要通过 Process Explorer 达到上述目的:
您可以看到在下面窗格中列出的“Thread ID”。
?
下面是仅使用 pslist 和 Thread Dump 的步骤示例:
1.???????? 运行 pslist -d 172
?
java 1720:
Tid Pri??? Cswtch??????????? State???? User Time?? Kernel Time?? Elapsed Time
1520?? 8????? 9705???? Wait:UserReq? 0:00:23.734?? 0:00:01.772??? 0:04:55.264
1968?? 9????? 2233???? Wait:UserReq? 0:00:04.606?? 0:00:00.040??? 0:04:54.874
1748? 15?????? 146???? Wait:UserReq? 0:00:00.010?? 0:00:00.010??? 0:04:54.863
1744? 11??????? 62???? Wait:UserReq? 0:00:00.010?? 0:00:00.000??? 0:04:54.853
1420? 15???????? 3???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:54.563
1856? 15???????? 7???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:54.563
1860?? 9????? 3157???? Wait:UserReq? 0:00:03.314?? 0:00:00.160??? 0:04:54.563
?412? 15????? 5888?? Wait:DelayExec? 0:00:00.000?? 0:00:00.000??? 0:04:54.553
1864?? 8???????? 3???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:50.567
1660? 15??????? 61???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:42.125
2020?? 8???????? 2???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:42.025
1532?? 8???????? 2???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:42.015
1332?? 8???????? 2???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:04:42.005
…..
?
?
java 1720:
? Tid Pri??? Cswtch??????????? State???? User Time?? Kernel Time?? Elapsed Time
1520?? 8????? 9705???? Wait:UserReq? 0:00:23.734?? 0:00:01.772??? 0:08:14.511
1968?? 8????? 6527???? Wait:UserReq? 0:00:06.309?? 0:00:00.070??? 0:08:14.120
1748? 15?????? 157???? Wait:UserReq? 0:00:00.010?? 0:00:00.010??? 0:08:14.110
……
2072?? 8??????? 13???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:08:01.231
2068?? 8??????? 20???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:08:01.221
2044?? 8??????? 15???? Wait:UserReq? 0:00:00.000?? 0:00:00.000??? 0:08:01.221
2000?? 8?????? 657???? Wait:UserReq? 0:00:00.090?? 0:00:00.010??? 0:08:01.211
?588? 10???? 59123???? Wait:UserReq? 0:00:48.830?? 0:00:02.633??? 0:08:01.211
1784?? 8?????? 150???? Wait:UserReq? 0:00:00.090?? 0:00:00.000??? 0:08:01.201
1756?? 8?????? 251???? Wait:UserReq? 0:00:00.941?? 0:00:00.000??? 0:08:01.201
1716?? 8???????? 6?????? Wait:Queue? 0:00:00.000?? 0:00:00.000??? 0:08:01.191
1800?? 8????? 1457?????? Wait:Queue? 0:00:00.761?? 0:00:00.210??? 0:08:01.191
……
?
注意,线程 ID 588 正在使用最多的用户/内核时间,因此占用最多的 CPU 资源。显然,该线程有问题。
?
3XMTHREADINFO????? "Servlet.Engine.Transports:915" (TID:0x34B82C78, sys_thread_t:0x778F7670, state:MW, native ID: 0x24c) prio=5
???????? at java.net.SocketOutputStream.socketWrite0(Native Method)
???????? at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
???????? at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
???……
???????? at java.io.PrintWriter.write(PrintWriter.java:247)
???????? at java.io.PrintWriter.print(PrintWriter.java:378)
???????? at java.io.PrintWriter.println(PrintWriter.java:515)
???????? at examples.servlets.HelloWorld2.service(HelloWorld2.java:94)
???????? at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
???……
?
?
显然,问题出现在 socketWrite本地方法上,但表面看起来是 HelloWorld2.service()出错。
?
89????? out.println(ExampleUtils.returnHtmlHeader("Hello World 2"));
???? 90????? out.println("<h4>");
???? 91????????? for (int i=0;i<100000000;i++) {
???? 92????????????????? int j = 0;
???? 93????????????????? j = j +i;
???? 94????? out.println(defaultGreeting + " " + defaultName + "!");
???? 95????????? }
???? 96
???? 97????? out.println("</h4>");
???? 98????? out.println(ExampleUtils.returnHtmlFooter());
?
可以看出,由于某种原因,输出流是用一个非常长的“for loop”语句编写的。这是错误所在,也是此示例中造成高 CPU 占用率的原因。
如果改正此代码,则 CPU 就不会被完全占用。
?
Linux
?
下面是 Linux 系统中上述进程的一个示例:
?
PID??? USER??? PRI??? NI??? SIZE??? RSS??? SHARE??? STAT ??? %CPU??? %MEM??? TIME??? COMMAND
...........
22962??? usera ?? 9??? ??? 0??? 86616??? 84M??? 26780??? ??? S??? ??? ??? 0.0??? ??? ??? 4.2??? ??? 0:00??? ??? java
...........
?
如果 PID 为 22962,则十六进制值将是:0x59B2
?
?
3XMTHREADINFO????? "Servlet.Engine.Transports:915" (TID:0x34B82C78, sys_thread_t:0x778F7670, state:MW, native ID: 0x59B2) prio=5
???????at java.lang.Object.wait(Native Method)
??????? at java.lang.Object.wait(Object.java:415)
??????? ……
?
?
在上述示例中,由于该线程此时占用 0% 的 CPU,所以只显示执行此操作的进程。理想状态下,应当迅速并且连续完成全部三个步骤,以便尽可能及时地捕捉数据。这可以通过类似下面的一个简单的 shell 脚本来完成。
#
# Takes an argument (PID of the WAS process) and loops three times.? This will append the prstat information to a file called dump_high_cpu.txt.? The thread dump information will either be in file where stdout was redirected or printed on the screen.
#
for loopnum in 1 2 3
do
?? top -b -n1>> dump_high_cpu.txt
?? kill -3? $1
?? echo "cpu snapshot and thread dump done. #" $loopnum
?? sleep 1
?? echo "Done sleeping."
done