线上too many open files问题排查记录
? ? ? ? 周三上午刚到工位,看到群里大家在讨论问题,就好奇一起看了一下,经过两个多小时的折腾,问题定位并解决,中间走了一些弯路,收获就是知道了“Too many open files”了,本来想整理一下自己的疑问写篇文章的,昨晚发现毕玄@bluedavy把问题整理了一下(http://hellojava.info/?p=79),学习了,但是我觉得我还是有必要站在自己的角度整理一下。
????????首先介绍一下我遇到的问题。
????????线上大量报远程调用超时,然后线上一个应用的集群有一台机器调用失败很多,有大量的异常,异常信息包含“too many open files”,最后定为是重启java进程的时候,文件最大打开数被设置小了。?java进程重启后解决。
????????1、如何产看java进程能够打开的最大文件数?
????? ? 命令如下“cat /proc/[pid]/limits”,max open files就是能够打开的最大文件数
123456789101112131415161718cat /proc/`pgrep java -u admin`/limits
------------------------------
Limit???????????????????? Soft Limit?????????? Hard Limit?????????? Units????
Max cpu time????????????? unlimited??????????? unlimited??????????? seconds??
Max file size???????????? unlimited??????????? unlimited??????????? bytes????
Max data size???????????? unlimited??????????? unlimited??????????? bytes????
Max stack size????????????
10485760
?????????????unlimited??????????? bytes????
Max core file size??????? unlimited??????????? unlimited??????????? bytes????
Max resident set????????? unlimited??????????? unlimited??????????? bytes????
Max processes?????????????
393208
???????????????393208
???????????????processes
Max open files????????????
131072
???????????????131072
???????????????files????
Max locked memory?????????
32768
????????????????32768
????????????????bytes????
Max address space???????? unlimited??????????? unlimited??????????? bytes????
Max file locks??????????? unlimited??????????? unlimited??????????? locks????
Max pending signals???????
393208
???????????????393208
???????????????signals??
Max msgqueue size?????????
819200
???????????????819200
???????????????bytes????
Max nice priority?????????
0
????????????????????0
???????????????????Max realtime priority?????
0
????????????????????0
????????2.如何产看java进程现在打开的文件数量呢?
????? ? 命令如下:lsof -p [pid] | wc -l
????????例子:sudo?-u?admin??lsof?-p?`pgrep?java?-u?admin`?>/tmp/java_pid_lsof.log
????????其中web应有中jar包的加载,TCP链接都占用文件数。
????????3、java进程的这些限制是如何赋值的呢?
????????一种是在java进程的启动脚本中控制的,例如ulimit -n number,
如果没有设置,则进程父进程的限制,如果父进程也没有,则继承当前用户的设置。
????????4、如何查看特定用户的限制?
????????通过ulimit -a 来查看当前用户的限制描述。
123456789101112131415161718ulimit -a
-----------------------------------------------
core file size????????? (blocks, -c)?
0
data seg size?????????? (kbytes, -d) unlimited
scheduling priority???????????? (-e)?
0
file size?????????????? (blocks, -f) unlimited
pending signals???????????????? (-i)?
393208
max locked memory?????? (kbytes, -l)?
32
max memory size???????? (kbytes, -m) unlimited
open files????????????????????? (-n)?
131072
pipe size??????????? (
512
?bytes, -p)?
8
POSIX message queues???? (bytes, -q)?
819200
real-time priority????????????? (-r)?
0
stack size????????????? (kbytes, -s)?
10240
cpu time?????????????? (seconds, -t) unlimited
max user processes????????????? (-u)?
393208
virtual memory????????? (kbytes, -v) unlimited
file locks????????????????????? (-x) unlimited
????????如果是admin用户,需要sudo的场景:sudo?-u?admin?sh?-c?"ulimit?-a"
????????5、系统级的限制怎么看?
????????两途径查看,系统级的限制对所有用户生效。
????????第一种:cat?/proc/sys/fs/file-max?
????????第二种:sysctl?-a?|?grep?file-max
????????6、一个文件被那些进程打开,如何查看?
????????lsof path/filename
12345sudo -u admin?? lsof zookeeper-
3.3
.
2
.jar
---------------------------------------------
COMMAND? PID? USER?? FD?? TYPE DEVICE??? SIZE??? NODE NAME
java????
7004
?admin? mem??? REG??
202
,
9
?1015297
?2717181
?zookeeper-
3.3
.
2
.jar
java????
7004
?admin? 366r?? REG??
202
,
9
?1015297
?2717181
?zookeeper-
3.3
.
2
.jar
?
????????参考文章:
????????1、http://langyu.iteye.com/blog/763247
????????2、http://hellojava.info/?p=79
??????????然后再贴一个放翁的微博,细节是问出来的。
??????????