linux 常用系统监控
1,监控CPU
top命令:
?
top - 15:12:13 up 170 days, 13 min,? 1 user,? load average: 0.00, 0.00, 0.00
Tasks: 114 total,?? 1 running, 113 sleeping,?? 0 stopped,?? 0 zombie
??????????????? 用户空间|系统空间|用户进程内改变过优先级的进程占用| 空闲||等待输入输|硬件中断||软件中断|被窃取的时间
Cpu(s):? 0.0%us,? 0.0%sy,???? 0.0%ni,???????????????????????????? ? ? ? ? ? ? ? ? ? ? ?????100.0%id,? 0.0%wa,? 0.0%hi,? 0.0%si,? 0.0%st
Mem:?? 5120000k total,? 3950552k used,? 1169448k free,?? 523188k buffers
Swap:? 2096440k total,????? 112k used,? 2096328k free,? 2477872k cached
进程ID| 用户名| 进程优先级| NICE值| 虚拟内存大小| 共享内存| 常驻内存| 进程状态| CPU占用率| 内存占用率| CPU时间使用总计| 命令?
PID??? ? USER????? PR???????????????? NI???? ????? ?VIRT? ?????????? SHR????? RES?????? ??S????????? ?%CPU????????? ?? %MEM??????????? ?TIME+?????????????????? ?COMMAND???????????????????????????????????????????????????????????????
25255???tpsc????????15???????????????? 0??????????????1576m??????? 12m???????179m???? S???????????? 0.0???????????????? 3.6?????????????? ? 11:53.72???????????????java??????????????????????????????????????????????????????????????????
23826???search????19???????????????? 0??????????????1715m???????? 9.8m??????142m??? S?????????????0.0????????????? ?? 2.9???????????????? 20:34.79???????????????java??????????????????????????????????????????????????????????????????
14696???tpsc????????17???????????????? 0??????????????1650m???????? 9280????? 42m????? S?????????? ? 0.0???????????????? 0.9???????????????? 0:29.10?????????????? ??jstatd????????????????????????????????????????????????????????????????
17161???tpsc????????18?????????????????0??????????????79276????????? 12m????? 13m???????S???????????? 0.0???????????????? 0.3???????????????? 0:00.02???????????? ???hummockclient?????????????????????????????????????????????????????????
?2732????ntp?????????15???????????????? 0??????????????19184????????? 3780?????4880????? S???????????? 0.0???????????????? 0.1???????????????? 0:00.00??????????? ??? ntpd??????????????????????????????????????????????????????????????????
17168???tpsc????????18???????????????? 0????????????? 715m?????????? 1800?????4464??????S??????????? ?0.0???????????????? 0.1???????????????? 0:00.00??????????? ??? httpd?????????????????????????????????????????????????????????????????
17158???root????????18???????????????? 0????????????? 43480????????? 2176?????3732??????S??????????? ?0.0???????????????? 0.1???????????????? 0:00.03??????????? ????httpd?????????????????????????????????????????????????????????????????
22244???root????????15?????????????????0??????????????34248????????? 1796?????2604??????S?????????????0.0???????????????? 0.1???????????????? 0:00.72??????????? ??? cfservd???????????????????????????????????????????????????????????????
28617?? root????????18?????????????????0??????????????95924??????????1580?????2124????? S?????????????0.0???????????????? 0.0???????????????? 0:01.35??????????? ??? DragoonAgent??????????????????????????????????????????????????????????
30402???root????????16?????????????????0??????????????51172??????????1664?????2060??????S?????????????0.0???????????????? 0.0???????????????? 0:00.00??????????? ??? sshd??????????????????????????????????????????????????????????????????
?
top命令交互式参数
排序命令
M?--- 根据%MEM 排序,可以方便的找出内存占用最多的程序
P?--- 根据%CPU 排序,可以找出当前谁占用CPU最多
T --- 根据TIME+ 排序,可以找出历史以来谁占用CPU最多
?
其他命令
c --- 显示COMMAND 全路径和参数
k --- 直接杀死进程,这个命令使得杀进程老方便了
f --- 指出要显示那几列
o --- 调整显示的列的顺序
1 --- 显示多个CPU各自的状态(默认是显示summary的CPU状态)
?
?
指标:
CPU的利用率:us+sy之和 ,利用率达到100%(如果有多个CPU可以超过100%)则利用率最好。
CPU的load: 如果是1核的CPU,LOAD=1达到最大利用率,超过就是超负荷。如果是8核的服务器,load=8就是最大利用率。所以要看机器是几个核的。
?
?
2,监控进程所占内存
free
yajun@yajun-VirtualBox:~$ free -m?
-m? 以MB为单位显示
-g?? 以GB为单位显示
-k?? 以KB为单位显示
?
指标:看内存占用大小咯。若使用swap了,那么系统肯定慢了。
关系:total = used + free
第2行:
-/+ buffers/cache的意思相当于:
-buffers/cache 的内存数:1397032 (等于第1行的 used – buffers – cached)
+buffers/cache 的内存数: 2752124 (等于第1行的 free + buffers + cached)
可见-buffers/cache反映的是被程序实实在在吃掉的内存,而+buffers/cache反映的是可以挪用的内存总数。
第三行单独针对交换分区, 就不用再说了.
?
3,监控硬盘容量
?
---------------------------- df 查看挂载点使用情况 ---------------------------------
查看所有挂在点的使用情况:
df -h
查看/home目录所在的挂在点使用情况(可以看出这个目录下还能装多少东西):
df -h /home
文件系统??????????? 容量? 已用? 可用 已用%% 挂载点
/dev/sda1????????????? 29G? 2.2G?? 25G?? 9% /
-------------------------- du 查看文件或文件夹使用情况 ---------------------------
查看home目录占用了多少大小的硬盘容量:
du -sh /home
50M??? /home
?
4,监控文件句柄
lsof
?
?
命令????????????????? 进程ID|用户名|文件描述|类型|设备|大小|结点|名称? 类型
COMMAND??? PID? USER?? FD????? TYPE???? DEVICE??? SIZE/OFF?????? NODE NAME
init???????? ? ? ?? ? 1????? ? root ? ? ? cwd?????? DIR??????? 8,1??????? 4096????????? 2 /
init????? ? ? ?? ???? 1?????? root ? ? ?? rtd????? ? DIR??????? 8,1??????? 4096????????? 2 /
init?????? ? ? ?? ??? 1?????? root? ? ?? txt???? ? ?? REG??????? 8,1????? 108204???? 792004 /sbin/init
init?????? ? ? ?? ??? 1?????? root? ? ?? mem?????? REG??????? 8,1?????? 34408??? 4203395 /lib/tls/i686/cmov/libnss_nis-2.11.1.so
init?????? ? ? ?? ??? 1?????? root?? ? ? mem?????? REG??????? 8,1????? 117086??? 4203398 /lib/tls/i686/cmov/libpthread-2.11.1.so
init????? ? ? ?? ???? 1?????? root? ? ? ? mem?????? REG??????? 8,1???? 1405508??? 4203379 /lib/tls/i686/cmov/libc-2.11.1.so
init????? ? ? ?? ???? 1?????? root? ? ? ? mem?????? REG??????? 8,1?????? 30496??? 4203391 /lib/tls/i686/cmov/libnss_compat-2.11.1.so
init????? ? ? ?? ???? 1?????? root?? ? ?? 0u??????? CHR??????? 1,3???????? 0t0?????? 1388 /dev/null
init????? ? ? ?? ???? 1?????? root??? ? ? 1u??? ? ? CHR??????? 1,3???????? 0t0?????? 1388 /dev/null
init????? ? ? ?? ???? 1?????? root???? ? ??2u?? ? ? CHR??????? 1,3???????? 0t0?????? 1388 /dev/null
init??????? ? ? ?? ?? 1?????? root??? ? ?? 3r??? ?? FIFO??????? 0,8???????? 0t0?????? 2861 pipe
init??????? ? ? ?? ?? 1?????? root???? ?? 4w?? ?? FIFO??????? 0,8???????? 0t0?????? 2861 pipe
init?????? ? ? ?? ??? 1?????? root??? ? ?? 5r??? ?? DIR?????? 0,11?????????? 0????????? 1 inotify
init?????? ? ? ?? ??? 1?????? root??? ??? 6r??? ?? DIR?????? 0,11?????????? 0????????? 1 inotify
init?????? ? ? ?? ??? 1?????? root??? ? ?? 7u???? unix 0xf694e400???????? 0t0?????? 2862 socket
init?????? ? ? ?? ??? 1?????? root???? ? ?? 8u???? unix 0xf694cc00???????? 0t0?????? 4262 socket
init?????? ? ? ?? ??? 1?????? root???? ? ? 9u???? unix 0xf6928a00???????? 0t0?????? 2970 socket
init????? ? ? ?? ???? 1?????? root?? ? ? ??10u???? unix 0xe84fcc00???????? 0t0?????? 4871 socket
kthreadd???? 2?????? root? ? ? ? ? ? cwd?????? DIR??????? 8,1??????? 4096????????? 2 /
?
初始打开每个应用程序时,都具有三个文件描述符,从 0 到 2,分别表示标准输入、输出和错误流。正因为如此,大多数应用程序所打开的文件的 FD 都是从 3 开始。
?
FD内容:
?????????????????????? cwd? current working directory;
?????????????????????? Lnn? library references (AIX);
?????????????????????? err? FD information error (see NAME column);
?????????????????????? jld? jail directory (FreeBSD);
?????????????????????? ltx? shared library text (code and data);
?????????????????????? Mxx? hex memory-mapped type number xx.
?????????????????????? m86? DOS Merge mapped file;
?????????????????????? mem? memory-mapped file;
?????????????????????? mmap memory-mapped device;
?????????????????????? pd?? parent directory;
?????????????????????? rtd? root directory;
?????????????????????? tr?? kernel trace file (OpenBSD);
?????????????????????? txt? program text (code and data);
?????????????????????? v86? VP/ix mapped file;
?
常见用法:
lsof -p 进程号????????????????????? 看某个进程号打开了哪些文件
lsof /home/yajun/bin/.swp 看这个文件被哪些进程打开
?
误删除某个文件以后,希望找回:
lsof -p 还未关闭的vim进程号
?
vim???? 11989 yajun??? 0u?? CHR? 136,3????? 0t0?????? 6 /dev/pts/3
vim???? 11989 yajun??? 1u?? CHR? 136,3????? 0t0?????? 6 /dev/pts/3
vim???? 11989 yajun??? 2u?? CHR? 136,3????? 0t0?????? 6 /dev/pts/3
vim???? 11989 yajun??? 3u?? REG??? 8,1??? 12288 6946875 /home/yajun/bin/.hello.txt.swp (deleted)
?
cat /proc/11989/fd/3
就可以显示出刚刚编辑过的内容了:
3210#"! Utp?ad?hfsdljfklsdjafklajsdklfjasdljflasdjklf;'w
?
查看占用22端口的进程:
lsof -i :22
?
查看与哪台及其有连接:
root@yjhexy:/home/yajun/work/ezra# lsof -i @10.20.156.47
COMMAND?? PID? USER?? FD?? TYPE? DEVICE SIZE/OFF NODE NAME
chrome?? 6123 yajun?? 83u? IPv4 3219817????? 0t0? TCP yjhexy:37125->ccbu-156-47:50030 (ESTABLISHED)
chrome?? 6123 yajun? 126u? IPv4 3219818????? 0t0? TCP yjhexy:37126->ccbu-156-47:50030 (ESTABLISHED)
java??? 10862? root?? 61u? IPv6 2853076????? 0t0? TCP yjhexy:50412->ccbu-156-47:9000 (ESTABLISHED)
java??? 11002? root?? 52u? IPv6 2855916????? 0t0? TCP yjhexy:41308->ccbu-156-47:9021 (ESTABLISHED)
?
查看系统的最大打开文件数:
more /proc/sys/fs/file-max
?
查看每个进程最多能打开多少文件数
ulimit -a
?
?
5,监控TPC连接
netstat
常见用法:
查看TCP连接的LISTEN的端口:netstat -tl
激活Internet连接 (仅服务器)
Proto Recv-Q Send-Q Local Address?????????? Foreign Address???????? State?????
tcp??????? 0????? 0 localhost.localdoma:ipp *:*???????????????????? LISTEN????
tcp6?????? 0????? 0 yajun-VirtualBox:ipp??? [::]:*????????????? ? ? ? LISTEN
?
其中如果是TCP连接的话,会有TCP协议中相关的State
?????? ESTABLISHED
????????????? The socket has an established connection.
?????? SYN_SENT
????????????? The socket is actively attempting to establish a connection.
?????? SYN_RECV
????????????? A connection request has been received from the network.
?????? FIN_WAIT1
????????????? The socket is closed, and the connection is shutting down.
?????? FIN_WAIT2
????????????? Connection is closed, and the socket is waiting for a shutdown from the remote end.
?????? TIME_WAIT
????????????? The socket is waiting after close to handle packets still in the network.
?????? CLOSE? The socket is not being used.
?????? CLOSE_WAIT
????????????? The remote end has shut down, waiting for the socket to close.
?????? LAST_ACK
????????????? The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
?????? LISTEN The? socket? is listening for incoming connections.? Such sockets are not included in the output unless you specify the --listening (-l) or
????????????? --all (-a) option.
?????? CLOSING
????????????? Both sockets are shut down but we still don't have all our data sent.
?????? UNKNOWN
????????????? The state of the socket is unknown.
?
下面图片引用自http://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed.svg
?

?
如果具有root权限的话,开可以产看占用该端口的进程使用-p 参数
比如:
sudo netstat -ap | grep? ssh
tcp??????? 0????? 0 *:ssh?????????????????? *:*???????????????????? LISTEN????? 737/sshd???????
tcp??????? 0???? 48 yjhexy:ssh????????????? 10.22.1.117:52234?????? ESTABLISHED 14381/sshd: yajun [
tcp6?????? 0????? 0 [::]:ssh??????????????? [::]:*????????????????? LISTEN????? 737/sshd
可以得知 14381 这个进程正是yajun用户通过ssh 方式登入本机,如果希望把他T出去,就可以kill -9 14381
?
?
6,监控IO状态
?
查看IO吞吐量
yajun@yjhexy:~$ iostat -d -k 1 10?
Linux 2.6.32-25-generic (yjhexy) ??? 2010年11月08日 ??? _i686_??? (2 CPU)
Device:??????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
sda????????????? 11.79?????? 135.52?????? 183.75??? 6605035??? 8955948
Device:??????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
sda?????????????? 0.00???????? 0.00???????? 0.00????????? 0????????? 0
Device:??????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
sda?????????????? 0.00???????? 0.00???????? 0.00????????? 0????????? 0
Device:??????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
sda?????????????? 0.00???????? 0.00???????? 0.00????????? 0????????? 0
Device:??????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
sda?????????????? 2.00???????? 0.00??????? 12.00????????? 0???????? 12
()
查看IO设备使用率(%util),响应时间(await)
yajun@yjhexy:~$ iostat -d -x -k 1 10
Linux 2.6.32-25-generic (yjhexy) ??? 2010年11月08日 ??? _i686_??? (2 CPU)
Device:???????? rrqm/s?? wrqm/s???? r/s???? w/s??? rkB/s??? wkB/s avgrq-sz avgqu-sz?? await? svctm? %util
sda?????????????? 2.33??? 41.71??? 7.56??? 4.20?? 135.09?? 183.18??? 54.14???? 0.52?? 44.15?? 3.35?? 3.94
Device:???????? rrqm/s?? wrqm/s???? r/s???? w/s??? rkB/s??? wkB/s avgrq-sz avgqu-sz?? await? svctm? %util
sda?????????????? 0.00???? 0.00??? 0.00??? 0.00???? 0.00???? 0.00???? 0.00???? 0.00??? 0.00?? 0.00?? 0.00
?
查看cpu状态
yajun@yjhexy:~$ iostat -c 1 10
Linux 2.6.32-25-generic (yjhexy) ??? 2010年11月08日 ??? _i686_??? (2 CPU)
avg-cpu:? %user?? %nice %system %iowait? %steal?? %idle
?????????? 4.46??? 2.06??? 5.31??? 1.51??? 0.00?? 86.65
?
?
7,综合监控
yajun@yjhexy:~$ vmstat 2 5(每隔2秒钟收集一次,共收集5次)?
查看 从启动依赖有多少进程被fork了:
vmstat -f
?
设定显示大小块时使用的单位
mvstat -Sm (以MB为单位显示)
?
显示IO摘要信息
vmstat -D
?
性能指标:
CPU的饱和度:procs中的r这一列的数目 / CPU的个数即CPU的饱和度,任何非0值都会导致机器性能的逐渐下降。
CPU的利用率:us+sy之和 ,利用率达到100%(如果有多个CPU可以超过100%)则利用率最好。
?
?
?