首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 数据库 > 其他数据库 >

记一次rman以致的交换空间暴增

2012-08-31 
记一次rman导致的交换空间暴增今天在客户现场碰到一件怪事,由于是急事,也就特事特办,应急处理了。首先据同

记一次rman导致的交换空间暴增
今天在客户现场碰到一件怪事,由于是急事,也就特事特办,应急处理了。
首先据同事反应,客户一主机home目录已经满掉,让我处理一下,登陆至主机,看到home目录果然处于100%状态。
引用root@hisdb02:/home/oracle/capaa#df
Filesystem    512-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         2097152   2021744    4%     2298     2% /
/dev/hd2         6815744   3682120   46%    37198     9% /usr
/dev/hd9var      2097152    945848   55%      442     1% /var
/dev/hd3        33554432  30177464   11%     1318     1% /tmp
/dev/hd1         2097152     13864  100%      455    19% /home
/proc                  -         -    -         -     -  /proc
/dev/hd10opt     2097152   1918936    9%     2738     2% /opt
/dev/lvoracle   62914560  21145136   67%    71833     3% /oracle
/dev/fslv00   2086666240 934258592   56%      282     1% /rman
/dev/lvdbra     83886080  74595552   12%    21011     1% /dbra
/dev/lvarch    167772160 160255912    5%      121     1% /archlog/orcl2
hisdb01:/archlog/orcl1  167772160 159523040    5%      125     1% /archlog/orcl1
P520:/Tbackup 1258291200 711808232   44%      690     1% /Tbackup
一开始以为问题很简单,立即前往/home查看子文件夹空间使用率,仔细一看发现子文件夹占用才100多M,而home文件系统有1G。事情至此开始变得有些蹊跷。
引用root@hisdb02:/home#du -sk *
8       dbra
4       esaadmin
0       guest
0       lost+found
108728  oracle
4       sshd
于是马上删掉较大文件( capaa_agent.tar,8M左右),但是home文件系统马上被占用完
引用root@hisdb02:/home/oracle/capaa#ls -rtl
total 15960
drwxr-xr-x   7 oracle   dba             256 Feb 16 2010  java5_64
drwxr-x---  10 oracle   dba             256 May 12 2010  capaa_agent
drwxr-xr-x   2 oracle   dba             256 Dec 23 11:57 dict
drwxr-xr-x   2 oracle   dba             256 Dec 23 11:57 exp
-rw-r-----   1 oracle   dba         8171520 Dec 23 13:53 capaa_agent.tar
drwxr-xr-x   2 oracle   dba             256 Jan 07 14:17 script
root@hisdb02:/home/oracle/capaa#rm -rf capaa_agent.tar
root@hisdb02:/home/oracle/capaa#df
Filesystem    512-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         2097152   2021744    4%     2298     2% /
/dev/hd2         6815744   3682120   46%    37198     9% /usr
/dev/hd9var      2097152    945848   55%      442     1% /var
/dev/hd3        33554432  30177464   11%     1318     1% /tmp
/dev/hd1         2097152     13864  100%      455    19% /home
/proc                  -         -    -         -     -  /proc
/dev/hd10opt     2097152   1918936    9%     2738     2% /opt
/dev/lvoracle   62914560  21145136   67%    71833     3% /oracle
/dev/fslv00   2086666240 934258592   56%      282     1% /rman
/dev/lvdbra     83886080  74595552   12%    21011     1% /dbra
/dev/lvarch    167772160 160255912    5%      121     1% /archlog/orcl2
hisdb01:/archlog/orcl1  167772160 159523040    5%      125     1% /archlog/orcl1
P520:/Tbackup 1258291200 711808232   44%      690     1% /Tbackup
root@hisdb02:/home/oracle/capaa#df
Filesystem    512-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         2097152   2021744    4%     2298     2% /
/dev/hd2         6815744   3682120   46%    37198     9% /usr
/dev/hd9var      2097152    945848   55%      442     1% /var
/dev/hd3        33554432  30177464   11%     1318     1% /tmp
/dev/hd1         2097152       808  100%      455    48% /home
/proc                  -         -    -         -     -  /proc
/dev/hd10opt     2097152   1918936    9%     2738     2% /opt
/dev/lvoracle   62914560  21145128   67%    71833     3% /oracle
/dev/fslv00   2086666240 934258592   56%      282     1% /rman
/dev/lvdbra     83886080  74595552   12%    21011     1% /dbra
/dev/lvarch    167772160 160255912    5%      121     1% /archlog/orcl2
hisdb01:/archlog/orcl1  167772160 159523040    5%      125     1% /archlog/orcl1
P520:/Tbackup 1258291200 711808232   44%      690     1% /Tbackup
事情变得越来越蹊跷,扩展home文件系统至2G,报空间不足。但是rootvg尚有剩余空间。
引用root@hisdb02:/home/oracle/capaa/java5_64/jre#lsvg rootvg
VOLUME GROUP:       rootvg                   VG IDENTIFIER:  00ca44e400004c0000000123df6dcc7d
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      1092 (279552 megabytes)
MAX LVs:            256                      FREE PPs:       14 (3584 megabytes)
LVs:                13                       USED PPs:       1078 (275968 megabytes)
OPEN LVs:           12                       QUORUM:         1
TOTAL PVs:          2                        VG DESCRIPTORS: 3
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         2                        AUTO ON:        yes
MAX PPs per VG:     32512                                    
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 1024 kilobyte(s)         AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
这时本能的用lsps查看交换空间使用情况,一看吓我一跳,交换空间已经使用至96%,也有意味着系统随时有宕机危险!
引用root@hisdb02:/home/oracle/capaa/java5_64/jre/lib#lsps -a
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type
hd6             hdisk0            rootvg       20480MB    96   yes   yes    lv
考虑到rootvg剩余空间已不够,需要缩小其他文件系统,释放空间给rootvg。所幸的是aix 5.3支持在线缩小文件系统,采用smitty fs马上缩小空间至50G。
引用root@hisdb02:/dbra/oswatch/osw#smitty fs

Change / Show Characteristics of an Enhanced Journaled File System

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
 
                                                        [Entry Fields]
  File system name                                    /archlog/orcl2
  NEW mount point                                    [/archlog/orcl2]
  SIZE of file system
          Unit Size                                   Gigabytes                                                                                                      +
          Number of units                            [50]                                                                                                             #
  Mount GROUP                                        []
  Mount AUTOMATICALLY at system restart?              yes                                                                                                            +
  PERMISSIONS                                         read/write                                                                                                     +
  Mount OPTIONS                                      []                                                                                                              +
  Start Disk Accounting?                              no                                                                                                             +
  Block Size (bytes)                                  4096
  Inline Log?                                         no
  Inline Log size (MBytes)                           [0]                                                                                                              #
  Extended Attribute Format                          [v1]
  ENABLE Quota Management?                            no                                                                                                             +
  Allow Small Inode Extents?                          no 
然后在线添加交换空间
引用root@hisdb02:/dbra/oswatch/osw#smitty mkps

                                                                        Add Another Paging Space

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                        [Entry Fields]
  Volume group name                                   rootvg
  SIZE of paging space (in logical partitions)       [60]                                                                                                             #
  PHYSICAL VOLUME name                                                                                                                                               +
  Start using this paging space NOW?                  yes                                                                                                            +
  Use this paging space each time the system is       yes                                                                                                            +
          RESTARTED?
现在查看交换空间使用情况:
引用root@hisdb02:/dbra/oswatch/osw#lsps -a
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type
paging00        hdisk1            rootvg       15360MB     1   yes   yes    lv
hd6             hdisk0            rootvg       20480MB    96   yes   yes    lv

topas查看系统全局情况,由于增加了交换空间,其总体使用率已经降至 54.4%。
引用  PAGING           MEMORY
  Faults    18677  Real,MB   23168
  Steals        0  % Comp     95.5
  PgspIn        3  % Noncomp   3.3
  PgspOut       0  % Client    3.3
  PageIn        3
  PageOut       0  PAGING SPACE
  Sios          3  Size,MB   35840
                   % Used     54.4
  NFS (calls/sec)  % Free     46.6
同时注意到有2个rman进程在占用大量的pagespace,并消耗着大量CPU。
引用Name            PID  CPU%  PgSp Owner
rman        5222520  26.0 9179.4 oracle
rman        5251162  25.8 9185.1 oracle
root@hisdb02:/dbra/oswatch/osw#ps -ef|grep 5222520
  oracle 2703384 5222520   0 17:23:44      -  0:00 oracleorcl2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
所幸的是系统并没有带来太多的交换
引用root@hisdb02:/home/oracle/capaa/java5_64/jre/lib#vmstat 1 1000

System configuration: lcpu=16 mem=23168MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
3  0 9850114 25020   0   1   0   0    0   0 2234 221390 5433 38  5 50  8
3  0 9852576 22552   0   6   0   0    0   0 3260 219950 7870 37  7 50  6
4  0 9848480 26646   0   2   0   0    0   0 2903 211954 6986 40  5 49  6
6  0 9848475 26649   0   2   0   0    0   0 5327 309306 14053 51  7 39  3
0  0 9851030 24091   0   3   0   0    0   0 4055 234427 9910 48  6 42  5
7  0 9850986 24130   0   4   0   0    0   0 4943 242181 11004 47  6 38  8
6  0 9851331 23780   0   5   0   0    0   0 8689 225650 17413 54  8 31  7
5  0 9854364 20747   0   0   0   0    0   0 9113 210502 19479 42  7 38 12
5  0 9851668 23442   0   1   0   0    0   0 7968 222546 16911 46  7 36 12
2  0 9849453 25656   0   1   0   0    0   0 8796 199683 18580 31  7 52  9
4  0 9849537 25571   0   1   0   0    0   0 8406 202812 17416 34  7 50  9
4  0 9849601 25501   0   6   0   0    0   0 5297 195486 10961 33  7 54  7
8  0 9849166 25932   0   4   0   0    0   0 2769 209397 6577 34  5 54  6
3  0 9849234 25862   0   2   0   0    0   0 2268 195945 5606 30  5 56  9
5  0 9853975 21117   0   4   0   0    0   0 3964 287321 8923 51  6 36  6
4  0 9853970 21121   0   1   0   0    0   0 3265 248413 7233 44  6 43  7
2  0 9854754 20334   0   2   0   0    0   0 1994 208690 5000 33  5 52  9
2  0 9854517 20570   0   1   0   0    0   0 3786 200623 8628 30  5 53 12
2  0 9852136 22947   0   4   0   0    0   0 4811 248666 11358 37  6 47 10
考虑到系统宕机风险。不做过多考虑直接将rman进程杀掉
引用root@hisdb02:/dbra/app#kill -9 1331316 5222520 5251162
杀掉之后可以看到home文件系统使用率马上降低
引用root@hisdb02:/dbra/app#df 
Filesystem    512-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         2097152   2021512    4%     2300     2% /
/dev/hd2         6815744   3682120   46%    37198     9% /usr
/dev/hd9var      2097152    945720   55%      442     1% /var
/dev/hd3        33554432  30177448   11%     1319     1% /tmp
/dev/hd1         2097152   1877832   11%      454     1% /home
/proc                  -         -    -         -     -  /proc
/dev/hd10opt     2097152   1918936    9%     2738     2% /opt
/dev/lvoracle   62914560  21142832   67%    71815     3% /oracle
/dev/fslv00   2086666240 934258592   56%      282     1% /rman
/dev/lvdbra     83886080  78449208    7%    20883     1% /dbra
/dev/lvarch    104857600  96784104    8%      124     1% /archlog/orcl2
hisdb01:/archlog/orcl1  167772160 159175536    6%      129     1% /archlog/orcl1
P520:/Tbackup 1258291200 710049520   44%      723     1% /Tbackup
其交换空间下下降至正常水平
引用root@hisdb02:/dbra/app#lsps -a
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type
paging00        hdisk1            rootvg       15360MB     1   yes   yes    lv
hd6             hdisk0            rootvg       20480MB    30   yes   yes    lv

事后,我查了metalink,Oracle没有明确的说法rman会导致大量的交换空间使用,由于进程已被杀,也没有过多的证据进一步研究。在客户现场救火,有一个重要的信条:恢复应用,不影响业务永远处于第一位。

热点排行