利用Oracle CRS搭建应用的高可用集群
【IT168技术文档】
前言:CRS的简介和由来
从Oracle 10gR1 RAC 开始,Oracle推出了自身的集群软件,这个软件的名称叫做Oracle Cluster Ready Service(Oracle集群就绪服务),简称CRS。从Oracle 10gR2开始,包括最新的11g,Oracle将其更名为Clusterware(集群件),但通常意义上我们认为CRS = Clusterware = Oracle Cluster Ready Service = Oracle Cluster Software.
CRS一般用来搭建Oracle的并行数据库,即RAC,但除了与RAC的接口之外,CRS还提供了一组高可用性的应用程序接口(API),用来搭建一般应用程序的高可用集群,即一般我们常说的双机热备,比如使用CRS实现MySQL的双机热备。
这种主备模式的双机热备还可以包括许多第三方的应用程序,比如虚拟IP、磁盘组、文件系统、MySQL数据库、Apache,或者单节点的Oracle实例,或者单节点的ASM,等等,都可以作为资源注册到CRS中去,由CRS来启动,关闭,监测应用程序的状态,还可以设置应用程序相互的依赖关系,保证多组资源正确的启动顺序。
??? 本文就以保护单节点oracle实例为例,演示如何使用CRS来实现上述功能。使用的主要的软件有:Solaris 10u4, Oracle CRS 10.2.0.2 , Oracle RDMBS 10.2.0.3, VxVM 5.0 ,磁盘阵列型号是AMS1000。
??? 系统拓扑图大致如下:
主要操作步骤如下:
一、准备工作:软件安装,数据库创建
1.安装Solaris 10, Veritas Volume Manager ,安装过程略
2.创建用户组dba/oinstall,oracle用户,并修改相应profile和/etc/hosts文件
修改rhosts文件,配置oracle用户的对等连接;
连接心跳网线;
如果生产环境中推荐心跳网络使用千兆,推荐每台机器有两块网卡分别连接两个网络交换机;
在操作系统中启动心跳网卡;
准备共享磁盘,这里使用的是SAN环境下的共享盘ams_wms0_0098,大小为20G;
3.下面使用静默方式来安装Oracle集群软件、数据库软件
这种安装创建方式的优点是创建速度比较快,并且不需要运行图形界面,适合远程安装和建库;
缺点是不直观,需要手工编写响应文件;
这一步CRS的安装只需要在一个节点上做:
oracle@rac01$. ./clusterware/runInstaller -silent -responsefile /tmp/shahand/crs.rsp
crs.rsp 文件内容参考“五-7”部分
CRS的runInstaller运行完毕以后,要手工在两个节点上运行root.sh,
并要手工运行$CRS_HOME/cfgtoollogs/configToolAllCommands
检查CRS安装配置正确:
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->root@rac01 #crs_stat -t
Name Type Target State Host
------------------------
ora....c01.gsd application ONLINE ONLINE rac01
ora....c01.ons application ONLINE ONLINE rac01
ora....c01.vip application ONLINE ONLINE rac01
ora....c02.gsd application ONLINE ONLINE rac02
ora....c02.ons application ONLINE ONLINE rac02
ora....c02.vip application ONLINE ONLINE rac02
检查设置了正确的心跳网络:
root@rac01 # $CRS_HOME/bin/oifcfg getif
e1000g0 10.198.88.0 global public
e1000g1 192.168.2.0 global cluster_interconnect
使用静默方式安装Oracle 数据库软件,这一步两个节点都要做:
./runInstaller -silent -responsefile /tmp/shahand/db.rsp
db.rsp 文件内容参考“五-8”部分
4.创建oracle数据库文件所需要的盘组、逻辑卷、文件系统、挂载文件系统并设置权限;
只需要在一个节点上做;??
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->root@rac01 # vxdisksetup -i ams_wms0_0098
root@rac01 # vxdg init oradata12 ams_wms0_0098
root@rac01 # vxassist -g oradata12 make oradata 18G
root@rac01 # vxedit -g oradata12 set user=oracle group=dba mode=644 oradata
root@rac01 # timex mkfs -F vxfs /dev/vx/rdsk/oradata12/oradata
version 7 layout
37748736 sectors, 18874368 blocks of size 1024, log size 65536 blocks
largefiles supported
real 10.30
user 0.07
sys 0.04
root@rac01 # mount -F vxfs -o largefiles /dev/vx/dsk/oradata12/oradata /oradata
root@rac01 # chown oracle:dba /oradata
root@rac01 # df -h /oradata/
Filesystem size used avail capacity Mounted on
/dev/vx/dsk/oradata12/oradata
18G 70M 17G 1% /oradata
5.静默方式创建oracle数据库,只需要一个节点上做:???
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->oracle@rac01 $ dbca -silent -createDatabase -sid orcl -sysPassword sys -systemPassword sys \
-datafileDestination /oradata -gdbName orcl -templateName General_Purpose.dbc
Copying database files
1% complete
3% complete
11% complete
18% complete
26% complete
37% complete
Creating and starting Oracle instance
40% complete
45% complete
50% complete
55% complete
56% complete
60% complete
62% complete
Completing Database Creation
66% complete
70% complete
73% complete
85% complete
96% complete
100% complete
手工检查数据库orcl的状态,可以登陆数据库select status from v$instance查看。
二、Oracle 集群软件资源的手工注册
1. 注销crs本身自带的ons、gsd、vip资源
root@rac01 # crs_stop -all
Attempting to stop `ora.rac01.gsd` on member `rac01`
Attempting to stop `ora.rac01.ons` on member `rac01`
Attempting to stop `ora.rac02.gsd` on member `rac02`
Attempting to stop `ora.rac02.ons` on member `rac02`
Stop of `ora.rac02.gsd` on member `rac02` succeeded.
Stop of `ora.rac02.ons` on member `rac02` succeeded.
Stop of `ora.rac01.gsd` on member `rac01` succeeded.
Stop of `ora.rac01.ons` on member `rac01` succeeded.
Attempting to stop `ora.rac01.vip` on member `rac01`
Attempting to stop `ora.rac02.vip` on member `rac02`
Stop of `ora.rac02.vip` on member `rac02` succeeded.
Stop of `ora.rac01.vip` on member `rac01` succeeded.
root@rac01 # crs_unregister ora.rac01.gsd
root@rac01 # crs_unregister ora.rac01.ons
root@rac01 # crs_unregister ora.rac01.vip
root@rac01 # crs_unregister ora.rac02.vip
root@rac01 # crs_unregister ora.rac02.ons
root@rac01 # crs_unregister ora.rac02.gsd
root@rac01 # crs_stat -t
CRS-0202: No resources are registered.
2.创建虚拟IP资源:
root@rac01 # crs_profile -create havip -t application -a /oracle/crs/bin/usrvip \
-o oi=e1000g0,ov=10.198.94.139,on=255.255.248.0
root@rac01 # crs_register havip
root@rac01 # crs_setperm havip -o root
root@rac01 # crs_setperm havip -u user:oracle:r-x
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------
ha_vip application 0/1 0/0 OFFLINE OFFLINE
root@rac01 # crs_start havip
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------
havip application 0/1 0/0 ONLINE ONLINE rac01
3.准备控制其他资源启动、关闭、检查的脚本文件dg.sh/fs.sh/db.sh/lsnr.sh
这四个脚本文件内容参考“五-3/4/5/6”部分
对crs_profile命令中的选项和参数做简单说明:
(1) 选项-r定义了该资源所依赖的资源,在下面的例子中,资源oradata_mount启动时依赖于
disk_group先 启动,需要停止disk_group的时候必须先停止资源oradata_mount,
资源orcl_db的启动则同时依赖于oradata_mount/disk_group/havip/listener;
(2) 参数-o 包括:ci的意思是crs对资源状态的监测间隔(check interval),单位为秒;
ra : crs重启资源的尝试次数,RESTART_ATTEMPTS,次数到达以后将重新分配;
fi : 资源状态出现错误以后,crs的尝试间隔,FAILURE_INTERVAL,单位是秒;
ft : 资源状态出现错误以后,crs的尝试次数,FAILURE_THRESHOLD;
这些参数可以使用默认值,分别是60秒/1/0秒/0。
(3) 参数-a 是指ACTION_SCRIPT,参数值为资源启动、关闭、监测的脚本,脚本固定的三个参数为
start/stop/check;
管理数据库监听的部分:
修改$ORACLE_HOME/network/admin/listener.ora文件,
将其中(HOST = rac01 )部分修改成(HOST = 10.198.94.139 ) (虚拟IP地址)
crs_profile -create listener -t application -a /oracle/crs/crs/public/lsnr.sh -r havip -o \
ci=180,ra=6,ft=2,fi=12
crs_register listener
crs_setperm listener -o root
crs_setperm listener -u user:oracle:r-x
crs_start listener
管理磁盘组和逻辑卷的部分:
crs_profile -create disk_group -t application -a /oracle/crs/crs/public/dg.sh -r havip -o \
ci=180,ra=6,ft=2,fi=12
crs_register disk_group
crs_setperm disk_group -o root
crs_setperm disk_group -u user:oracle:r-x
注:本身磁盘组的启动并不依赖于虚拟IP的启动,这里之所以设置两者的依赖关系,
是为了防止虚拟IP在一个节点启动,而磁盘组在另外一个节点启动,造成资源不一致的情况出现。
管理文件系统的部分:
crs_profile -create oradata_mount -t application -a /oracle/crs/crs/public/fs.sh -r disk_group -o \
ci=180,ra=6,ft=2,fi=12
crs_register oradata_mount
crs_setperm oradata_mount -o root
crs_setperm oradata_mount -u user:oracle:r-x
管理数据库实例的部分:
crs_profile -create orcl_db -t application -a /oracle/crs/crs/public/db.sh -r \
"oradata_mount listener" -o ci=180,ra=6,ft=2,fi=12
crs_register orcl_db
crs_setperm orcl_db -o root
crs_setperm orcl_db -u user:oracle:r-x
crs_start orcl_db
4.确保脚本具有执行属性,并把public 和profile的内容拷到第二个节点上。
# chmod +x /oracle/crs/crs/public/*
# rcp -r -p /oracle/crs/crs/public/* rac02:/oracle/crs/crs/public/
5.启动所有的资源
下面可以看到,在crs启动和关闭资源的过程中,其顺序是按照前面定义的资源依赖关系进行的:
root@rac01 # crs_stop -all
Attempting to stop `orcl_db` on member `rac01`
Stop of `orcl_db` on member `rac01` succeeded.
Attempting to stop `oradata_mount` on member `rac01`
Stop of `oradata_mount` on member `rac01` succeeded.
Attempting to stop `disk_group` on member `rac01`
Stop of `disk_group` on member `rac01` succeeded.
Attempting to stop `listener` on member `rac01`
Stop of `listener` on member `rac01` succeeded.
Attempting to stop `havip` on member `rac01`
Stop of `havip` on member `rac01` succeeded.
root@rac01 # crs_start -all
Attempting to start `havip` on member `rac01`
Start of `havip` on member `rac01` succeeded.
Attempting to start `listener` on member `rac01`
Start of `listener` on member `rac01` succeeded.
Attempting to start `disk_group` on member `rac01`
Start of `disk_group` on member `rac01` succeeded.
Attempting to start `oradata_mount` on member `rac01`
Start of `oradata_mount` on member `rac01` succeeded.
Attempting to start `orcl_db` on member `rac01`
Start of `orcl_db` on member `rac01` succeeded.
检查资源状态是否正常:
oracle@rac01 $ crs_stat -t
Name Type Target State Host
------------------------
disk_group application ONLINE ONLINE rac01
havip application ONLINE ONLINE rac01
listener application ONLINE ONLINE rac01
oradata_mount application ONLINE ONLINE rac01
orcl_db application ONLINE ONLINE rac01
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->这时候发现在输入网卡时候不小心输入了错误的网卡名称,公共网卡名称应该是e1000g0,2008-01-10 17:30:22.526: [ CRSRES][1580] startRunnable: setting CLI values
2008-01-10 17:30:22.527: [ CRSRES][1580] Attempting to start `dg` on member `rac01`
2008-01-10 17:30:22.589: [ CRSRES][1581] startRunnable: setting CLI values
2008-01-10 17:30:22.629: [ CRSRES][1581] Attempting to start `havip` on member `rac01`
2008-01-10 17:30:22.688: [ CRSAPP][1581] StartResource error for havip error code = 1
2008-01-10 17:30:22.749: [ CRSAPP][1581] StopResource error for havip error code = 1
2008-01-10 17:30:22.757: [ CRSRES][1581] X_OP_StopResourceFailed : Stop Resource failed
(File: rti.cpp, line: 1796
2008-01-10 17:30:22.758: [ CRSRES][1581][ALERT] `havip` on member `rac01` has experienced an ...
2008-01-10 17:30:22.758: [ CRSRES][1581] Human intervention required to resume its availability.
2008-01-10 17:30:23.211: [ CRSRES][1580] Start of `dg` on member `rac01` succeeded.
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->#!/bin/sh
# *****************************************************************
# shahand 2008-1-3
SCRIPT=$0
ACTION=$1 # Action (start, stop or check)
DG_NAME=oradata12
VOL_NAME=oradata
case $1 in
'start')
/usr/sbin/vxdg -tfC import $DG_NAME;
/usr/sbin/vxvol -g $DG_NAME startall
echo "Resource STARTED"
;;
'stop')
/usr/sbin/vxvol -g $DG_NAME stopall;
/usr/sbin/vxdg deport $DG_NAME
echo "Resource STOPPED"
;;
'check')
if [[ `/usr/sbin/vxprint -v -g $DG_NAME |grep $VOL_NAME|wc -c` -eq 0 ]]
then
exit 1
else
exit 0
fi
echo "Resource CHECKED"
;;
*)
echo "usage: $0 {start stop check}"
;;
esac
exit 0
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->7.Oracle集群软件的安装响应文件crs.rsp ,其中在####之前需要修改???#!/bin/sh
# *****************************************************************
# ## shahand 2008-1-3
# *****************************************************************
# /oracle/crs/crs/public/lsnr.sh
SCRIPT=$0
ACTION=$1 # Action (start, stop or check)
case $1 in
'start')
su - oracle -c "lsnrctl start"
exit $?
;;
'stop')
su - oracle -c "lsnrctl stop"
exit $?
;;
'check')
if [ `ps -ef|grep tnslsnr |wc -c` -eq 0 ]
then
echo "bad";
exit 1
else
echo "good";
exit 0
fi
;;
*)
echo "usage: $0 {start stop check}"
;;
esac
exit 0
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->ORACLE_HOME="/oracle/crs"
sl_tableList={"rac01:rac01-priv:rac01-vip:N:Y","rac02:rac02-priv:rac02-vip:N:Y"}
ret_PrivIntrList={"e1000g0:10.198.88.0:1","e1000g1:192.168.0.0:2"}
n_storageTypeOCR=2
s_ocrpartitionlocation="/oracle/ocrfile1"
s_ocrMirrorLocation=""
n_storageTypeVDSK=2
s_votingdisklocation="/oracle/vdfile1"
s_OcrVdskMirror1RetVal=""
s_VdskMirror2RetVal=""
ORACLE_HOME_NAME="OraCRS10ghome1"
################# complete modify #####################
RESPONSEFILE_VERSION=2.2.1.0.0
UNIX_GROUP_NAME="oinstall"
FROM_LOCATION="../stage/products.xml"
NEXT_SESSION_RESPONSE=<Value Unspecified>
TOPLEVEL_COMPONENT={"oracle.crs","10.2.0.1.0"}
DEINSTALL_LIST={"oracle.crs","10.2.0.1.0"}
SHOW_SPLASH_SCREEN=false
SHOW_WELCOME_PAGE=false
SHOW_NODE_SELECTION_PAGE=false
SHOW_SUMMARY_PAGE=false
SHOW_INSTALL_PROGRESS_PAGE=false
SHOW_CONFIG_TOOL_PAGE=false
SHOW_XML_PREREQ_PAGE=false
SHOW_ROOTSH_CONFIRMATION=true
SHOW_END_SESSION_PAGE=false
SHOW_EXIT_CONFIRMATION=false
NEXT_SESSION=false
NEXT_SESSION_ON_FAIL=false
SHOW_DEINSTALL_CONFIRMATION=false
SHOW_DEINSTALL_PROGRESS=false
RESTART_SYSTEM=false
RESTART_REMOTE_SYSTEM=false
REMOVE_HOMES=
ORACLE_HOSTNAME=<Value Unspecified>
SHOW_END_OF_INSTALL_MSGS=false
COMPONENT_LANGUAGES={"en"}
s_clustername="crs"
CLUSTER_CONFIGURATION_FILE=""
8. Oracle数据库软件的安装响应文件db.rsp,其中在####之前需要修改
CLUSTER_NODES={}
ORACLE_HOME_NAME="OraDB10ghome1"
ORACLE_HOME="/oracle/10g"
##########################################################
FROM_LOCATION="../stage/products.xml"
RESPONSEFILE_VERSION=2.2.1.0.0
UNIX_GROUP_NAME="oinstall"
FROM_LOCATION_CD_LABEL=<Value Unspecified>
SHOW_WELCOME_PAGE=true
SHOW_CUSTOM_TREE_PAGE=true
SHOW_COMPONENT_LOCATIONS_PAGE=true
SHOW_SUMMARY_PAGE=true
SHOW_INSTALL_PROGRESS_PAGE=true
SHOW_REQUIRED_CONFIG_TOOL_PAGE=true
SHOW_CONFIG_TOOL_PAGE=true
SHOW_RELEASE_NOTES=true
SHOW_ROOTSH_CONFIRMATION=true
SHOW_END_SESSION_PAGE=true
SHOW_EXIT_CONFIRMATION=true
NEXT_SESSION=false
NEXT_SESSION_ON_FAIL=true
NEXT_SESSION_RESPONSE=<Value Unspecified>
DEINSTALL_LIST={"oracle.server","10.2.0.1.0"}
SHOW_DEINSTALL_CONFIRMATION=true
SHOW_DEINSTALL_PROGRESS=true
ACCEPT_LICENSE_AGREEMENT=false
TOPLEVEL_COMPONENT={"oracle.server","10.2.0.1.0"}
SHOW_SPLASH_SCREEN=true
SELECTED_LANGUAGES={"en"}
COMPONENT_LANGUAGES={"en"}
INSTALL_TYPE="Enterprise Edition"
sl_superAdminPasswds=<Value Unspecified>
sl_dlgASMCfgSelectableDisks={}
s_superAdminSamePasswd=<Value Unspecified>
s_globalDBName="orcl"
s_dlgASMCfgRedundancyValue="2 (Norm)"
s_dlgASMCfgNewDisksSize="0"
s_dlgASMCfgExistingFreeSpace="0"
s_dlgASMCfgDiskGroupName="DATA"
s_dlgASMCfgDiskDiscoveryString=""
s_dlgASMCfgAdditionalSpaceNeeded=" MB"
s_dbSelectedUsesASM=""
s_dbSIDSelectedForUpgrade=""
s_dbRetChar=""
s_dbOHSelectedForUpgrade=""
s_ASMSYSPassword=<Value Unspecified>
n_performUpgrade=0
n_dlgASMCfgRedundancySelected=2
n_dbType=1
n_dbSelection=0
b_useSamePassword=false
b_useFileSystemForRecovery=true
b_receiveEmailNotification=false
b_loadExampleSchemas=false
b_enableAutoBackup=false
b_dlgASMShowCandidateDisks=true
b_centrallyManageASMInstance=true
sl_dlgASMDskGrpSelectedGroup={" "," "," "," "}
s_dlgRBOUsername=""
s_dlgEMCentralAgentSelected="No Agents Found"
b_useDBControl=true
s_superAdminSamePasswdAgain=<Value Unspecified>
s_dlgEMSMTPServer=""
s_dlgEMEmailAddress=""
s_dlgRBORecoveryLocation="/oracle/db/flash_recovery_area/"
n_upgradeDB=1
n_configurationOption=3c
sl_upgradableSIDBInstances={}
n_upgradeASM=0
sl_dlgASMCfgDiskSelections={}
s_ASMSYSPasswordAgain=<Value Unspecified>
n_dbStorageType=0
s_rawDeviceMapFileLocation=""
sl_upgradableRACDBInstances={}
s_dlgRBOPassword=<Value Unspecified>
b_stateOfUpgradeDBCheckbox=false
s_dbSid="orcl"
b_dbSelectedUsesASM=false
sl_superAdminPasswdsAgain=<Value Unspecified>
s_mountPoint="/oracle/db/oradata/"
b_stateOfUpgradeASMCheckbox=false
oracle.assistants.server:OPTIONAL_CONFIG_TOOLS="{}"
oracle.has.common:OPTIONAL_CONFIG_TOOLS="{}"
oracle.network.client:OPTIONAL_CONFIG_TOOLS="{}"
oracle.sqlplus.isqlplus:OPTIONAL_CONFIG_TOOLS="{}"
oracle.sysman.console.db:OPTIONAL_CONFIG_TOOLS="{}"
varSelect=1
s_nameForOPERGrp="dba"
s_nameForDBAGrp="dba"