在MOS的经典文档“11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”中有一副经典大图,可以一目了然的告诉我们RAC集群中大量 d.bin 进程之间的依赖关系(也就是启动和关闭,谁启动重启谁等等)
1. Oracle Clusterware 启动过程
下面简单解释一下集群件是如何一步步拉起来的。
- 当 Oracle Clusterware 集群的节点启动/重新启动时,OHASD 通过特定于平台的方式启动。 OHASD 是启动 Oracle Clusterware 的根本。 OHASD 可以访问存储在本地文件系统上的 OLR(Oracle Local Registry)。 OLR 提供完成 OHASD 初始化所需的数据。
-
OHASD 拉起了 GPNPD 和 CSSD。 CSSD 可以访问存储在本地文件系统上的 GPNP Profile。此配置文件包含以下重要的引导程序数据;
- a. ASM Diskgroup Discovery String
- b. ASM SPFILE location (Diskgroup name)
- c. Name of the ASM Diskgroup containing the Voting Files
-
ASM 磁盘上的Voting Files的位置由 CSSD 使用 ASM 磁盘标头中的指针访问,并且 CSSD 能够完成初始化并启动或加入现有集群。
-
OHASD 启动一个 ASM 实例,ASM 现在可以操作 CSSD 的初始化和运行。 ASM 实例使用特殊代码来定位 ASM SPFILE 的内容,假设它存储在一个 Diskgroup 中。
-
在 ASM 实例运行并挂载其磁盘组后,CRSD 可以访问 Clusterware 的 OCR。
-
OHASD 通过访问 ASM 磁盘组中的 OCR 启动 CRSD。
-
集群件完成初始化并在其控制下启动其他服务。
Clusterware 启动时涉及三个文件。
1) OLR – 是要读取和打开的第一个文件。此文件是本地文件,此文件包含有关 voting disk 存储位置的信息 和启动 ASM 的信息。 (例如 ASM Discovery String)
2) VOTING DISK – 这是要打开和读取的第二个文件,这取决于只能访问 OLR。
ASM 在 CSSD 之后启动,或者如果 CSSD 离线(即缺少voting file),ASM 不启动。
Voting Disks 如何存储在 ASM 中?
Voting disks 直接放置在 ASMDISK 上。 Oracle Clusterware 将 votedisk 存储在保存 Voting Files 的磁盘组中的磁盘上。
Oracle Clusterware 不依赖 ASM 来访问 Voting Files,这意味着 Oracle Clusterware 不需要 Diskgroup 在 ASMDISK 上读写。可以使用 V$ASM_DISK 列 VOTING_FILE 检查 ASMDISK 上是否存在投票文件。
所以,voting files 不依赖于要访问的磁盘组,并不意味着不需要磁盘组,diskgroup 和 voting file 是通过它们的设置链接的。
3) OCR – 最后 ASM 实例启动并挂载所有磁盘组,然后 Clusterware Deamon (CRSD) 打开并读取存储在 Diskgroup 上的 OCR。
因此,如果 ASM 已经启动,ASM 不依赖于 OCR 或 OLR 在线。 ASM 依赖于 CSSD(Votedisk)上线。
有一种独占模式可以在没有 CSSD 的情况下启动 ASM(但它是为了恢复 OCR 或 VOTE 目的)
1. OHASD Spawns
- cssdagent – 负责启动 CSSD 的 Agent。
- orarootagent – 负责启动所有 root 用户下的 ohasd 资源 的Agent。
- oraagent – 负责启动所有 oracle 用户下的 ohasd 资源的 Agent。
- cssdmonitor – 监控 CSSD 以及节点健康(和 cssdagent 一起)。
2.1. OHASD rootagent spawns
- CRSD – 管理集群资源的主要后台进程。
- CTSSD – 集群时间同步服务守护进程
- Diskmon
- ACFS (ASM Cluster File System) Drivers
2.2. OHASD oraagent spawns
- MDNSD – 用于实现 DNS 查询。
- GIPCD – 用于进程间和节点间通信。
- GPNPD – Grid Plug & Play Profile 守护进程。
- EVMD – 事件监视器守护进程。
- ASM – 用于监控 ASM 实例的资源。
3. CRSD spawns
- orarootagent – 负责启动所有 root 用户下的 crsd 资源的 Agent。
- oraagent – 负责启动所有 oracle 用户下的 crsd 资源的 Agent。
4.1. CRSD rootagent spawns
- Network resource – 监控公共网络。
- SCAN VIP(s) – Single Client Access Name Virtual IPs
- Node VIPs – 每个节点1个。
- ACFS Registery – 挂载 ASM Cluster File System
- GNS VIP (optional) – VIP for GNS
4.2. CRSD oraagent spawns
- ASM Resouce – 用来管理/监控 ASM 磁盘组
- Diskgroup – Used for managing/monitoring ASM diskgroups.
- DB Resource – 用来管理/监控数据库和实例
- SCAN Listener – SCAN 监听(Single Client Access Name),监听在 SCAN VIP 上.
- Listener – 节点监听,监听在 Node VIP 上.
- Services – 用来管理/监控 services
- ONS – Oracle Notification Service
- eONS – 加强版 Oracle Notification Service
- GSD – 为了向下兼容 9i
- GNS (optional) – Grid Naming Service - 处理域名解析
2. RAC 的启动顺序
- 最先启动的是/u01/app/11.2.0.3/grid/bin/ohasd.bin ,他后面呆着reboot,表示它被kill后会被自动reboot。
~]# ps -ef|grep d.bin
root 4296 1 4 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4338 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root 4342 1 2 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
root 4348 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
root 4370 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
- 接下来,我们看到orarootagent.bin cssdagent cssdmonitor不见了,增加 mdnsd.bin
~]# ps -ef|grep d.bin
root 4296 1 4 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 10 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 2 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
- 然后是增加了 ocssd.bin gpnpd.bin orarootagent.bin gipcd.bin osysmond.bin cssdmonitor cssdagent diskmon.bin
~]# ps -ef|grep d.bin
root 4296 1 5 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 3 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 5 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 3 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 2 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 2 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 5 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 3 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
- 然后是增加了
ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
。
ologgerd(Cluster Logger Service)进程是随着11.2.0.2安装过程自动安装的(11.2.0.2的新特性,以前的版本需要单独下载和安装),属于Cluster Health Monitor(以下简称CHM)组件。
CHM主要用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。
~]# ps -ef|grep d.bin
root 4296 1 3 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 2 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 4590 1 1 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
- 在启动ocssd.bin以后,就会启动 octssd.bin
~]# ps -ef|grep d.bin
root 4296 1 2 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 2 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 4590 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root 4685 1 4 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
- 接下来,启动evmd.bin
~]# ps -ef|grep d.bin
root 4296 1 2 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 2 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 4590 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root 4685 1 1 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid 4710 1 2 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
- 然后是crsd.bin 和 tnslsnr
~]# ps -ef|grep d.bin
root 4296 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 4685 1 0 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid 4710 1 0 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root 5080 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root 5100 1 1 20:39 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid 5189 4710 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid 5229 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root 5242 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 5368 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 5376 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle 5466 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 5487 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
当 crsd.bin
启动后,就可以使用 crsctl status res -t
来查看CRS状态了。如果 crsd.bin
没启动,那么需要使用 crsctl status res -t -init
查看。
- 最后启动了 lsnrctl 和 oc4jctl ,至此,CRS启动完毕。
~]# ps -ef|grep d.bin
root 4296 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 4430 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 4444 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 4458 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 4472 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 4476 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 4494 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 4509 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 4530 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 4534 1 0 20:37 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 4557 1 0 20:37 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 4685 1 0 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid 4710 1 0 20:38 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root 5080 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root 5100 1 0 20:39 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid 5189 4710 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid 5229 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root 5242 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid 5368 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 5376 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle 5466 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 5487 1 0 20:39 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
grid 6061 5487 0 20:41 ? 00:00:00 /bin/sh /u01/app/11.2.0.3/grid/bin/oc4jctl check
grid 6072 6061 1 20:41 ? 00:00:00 /u01/app/11.2.0.3/grid/perl/bin/perl /u01/app/11.2.0.3/grid/bin/oc4jctl.pl check 8888
grid 6086 5229 1 20:41 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN1
grid 6088 5229 1 20:41 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER
翻译于:http://oracle-help.com/oracle-rac/rac-11gr2-clusterware-startup-sequence/
19c RAC参考:Oracle Real Application Clusters 19c Technical Architecture