Menu Close

Oracle 11.2 RAC集群启动过程一——启动

在MOS的经典文档“11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”中有一副经典大图,可以一目了然的告诉我们RAC集群中大量 d.bin 进程之间的依赖关系(也就是启动和关闭,谁启动重启谁等等)

1. Oracle Clusterware 启动过程

下面简单解释一下集群件是如何一步步拉起来的。

  1. 当 Oracle Clusterware 集群的节点启动/重新启动时,OHASD 通过特定于平台的方式启动。 OHASD 是启动 Oracle Clusterware 的根本。 OHASD 可以访问存储在本地文件系统上的 OLR(Oracle Local Registry)。 OLR 提供完成 OHASD 初始化所需的数据。

  2. OHASD 拉起了 GPNPDCSSDCSSD 可以访问存储在本地文件系统上的 GPNP Profile。此配置文件包含以下重要的引导程序数据;

  • a. ASM Diskgroup Discovery String
  • b. ASM SPFILE location (Diskgroup name)
  • c. Name of the ASM Diskgroup containing the Voting Files
  1. ASM 磁盘上的Voting Files的位置由 CSSD 使用 ASM 磁盘标头中的指针访问,并且 CSSD 能够完成初始化并启动或加入现有集群。

  2. OHASD 启动一个 ASM 实例,ASM 现在可以操作 CSSD 的初始化和运行。 ASM 实例使用特殊代码来定位 ASM SPFILE 的内容,假设它存储在一个 Diskgroup 中。

  3. ASM 实例运行并挂载其磁盘组后,CRSD 可以访问 Clusterware 的 OCR

  4. OHASD 通过访问 ASM 磁盘组中的 OCR 启动 CRSD

  5. 集群件完成初始化并在其控制下启动其他服务。

Clusterware 启动时涉及三个文件。

1) OLR – 是要读取和打开的第一个文件。此文件是本地文件,此文件包含有关 voting disk 存储位置的信息 和启动 ASM 的信息。 (例如 ASM Discovery String

2) VOTING DISK – 这是要打开和读取的第二个文件,这取决于只能访问 OLR。

ASMCSSD 之后启动,或者如果 CSSD 离线(即缺少voting file),ASM 不启动。

Voting Disks 如何存储在 ASM 中?

Voting disks 直接放置在 ASMDISK 上。 Oracle Clusterwarevotedisk 存储在保存 Voting Files 的磁盘组中的磁盘上。

Oracle Clusterware 不依赖 ASM 来访问 Voting Files,这意味着 Oracle Clusterware 不需要 DiskgroupASMDISK 上读写。可以使用 V$ASM_DISKVOTING_FILE 检查 ASMDISK 上是否存在投票文件。

所以,voting files 不依赖于要访问的磁盘组,并不意味着不需要磁盘组,diskgroupvoting file 是通过它们的设置链接的。

3) OCR – 最后 ASM 实例启动并挂载所有磁盘组,然后 Clusterware Deamon (CRSD) 打开并读取存储在 Diskgroup 上的 OCR

因此,如果 ASM 已经启动,ASM 不依赖于 OCR 或 OLR 在线。 ASM 依赖于 CSSDVotedisk)上线。

有一种独占模式可以在没有 CSSD 的情况下启动 ASM(但它是为了恢复 OCRVOTE 目的)

1. OHASD Spawns

  • cssdagent – 负责启动 CSSD 的 Agent。
  • orarootagent – 负责启动所有 root 用户下的 ohasd 资源 的Agent。
  • oraagent – 负责启动所有 oracle 用户下的 ohasd 资源的 Agent。
  • cssdmonitor – 监控 CSSD 以及节点健康(和 cssdagent 一起)。

2.1. OHASD rootagent spawns

  • CRSD – 管理集群资源的主要后台进程。
  • CTSSD – 集群时间同步服务守护进程
  • Diskmon
  • ACFS (ASM Cluster File System) Drivers

2.2. OHASD oraagent spawns

  • MDNSD – 用于实现 DNS 查询。
  • GIPCD – 用于进程间和节点间通信。
  • GPNPD – Grid Plug & Play Profile 守护进程。
  • EVMD – 事件监视器守护进程。
  • ASM – 用于监控 ASM 实例的资源。

3. CRSD spawns

  • orarootagent – 负责启动所有 root 用户下的 crsd 资源的 Agent。
  • oraagent – 负责启动所有 oracle 用户下的 crsd 资源的 Agent。

4.1. CRSD rootagent spawns

  • Network resource – 监控公共网络。
  • SCAN VIP(s) – Single Client Access Name Virtual IPs
  • Node VIPs – 每个节点1个。
  • ACFS Registery – 挂载 ASM Cluster File System
  • GNS VIP (optional) – VIP for GNS

4.2. CRSD oraagent spawns

  • ASM Resouce – 用来管理/监控 ASM 磁盘组
  • Diskgroup – Used for managing/monitoring ASM diskgroups.
  • DB Resource – 用来管理/监控数据库和实例
  • SCAN Listener – SCAN 监听(Single Client Access Name),监听在 SCAN VIP 上.
  • Listener – 节点监听,监听在 Node VIP 上.
  • Services – 用来管理/监控 services
  • ONS – Oracle Notification Service
  • eONS – 加强版 Oracle Notification Service
  • GSD – 为了向下兼容 9i
  • GNS (optional) – Grid Naming Service - 处理域名解析

2. RAC 的启动顺序

  1. 最先启动的是/u01/app/11.2.0.3/grid/bin/ohasd.bin ,他后面呆着reboot,表示它被kill后会被自动reboot。
~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4338     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      4342     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
root      4348     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
root      4370     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
  1. 接下来,我们看到orarootagent.bin cssdagent cssdmonitor不见了,增加 mdnsd.bin
~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1 10 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
  1. 然后是增加了 ocssd.bin gpnpd.bin orarootagent.bin gipcd.bin osysmond.bin cssdmonitor cssdagent diskmon.bin
~]# ps -ef|grep d.bin
root      4296     1  5 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
  1. 然后是增加了 ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
    ologgerd(Cluster Logger Service)进程是随着11.2.0.2安装过程自动安装的(11.2.0.2的新特性,以前的版本需要单独下载和安装),属于Cluster Health Monitor(以下简称CHM)组件。
    CHM主要用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。
~]# ps -ef|grep d.bin
root      4296     1  3 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root      4590     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
  1. 在启动ocssd.bin以后,就会启动 octssd.bin
~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  4 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
  1. 接下来,启动evmd.bin
~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  1 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  2 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
  1. 然后是crsd.bin 和 tnslsnr
~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  1 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin

crsd.bin 启动后,就可以使用 crsctl status res -t 来查看CRS状态了。如果 crsd.bin 没启动,那么需要使用 crsctl status res -t -init 查看。

  1. 最后启动了 lsnrctl 和 oc4jctl ,至此,CRS启动完毕。
~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  0 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
grid      6061  5487  0 20:41 ?        00:00:00 /bin/sh /u01/app/11.2.0.3/grid/bin/oc4jctl check
grid      6072  6061  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/perl/bin/perl /u01/app/11.2.0.3/grid/bin/oc4jctl.pl check 8888
grid      6086  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN1
grid      6088  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER

翻译于:http://oracle-help.com/oracle-rac/rac-11gr2-clusterware-startup-sequence/

19c RAC参考:Oracle Real Application Clusters 19c Technical Architecture