asm中dismount导致rac一个节点宕机该怎么办

2024-04-02 19:04:59 349人浏览泡泡鱼

摘要

这篇文章将为大家详细讲解有关asm中dismount导致rac一个节点宕机该怎么办，文章内容质量较高，因此小编分享给大家做个参考，希望大家阅读完这篇文章后对相关知识有一定的了解。

asm日志

/u01/app/grid/diag/asm/+asm/+ASM1/trace

Thu Jul 30 02:10:46 2015 WARNING: Waited 15 secs for write io to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.

Thu Jul 30 02:10:47 2015 NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1

NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1

NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1 NOTE: checking PST: grp = 1

GMON checking disk modes for group 1 at 12 for pid 28, osid 38695

ERROR: no read quorum in group: required 2, found 0 disks

Dirty Detach Reconfiguration complete Thu Jul 30 02:10:47 2015

WARNING: dirty detached from domain 1

NOTE: cache dismounted group 1/0xB368755B (DATA2)   <--自己dismounted了

sql> alter diskgroup DATA2 dismount force  

Thu Jul 30 02:11:24 2015 NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup DATA2 was mounted    <

自己又mounted了

SUCCESS: ALTER DISKGROUP DATA2 MOUNT  

alert可以看到ASM磁盘dismount，并且是错误“Waited 15 secs for write IO to PST”的问题，这是ASM特有的心跳超时检测， ASM instance会定期检查每个asm disk是不是能正常反馈

Generally this kind messages comes in ASM alertlog file on below situations,

Delayed ASM PST heart beats on ASM disks in nORMal or high redundancy diskgroup,

thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.

By the way the heart beat delays are sort of ignored for external redundancy diskgroup.

ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,

but the heart beat delays do not dismount external redundancy diskgroup directly.

上面描述，可以理解为下面几点：1. ASM实例会定期检查每一个磁盘组的磁盘状态，是否通信正常；

2. 这个检查，只是针对normal和high冗余模式，对于external冗余，不会遇到这个错误；

3. 默认情况是15s超时，也就是说15s磁盘组还是没有对ASM实例响应的话，就会dismount磁盘组。在存储网络出现问题的情况下，会引发这个错误的出现。也就是说，在ASM定期发出检查信息的时候，如果磁盘没有在15s内反馈的话，就认为磁盘已经无法访问。

实际情况是上面的凌晨2:10时间点正好是做全库备份时间,估计大量的写入导致io响应慢 在11.2.0.3.0之后才有这个参数出现，也就是说ASM实例对磁盘超时的检测是在11.2.0.3之后才出现的 set pages 9999; SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.inst_id = USERENV ('Instance') AND y.inst_id = USERENV ('Instance') AND x.indx = y.indx AND upper(x.ksppinm) like '%ASM_H%'; 显示如下:

_asm_hbeatiowait

number of secs to wait for PST Async Hbeat IO return

_asm_hbeatwaitquantum

quantum used to compute time-to-wait for a PST Hbeat check

在存储网络条件不是很好的情况下可以设置检查时间长点,其实在12.1.0.2默认就是120秒了

alter system set "_asm_hbeatiowait"=120 scope=spfile;

重启asm 继续观察

关于asm中dismount导致rac一个节点宕机该怎么办就分享到这里了，希望以上内容可以对大家有一定的帮助，可以学到更多知识。如果觉得文章不错，可以把它分享出去让更多的人看到。

您可能感兴趣的文档:

--结束END--

本文标题: asm中dismount导致rac一个节点宕机该怎么办

本文链接: https://lsjlt.com/news/65777.html(转载时请注明来源链接)