如何设置DG最大保护模式.(PROTECTION)及ORA-03113错误解决
前几天福才老兄让我帮他测试下DG的三种保护模式,在测试最大性能、最大可用性的时候都没问题,在测试最大保护模式的时候,遇到了ORA-03113错误。
SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 313159680 bytes Fixed Size 2227944 bytes Variable Size 197132568 bytes Database Buffers 109051904 bytes Redo Buffers 4747264 bytes Database mounted. SQL> alter database set standby database to maximize PROTECTION; Database altered. SQL> alter database open; alter database open * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 5505 Session ID: 125 Serial number: 5
我的第一感觉是最大保护模式需要使用到standby redo file,而这个DG我已经一年没有启用了,我记得在搭建DG的时候并没有添加standby redo file,可能是这出了问题,查询后发现,standby redo file已经添加了。
SQL> select * from v$standby_log; GROUP# DBID THREAD# SEQUENCE# BYTES BLOCKSIZE USED ARC STATUS ---------- ------------- -------- ---------- ---------- ---------- ---------- --- ---------- 10 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 11 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 12 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 13 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED
那就是同步传输和异步传输的问题了,因为之前已经想到是standby redo file没有添加的问题,那么我的DG应该是异步传输的。
SQL> show parameter log_archive_dest_2 NAME TYPE VALUE ------------------- --------- ------------------------------------------------------------------------------------------ log_archive_dest_2 string SERVICE=PRIMARY LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=PRIMARY
经查询,我的DG环境还真是异步传输的日志,DG要求最大保护模式必须是同步传输日志,修改参数为同步传输。
SQL> alter system set log_archive_dest_2='SERVICE=PRIMARY LGWR SYNC AFFIRM VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=PRIMARY'; System altered.
再次启动数据库,问题依然存在。
SQL> alter database open; alter database open * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 5601 Session ID: 125 Serial number: 5
看来光凭感觉来是不行了,看告警日志吧。
主库告警日志:
alter database open Mon Jun 17 01:19:26 2013 LGWR: STARTING ARCH PROCESSES Mon Jun 17 01:19:26 2013 ARC0 started with pid=19, OS id=7189 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES LGWR: Primary database is in MAXIMUM PROTECTION mode LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR Mon Jun 17 01:19:27 2013 ARC1 started with pid=20, OS id=7193 Mon Jun 17 01:19:27 2013 ARC2 started with pid=21, OS id=7197 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Mon Jun 17 01:19:27 2013 ARC3 started with pid=22, OS id=7201 ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16086) LGWR: RFS network connection re-established at host 'primary' LGWR: Error 16086 opening RFS destination for reconnect Mon Jun 17 01:19:41 2013 LGWR: RFS network connection re-established at host 'primary' LGWR: Error 16086 opening RFS destination for reconnect LGWR: RFS network connection re-established at host 'primary' LGWR: Error 16086 opening RFS destination for reconnect Mon Jun 17 01:19:53 2013 LGWR: RFS network connection re-established at host 'primary' LGWR: Error 16086 opening RFS destination for reconnect ****************************************************** Mon Jun 17 01:24:33 2013 System state dump requested by (instance=1, osid=7116 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_diag_7096.trc LGWR (ospid: 7116): terminating the instance due to error 16098 Dumping diagnostic data in directory=[cdmp_20130617012433], requested by (instance=1, osid=7116 (LGWR)), summary=[abnormal instance termination]. Instance terminated by LGWR, pid = 7116
备库告警日志:
Tue Jun 18 16:56:44 2013 Primary database is in MAXIMUM PROTECTION mode Standby controlfile consistent with primary Standby controlfile consistent with primary RFS[65]: Assigned to RFS process 7292 RFS[65]: No standby redo logfiles created for thread 1 RFS[65]: No standby redo logfiles selected (reason:7) Tue Jun 18 16:56:50 2013 Primary database is in MAXIMUM PROTECTION mode Standby controlfile consistent with primary Standby controlfile consistent with primary RFS[66]: Assigned to RFS process 7296 RFS[66]: No standby redo logfiles created for thread 1 RFS[66]: No standby redo logfiles selected (reason:7) Tue Jun 18 16:56:57 2013 Primary database is in MAXIMUM PROTECTION mode Standby controlfile consistent with primary Standby controlfile consistent with primary RFS[67]: Assigned to RFS process 7300 RFS[67]: No standby redo logfiles created for thread 1 RFS[67]: No standby redo logfiles selected (reason:7) Tue Jun 18 16:57:03 2013 Primary database is in MAXIMUM PROTECTION mode Standby controlfile consistent with primary Standby controlfile consistent with primary RFS[68]: Assigned to RFS process 7304 RFS[68]: No standby redo logfiles created for thread 1 RFS[68]: No standby redo logfiles selected (reason:7) Tue Jun 18 16:57:09 2013
主库竟然是16086错误,这个错误基本上就是主库的日志不能往备库的standby redo file里写导致的,而备库的告警日志也说不能创建standby redo file,可是我的数据库是有standby redo file的,为什么还要创建呢?
SQL> select * from v$standby_log; GROUP# DBID THREAD# SEQUENCE# BYTES BLOCKSIZE USED ARC STATUS ---------- ------------- -------- ---------- ---------- ---------- ---------- --- ---------- 10 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 11 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 12 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 13 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED
通过查看standby redo file,发现DBID都是UNASSIGNED状态,THREAD#字段都是0,STATUS字段也都是UNASSIGNED状态,这就说明这些standby redo file都没有被使用,那就再添加个standby redo file试试。
SQL> alter database add standby logfile group 4 ('/u01/app/oracle/stream/sdy_redo04.log ') size 50m; Database altered.
主库尝试打开数据库,这次竟然打开了。
SQL> alter database open; Database altered.
在查看备库,standby redo file已经使用了。
SQL> select * from v$standby_log; GROUP# DBID THREAD# SEQUENCE# BYTES BLOCKSIZE USED ARC STATUS ---------- ------------- -------- ---------- ---------- ---------- ---------- --- ---------- 4 1564343572 1 286 52428800 512 78336 YES ACTIVE 10 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 11 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 12 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED 13 UNASSIGNED 0 0 52428800 512 0 YES UNASSIGNED
再添加三组standby redo file,删掉之前的4组一直不用的standby redo file,保证以后不出问题。
SQL> alter database add standby logfile group 5 ('/u01/app/oracle/stream/sdy_redo05.log ') size 50m; Database altered. SQL> alter database add standby logfile group 6 ('/u01/app/oracle/stream/sdy_redo06.log ') size 50m; Database altered. SQL> alter database add standby logfile group 7 ('/u01/app/oracle/stream/sdy_redo07.log ') size 50m; Database altered. SQL> alter database drop standby logfile group 10 ; Database altered. SQL> alter database drop standby logfile group 11 ; Database altered. SQL> alter database drop standby logfile group 12 ; Database altered. SQL> alter database drop standby logfile group 13 ; Database altered.
如果主备库有切换需求,那么主库也需要和备库做同样的操作。