ORA-07445[opiaba()+639]、ORA-00600[17147]错误导致数据库挂掉
Mar022019
今天,用友NC的一个数据库突然挂了,查看数据库日志发现,在凌晨数据库已经挂了,今天上午10点被同事启动了,然后中午又挂了,具体告警信息如下:
Sat Mar 02 12:37:30 2019 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x186016B, opiaba()+639] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/trace/ncdb3_ora_101082.trc (incident=72265): ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x186016B] [SI_KERNEL(general_protection)] [] Incident details in: /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/incident/incdir_72265/ncdb3_ora_101082_i72265.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 02 12:38:09 2019 Sweep [inc][72265]: completed Sweep [inc2][72265]: completed Sat Mar 02 12:38:10 2019 Dumping diagnostic data in directory=[cdmp_20190302123810], requested by (instance=1, osid=101082), summary=[incident=72265]. Sat Mar 02 12:38:43 2019 Errors in file /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/trace/ncdb3_pmon_98334.trc (incident=72017): ORA-00600: internal error code, arguments: [17147], [0x9194B8448], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/incident/incdir_72017/ncdb3_pmon_98334_i72017.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 02 12:38:43 2019 Dumping diagnostic data in directory=[cdmp_20190302123843], requested by (instance=1, osid=98334 (PMON)), summary=[incident=72017]. Errors in file /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/trace/ncdb3_pmon_98334.trc: ORA-00600: internal error code, arguments: [17147], [0x9194B8448], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 98334): terminating the instance due to error 472 System state dump requested by (instance=1, osid=98334 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/ncdb3/ncdb3/trace/ncdb3_diag_98344_20190302123844.trc Instance terminated by PMON, pid = 98334
查看trace文件,发现可疑SQL,一共34万多行,大概6万多条UPDATE操作,SQL如下:
*** 2019-03-02 12:37:30.547 dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=4yj2p6s0z2m2q) ----- begin UPDATE ic_saleoutdetail t SET t.proestdetailid=:1,t.isproest=:2 WHERE t.ID = :3 ; UPDATE ic_saleoutdetail t SET t.proestdetailid=:4,t.isproest=:5 WHERE t.ID = :6 ; … … … … UPDATE ic_saleoutdetail t SET t.proestdetailid=:204670,t.isproest=:204671 WHERE t.ID = :204672 ;end;
从begin到end一共24万多的绑定变量,这个数据库是11.2.0.4.0版本,一个SQL最多只支持65535个绑定变量,超过这个阈值数据库就可能挂掉。通常情况下基本也遇不到这么多绑定变量的情况。
解决方法是拆分上面的SQL,和开发的同事沟通后,他们确定可以拆分,目前开发的同事正在调整程序。虽然这个是BUG,但绝大多数的BUG都是SQL书写不规范触发的。