ORACLE 12C新特性-多线程引起的一个笑话
前几天和同事处理了一个12C的故障,现在想起来还觉得好笑,没文化太可怕了,还好这个是我们公司的测试服务器出的问题,怎么折腾都没事。
故障描述:
新同事在向12C数据库中导入其他数据库的AWR数据的时候,说数据库连不上了,当时忙着技术分享,也没看,后来同事在处理的时候,发现SQLPLUS不能正常连接数据库,不知道是什么原因,技术分享结束后,我帮他看了下,SQLPLUS本地验证的方式登录数据库后,所有操作报ORA-01017错误。
[oracle@enmocoredb admin]$ export ORACLE_SID=core [oracle@enmocoredb admin]$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.1.0 Production on Tue Oct 15 10:06:52 2013 Copyright (c) 1982, 2013, Oracle. All rights reserved. ERROR: ORA-01017: invalid username/password; logon denied
遇到这个问题我通常会想到是$ORACLE_HOME/bin目录下的ORACLE文件的权限不对,查看后发现不是这个原因,权限是正常的。
[oracle@enmocoredb bin]$ ll oracle -rwsr-s--x 1 oracle oinstall 294728395 Jul 2 14:05 oracle
那是什么原因呢?和同事研究半天,也没发现端倪,凭着以前的经验,查了下数据库的进程。
[oracle@enmocoredb admin]$ ps -ef | grep smon oracle 29003 28878 0 10:18 pts/1 00:00:00 grep smon
SMON进程不存在,SMON进程挂了,数据库不就挂了吗?那在看看其他进程是否在。
[oracle@enmocoredb admin]$ ps -ef | grep ora root 28705 1405 0 09:28 ? 00:00:00 sshd: oracle [priv] oracle 28707 28705 0 09:28 ? 00:00:00 sshd: oracle@pts/0 oracle 28708 28707 0 09:28 pts/0 00:00:00 -bash root 28737 1405 0 09:28 ? 00:00:00 sshd: oracle [priv] oracle 28739 28737 3 09:29 ? 00:01:54 sshd: oracle@pts/2 oracle 28740 28739 0 09:29 pts/2 00:00:00 -bash oracle 28787 1 0 09:48 ? 00:00:00 ora_pmon_core oracle 28789 1 0 09:48 ? 00:00:00 ora_psp0_core oracle 28791 1 3 09:48 ? 00:01:08 ora_vktm_core oracle 28795 1 0 09:48 ? 00:00:02 ora_u004_core oracle 28801 1 0 09:48 ? 00:00:04 ora_u005_core oracle 28807 1 0 09:48 ? 00:00:00 ora_dbw0_core root 28875 1405 0 10:02 ? 00:00:00 sshd: oracle [priv] oracle 28877 28875 0 10:02 ? 00:00:00 sshd: oracle@pts/1 oracle 28878 28877 0 10:02 pts/1 00:00:00 -bash oracle 28954 28877 0 10:10 ? 00:00:00 /usr/libexec/openssh/sftp-server oracle 29006 28878 0 10:19 pts/1 00:00:00 ps -ef oracle 29007 28878 0 10:19 pts/1 00:00:00 grep ora
到现在就满脸疑惑了,为什么SMON进程挂掉了,其他进程还在,之前的经验告诉我,SMON进程挂了,数据库也就挂了,本地验证登录不了数据库,那就杀进程吧,然后我们残忍的干掉了这些进程。
[oracle@enmocoredb admin]$ pkill core
再次尝试本地验证登录数据库成功,但是启动时,错误依旧。
[oracle@enmocoredb dbs]$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 25 18:27:29 2013 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup pfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initcore.ora' ORA-01017: invalid username/password; logon denied
告警日志最后面竟然不报错,看不出什么问题,而且数据库进程又神奇般的启动了,还是没有SMON进程,真见鬼了。没办法只好重建了PFILE,看看是不是参数出了问题。
[oracle@enmocoredb dbs]$ vi initcore.ora core.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment *.compatible='12.1.0.0.0' *.control_files='/u01/app/oracle/oradata/core/control01.ctl','/u01/app/oracle/oradata/core/control02.ctl' *.db_block_size=8192 *.db_create_file_dest='/u01/app/oracle/oradata/core/' *.db_domain='' *.db_name='core' *.db_recovery_file_dest='/home/oracle/fra' *.db_recovery_file_dest_size=1073741824 *.diagnostic_dest='/u01/app/oracle' *.dispatchers='(PROTOCOL=TCP) (SERVICE=coreXDB)' *.sga_target=500m *.sga_max_size=500m *.pga_aggregate_target=300m *.open_cursors=300 *.processes=150 *.remote_login_passwordfile='EXCLUSIVE' #*.threaded_execution=TRUE *.undo_tablespace='UNDOTBS1' PGA_AGGREGATE_LIMIT=300M
threaded_execution参数没见过,干啥用的?不知道干啥用的就先注释掉,看看是不是他搞的鬼。
[oracle@enmocoredb dbs]$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 25 18:31:23 2013 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup pfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initcore.ora' ORACLE instance started. Total System Global Area 521936896 bytes Fixed Size 2290264 bytes Variable Size 209718696 bytes Database Buffers 301989888 bytes Redo Buffers 7938048 bytes Database mounted. ORA-03113: end-of-file on communication channel Process ID: 2755 Session ID: 125 Serial number: 5
看来真是这个参数搞的鬼,可以看到数据库启动到了MOUNT状态,OPEN时报错,ORA-03113错误通常和磁盘有关系,要么是磁盘不能读写,要么是文件损坏了,不过基本遇到的都是前者,这个错误告警日志会记录。经查告警日志,发现是闪回恢复区满了,导致日志无法归档,数据库挂起,修改闪回恢复去大小,数据库可以正常OPEN。
那么threaded_execution参数到底是干什么的呢?是不是12C引入的新功能呢?查了查资料,发现这个参数竟然是12C的新特性-多线程引入的参数。
Multiprocess and Multithreaded Oracle Database Systems Multiprocess Oracle Database (also called multiuser Oracle Database) uses several processes to run different parts of the Oracle Database code and additional Oracle processes for the users—either one process for each connected user or one or more processes shared by multiple users. Most databases are multiuser because a primary advantage of a database is managing data needed by multiple users simultaneously. Each process in a database instance performs a specific job. By dividing the work of the database and applications into several processes, multiple users and applications can connect to an instance simultaneously while the system gives good performance. In previous releases, Oracle processes did not run as threads on UNIX and Linux systems. Starting in Oracle Database 12c, the multithreaded Oracle Database model enables Oracle processes to execute as operating system threads in separate address spaces. When Oracle Database 12c is installed, the database runs in process mode. You must set a parameter to run the database in threaded mode. In threaded mode, some background processes on UNIX and Linux run as processes (processes containing one thread), whereas the remaining Oracle processes run as threads within processes. In a database running in threaded mode, PMON and DBW might run as operating system processes, whereas LGWR and CMON might run as threads within a single process. Two foreground processes and a parallel execution (PX) server process might run as threads in a second operating system process. A third operating system process might contain multiple foreground threads. Thus, "Oracle process" does not always mean "operating system process." The V$PROCESS view contains one row for each Oracle process connected to a database instance. For example, you can run the following query in SQL*Plus to get the operating system process ID and operating system thread ID for each process:
threaded_execution参数就是控制是否使用多线程模式,ORACLE 12C在开启多线程之后,默认就是没有SMON进程的。以下摘自官方文档:
SPID STID PROGRAM PSEUDO ----- ----- ----------------- --------- 7190 7190 oracle@samplehost (PMON) 7192 7192 oracle@samplehost (PSP0) 7194 7194 oracle@samplehost (VKTM) 7198 7198 oracle@samplehost (SCMN) 7198 7200 oracle@samplehost (GEN0) 7202 7202 oracle@samplehost (SCMN) 7202 7204 oracle@samplehost (DIAG) 7198 7205 oracle@samplehost (DBRM) 7202 7206 oracle@samplehost (DIA0)
而且开启多线程之后,就不再支持操作系统认证了,所以会遇到ORA-01017错误。
由于我之前不了解这个特性,凭借以前的经验,发现SMON进程不在就以为数据库已经挂了,其实这个问题只要看下告警日志,就知道是因为闪回恢复区满,日志无法归档,只需要修改下闪回恢复区的大小就可以了,数据库并没有宕掉,看来在一套新版本的系统中,凭借以前的经验有时候也会害死人的,这也体现出没文化太可怕了。
【下一篇】oracle 11g延迟段创建导致exp-00003错误及关闭延迟段创建特性