sculkget failed to lock oracle10gdbdbslkinstB1ACDB2 exclusive lock held by PID
May202014
客户的RAC第二个节点主机硬件出了问题,频繁自动关机,在第二节点主机启动的时候发现数据库没有起来,检查告警日志发现以下错误:
Starting ORACLE instance (normal) Thu May 8 09:39:30 2014 LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface type 1 lan901 192.168.2.0 configured from OCR for use as a cluster interconnect Interface type 1 lan900 10.30.1.0 configured from OCR for use as a public interface Picked latch-free SCN scheme 1 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled ksdpec: called for event 13740 prior to event group initialization Starting up ORACLE RDBMS Version: 10.2.0.4.0. System parameters with non-default values: processes = 1500 sessions = 1655 shared_pool_size = 7717519360 large_pool_size = 50331648 java_pool_size = 50331648 streams_pool_size = 33554432 spfile = /dev/vg14/rspfile control_files = /dev/vg14/rcontrolfile1, /dev/vg14/rcontrolfile2, /dev/vg14/rcontrolfile3 db_block_size = 8192 db_cache_size = 7985954816 compatible = 10.2.0.3.0 log_archive_dest = log_archive_dest_1 = LOCATION=/archive log_archive_format = %t_%s_%r.arc db_file_multiblock_read_count= 16 cluster_database = TRUE cluster_database_instances= 2 _gc_undo_affinity = FALSE _gc_affinity_time = 0 _gc_affinity_limit = 10000000 _gc_affinity_minimum = 10000000 thread = 2 instance_number = 2 undo_management = AUTO undo_tablespace = UNDOTBS1 undo_retention = 7200 remote_login_passwordfile= EXCLUSIVE db_domain = local_listener = LISTENER_B1ACDB2 remote_listener = job_queue_processes = 10 background_dump_dest = /oracle10g/admin/B1ACDB/bdump user_dump_dest = /oracle10g/admin/B1ACDB/udump core_dump_dest = /oracle10g/admin/B1ACDB/cdump audit_file_dest = /oracle10g/admin/B1ACDB/adump audit_trail = DB db_name = B1ACDB open_cursors = 600 pga_aggregate_target = 2975858688 _gby_hash_aggregation_enabled= FALSE Cluster communication is configured to use the following interface(s) for this instance 192.168.2.22 Thu May 8 09:39:34 2014 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 PMON started with pid=3, OS id=5372 DIAG started with pid=4, OS id=5584 PSP0 started with pid=5, OS id=5588 LMON started with pid=7, OS id=5592 LMD0 started with pid=8, OS id=5594 LMS0 started with pid=9, OS id=5596 Thu May 8 09:39:56 2014 sculkget: failed to lock /oracle10g/db/dbs/lkinstB1ACDB2 exclusive sculkget: lock held by PID: 4938 Oracle Instance Startup operation failed.Another process may be attempting to startup or shutdown this Instance. Failed to acquire instance startup/shutdown serialization primitive LMS1 started with pid=10, OS id=5603 LMS2 started with pid=2, OS id=5605 MMAN started with pid=11, OS id=5608 DBW0 started with pid=12, OS id=5610 DBW1 started with pid=13, OS id=5612 LGWR started with pid=14, OS id=5614 CKPT started with pid=15, OS id=5621 SMON started with pid=16, OS id=5623 RECO started with pid=17, OS id=5625 CJQ0 started with pid=18, OS id=5627 MMON started with pid=19, OS id=5629 MMNL started with pid=20, OS id=5636 Thu May 8 09:39:59 2014 Shutting down instance (abort) License high water mark = 2 Instance terminated by USER, pid = 5644
日志提示lkinstB1ACDB2文件被操作系统进程4938占用了,可是在操作系统上用ps命令查看并没有这个进程。查看CRS状态,只有第二节点的INSTANCE没有起来,其他都正常。
acdbs:/oracle > crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....B1.inst application ONLINE ONLINE acdbm ora....B2.inst application ONLINE OFFLINE ora.B1ACDB.db application ONLINE ONLINE acdbm ora....BM.lsnr application ONLINE ONLINE acdbm ora.acdbm.gsd application ONLINE ONLINE acdbm ora.acdbm.ons application ONLINE ONLINE acdbm ora.acdbm.vip application ONLINE ONLINE acdbm ora....BS.lsnr application ONLINE ONLINE acdbs ora.acdbs.gsd application ONLINE ONLINE acdbs ora.acdbs.ons application ONLINE ONLINE acdbs ora.acdbs.vip application ONLINE ONLINE acdbs
进入到ORACLE_HOME/dbs目录,看看这个被锁住的文件。
acdbs:/oracle > cd $ORACLE_HOME/dbs acdbs:/oracle10g/db/dbs > ls hc_B1ACDB2.dat initB1ACDB2.ora initENMO.ora orapwB1ACDB2 snapcf_ENMO.f hc_ENMO.dat initB1ACDB2.ora-0112 initdw.ora orapwENMO spfileENMO.ora init.ora initB1ACDB2.ora0115 lkinstB1ACDB2 snapcf_B1ACDB2.f sqlnet.log
这是个空文件,里面没有任何信息。
acdbs:/oracle10g/db/dbs > du lkinstB1ACDB2 0 lkinstB1ACDB2_20140512
手动启动INSTANCE也失败,报同样的错误。这个lkinstB1ACDB2文件正常是不应该存在的,从名字看应该是lock instance的意思,可能是在服务器突然关机(相当于掉电)的时候,残留的一个文件,直接将这个文件移除,启动INSTANCE试试。
acdbs:/oracle10g/db/dbs > mv lkinstB1ACDB2 lkinstB1ACDB2_20140512 acdbs:/oracle10g/db/dbs > srvctl start instance -d B1ACDB -i B1ACDB2
命令执行成功后,查看CRS状态,一切正常了。
acdbs:/oracle10g/db/dbs > crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....B1.inst application ONLINE ONLINE acdbm ora....B2.inst application ONLINE ONLINE acdbs ora.B1ACDB.db application ONLINE ONLINE acdbm ora....BM.lsnr application ONLINE ONLINE acdbm ora.acdbm.gsd application ONLINE ONLINE acdbm ora.acdbm.ons application ONLINE ONLINE acdbm ora.acdbm.vip application ONLINE ONLINE acdbm ora....BS.lsnr application ONLINE ONLINE acdbs ora.acdbs.gsd application ONLINE ONLINE acdbs ora.acdbs.ons application ONLINE ONLINE acdbs ora.acdbs.vip application ONLINE ONLINE acdbs
至此,问题解决。
———————————————end—————————————————–
【上一篇】bug 7207932 MRP0 Background Media Recovery terminated with error 600 ora-00600 kgeade_is_0
【下一篇】linux 分区使用率过高又查询不到被哪些文件占用的问题
【下一篇】linux 分区使用率过高又查询不到被哪些文件占用的问题