Oracle Linux(OEL)网卡参数默认设置导致ORA-27300、ORA-27301、ORA-27302错误
Sep232016
之前有一套核心数据库(2节点RAC),磁盘资源紧张,并且存储配置存在问题,I/O性能一直不高,因此新采购了一套EMC的存储,由于数据量很大(2.5TB),使用DG(2节点RAC)进行存储切换,这个过程不表,切换后,新主库的第二个节点遇到ORA-27300、ORA-27301、ORA-27302错误。本案例环境OEL6.5,ORACLE 11.2.0.4。
Thu Sep 22 08:49:37 2016 skgxpvfynet: mtype: 61 process 12490 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4) Errors in file /u01/app/oracle/diag/rdbms/ivldb34/ivldb4/trace/ivldb4_m001_12490.trc (incident=120005): ORA-00603: ORACLE server session terminated by fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:sendmsg failed with status: 105 ORA-27301: OS failure message: No buffer space available ORA-27302: failure occurred at: sskgxpsnd2 Incident details in: /u01/app/oracle/diag/rdbms/ivldb34/ivldb4/incident/incdir_120005/ivldb4_m001_12490_i120005.trc opidrv aborting process M001 ospid (12490) as a result of ORA-603 Dumping diagnostic data in directory=[cdmp_20160922084938], requested by (instance=2, osid=12490 (M001)), summary=[incident=120005]. Thu Sep 22 08:49:39 2016 Process m001 died, see its trace file
经查MOS(Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available (文档 ID 2041723.1)发现,这是因为网卡的MUT参数设置过高导致网卡的缓存不足导致的。
Cause High value of MTU for loop back adapter? on UEK3 causes the issue. Solution In UEK3, the MTU value should be modified as below #ifconfig lo mtu 16436 To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo MTU=16436 Save the file and restart the network service to load the changes #service network restart On servers with High Pysical Memory, the parameter vm.min_free_kbytes?should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of de-fragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions. *** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1048576 *** Additionally, on NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes. On NUMA Enabled? Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.
这应该是OEL操作系统专属的错误,对比OEL、RHEL、CentOS系统,发现只有OEL系统网卡本地回环的MTU是65536,其他系统均是16436,而MOS上的解决方案是将网卡本地回环的MTU改为16436。这台服务器网卡本地回环的MTU当前设置如下:
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:867931409 errors:0 dropped:0 overruns:0 frame:0 TX packets:867931409 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:438765973006 (408.6 GiB) TX bytes:438765973006 (408.6 GiB)
当前服务器网卡本地回环的MTU默认设置是65536,按照MOS文档的方法修改这个设置。
[root@SL010A-IVDB04 ~] #ifconfig lo mtu 16436
这个命令将修改内存中网卡的参数,直接生效。
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:867931409 errors:0 dropped:0 overruns:0 frame:0 TX packets:867931409 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:438765973006 (408.7 GiB) TX bytes:438765973006 (408.7 GiB)
但是重启后将恢复默认值,如果保证重启也生效,就需要修改网卡的配置文件了,MOS中也有说明。
[root@SL010A-IVDB04 ~] cat /etc/sysconfig/network-scripts/ifcfg-lo DEVICE=lo IPADDR=127.0.0.1 NETMASK=255.0.0.0 NETWORK=127.0.0.0 # If you're having problems with gated making 127.0.0.0/8 a martian, # you can change this to something else (255.255.255.255, for example) BROADCAST=127.255.255.255 ONBOOT=yes NAME=loopback MTU=16436
修改后,再未遇到这个错误。