GreenPlum数据库Segment节点故障转移到Mirror
本文档主要演示GreenPlum数据库Segment节点在有Mirror的情况下,Segment发生故障,Mirror节点自动接管Segment节点,数据库可以正常使用。这个过程对应用来说是透明的,也是在4.X版本之后提供的功能,本实验环境使用的数据库版本为4.3.6.2。在模拟故障前,先查看数据库的状态。
dbdream=# select * from gp_segment_configuration order by 1; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+----------+---------+------------------ 1 | -1 | p | p | s | u | 5432 | mdw-std | mdw-std | | 2 | 0 | p | p | s | u | 40000 | sdw1 | sdw1 | 43000 | 3 | 1 | p | p | s | u | 40000 | sdw2 | sdw2 | 43000 | 4 | 2 | p | p | s | u | 40000 | sdw3 | sdw3 | 43000 | 5 | 3 | p | p | s | u | 40000 | sdw4 | sdw4 | 43000 | 6 | 0 | m | m | s | u | 41000 | sdw2 | sdw2 | 42000 | 7 | 1 | m | m | s | u | 41000 | sdw3 | sdw3 | 42000 | 8 | 2 | m | m | s | u | 41000 | sdw4 | sdw4 | 42000 | 9 | 3 | m | m | s | u | 41000 | sdw1 | sdw1 | 42000 | 10 | -1 | m | m | s | u | 5432 | mdw | mdw | | (10 rows)
当前数据库存在4个Segment节点,本实验将会模拟其中一台Segment主机宕掉的情况下,Mirror节点自动接管Segment节点,保证数据库正常运行。
[gpadmin@mdw-std ~]$ gpstate –s 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:--Master Configuration & Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master host = mdw-std 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master postgres process ID = 5350 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master data directory = /gpdb/gpdata/master/gpseg-1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master port = 5432 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master current role = dispatch 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Greenplum initsystem version = 4.3.6.2 build 1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Greenplum current version = PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Postgres version = 8.2.15 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Master standby = mdw 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Standby master state = Standby host passive 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-Segment Instance Status Report 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg0 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 3359 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg0 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 4416 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 3079 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 4726 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 5094 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw4 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw4 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg2 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 10578 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw4 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw4 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 8249 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw1 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg3 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Synchronized 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- PID = 4739 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:- Segment status = Up
查看Mirror节点的状态。
[gpadmin@mdw-std ~]$ gpstate -m 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--------------- 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--------------- 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:- sdw2 /gpdb/gpdata/mirror/gpseg0 41000 Passive Synchronized 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:- sdw3 /gpdb/gpdata/mirror/gpseg1 41000 Passive Synchronized 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:- sdw4 /gpdb/gpdata/mirror/gpseg2 41000 Passive Synchronized 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:- sdw1 /gpdb/gpdata/mirror/gpseg3 41000 Passive Synchronized 20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:---------------
从上面的信息可以看到,当前Mirror的状态都是正常的,此时如果Segment出现故障,Mirror节点会自动接管Segment节点,数据库可以正常使用,并不会对业务造成影响。
下面关闭sdw4这个Segment节点的主机,这台主机上存在sdw4节点的Segment和sdw1的Mirror节点,Mirror节点宕掉并不会对数据库的正常运行造成影响,sdw4的Mirror节点存放在sdw1节点的主机上,也就是sdw4节点的Segment宕掉,sdw1主机上的Mirror会接管Segment,下面重启sdw4的主机。
[root@sdw4 fspc_segment]# reboot [root@sdw4 fspc_segment]# Broadcast message from root@sdw4 (/dev/pts/0) at 14:59 ... The system is going down for reboot NOW!
下面检查Mirror的信息,会发现sdw4的Mirror已经自动切换为Primary。
[gpadmin@mdw-std ~]$ gpstate -m 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--------------------- 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--------------------- 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:- sdw2 /gpdb/gpdata/mirror/gpseg0 41000 Passive Synchronized 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:- sdw3 /gpdb/gpdata/mirror/gpseg1 41000 Passive Synchronized 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-sdw4 /gpdb/gpdata/mirror/gpseg2 41000 Failed <<<<<<<< 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:- sdw1 /gpdb/gpdata/mirror/gpseg3 41000 Acting as Primary Change Tracking 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:------------------- 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) are acting as primaries 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed 20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 mirror segment(s) acting as primaries are in change tracking
从上面的信息可以看到,sdw4主机上的Mirror(sdw3的Mirror)已经宕掉,sdw1主机上的Mirror(sdw4的Mirror)已经切换为Priamry。
通过gpstate –s命令也可以看到这些信息。
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw3 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw3 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg2 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Change Tracking 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Change Tracking Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Change tracking data size = 106 MB 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- PID = 5094 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw4 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw4 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg2 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:- Mirror status = Out of Sync <<<<<<<< 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:- PID = Not found <<<<<<<< 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:- Configuration reports status as = Down <<<<<<<< 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:- Segment status = Down in configuration <<<<<<<<
以上是sdw3节点的信息,可以看到sdw3主机上的Segment节点状态正常,但是他的Mirror在sdw4主机上,目前已经宕掉。
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw1 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw1 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/mirror/gpseg3 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Port = 41000 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Mirror 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirror status = Change Tracking 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Change Tracking Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Change tracking data size = 100 MB 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Status 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- PID = 4739 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Configuration reports status as = Up 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Database status = Up 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Segment Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Hostname = sdw4 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Address = sdw4 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Datadir = /gpdb/gpdata/primary/gpseg3 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Port = 40000 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Mirroring Info 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:- Preferred role = Primary 20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:- Mirror status = Out of Sync <<<<<<<<
以上是sdw4节点的信息,可以看到,sdw4节点的Primary已经被sdw1上的Mirror接管,sdw4上的节点已经变成了Mirror,目前是宕掉的状态。
从数据库中也可以查询到这些信息。
dbdream=# select * from gp_segment_configuration order by 1; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------- 1 | -1 | p | p | s | u | 5432 | mdw-std | mdw-std | | 2 | 0 | p | p | s | u | 40000 | sdw1 | sdw1 | 43000 | 3 | 1 | p | p | s | u | 40000 | sdw2 | sdw2 | 43000 | 4 | 2 | p | p | c | u | 40000 | sdw3 | sdw3 | 43000 | 5 | 3 | m | p | s | d | 40000 | sdw4 | sdw4 | 43000 | 6 | 0 | m | m | s | u | 41000 | sdw2 | sdw2 | 42000 | 7 | 1 | m | m | s | u | 41000 | sdw3 | sdw3 | 42000 | 8 | 2 | m | m | s | d | 41000 | sdw4 | sdw4 | 42000 | 9 | 3 | p | m | c | u | 41000 | sdw1 | sdw1 | 42000 | 10 | -1 | m | m | s | u | 5432 | mdw | mdw | | (10 rows)
先看下dbid=4的这行记录,这是sdw3主机的Segment节点,当前sdw3节点的Mirror在sdw4主机上,已经宕掉,mode列这里的状态是c(change logging),现在Segment和Mirror已经是不同步状态了。再看下dbid=5的这行记录,这行记录记录的是sdw4的Segment信息,目前已经宕掉,status字段的状态是d(down),preferred_role是p,也就是原先的角色是Primary,role字段的状态是m(Mirror),现在的角色是Mirror,因为角色已经切换。再看dbid=8这条记录,这条记录记录的是sdw3主机的Mirror,status字段的状态是d(down),因为Mirror宕掉对数据库的正常运行没有影响,所以原先的角色是Mirror,现在的角色还是Mirror。再看dbid=9这行记录,这行记录记录的是sdw4主机的Mirror,在sdw1主机上,因为sdw4的Primary已经宕掉,sdw1上的Mirror已经切换成Primary了,mode字段的状态是c(change logging),因为现在Primary和Mirror已经不同步了,preferred_role状态是m(Mirror),原先的角色是Mirror,role字段的状态是p(Primary),现在的角色是Primary。
下面看下,Mirror切换成Primary后,数据库的读写是否会收到影响。
dbdream=# select gp_segment_id,count(*) from ord_pay dbdream-# group by gp_segment_id; gp_segment_id | count ---------------+-------- 3 | 250000 2 | 250001 1 | 249999 0 | 250000 (4 rows) dbdream=# select gp_segment_id,count(*) from ord_pay_s group by gp_segment_id; gp_segment_id | count ---------------+------- (0 rows)
可见查询没有问题,下面把ord_pay表中的100万条数据插入到ord_pay_s(空表)表中,看看写操作是否会收到影响。
dbdream=# insert into ord_pay_s select * from ord_pay; INSERT 0 1000000 dbdream=# select gp_segment_id,count(*) from ord_pay_s group by gp_segment_id; gp_segment_id | count ---------------+-------- 3 | 250000 0 | 250000 1 | 249999 2 | 250001 (4 rows)
可见写操作也没有收到影响。但是现在数据的分布是不合理的,以为当前sdw1主机上存在两个Primary的Segment,也就是sdw1节点和sdw4节点,而其他主机上只存在一个Primary的Segment节点,压力是不均衡的,GreenPlum数据库建议在故障主机修复后,再切换回原来的架构。
要恢复到原来的架构,有两种可能情况,第一种情况,当故障主机启动后,这台主机上的节点会自动启动,当然现在已经都是Mirror节点了,角色是不会切换的,而且数据是不一致状态。这种情况只需要使用gprecoverseg命令同步一下数据后,就可以切换到原来的架构了。很遗憾的是,我遇到的是另一种情况,当故障主机sdw4启动后,这台主机上的节点并没有启动,而且在重启数据库时,这台主机上的节点也没有启动。
[gpadmin@mdw-std ~]$ gpstart -a 20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting gpstart with args: -a 20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Gathering information and validating the environment... 20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting Master instance in admin mode 20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Setting new master era 20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Master Started... 20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Shutting down master 20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw4 directory /gpdb/gpdata/mirror/gpseg2 <<<<< 20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw4 directory /gpdb/gpdata/primary/gpseg3 <<<<< 20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... .. 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Process results... 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:------------------------ 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:- Successful segment starts = 6 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:- Failed segment starts = 0 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 2 <<<<<<<< 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:------------------------ 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:- 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Successfully started 6 of 6 segment instances, skipped 2 other segments 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:------------------------- 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-********************** 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-*********************** 20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting Master instance mdw-std directory /gpdb/gpdata/master/gpseg-1 20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Command pg_ctl reports Master mdw-std instance active 20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting standby master 20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Checking if standby master is running on host: mdw in directory: /gpdb/gpdata/master/gpseg-1 20160308:16:16:10:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Number of segments not attempted to start: 2 20160308:16:16:10:017426 gpstart:mdw-std:gpadmin-[INFO]:-Check status of database with gpstate utility
启动日志会提示需要通过gprecoverseg命令同步下故障节点的数据,才能启动故障主机上的节点。下面使用gprecoverseg命令同步故障机节点的数据。
[gpadmin@mdw-std ~]$ gprecoverseg 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting gprecoverseg with args: 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Checking if segments are ready 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Greenplum instance recovery parameters 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------------- 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery type = Standard 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------------- 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery 1 of 2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------------- 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Synchronization mode = Incremental 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance host = sdw4 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance address = sdw4 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance directory = /gpdb/gpdata/mirror/gpseg2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance port = 41000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance replication port = 42000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance fspc1 directory = /gpdb/gpdata/fspc_mirror/gpseg2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance host = sdw3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance address = sdw3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance directory = /gpdb/gpdata/primary/gpseg2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance port = 40000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance replication port = 43000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance fspc1 directory = /gpdb/gpdata/fspc_segment/gpseg2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Target = in-place 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------------- 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery 2 of 2 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------------- 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Synchronization mode = Incremental 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance host = sdw4 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance address = sdw4 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance directory = /gpdb/gpdata/primary/gpseg3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance port = 40000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance replication port = 43000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Failed instance fspc1 directory = /gpdb/gpdata/fspc_segment/gpseg3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance host = sdw1 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance address = sdw1 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance directory = /gpdb/gpdata/mirror/gpseg3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance port = 41000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance replication port = 42000 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Source instance fspc1 directory = /gpdb/gpdata/fspc_mirror/gpseg3 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- Recovery Target = in-place 20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:----------------------- Continue with segment recovery procedure Yy|Nn (default=N):
gprecoverseg命令会列出需要同步的节点,上述列出的是sdw3和sdw4的Mirror需要数据同步,输入Y确认同步数据。
> y 20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-2 segment(s) to recover 20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped 20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments updating flat files 20160308:16:20:34:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating configuration with new mirrors 20160308:16:20:34:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating mirrors . 20160308:16:20:35:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting mirrors 20160308:16:20:35:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... .. 20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Process results... 20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating configuration to mark mirrors up 20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating primaries 20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait... . 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Process results... 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Done updating primaries 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-********************* 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating segments for resynchronization is completed. 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background. 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:- 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress. 20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-**********************
数据同步完成,查看Mirror节点的信息,Mirror状态正常。
[gpadmin@mdw-std gpseg-1]$ gpstate -m 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:- sdw2 /gpdb/gpdata/mirror/gpseg0 41000 Passive Synchronized 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:- sdw3 /gpdb/gpdata/mirror/gpseg1 41000 Passive Synchronized 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:- sdw4 /gpdb/gpdata/mirror/gpseg2 41000 Passive Synchronized 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:- sdw1 /gpdb/gpdata/mirror/gpseg3 41000 Acting as Primary Synchronized 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) are acting as primaries
但是当前角色还没有切换回来,sdw1主机上还是两个Primary的Segment,sdw4主机上还是两个Mirror的Segment,需要使用gprecoverseg –r命令进行Primary和Mirror角色切换。
[gpadmin@mdw-std gpseg-1]$ gprecoverseg -r 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting gprecoverseg with args: -r 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Checking if segments are ready 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Greenplum instance recovery parameters 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery type = Rebalance 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Unbalanced segment 1 of 2 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance host = sdw1 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance address = sdw1 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance directory = /gpdb/gpdata/mirror/gpseg3 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance port = 41000 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance replication port = 42000 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance fspc1 directory = /gpdb/gpdata/fspc_mirror/gpseg3 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Balanced role = Mirror 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Current role = Primary 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Unbalanced segment 2 of 2 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance host = sdw4 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance address = sdw4 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance directory = /gpdb/gpdata/primary/gpseg3 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance port = 40000 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance replication port = 43000 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Unbalanced instance fspc1 directory = /gpdb/gpdata/fspc_segment/gpseg3 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Balanced role = Primary 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- Current role = Mirror 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-------------------- 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[WARNING]:-This operation will cancel queries that are currently executing. 20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[WARNING]:-Connections to the database however will not be interrupted. Continue with segment rebalance procedure Yy|Nn (default=N):
gprecoverseg –r命令也会列出需要切换的节点信息,如上列出的是sdw1主机上的sdw4节点的Primary和sdw4节点上的sdw4节点的Mirror节点的切换,输入Y确认切换。sdw3节点的Mirror并不需要切换。
> y 20160308:16:35:29:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Getting unbalanced segments 20160308:16:35:29:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Stopping unbalanced primary segments... .. 20160308:16:35:31:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Triggering segment reconfiguration 20160308:16:35:35:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting segment synchronization ........... 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-******************* 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-The rebalance operation has completed successfully. 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-There is a resynchronization running in the background to bring all 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-segments in sync. 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:- 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Use gpstate -e to check the resynchronization progress. 20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-*******************
切换完成,此时通过gpstate –m查看Mirror信息可以看到,Mirror已经切换回来,状态也都正常。
[gpadmin@mdw-std gpseg-1]$ gpstate -m 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:----------------------- 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:- sdw2 /gpdb/gpdata/mirror/gpseg0 41000 Passive Synchronized 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:- sdw3 /gpdb/gpdata/mirror/gpseg1 41000 Passive Synchronized 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:- sdw4 /gpdb/gpdata/mirror/gpseg2 41000 Passive Synchronized 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:- sdw1 /gpdb/gpdata/mirror/gpseg3 41000 Passive Synchronized 20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:----------------------
gpstate –s命令查看也都正常,但是gpstate –e命令有可能会看到主/备数据正在同步,这是正常的。
[gpadmin@mdw-std gpseg-1]$ gpstate -e 20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -e 20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Gathering data from segments... . 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-Segment Mirroring Status Report 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------------------------- 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-Segment Pairs in Resynchronization 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:- Current Primary Port Resync mode Est. resync progress Total resync objects Objects to resync Data synced Est. total to sync Est. resync end time Change tracking size Mirror Port 20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:- sdw4 40000 Incremental 100% 0 0 320 kB Sync complete; awaiting config change 104 MB sdw1 41000
过段时间再次查看,数据同步完成状态就会正常。
[gpadmin@mdw-std gpseg-1]$ gpstate -e 20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -e 20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1' 20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28' 20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master... 20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Gathering data from segments... . 20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:---------------------------------- 20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:-Segment Mirroring Status Report 20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:---------------------------------- 20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:-All segments are running normally
此时数据库已经切换回原有的架构。
dbdream=# select * from gp_segment_configuration order by 1; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------- 1 | -1 | p | p | s | u | 5432 | mdw-std | mdw-std | | 2 | 0 | p | p | s | u | 40000 | sdw1 | sdw1 | 43000 | 3 | 1 | p | p | s | u | 40000 | sdw2 | sdw2 | 43000 | 4 | 2 | p | p | c | u | 40000 | sdw3 | sdw3 | 43000 | 5 | 3 | p | p | s | d | 40000 | sdw4 | sdw4 | 43000 | 6 | 0 | m | m | s | u | 41000 | sdw2 | sdw2 | 42000 | 7 | 1 | m | m | s | u | 41000 | sdw3 | sdw3 | 42000 | 8 | 2 | m | m | s | d | 41000 | sdw4 | sdw4 | 42000 | 9 | 3 | m | m | c | u | 41000 | sdw1 | sdw1 | 42000 | 10 | -1 | m | m | s | u | 5432 | mdw | mdw | | (10 rows)
在数据库中可以通过gp_configuration_history字典来查看数据库的切换信息。
dbdream=# select * from gp_configuration_history; time | dbid | desc -------------------------------+------+-------------------------------------------------------------------------- 2016-03-08 14:59:57.284412+08 | 4 | FTS: content 2 fault marking status UP mode: change-tracking role p 2016-03-08 14:59:57.284705+08 | 8 | FTS: content 2 fault marking status DOWN role m 2016-03-08 14:59:57.289104+08 | 5 | FTS: content 3 fault marking status DOWN role m 2016-03-08 14:59:57.289155+08 | 9 | FTS: content 3 fault marking status UP mode: change-tracking role p 2016-03-08 16:20:34.144996+08 | 5 | gprecoverseg: segment config for resync: segment mode and status 2016-03-08 16:20:34.144996+08 | 8 | gprecoverseg: segment config for resync: segment mode and status 2016-03-08 16:20:37.19326+08 | 4 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:20:37.19326+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:20:37.19326+08 | 8 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:20:37.19326+08 | 9 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:21:09.221171+08 | 4 | FTS: changed segment to insync from resync. 2016-03-08 16:21:09.221171+08 | 8 | FTS: changed segment to insync from resync. 2016-03-08 16:21:09.223828+08 | 9 | FTS: changed segment to insync from resync. 2016-03-08 16:21:09.223828+08 | 5 | FTS: changed segment to insync from resync. 2016-03-08 16:35:35.746699+08 | 9 | FTS: content 3 fault marking status DOWN role m 2016-03-08 16:35:35.746885+08 | 5 | FTS: content 3 fault marking status UP mode: change-tracking role p 2016-03-08 16:35:42.00957+08 | 9 | gprecoverseg: segment config for resync: segment mode and status 2016-03-08 16:35:45.085018+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:35:45.085018+08 | 9 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2016-03-08 16:36:35.877428+08 | 5 | FTS: changed segment to insync from resync. 2016-03-08 16:36:35.877428+08 | 9 | FTS: changed segment to insync from resync. (21 rows)
GreenPlum数据库主/备切换相当的简单,gprecoverseg命令相当的智能,虽然在4.3版本之后提供的主备自动切换功能非常强大,在Primary的主机出现故障之后,Mirror会自动切换为Primary,不影响数据库的正常工作,但是对监控不是很到位的系统来说,并不建议使用这个功能,首先这个功能存在一定的BUG,其次,监控不到位,一旦发现切换,并不能及时发现,如果再有节点出现故障,可能对数据恢复造成影响,而且如果单个节点的数据量非常大的时候,gprecoverseg同步数据的过程将会很漫长。