当前位置: 首页 > GreenPlum > 正文

GreenPlum数据库Segment节点故障转移到Mirror

本文档主要演示GreenPlum数据库Segment节点在有Mirror的情况下,Segment发生故障,Mirror节点自动接管Segment节点,数据库可以正常使用。这个过程对应用来说是透明的,也是在4.X版本之后提供的功能,本实验环境使用的数据库版本为4.3.6.2。在模拟故障前,先查看数据库的状态。

dbdream=# select * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address | replication_port | san_mounts 
------+---------+------+----------------+------+--------+-------+----------+---------+------------------
    1 |      -1 | p    | p              | s    | u      |  5432 | mdw-std  | mdw-std |                  | 
    2 |       0 | p    | p              | s    | u      | 40000 | sdw1     | sdw1    |            43000 | 
    3 |       1 | p    | p              | s    | u      | 40000 | sdw2     | sdw2    |            43000 | 
    4 |       2 | p    | p              | s    | u      | 40000 | sdw3     | sdw3    |            43000 | 
    5 |       3 | p    | p              | s    | u      | 40000 | sdw4     | sdw4    |            43000 | 
    6 |       0 | m    | m              | s    | u      | 41000 | sdw2     | sdw2    |            42000 | 
    7 |       1 | m    | m              | s    | u      | 41000 | sdw3     | sdw3    |            42000 | 
    8 |       2 | m    | m              | s    | u      | 41000 | sdw4     | sdw4    |            42000 | 
    9 |       3 | m    | m              | s    | u      | 41000 | sdw1     | sdw1    |            42000 | 
   10 |      -1 | m    | m              | s    | u      |  5432 | mdw      | mdw     |                  | 
(10 rows)

当前数据库存在4个Segment节点,本实验将会模拟其中一台Segment主机宕掉的情况下,Mirror节点自动接管Segment节点,保证数据库正常运行。

[gpadmin@mdw-std ~]$ gpstate –s
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:--Master Configuration & Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master host                    = mdw-std
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master postgres process ID     = 5350
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master data directory          = /gpdb/gpdata/master/gpseg-1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master port                    = 5432
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master current role            = dispatch
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Greenplum initsystem version   = 4.3.6.2 build 1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Greenplum current version      = PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Postgres version               = 8.2.15
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Master standby                 = mdw
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Standby master state           = Standby host passive
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-Segment Instance Status Report
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg0
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 3359
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg0
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 4416
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Segment status                    = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 3079
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 4726
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Segment status                    = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 5094
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw4
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw4
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg2
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 10578
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Segment status                    = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw4
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw4
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 8249
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw1
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg3
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 4739
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:14:55:34:015399 gpstate:mdw-std:gpadmin-[INFO]:-      Segment status                    = Up

查看Mirror节点的状态。

[gpadmin@mdw-std ~]$ gpstate -m
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:---------------
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:---------------
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-   Mirror   Datadir                      Port    Status    Data Status    
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-   sdw2     /gpdb/gpdata/mirror/gpseg0   41000   Passive   Synchronized
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-   sdw3     /gpdb/gpdata/mirror/gpseg1   41000   Passive   Synchronized
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-   sdw4     /gpdb/gpdata/mirror/gpseg2   41000   Passive   Synchronized
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:-   sdw1     /gpdb/gpdata/mirror/gpseg3   41000   Passive   Synchronized
20160308:14:58:50:015547 gpstate:mdw-std:gpadmin-[INFO]:---------------

从上面的信息可以看到,当前Mirror的状态都是正常的,此时如果Segment出现故障,Mirror节点会自动接管Segment节点,数据库可以正常使用,并不会对业务造成影响。

下面关闭sdw4这个Segment节点的主机,这台主机上存在sdw4节点的Segment和sdw1的Mirror节点,Mirror节点宕掉并不会对数据库的正常运行造成影响,sdw4的Mirror节点存放在sdw1节点的主机上,也就是sdw4节点的Segment宕掉,sdw1主机上的Mirror会接管Segment,下面重启sdw4的主机。

[root@sdw4 fspc_segment]# reboot
[root@sdw4 fspc_segment]# 
Broadcast message from root@sdw4
        (/dev/pts/0) at 14:59 ...

The system is going down for reboot NOW!

下面检查Mirror的信息,会发现sdw4的Mirror已经自动切换为Primary。

[gpadmin@mdw-std ~]$ gpstate -m
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:---------------------
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:---------------------
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-   Mirror   Datadir                      Port    Status              Data Status       
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-   sdw2     /gpdb/gpdata/mirror/gpseg0   41000   Passive             Synchronized
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-   sdw3     /gpdb/gpdata/mirror/gpseg1   41000   Passive             Synchronized
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-sdw4     /gpdb/gpdata/mirror/gpseg2   41000   Failed                                <<<<<<<<
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-   sdw1     /gpdb/gpdata/mirror/gpseg3   41000   Acting as Primary   Change Tracking
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[INFO]:-------------------
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) are acting as primaries
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed
20160308:14:59:58:015601 gpstate:mdw-std:gpadmin-[WARNING]:-1 mirror segment(s) acting as primaries are in change tracking

从上面的信息可以看到,sdw4主机上的Mirror(sdw3的Mirror)已经宕掉,sdw1主机上的Mirror(sdw4的Mirror)已经切换为Priamry。

通过gpstate –s命令也可以看到这些信息。

20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw3
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw3
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg2
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Change Tracking
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Change Tracking Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Change tracking data size         = 106 MB
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 5094
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw4
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw4
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg2
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:-   Mirror status                     = Out of Sync                   <<<<<<<<
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:-   PID                               = Not found                     <<<<<<<<
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:-   Configuration reports status as   = Down                          <<<<<<<<
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:-   Segment status                    = Down in configuration         <<<<<<<<

以上是sdw3节点的信息,可以看到sdw3主机上的Segment节点状态正常,但是他的Mirror在sdw4主机上,目前已经宕掉。

20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw1
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw1
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/mirror/gpseg3
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 41000
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Primary
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Mirror
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Mirror status                     = Change Tracking
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Change Tracking Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Change tracking data size         = 100 MB
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Status
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      PID                               = 4739
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Configuration reports status as   = Up
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Database status                   = Up
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Segment Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Hostname                          = sdw4
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Address                           = sdw4
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Datadir                           = /gpdb/gpdata/primary/gpseg3
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Port                              = 40000
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-   Mirroring Info
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Current role                      = Mirror
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[INFO]:-      Preferred role                    = Primary
20160308:15:00:47:015652 gpstate:mdw-std:gpadmin-[WARNING]:-   Mirror status                     = Out of Sync                   <<<<<<<<

以上是sdw4节点的信息,可以看到,sdw4节点的Primary已经被sdw1上的Mirror接管,sdw4上的节点已经变成了Mirror,目前是宕掉的状态。

从数据库中也可以查询到这些信息。

dbdream=# select * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address | replication_port | san_mounts 
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+-------
    1 |      -1 | p    | p              | s    | u      |  5432 | mdw-std  | mdw-std |                  | 
    2 |       0 | p    | p              | s    | u      | 40000 | sdw1     | sdw1    |            43000 | 
    3 |       1 | p    | p              | s    | u      | 40000 | sdw2     | sdw2    |            43000 | 
    4 |       2 | p    | p              | c    | u      | 40000 | sdw3     | sdw3    |            43000 | 
    5 |       3 | m    | p              | s    | d      | 40000 | sdw4     | sdw4    |            43000 | 
    6 |       0 | m    | m              | s    | u      | 41000 | sdw2     | sdw2    |            42000 | 
    7 |       1 | m    | m              | s    | u      | 41000 | sdw3     | sdw3    |            42000 | 
    8 |       2 | m    | m              | s    | d      | 41000 | sdw4     | sdw4    |            42000 | 
    9 |       3 | p    | m              | c    | u      | 41000 | sdw1     | sdw1    |            42000 | 
   10 |      -1 | m    | m              | s    | u      |  5432 | mdw      | mdw     |                  | 
(10 rows)

先看下dbid=4的这行记录,这是sdw3主机的Segment节点,当前sdw3节点的Mirror在sdw4主机上,已经宕掉,mode列这里的状态是c(change logging),现在Segment和Mirror已经是不同步状态了。再看下dbid=5的这行记录,这行记录记录的是sdw4的Segment信息,目前已经宕掉,status字段的状态是d(down),preferred_role是p,也就是原先的角色是Primary,role字段的状态是m(Mirror),现在的角色是Mirror,因为角色已经切换。再看dbid=8这条记录,这条记录记录的是sdw3主机的Mirror,status字段的状态是d(down),因为Mirror宕掉对数据库的正常运行没有影响,所以原先的角色是Mirror,现在的角色还是Mirror。再看dbid=9这行记录,这行记录记录的是sdw4主机的Mirror,在sdw1主机上,因为sdw4的Primary已经宕掉,sdw1上的Mirror已经切换成Primary了,mode字段的状态是c(change logging),因为现在Primary和Mirror已经不同步了,preferred_role状态是m(Mirror),原先的角色是Mirror,role字段的状态是p(Primary),现在的角色是Primary。

下面看下,Mirror切换成Primary后,数据库的读写是否会收到影响。

dbdream=# select gp_segment_id,count(*) from ord_pay
dbdream-# group by gp_segment_id;
 gp_segment_id | count  
---------------+--------
             3 | 250000
             2 | 250001
             1 | 249999
             0 | 250000
(4 rows)

dbdream=# select gp_segment_id,count(*) from ord_pay_s
group by gp_segment_id;
 gp_segment_id | count 
---------------+-------
(0 rows)

可见查询没有问题,下面把ord_pay表中的100万条数据插入到ord_pay_s(空表)表中,看看写操作是否会收到影响。

dbdream=# insert into ord_pay_s select * from ord_pay;
INSERT 0 1000000

dbdream=# select gp_segment_id,count(*) from ord_pay_s                                                                                                                                               group by gp_segment_id;
 gp_segment_id | count  
---------------+--------
             3 | 250000
             0 | 250000
             1 | 249999
             2 | 250001
(4 rows)

可见写操作也没有收到影响。但是现在数据的分布是不合理的,以为当前sdw1主机上存在两个Primary的Segment,也就是sdw1节点和sdw4节点,而其他主机上只存在一个Primary的Segment节点,压力是不均衡的,GreenPlum数据库建议在故障主机修复后,再切换回原来的架构。

要恢复到原来的架构,有两种可能情况,第一种情况,当故障主机启动后,这台主机上的节点会自动启动,当然现在已经都是Mirror节点了,角色是不会切换的,而且数据是不一致状态。这种情况只需要使用gprecoverseg命令同步一下数据后,就可以切换到原来的架构了。很遗憾的是,我遇到的是另一种情况,当故障主机sdw4启动后,这台主机上的节点并没有启动,而且在重启数据库时,这台主机上的节点也没有启动。

[gpadmin@mdw-std ~]$ gpstart -a
20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting gpstart with args: -a
20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Gathering information and validating the environment...
20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20160308:16:16:04:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting Master instance in admin mode
20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Setting new master era
20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Master Started...
20160308:16:16:05:017426 gpstart:mdw-std:gpadmin-[INFO]:-Shutting down master
20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw4 directory /gpdb/gpdata/mirror/gpseg2 <<<<<
20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw4 directory /gpdb/gpdata/primary/gpseg3 <<<<<
20160308:16:16:06:017426 gpstart:mdw-std:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.. 
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Process results...
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:------------------------
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-   Successful segment starts                                            = 6
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 2   <<<<<<<<
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:------------------------
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Successfully started 6 of 6 segment instances, skipped 2 other segments 
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-------------------------
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-**********************
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[WARNING]:-***********************
20160308:16:16:08:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting Master instance mdw-std directory /gpdb/gpdata/master/gpseg-1 
20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Command pg_ctl reports Master mdw-std instance active
20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Starting standby master
20160308:16:16:09:017426 gpstart:mdw-std:gpadmin-[INFO]:-Checking if standby master is running on host: mdw  in directory: /gpdb/gpdata/master/gpseg-1
20160308:16:16:10:017426 gpstart:mdw-std:gpadmin-[WARNING]:-Number of segments not attempted to start: 2
20160308:16:16:10:017426 gpstart:mdw-std:gpadmin-[INFO]:-Check status of database with gpstate utility

启动日志会提示需要通过gprecoverseg命令同步下故障节点的数据,才能启动故障主机上的节点。下面使用gprecoverseg命令同步故障机节点的数据。

[gpadmin@mdw-std ~]$ gprecoverseg 
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting gprecoverseg with args: 
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Checking if segments are ready
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:20:08:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Greenplum instance recovery parameters
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------------
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery type              = Standard
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------------
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery 1 of 2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------------
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Synchronization mode     = Incremental
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance host     = sdw4
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance address  = sdw4
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance directory    = /gpdb/gpdata/mirror/gpseg2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance port     = 41000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance replication port   = 42000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance fspc1 directory     = /gpdb/gpdata/fspc_mirror/gpseg2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance host  = sdw3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance address  = sdw3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance directory   = /gpdb/gpdata/primary/gpseg2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance port   = 40000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance replication port   = 43000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance fspc1 directory    = /gpdb/gpdata/fspc_segment/gpseg2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Target   = in-place
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------------
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery 2 of 2
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------------
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Synchronization mode       = Incremental
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance host        = sdw4
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance address      = sdw4
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance directory     = /gpdb/gpdata/primary/gpseg3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance port       = 40000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance replication port    = 43000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Failed instance fspc1 directory     = /gpdb/gpdata/fspc_segment/gpseg3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance host    = sdw1
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance address   = sdw1
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance directory   = /gpdb/gpdata/mirror/gpseg3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance port     = 41000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance replication port   = 42000
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Source instance fspc1 directory    = /gpdb/gpdata/fspc_mirror/gpseg3
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Recovery Target     = in-place
20160308:16:20:09:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-----------------------

Continue with segment recovery procedure Yy|Nn (default=N):

gprecoverseg命令会列出需要同步的节点,上述列出的是sdw3和sdw4的Mirror需要数据同步,输入Y确认同步数据。

> y
20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-2 segment(s) to recover
20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
 
20160308:16:20:28:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
updating flat files
20160308:16:20:34:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating configuration with new mirrors
20160308:16:20:34:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating mirrors
. 
20160308:16:20:35:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting mirrors
20160308:16:20:35:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.. 
20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Process results...
20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating configuration to mark mirrors up
20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating primaries
20160308:16:20:37:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
. 
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Process results...
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Done updating primaries
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-*********************
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-Use  gpstate -s  to check the resynchronization progress.
20160308:16:20:38:017626 gprecoverseg:mdw-std:gpadmin-[INFO]:-**********************

数据同步完成,查看Mirror节点的信息,Mirror状态正常。

[gpadmin@mdw-std gpseg-1]$ gpstate -m
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-   Mirror   Datadir                      Port    Status              Data Status    
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-   sdw2     /gpdb/gpdata/mirror/gpseg0   41000   Passive             Synchronized
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-   sdw3     /gpdb/gpdata/mirror/gpseg1   41000   Passive             Synchronized
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-   sdw4     /gpdb/gpdata/mirror/gpseg2   41000   Passive             Synchronized
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-   sdw1     /gpdb/gpdata/mirror/gpseg3   41000   Acting as Primary   Synchronized
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:31:20:018397 gpstate:mdw-std:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) are acting as primaries

但是当前角色还没有切换回来,sdw1主机上还是两个Primary的Segment,sdw4主机上还是两个Mirror的Segment,需要使用gprecoverseg –r命令进行Primary和Mirror角色切换。

[gpadmin@mdw-std gpseg-1]$ gprecoverseg -r
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting gprecoverseg with args: -r
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Checking if segments are ready
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Greenplum instance recovery parameters
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Recovery type              = Rebalance
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Unbalanced segment 1 of 2
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance host               = sdw1
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance address            = sdw1
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance directory          = /gpdb/gpdata/mirror/gpseg3
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance port               = 41000
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance replication port   = 42000
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance fspc1 directory    = /gpdb/gpdata/fspc_mirror/gpseg3
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Balanced role                          = Mirror
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Current role                           = Primary
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:---------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Unbalanced segment 2 of 2
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:---------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance host               = sdw4
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance address            = sdw4
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance directory          = /gpdb/gpdata/primary/gpseg3
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance port               = 40000
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance replication port   = 43000
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Unbalanced instance fspc1 directory    = /gpdb/gpdata/fspc_segment/gpseg3
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Balanced role                          = Primary
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-   Current role                           = Mirror
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:--------------------
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[WARNING]:-This operation will cancel queries that are currently executing.
20160308:16:35:17:018499 gprecoverseg:mdw-std:gpadmin-[WARNING]:-Connections to the database however will not be interrupted.

Continue with segment rebalance procedure Yy|Nn (default=N):

gprecoverseg –r命令也会列出需要切换的节点信息,如上列出的是sdw1主机上的sdw4节点的Primary和sdw4节点上的sdw4节点的Mirror节点的切换,输入Y确认切换。sdw3节点的Mirror并不需要切换。

> y
20160308:16:35:29:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Getting unbalanced segments
20160308:16:35:29:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Stopping unbalanced primary segments...
.. 
20160308:16:35:31:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Triggering segment reconfiguration
20160308:16:35:35:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Starting segment synchronization
........... 
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-*******************
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-The rebalance operation has completed successfully.
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-There is a resynchronization running in the background to bring all
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-segments in sync.
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-Use gpstate -e to check the resynchronization progress.
20160308:16:35:46:018499 gprecoverseg:mdw-std:gpadmin-[INFO]:-*******************

切换完成,此时通过gpstate –m查看Mirror信息可以看到,Mirror已经切换回来,状态也都正常。

[gpadmin@mdw-std gpseg-1]$ gpstate -m
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -m
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:--Current GPDB mirror list and status
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:--Type = Spread
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-----------------------
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-   Mirror   Datadir                      Port    Status    Data Status    
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-   sdw2     /gpdb/gpdata/mirror/gpseg0   41000   Passive   Synchronized
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-   sdw3     /gpdb/gpdata/mirror/gpseg1   41000   Passive   Synchronized
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-   sdw4     /gpdb/gpdata/mirror/gpseg2   41000   Passive   Synchronized
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:-   sdw1     /gpdb/gpdata/mirror/gpseg3   41000   Passive   Synchronized
20160308:16:36:58:018852 gpstate:mdw-std:gpadmin-[INFO]:----------------------

gpstate –s命令查看也都正常,但是gpstate –e命令有可能会看到主/备数据正在同步,这是正常的。

[gpadmin@mdw-std gpseg-1]$ gpstate -e
20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -e
20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:36:05:018696 gpstate:mdw-std:gpadmin-[INFO]:-Gathering data from segments...
. 
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-Segment Mirroring Status Report
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-----------------------------------------------------
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-Segment Pairs in Resynchronization
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-   Current Primary   Port    Resync mode   Est. resync progress   Total resync objects   Objects to resync   Data synced   Est. total to sync                      Est. resync end time   Change tracking size   Mirror   Port
20160308:16:36:06:018696 gpstate:mdw-std:gpadmin-[INFO]:-   sdw4              40000   Incremental   100%                   0                      0                   320 kB        Sync complete; awaiting config change                          104 MB                 sdw1     41000

过段时间再次查看,数据同步完成状态就会正常。

[gpadmin@mdw-std gpseg-1]$ gpstate -e
20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Starting gpstate with args: -e
20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Obtaining Segment details from master...
20160308:16:37:36:018980 gpstate:mdw-std:gpadmin-[INFO]:-Gathering data from segments...
. 
20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------
20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:-Segment Mirroring Status Report
20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:----------------------------------
20160308:16:37:37:018980 gpstate:mdw-std:gpadmin-[INFO]:-All segments are running normally

此时数据库已经切换回原有的架构。

dbdream=# select * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address | replication_port | san_mounts 
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+-------
    1 |      -1 | p    | p              | s    | u      |  5432 | mdw-std  | mdw-std |                  | 
    2 |       0 | p    | p              | s    | u      | 40000 | sdw1     | sdw1    |            43000 | 
    3 |       1 | p    | p              | s    | u      | 40000 | sdw2     | sdw2    |            43000 | 
    4 |       2 | p    | p              | c    | u      | 40000 | sdw3     | sdw3    |            43000 | 
    5 |       3 | p    | p              | s    | d      | 40000 | sdw4     | sdw4    |            43000 | 
    6 |       0 | m    | m              | s    | u      | 41000 | sdw2     | sdw2    |            42000 | 
    7 |       1 | m    | m              | s    | u      | 41000 | sdw3     | sdw3    |            42000 | 
    8 |       2 | m    | m              | s    | d      | 41000 | sdw4     | sdw4    |            42000 | 
    9 |       3 | m    | m              | c    | u      | 41000 | sdw1     | sdw1    |            42000 | 
   10 |      -1 | m    | m              | s    | u      |  5432 | mdw      | mdw     |                  | 
(10 rows)

在数据库中可以通过gp_configuration_history字典来查看数据库的切换信息。

dbdream=# select * from gp_configuration_history;
             time              | dbid |                                             desc                         
-------------------------------+------+--------------------------------------------------------------------------
2016-03-08 14:59:57.284412+08 |    4 | FTS: content 2 fault marking status UP mode: change-tracking role p
 2016-03-08 14:59:57.284705+08 |    8 | FTS: content 2 fault marking status DOWN role m
 2016-03-08 14:59:57.289104+08 |    5 | FTS: content 3 fault marking status DOWN role m
 2016-03-08 14:59:57.289155+08 |    9 | FTS: content 3 fault marking status UP mode: change-tracking role p
 2016-03-08 16:20:34.144996+08 |    5 | gprecoverseg: segment config for resync: segment mode and status
 2016-03-08 16:20:34.144996+08 |    8 | gprecoverseg: segment config for resync: segment mode and status
 2016-03-08 16:20:37.19326+08  |    4 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:20:37.19326+08  |    5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:20:37.19326+08  |    8 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:20:37.19326+08  |    9 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:21:09.221171+08 |    4 | FTS: changed segment to insync from resync.
 2016-03-08 16:21:09.221171+08 |    8 | FTS: changed segment to insync from resync.
 2016-03-08 16:21:09.223828+08 |    9 | FTS: changed segment to insync from resync.
 2016-03-08 16:21:09.223828+08 |    5 | FTS: changed segment to insync from resync.
 2016-03-08 16:35:35.746699+08 |    9 | FTS: content 3 fault marking status DOWN role m
 2016-03-08 16:35:35.746885+08 |    5 | FTS: content 3 fault marking status UP mode: change-tracking role p
 2016-03-08 16:35:42.00957+08  |    9 | gprecoverseg: segment config for resync: segment mode and status
 2016-03-08 16:35:45.085018+08 |    5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:35:45.085018+08 |    9 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
 2016-03-08 16:36:35.877428+08 |    5 | FTS: changed segment to insync from resync.
 2016-03-08 16:36:35.877428+08 |    9 | FTS: changed segment to insync from resync.
(21 rows)

GreenPlum数据库主/备切换相当的简单,gprecoverseg命令相当的智能,虽然在4.3版本之后提供的主备自动切换功能非常强大,在Primary的主机出现故障之后,Mirror会自动切换为Primary,不影响数据库的正常工作,但是对监控不是很到位的系统来说,并不建议使用这个功能,首先这个功能存在一定的BUG,其次,监控不到位,一旦发现切换,并不能及时发现,如果再有节点出现故障,可能对数据恢复造成影响,而且如果单个节点的数据量非常大的时候,gprecoverseg同步数据的过程将会很漫长。

本文固定链接: https://www.dbdream.com.cn/2016/03/greenplum%e6%95%b0%e6%8d%ae%e5%ba%93segment%e8%8a%82%e7%82%b9%e6%95%85%e9%9a%9c%e8%bd%ac%e7%a7%bb%e5%88%b0mirror/ | 信春哥,系统稳,闭眼上线不回滚!

该日志由 dbdream 于2016年03月07日发表在 GreenPlum 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: GreenPlum数据库Segment节点故障转移到Mirror | 信春哥,系统稳,闭眼上线不回滚!
关键字: , , , ,

GreenPlum数据库Segment节点故障转移到Mirror:等您坐沙发呢!

发表评论

快捷键:Ctrl+Enter