MySQL基于GTID的复制强调主从库GTID的连续和一致,比如主库如果执行了1-4,那从库IO线程拿到1-4的变更后,不管有没有被SQL线程过滤,从库的GTID也得保留1-4。

这就会出现一个现象:假定从库使用MySQL的过滤规则不执行主库发送的一部分变更(配置replication-wild-do-table=A.%,GTID=uuid:2被过滤),那MySQL这个时候会在从库用空事务的方式填充binlog,从而保证uuid:2在从库不丢失,主从“GTID”一致。也就是说主从数据可以不一致,但是GTID一致。印证了“从GTID连续一致指标判断主从一致”的依据是不正确的。

那MySQL不做这个填充,会发生什么?一个是从库出现很多”GTID空隙“,Executed_Gtid_Set变得特别长。再者,如果某个期间你期望通过MASTER_AUTO_POSITION=1切换新主库,新主库一看从库缺了这么多”空隙“,那从自己的binlog里面找到发给从库让它补上,结果就是从库依旧过滤这些变更不执行,依旧补不上”空隙“。更多的时候切换的新主库会因为binlog rotate原因,包含“空隙GTID”的binlog已不存在,CHANGE MASTER持续报1236错误,切换不成功,非常恼火。

既然DBA显式指定的跳过一部分表的变更,GTID也得保持主库连续。从以往的使用经验来看,MySQL确实也是这么做的,而这次碰到的bug则是这个地方MySQL实现的不完善引起的。

BUG简述:在使用过滤选项进行GTID主从复制时,主库的CREATE DATABASE, ALTER DATABASE & DROP DATABASE变更如果在从库被过滤,该GTID将不会在从库Executed_Gtid_Set中保留。

复现方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
master - my.cnf:
log-bin = mysql-bin
gtid-mode = ON
log-slave-updates = ON
enforce-gtid-consistency = ON
...
slave - my.cnf:
replicate-wild-do-table = mydb1.%
gtid-mode = ON
log-slave-updates = ON
enforce-gtid-consistency = ON
...
master status information:
mysql> show master status ;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000002 | 627 | | | 356d0651-5e4c-11e8-846c-126bb17ce918:1-345 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.01 sec)
replication status of slave:
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.23.7.81
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 627
Relay_Log_File: freewheel-relay-bin.000002
Relay_Log_Pos: 554
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table: mydb1.%
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 627
Relay_Log_Space: 762
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 27087
Master_UUID: 356d0651-5e4c-11e8-846c-126bb17ce918
Master_Info_File: /export/data/mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:345
Executed_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:1-345
Auto_Position: 1
1 row in set (0.00 sec)
Here're steps to verify:
1. Send INSERT on mydb1.t on master
master - GTID+1:
mysql> insert into mydb1.t values (1);
Query OK, 1 row affected (0.00 sec)
mysql> show master status ;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000002 | 859 | | | 356d0651-5e4c-11e8-846c-126bb17ce918:1-346 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
slave - GTID+1:
mysql> show slave status \G
...
Replicate_Wild_Do_Table: mydb1.%
Retrieved_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:345-346
Executed_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:1-346
...
2. Send INSERT on mydb2.t on master
master - GTID+1:
mysql> insert into mydb2.t values (1);
Query OK, 1 row affected (0.00 sec)
mysql> show master status ;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000002 | 1091 | | | 356d0651-5e4c-11e8-846c-126bb17ce918:1-347 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from mydb2.t;
+------+
| x |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
slave - GTID+1:
mysql> select * from mydb2.t;
Empty set (0.00 sec)
mysql> show slave status \G
...
Replicate_Wild_Do_Table: mydb1.%
Retrieved_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:345-347
Executed_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:1-347
...
3. Send CREATE DATABASE xxx on master
master - GTID+1:
mysql> create database mydb3;
Query OK, 1 row affected (0.00 sec)
mysql> show master status ;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000002 | 1236 | | | 356d0651-5e4c-11e8-846c-126bb17ce918:1-348 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
slave - GTID retrieved but not changed on Executed_Gtid_Set: => 丢失一个GTID
mysql> show slave status \G
...
Replicate_Wild_Do_Table: mydb1.%
Retrieved_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:345-348
Executed_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:1-347
...
4. Send INSERT on mydb1.t on master
master - GTID+1:
mysql> insert into mydb1.t values (1);
Query OK, 1 row affected (0.00 sec)
mysql> show master status ;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000002 | 1468 | | | 356d0651-5e4c-11e8-846c-126bb17ce918:1-349 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
slave - broken GTID set on Executed_Gtid_Set: => 出现GTID空隙
...
Replicate_Wild_Do_Table: mydb1.%
Retrieved_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:345-349
Executed_Gtid_Set: 356d0651-5e4c-11e8-846c-126bb17ce918:1-347:349
...

反馈bug后得到官方回复,已经在新版本8.0.12, 5.7.23, 5.6.41中修复,虽然都还没有release:

Thanks for the report. Changelog entry added for MySQL 8.0.12, 5.7.23, and 5.6.41:

When GTIDs are in use for replication, replicated transactions that are filtered out on the slave are persisted. If binary logging is enabled on the slave, the filtered-out transaction is written to the binary log as a Gtid_log_event followed by an empty transaction containing only BEGIN and COMMIT statements. If binary logging is disabled, the GTID of the filtered-out transaction is written to the mysql.gtid_executed table. This process ensures that there are no gaps in the set of executed GTIDs, and that the filtered-out transactions are not retrieved again if the slave reconnects to the master. Previously, this process was not done for CREATE DATABASE, ALTER DATABASE, and DROP DATABASE statements, but it is now carried out for those statements as well as for others.

总结的话,现阶段如果用了过滤选项replicate-*-table进行复制,避免在主库进行CREATE/DROP/ALTER DATABASE操作,否则会引起该GTID在从库Executed_Gtid_Set中缺失。

还是少用replicate-*-table这类选项吧……