Hi Andrei, Brandon, What do you think of this patch to fix sporadic failure of rpl.rpl_gtid_stop_start in buildbot, MDEV-33602? The test fails relatively frequent, so would be good to get it fixed. Two issues: 1. I have not so far been able to reproduce the failure, so this patch is assumed to fix the sporadic failure, but I have not been able to verify that it does. 2. The question is if this will be just hiding a real bug? When the GTID-mode slave connects, it receives a few events from the start of the binlog (Gtid_list, Binlog_checkpoint, etc), and these briefly update the old-style position to a wrong value until the fake Gtid_list is received with the correct position. One might argue that this should be avoided. But on the other hand this will not solve the fundamental issue, that until the GTID mode connect is completed, we will in the general case not have a valid corresponding old-style position. So maybe the current behaviour is ok? - Kristian. Kristian Nielsen via commits <commits@lists.mariadb.org> writes:
The test could fail with a duplicate key error because switching to non-GTID mode could start at the wrong old-style position. The position could be wrong when the previous GTID connect was stopped before receiving the fake GTID list event which gives the old-style position corresponding to the GTID connected position.
Work-around by injecting an extra event and syncing the slave before switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org> --- .../suite/rpl/r/rpl_gtid_stop_start.result | 4 ++++ mysql-test/suite/rpl/t/rpl_gtid_stop_start.test | 16 ++++++++++++++++ 2 files changed, 20 insertions(+)
diff --git a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result index ae0050c353a..e8633cd45bb 100644 --- a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result +++ b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result @@ -111,6 +111,10 @@ a 6 7 *** MDEV-4486: Allow to start old-style replication even if mysql.gtid_slave_pos is unavailable +connection server_1; +INSERT INTO t1 VALUES (8); +DELETE FROM t1 WHERE a=8; +connection server_2; connection server_2; include/stop_slave.inc CHANGE MASTER TO master_use_gtid= no; diff --git a/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test b/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test index b5ff294908b..032ebb77d1e 100644 --- a/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test +++ b/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test @@ -173,6 +173,22 @@ SELECT * FROM t1 ORDER BY a;
--echo *** MDEV-4486: Allow to start old-style replication even if mysql.gtid_slave_pos is unavailable
+# In GTID mode, the old-style replication position is also updated. But during +# GTID connect, the old-style position is not known until receiving the fake +# GTID list event, which contains the required position value. If we happened +# to stop the slave above before this fake GTID list event, the test could fail +# with duplicate key errors due to switching to non-GTID mode at a wrong +# position too far back in the binlog. +# +# Work-around this by injecting an extra dummt event and syncing the slave to +# it, ensuring the old-style position will be updated. +--connection server_1 +INSERT INTO t1 VALUES (8); +DELETE FROM t1 WHERE a=8; +--save_master_pos +--connection server_2 +--sync_with_master + --connection server_2 --source include/stop_slave.inc CHANGE MASTER TO master_use_gtid= no;