Hi Andrei, Brandon,
What do you think of this patch to fix sporadic failure of
rpl.rpl_gtid_stop_start in buildbot, MDEV-33602? The test fails relatively
frequent, so would be good to get it fixed.
Two issues:
1. I have not so far been able to reproduce the failure, so this patch is
assumed to fix the sporadic failure, but I have not been able to verify that
it does.
2. The question is if this will be just hiding a real bug? When the
GTID-mode slave connects, it receives a few events from the start of the
binlog (Gtid_list, Binlog_checkpoint, etc), and these briefly update the
old-style position to a wrong value until the fake Gtid_list is received
with the correct position. One might argue that this should be avoided. But
on the other hand this will not solve the fundamental issue, that until the
GTID mode connect is completed, we will in the general case not have a valid
corresponding old-style position. So maybe the current behaviour is ok?
- Kristian.
Kristian Nielsen via commits <commits@lists.mariadb.org> writes:
The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.
Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
---
.../suite/rpl/r/rpl_gtid_stop_start.result | 4 ++++
mysql-test/suite/rpl/t/rpl_gtid_stop_start.test | 16 ++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
index ae0050c353a..e8633cd45bb 100644
--- a/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
+++ b/mysql-test/suite/rpl/r/rpl_gtid_stop_start.result
@@ -111,6 +111,10 @@ a
6
7
*** MDEV-4486: Allow to start old-style replication even if mysql.gtid_slave_pos is unavailable
+connection server_1;
+INSERT INTO t1 VALUES (8);
+DELETE FROM t1 WHERE a=8;
+connection server_2;
connection server_2;
include/stop_slave.inc
CHANGE MASTER TO master_use_gtid= no;
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test b/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test
index b5ff294908b..032ebb77d1e 100644
--- a/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test
+++ b/mysql-test/suite/rpl/t/rpl_gtid_stop_start.test
@@ -173,6 +173,22 @@ SELECT * FROM t1 ORDER BY a;
--echo *** MDEV-4486: Allow to start old-style replication even if mysql.gtid_slave_pos is unavailable
+# In GTID mode, the old-style replication position is also updated. But during
+# GTID connect, the old-style position is not known until receiving the fake
+# GTID list event, which contains the required position value. If we happened
+# to stop the slave above before this fake GTID list event, the test could fail
+# with duplicate key errors due to switching to non-GTID mode at a wrong
+# position too far back in the binlog.
+#
+# Work-around this by injecting an extra dummt event and syncing the slave to
+# it, ensuring the old-style position will be updated.
+--connection server_1
+INSERT INTO t1 VALUES (8);
+DELETE FROM t1 WHERE a=8;
+--save_master_pos
+--connection server_2
+--sync_with_master
+
--connection server_2
--source include/stop_slave.inc
CHANGE MASTER TO master_use_gtid= no;