Re: [MariaDB commits] [PATCH] MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start
Andrei Elkin <andrei.elkin@mariadb.com> writes:
1. I have not so far been able to reproduce the failure, so this patch is assumed to fix the sporadic failure, but I have not been able to verify that it does.
I verified your analysis to confirm.
Thanks for checking!
But on the other hand this will not solve the fundamental issue, that until the GTID mode connect is completed, we will in the general case not have a valid corresponding old-style position. So maybe the current behaviour is ok?
Sure but it looks fixes are not overly difficult.
As it's Gtid_list_log_event::log_pos that makes the file:pos (old-style coordinates) state be valid why won't be keep the coordinates intact until the event has been arrived/processed? Say that situation is remembered in `RLI::seen_gtid_log_list_event`. Then for instance in the serial case the fixes would look like
--- a/sql/rpl_rli.cc +++ b/sql/rpl_rli.cc @@ -1030,7 +1030,7 @@ void Relay_log_info::inc_group_relay_log_pos(ulonglong log_pos, rgi->last_master_timestamp > last_master_timestamp) last_master_timestamp= rgi->last_master_timestamp; } - else + else if (!mi->using_gtid || seen_gtid_log_list_event) { /* Non-parallel case. */ group_relay_log_pos= event_relay_log_pos;
Agree, this looks fine. The dump thread should always be sending the gtid_list event. And yes, the log_pos of the events received until then are not useful to set for the slave. Right, so this seems a good approach for a fix. Maybe I should push the workaround to 10.5 to help the effort to remove sporadic failures, and then file a separate bug about the underlying problem and your proposed solution? Thanks, - Kristian.
participants (1)
-
Kristian Nielsen