Pavel Ivanov <pivanof@google.com> writes:
Friendly warning: I've discovered a critical bug in GTID implementation https://mariadb.atlassian.net/browse/MDEV-4473. So use
Thanks for digging this up, and even supplying a nice test case! I've pushed a fix to 10.0-base, and merged it to 10.0. The logic here is a bit complex, so let me try to explain what is going on. Normally, when the slave requests to start replication from some GTID G, the master needs to find the binlog file that contains G, scan through it until it reaches the event G, and then start sending events to the slave at the point after the event group of G. However, suppose that we are using two replication domains 1 and 2, but there are no events logged in domain 2 for a month. The slave will send its start gtid position, say 1-1-10000,2-2-500. Since there was nothing logged in domain 2 for one month, it is likely that the binlog file containing 2-2-500 was purged. So if we tried to locate that purged binlog, the slave would fail to connect. But as long as 2-2-500 was the _last_ event logged in domain 2 (which is likely if it was logged one month ago), then we do not need to find the old purged binlog file - we can start from the beginning of any later binlog file. The code to handle this special case in gtid_find_binlog_file() and contains_all_slave_gtid() was the code with the bug. The way the code works is to look at the Gtid_list_log_event at the start of every. This contains the list of the GTIDs with the highest sequence number logged in previous binlog files, for each (domain_id, server_id). Further, the list is sorted on domain_id, and the last GTID in each group is the last GTID logged. This allows to handle the above-mentioned special case. If 2-2-500 appears in the Gtid_list_log_event as the last event logged in prior binlogs, then we can start sending events from the beginning of this binlog, we do not need to go back further. However, the code had a bug, it was missing the check that the GTID was the last one in the group for that domain. So if the Gtid_list_log_event contained 1-1-10000,2-2-500,2-3-600, then the code would select this as the starting point for 2-2-500. This is wrong, because we need to go back to find 2-3-600 and send this to the slave. The fix is to add the check that the GTID is the last one in its domain_id group. In the bug report, you wondered why there are multiple GTIDs for one domain_id in Gtid_list_log_event (there is one for each server id). This is needed to be able to locate any given GTID in the binlogs, without relying on sequence numbers being strictly increasing. There are a number of scenarios where this can happen. Even if most of these are undesirable configurations / user error, it is sure to occur in practice, and I spent a lot of effort in the design to make sure that GTID will still behave reasonably in such cases, and not silently corrupt replication. There is a careful distinction between the slave state (gtid position), which has only one GTID per domain id, and master binlog state (Gtid_list_log_event), which has one per (domain_id, server_id) combination. The latter can accumulate lots of cruft in the form of old, no longer used server_id's, but it does not matter, as it is not something users ever need to look at. The former _is_ something the user might want to look at, and it has the simple format of just one GTID per domain configured by the user. (This was BTW a major motivation for redesigning GTID from scratch rather than taking the MySQL 5.6 version. In MySQL 5.6, they do not make this distiction, so the user-visible slave GTID position will accumulate cruft in the form of no longer used server UUIDs, which will hang around basically forever). Thanks, - Kristian.