data:image/s3,"s3://crabby-images/2cef3/2cef380fa1898966dbddae070e8711a81d0d89a3" alt=""
Jonas Oreland <jonaso@google.com> writes:
hmm...i'm not sure I get it...
is it a bug or a feature that the "rouge" transactions is skipped by Slave2 in statement based replication, skipping 0-2-3 and 0-2-4 can cause arbitrary data drift, right ?
They are not skipped. The bug is in your patch (I think, I did not test it); those two transactions can be duplicated (executed twice by Slave2). Let me give the example in more detail: Let's say Slave2 first connects to Slave1 from the start. Slave2 executes GTIDs 0-1-1, 0-1-2, 0-2-3, 0-2-4, 0-1-3. Now we run STOP SLAVE on Slave2, @@gtid_slave_pos=0-1-3. Later we do START SLAVE on Slave2. Then Slave2 has to resume from the correct position, which is just after 0-1-3. But with your patch, I think Slave2 will receive and execute 0-2-4 and 0-1-3 again. This results in duplicate events and possible data drift on Slave2. Because in your code, you will reach GTID 0-2-3 in the binlog, and compare against the 0-1-3 requested by Slave2. And since 3==3, you will run info->gtid_state.remove(gtid). And then the next GTID 0-2-4 will be sent (incorrectly) to Slave2. The correct behaviour is to compare 0-2-3 to 0-1-3, see that the server_ids are different, and skip and _not_ remove from the gtid_state. Then GTID 0-2-4 will be skipped, and only after the correct position 0-1-3 will Server2 start receiving events. More generally, if GTIDS D-S1-N1 comes before D-S2-N2 in the binlogs, there is no guarantee that N1 < N2. Only if S1=S2 can we be sure that N1 < N2. That is why the server_id checks are needed. Hope this helps, - Kristian.
Now the binlog on Slave1 contains:
GTID 0-1-1 GTID 0-1-2 GTID 0-2-3 GTID 0-2-4 GTID 0-1-3 GTID 0-1-4 GTID 0-1-5