On Fri, May 3, 2013 at 1:46 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
This is exactly my concern currently (wasn't initially). So let's say as you said above extra transaction was executed when slave was connected, or (more probable case if possible) when IO thread was already disconnected but SQL thread didn't catch up yet. In both cases extra transaction somehow got there in the middle of the stream, and so the last GTID is the valid one, existing on the master. So slave is able to connect to master again and everything works fine. A couple of days passed without change of the master, slave already deleted the binlog file with transaction 0-2-101. Somewhere on the transaction 0-1-10000 the server 2 becomes master...
Hm... Looking at the code it seems that at this point all slaves should be able to continue to replicate from server 2 just fine, although result of transaction 0-2-101 will still exist only on server 2 and nowhere else. Which means that your answer above is incorrect (as long as slaves connecting to server 2 are at state 0-1-101 or higher). If this is true then I'm relieved and have no more concerns...
Aha, now I see. Yes, you are right. 0-2-101 will exist only on server 2. It will not be replicated to other slaves if server 2 later becomes master.
Thinking about such situation more it still seems to me that it would be beneficial to have an additional flag that will require for seq_no in GTIDs of committed transactions to be strictly increasing for each domain. It will add fault-tolerance to production setups. Do you think it's feasible to add such flag (it may be false by default)? Pavel