On Tue, Aug 13, 2013 at 3:26 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Note that the patch I've attached have test case that should reproduce the problems.
Thanks, I've now gone through the testcases also. Let me number the individual tests as follows:
1. Check that gap in seq_no without binlogs doesn't allow to replicate 2. The same test but with starting master GTID from different server_id 3. Check processing of alternate futures 4. Check alt future when divergence is in last event in binlog file 5. Check alt future without binlogs
I tried the test cases against current 10.0-base.
(5) fails, this is a bug. It should fail also in non-strict mode. I will fix (as I said in comment to MDEV-4820).
(3) Fails, but only because of a different error message (generic "event is not found in master binlog" rather than specific "slave has alternate future).
This is surprising. Test doesn't check the particular message text. How does it fail?
I can put in the "alternate future" error message, but I want to be sure you really understand what this does and the limitations.
In your test, slave does one extra transaction, master does two. Slave ends up at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that slave has an alternate future.
But suppose you did the test the other way around, slave does two transactions, master does one. Then slave has 0-2-113 and master has 0-1-112. It is not the case that 112 >= 113. So we can not detect at this point that slave has an alternate future.
So now we are going to give two *different* error messages to the user essentially at random, depending on which alternate future is furthest ahead. Is this really what you want?
Well, even if I wanted it differently there's no good way to do that.
I would think it would be *better* for data center operations to have at least a consistent error message for the two situations.
Or did I misunderstand something? Can the Google patch detect alternate slave futures in this case and distinguish it from master being behind, and if so, how?
No, neither my patch on MDEV-4820 nor Google's Group ID patch in MySQL 5.1 cannot detect alt future when slave has more transactions than master. But that doesn't matter (for us) because in normal situation master's GTID will continue moving forward while slave's GTID will remain the same. So eventually (usually very quickly) we'll reach the situation when seq_no on master is bigger and then slave will get "alt future" error and it will stick, i.e. won't ever change again. That's good enough for us because this small window of different error message is pretty much impossible to catch -- you'll always look at the logs when there's already "alt future" error in them.
Other than the error message, (1)-(4) all pass for me on current unmodified 10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one feature request for different error message. I cannot help thinking that there is something I'm missing from all you've written already on the subject of MDEV-4820, but I don't have anything concrete. So please let me know what I'm missing.
Apparently I failed to reproduce the problem scenarios in the testing environment. Sorry, I didn't try to run it on unchanged code. But did you try the manual reproduction steps I mentioned in MDEV-4820?
With GTID, @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
Right, but as I understood it gtid_binlog_pos is necessary only to adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you check that in non-strict mode when seq_no of the latest transaction can be less than seq_no of old transaction?). If we know for sure that gtid_slave_pos reflects the latest transaction then again gtid_binlog_pos doesn't carry any new information and can be empty. Am I missing something?
Yes, I think so.
For one, we need to know the next sequence number to use within each domain.
This information exists in gtid_slave_pos in situation I'm talking about.
More subtle, we also need to know the last sequence number for every (domain_id, server_id) pair. This is the information that allows slave to start at the correct place in the master binlog even if sequence numbers are not monotonic between server_ids.
This is where I disagree. You keep insisting that this information is necessary. By that you are basically saying: I need to support those who set up lousy multi-master replication, thus I won't treat domain_id as a single replication stream, I'll treat a pair (domain_id, server_id) as a single replication stream (meaning there could be several replication streams within one domain_id). So virtually you merge server_id into domain_id and create one 64-bit domain_id. And then each such merged domain_id has its own replication position (determined by seq_no) and using that you determine where in binlog slave should start to replicate. But again it seems that MariaDB currently is working inconsistently even in this setup: slave passes to server GTID only for one of server_ids. But binlog events for this server_id don't have any particular order related to binlog events from another server_id. So master can easily send events that slave already has or skip some events that slave doesn't have. Even if you put some protection on slave to not re-execute events that it already has you still cannot protect against skipped events. So replication in such situation probably will never break (i.e. slave won't stop with some error) but results will be questionable. And the only argument for this seem to be that anyone can be in the same situation without GTID replication... But what I'm asserting is that when replication is set up properly, when different masters never write binlog events with the same domain_id (and if a second server creates binlog event with the same domain_id that's considered an error), when events with the same domain_id can never come to a slave through different replication streams, in such situation domain_id is one and only true domain id that needs to remember last position, i.e. seq_no. In such setup there's no need to remember all server_ids that this database had ever had as masters (there could be hundreds of those). Then (domain_id, seq_no) pair will uniquely identify server's position within domain and no other information is necessary for that. server_id is then needed only to distinguish alternate futures, that's it. I thought gtid_strict_mode was supposed to be such mode of operation.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
I have no idea where you heard that.
Can you please be more concrete? Eg. give an example where the MariaDB design makes it impossible to avoid lost transactions, alternate futures, or other db inconsistencies?
Here is your words from an earlier email in this thread:
I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers.
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
If MariaDB cannot detect missing events and alternate futures that means it silently allows them to exist. You said you made this decision deliberately, so it's by design. And you said that MariaDB shouldn't implement second design here. So if we don't want to tolerate lost transactions and alternate futures and want things to break whenever such events happen, stock MariaDB cannot do it for us by design. And we have to do our custom modifications to it to support such production environment. Have I misunderstood what you said?
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
There are many ways. For example using one of the --replicate-ignore-* options. For example if you have a master with two schemas, you could have two slaves S1 and S2 each replicating one schema. You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A.
Of course, that third-level slave can not use GTID positioning unless you correctly configure different domain ids for the original first-level master transactions depending on schema used. _Exactly_ because in this case GTID can not ensure against lost transactions or other corruption. But thanks to the design, GTID can still be enabled and used in other parts of the replication hierarchy.
I would think actually third-level slave will break in such situation because MariaDB doesn't allow GTIDs with the same (domain_id, server_id) pair to be out of order, right? But with such replication setup third-level slave can get a bunch of events from S1 first and then it won't be able to get any events from S2 because they'll have smaller seq_no than already present in the binlog.
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information.
Well, Google's Group ID must also need to store last GTID logged persistently, so that it can continue at the right point after a server restart/crash. Where is this stored? MariaDB GTID chooses to store this in the binlog, to avoid the overhead of extra InnoDB row operations for each transaction to store it in a table or system tablespace header. One way or the other, Google's Group ID must store this somewhere.
Sure. Last event in the binlog has last Group ID. If there's no binlogs or there are only binlogs without events then variable binlog_group_id has it. Pavel