Pavel Ivanov <pivanof@google.com> writes:
Note that the patch I've attached have test case that should reproduce the problems.
Thanks, I've now gone through the testcases also. Let me number the individual tests as follows: 1. Check that gap in seq_no without binlogs doesn't allow to replicate 2. The same test but with starting master GTID from different server_id 3. Check processing of alternate futures 4. Check alt future when divergence is in last event in binlog file 5. Check alt future without binlogs I tried the test cases against current 10.0-base. (5) fails, this is a bug. It should fail also in non-strict mode. I will fix (as I said in comment to MDEV-4820). (3) Fails, but only because of a different error message (generic "event is not found in master binlog" rather than specific "slave has alternate future). I can put in the "alternate future" error message, but I want to be sure you really understand what this does and the limitations. In your test, slave does one extra transaction, master does two. Slave ends up at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that slave has an alternate future. But suppose you did the test the other way around, slave does two transactions, master does one. Then slave has 0-2-113 and master has 0-1-112. It is not the case that 112 >= 113. So we can not detect at this point that slave has an alternate future. So now we are going to give two *different* error messages to the user essentially at random, depending on which alternate future is furthest ahead. Is this really what you want? I would think it would be *better* for data center operations to have at least a consistent error message for the two situations. Or did I misunderstand something? Can the Google patch detect alternate slave futures in this case and distinguish it from master being behind, and if so, how? Other than the error message, (1)-(4) all pass for me on current unmodified 10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one feature request for different error message. I cannot help thinking that there is something I'm missing from all you've written already on the subject of MDEV-4820, but I don't have anything concrete. So please let me know what I'm missing. Or is it just that my explanations are confusing, and it would have been better if I'd just fixed (5) and then discussed (3) before answering? (But the discussion is very useful for me to get my thoughts clear, the details around this are unfortunately quite complex).
With GTID, @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
Right, but as I understood it gtid_binlog_pos is necessary only to adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you check that in non-strict mode when seq_no of the latest transaction can be less than seq_no of old transaction?). If we know for sure that gtid_slave_pos reflects the latest transaction then again gtid_binlog_pos doesn't carry any new information and can be empty. Am I missing something?
Yes, I think so. For one, we need to know the next sequence number to use within each domain. More subtle, we also need to know the last sequence number for every (domain_id, server_id) pair. This is the information that allows slave to start at the correct place in the master binlog even if sequence numbers are not monotonic between server_ids. Setting gtid_slave_pos does not restore any of this information. It just triggers a special case that allows to turn a --log-slave-updates=0 slave into a master. The fact that it partially works for your case of removing binlogs is mostly accidental, and it is not the correct way to handle it. You really need to understand this subtle part to understand the finer details of slave connect. It is true that _if_ we required strict mode always we could use a simpler algorithm. But we do not require that. And since we do not, and the more complex algorithm works in all cases, it is better to have just one algorithm that handles all cases, rather than two separate algorithms, with twice the potential for bugs. So if you really need to remove manually binlogs on a master, as I said before the correct way is to preserve the full information. Such way is not currently implemented. I have suggested two possible ways it could be implemented (always read master-bin.info in non-crash case, and explicit CHANGE MASTER TO gtid_list=XXX). So far I have not made it a priority to support manual deletion of binlogs on the master. I hope this makes things clearer, else please help me understand what it is that I am failing to explain properly.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
I have no idea where you heard that. Can you please be more concrete? Eg. give an example where the MariaDB design makes it impossible to avoid lost transactions, alternate futures, or other db inconsistencies?
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
There are many ways. For example using one of the --replicate-ignore-* options. For example if you have a master with two schemas, you could have two slaves S1 and S2 each replicating one schema. You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A. Of course, that third-level slave can not use GTID positioning unless you correctly configure different domain ids for the original first-level master transactions depending on schema used. _Exactly_ because in this case GTID can not ensure against lost transactions or other corruption. But thanks to the design, GTID can still be enabled and used in other parts of the replication hierarchy.
Don't forget the special case when the GTID requested by slave is the last event in this domain in the previous binlog file. Then you don't look into that file and start serving directly from the next event which won't be equal to what slave requested.
Agree.
Well, if I understood you correctly all test cases shouldn't work by design. Maybe only except the second case when server doesn't replicate at all.
On the contrary, from what I could determine all the test cases should work, and I will fix the one bug where they do not (and the error message if you really insist).
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information.
Well, Google's Group ID must also need to store last GTID logged persistently, so that it can continue at the right point after a server restart/crash. Where is this stored? MariaDB GTID chooses to store this in the binlog, to avoid the overhead of extra InnoDB row operations for each transaction to store it in a table or system tablespace header. One way or the other, Google's Group ID must store this somewhere. AFAIK, Google's Group ID is not on by default, and needs to be explicitly enabled. Enabling it adds overhead to every binlog event. In contrast, MariaDB GTID is on by default, and the implementation actually _decreases_ the size of binlog events compared to 5.5. Making this work requires the extra information in Gtid_list. - Kristian.