Pavel Ivanov <pivanof@google.com> writes:
Have I misunderstood what you said?
Yes, totally :-(
By that you are basically saying: I need to support those who set up lousy multi-master replication, thus I won't treat domain_id as a single replication stream, I'll treat a pair (domain_id, server_id) as a single replication stream (meaning there could be several replication streams within one domain_id). So
Absolutely not! This would be total breakage of the whole design. The whole foundation of MariaDB GTID is that for each domain id, we have a well-defined binlog order that _must_ be the same on every server in the replication hierarchy. This is what allows to represent the slave position as a single GTID per domain id. And that is why we provide the gtid_strict_mode, which enforces globally monotonic sequence numbers. Because if sequence numbers are monotonic everywhere, then it is impossible to have different binlog orders on different servers. But even without globally monotonic sequence numbers, you can still have the same binlog order on every server. And GTID can still work correctly. But it becomes the user's responsibility to ensure that the binlog order is the same on all servers in the same hierarchy. So in gtid strict mode it works exactly as you (and I) want. It will not hurt you that it also works in non-strict mode, which you will not use anyway. What is so hard to understand about this?
Here is your words from an earlier email in this thread:
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
So if we don't want to tolerate lost transactions and alternate futures and want things to break whenever such events happen, stock MariaDB cannot do it for us by design.
Ok, sorry about this, I can see how this could be misunderstood. I was trying to explain too many things at once and got things mixed up. All I meant here is that the code needs to use a bit more complex algorithms to correctly detect errors, not that such detection is impossible or should not be done. And that your patch similarly needs to be written in a different way, not that it would be impossible to do. I think to get this discussion back on track, you need to forget everything you think I said so far, and instead accept the following points: 1. I completely agree with you on how things should work in strict mode. Binlog events should always have monotonic sequence numbers, and no lost transactions or alternate futures are acceptable. 2. MariaDB GTID also supports some non-strict usage. This is not allowed to break point (1), so you do not need to worry about it if you are happy to use strict mode. 3. I agree that the issues you report in MDEV-4820 are bugs and I will fix them. Once we are on the same track here, we can discuss the finer details on how I make things work in non-strict mode, and why that is a desirable thing to do, if you like. But if you think I'm saying something to contradict points 1-3 above, you have misunderstood me. Ok? ---- (A couple more answers to less important questions:)
This is surprising. Test doesn't check the particular message text. How does it fail?
Just that the test case has a suppression for the error message about alternate future. Current 10.0-base gives a differently worded error message. So a different suppression is needed.
Apparently I failed to reproduce the problem scenarios in the testing environment. Sorry, I didn't try to run it on unchanged code. But did you try the manual reproduction steps I mentioned in MDEV-4820?
Yes, with exactly the same results. I found the one bug I've mentioned, and the rest passed.
This is where I disagree. You keep insisting that this information is necessary.
It is necessary in the sense that the code was written under the assumption that this information is there. And it is necessary to be able to correctly locate an arbitrary GTID in a binlog that was written in non-strict mode. But now that I have thought about it, I think you are right that they are not needed if the entire binlog was written obeying strict mode. It seems we can always detect errors in this case. The point is: I did not consider the possibility that user would manually remove the binlog on a master when I wrote the code. So I did not want to promise that it is supported until I thought the problem through.
You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A.
I would think actually third-level slave will break in such situation because MariaDB doesn't allow GTIDs with the same (domain_id, server_id) pair to be out of order, right? But with such replication setup third-level slave can get a bunch of events from S1 first and then it won't be able to get any events from S2 because they'll have smaller seq_no than already present in the binlog.
Thanks, yes you are right, GTID will not work correctly on A or any slave of A in this setup. And that is exactly the point of gtid_strict_mode. A setup such as this requires extra configuration (separate domain_id) to work correctly with GTID. So we provide strict mode to be able to give an error when the configuration is incorrect. But we have to have this off by default to not break upgrades. - Kristian.