[Maria-developers] Review of patch for MDEV-4820
I took a close look at your patch for MDEV-4820. I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers. This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures. I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with. The gtid_strict_mode is about helping the user by giving an error instead of logging certain undeirable event sequences to the binlog. It should not cause different non-error behaviour ever, nor should it prevent working with binlogs that were logged when strict mode was not enabled.
diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -993,7 +993,7 @@ rpl_binlog_state::check_strict_sequence(uint32 domain_id, uint32 server_id,
if ((elem= (element *)my_hash_search(&hash, (const uchar *)(&domain_id), 0)) && - elem->last_gtid && elem->last_gtid->seq_no >= seq_no) + elem->last_gtid && elem->last_gtid->seq_no + 1 != seq_no)
If I understand this part correctly, this is about enforcing that there are no holes in sequence numbers. This goes against the GTID design.
@@ -854,8 +857,16 @@ contains_all_slave_gtid(slave_connection_state *st, Gtid_list_log_event *glev) */ return false; } - if (gtid->server_id == glev->list[i].server_id && - gtid->seq_no <= glev->list[i].seq_no) + if (slave_gtid_strict_mode) + { + if (gtid->seq_no < glev->list[i].seq_no) + { + has_greater_seq_no= true; + break; + } + }
I think here you want to depend on sequence numbers being monotonic. This does not work, as you cannot be sure that strict mode was enabled when this binlog file was written.
- if (!rpl_global_gtid_slave_state.domain_to_gtid(slave_gtid->domain_id, - &master_replication_gtid) || + bool master_repl_gtid_found = + rpl_global_gtid_slave_state.domain_to_gtid(slave_gtid->domain_id, + &master_replication_gtid); + if (!master_repl_gtid_found || slave_gtid->server_id != master_replication_gtid.server_id || slave_gtid->seq_no != master_replication_gtid.seq_no) {
Right, so slave requests something in a domain that we do not have anything about in the binlog, but we do have something in the slave state (but not the correct one). So this is a bug in the current code. If slave requests to start at a position in a domain we have nothing in, either that domain is served by a different master, or slave wants to start from the very first event in that domain. Currently, we fail to check in the second case that we actually have the correct GTID as the first event. But this is the wrong way to fix it. A master should not be prevented from serving some domain D1 just because it has some old junk for another domain D2 in the slave state. Instead, when we allow to start in a domain that is missing, and then later actually encounter an event in this domain, we should check that it matches what the slave originally requested. If not, we should give an error. I'll try to find time to do this.
@@ -1191,6 +1239,16 @@ gtid_find_binlog_file(slave_connection_state *state, char *out_name, */ state->remove(gtid); } + else if (slave_gtid_strict_mode && + gtid->seq_no == glev->list[i].seq_no) + { + *errormsg= "Requested slave GTID state have different server id " + "than the one in binlog"; + *error_gtid= *gtid; + *good_gtid= glev->list[i]; + error= ER_SLAVE_IS_FROM_ALTERNATE_FUTURE; + goto end; + }
This can fail if the binlog file was written when strict mode was not enabled.
@@ -1559,7 +1620,39 @@ send_event_to_slave(THD *thd, NET *net, String* const packet, ushort flags, event_gtid.seq_no <= gtid->seq_no) *gtid_skip_group = (flags2 & Gtid_log_event::FL_STANDALONE ? GTID_SKIP_STANDALONE : GTID_SKIP_TRANSACTION); - if (event_gtid.server_id == gtid->server_id && + if (slave_gtid_strict_mode) + { + if (event_gtid.seq_no == gtid->seq_no && + event_gtid.server_id != gtid->server_id) + { + my_errno= ER_SLAVE_IS_FROM_ALTERNATE_FUTURE; + *error_gtid= *gtid; + *good_gtid= event_gtid; + return "Requested slave GTID state have different server id " + "than the one in binlog"; + }
Again, what if this is an old part of the binlog where strict mode was disabled?
+ else if (*first_event && event_gtid.seq_no > gtid->seq_no) + { + /* + Earlier we decided that this binlog file has event that we need, + but the first event in the file has seq_no beyond what we need. + This can happen in two cases: + - this file is the first binlog we have and Gtid_list event + in it is empty; + - there is a gap between the last event mentioned in Gtid_list + and the first event in the binlog file. + As the first reason is the most probable one let's issue the + appropriate error message. + */ + my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; + *error_gtid= *gtid; + return "Could not find GTID state requested by slave in any " + "binlog files. Probably the slave state is too old and " + "required binlog files have been purged"; + } + }
This "first_event" (which I admit I do not fully understand) looks wrong - all such things usually need to be per-domain. However, I think this is the place in the code I mention above, where we get an event in a domain that the slave requested, but we did not have anything at connect time. So here I want an even stricter check, and also in non-strict mode: The first GTID in the domain must be _equal_ to what the slave requested, or we should fail. ---- Next I will take a closer look at your test cases and see what can be done to fix it in a way compatible with the basic design. I also want to re-iterate the suggestion to not delete binlogs when cloning a master to a new master. The most recent binlog file has real information (in the initial Gtid_list event) that is needed for proper operation as a master, and it really doesn't make sense to first purposely corrupt the server state and subsequently try to hack the code to fix the damage. If one really wanted not to copy any binlog files, then a better idea would be a way to restore the state, eg. a RESET MASTER TO gtid_list='xxx,yyy,...' command or something like that. Hope this helps, - Kristian.
On Mon, Aug 12, 2013 at 4:59 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I took a close look at your patch for MDEV-4820.
I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers.
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
The gtid_strict_mode is about helping the user by giving an error instead of logging certain undeirable event sequences to the binlog. It should not cause different non-error behaviour ever, nor should it prevent working with binlogs that were logged when strict mode was not enabled.
diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -993,7 +993,7 @@ rpl_binlog_state::check_strict_sequence(uint32 domain_id, uint32 server_id,
if ((elem= (element *)my_hash_search(&hash, (const uchar *)(&domain_id), 0)) && - elem->last_gtid && elem->last_gtid->seq_no >= seq_no) + elem->last_gtid && elem->last_gtid->seq_no + 1 != seq_no)
If I understand this part correctly, this is about enforcing that there are no holes in sequence numbers. This goes against the GTID design.
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
+ else if (*first_event && event_gtid.seq_no > gtid->seq_no) + { + /* + Earlier we decided that this binlog file has event that we need, + but the first event in the file has seq_no beyond what we need. + This can happen in two cases: + - this file is the first binlog we have and Gtid_list event + in it is empty; + - there is a gap between the last event mentioned in Gtid_list + and the first event in the binlog file. + As the first reason is the most probable one let's issue the + appropriate error message. + */ + my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; + *error_gtid= *gtid; + return "Could not find GTID state requested by slave in any " + "binlog files. Probably the slave state is too old and " + "required binlog files have been purged"; + } + }
This "first_event" (which I admit I do not fully understand) looks wrong - all such things usually need to be per-domain.
You are right, although I'd think it shouldn't be first_event per-domain. Instead the Gtid_list event should be remembered and then first seen event should be compared to what was in Gtid_list...
However, I think this is the place in the code I mention above, where we get an event in a domain that the slave requested, but we did not have anything at connect time. So here I want an even stricter check, and also in non-strict mode: The first GTID in the domain must be _equal_ to what the slave requested, or we should fail.
Don't forget the special case when the GTID requested by slave is the last event in this domain in the previous binlog file. Then you don't look into that file and start serving directly from the next event which won't be equal to what slave requested.
----
Next I will take a closer look at your test cases and see what can be done to fix it in a way compatible with the basic design.
Well, if I understood you correctly all test cases shouldn't work by design. Maybe only except the second case when server doesn't replicate at all.
I also want to re-iterate the suggestion to not delete binlogs when cloning a master to a new master. The most recent binlog file has real information (in the initial Gtid_list event) that is needed for proper operation as a master, and it really doesn't make sense to first purposely corrupt the server state and subsequently try to hack the code to fix the damage. If one really wanted not to copy any binlog files, then a better idea would be a way to restore the state, eg. a RESET MASTER TO gtid_list='xxx,yyy,...' command or something like that.
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information. I guess we can reevaluate what we should do with MariaDB. Although it sounds like whatever we do it still won't fit us without hacks that go against MariaDB's design -- we'll still need to hack to prohibit any transaction loss or alternate futures... Pavel
Pavel Ivanov <pivanof@google.com> writes:
Note that the patch I've attached have test case that should reproduce the problems.
Thanks, I've now gone through the testcases also. Let me number the individual tests as follows: 1. Check that gap in seq_no without binlogs doesn't allow to replicate 2. The same test but with starting master GTID from different server_id 3. Check processing of alternate futures 4. Check alt future when divergence is in last event in binlog file 5. Check alt future without binlogs I tried the test cases against current 10.0-base. (5) fails, this is a bug. It should fail also in non-strict mode. I will fix (as I said in comment to MDEV-4820). (3) Fails, but only because of a different error message (generic "event is not found in master binlog" rather than specific "slave has alternate future). I can put in the "alternate future" error message, but I want to be sure you really understand what this does and the limitations. In your test, slave does one extra transaction, master does two. Slave ends up at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that slave has an alternate future. But suppose you did the test the other way around, slave does two transactions, master does one. Then slave has 0-2-113 and master has 0-1-112. It is not the case that 112 >= 113. So we can not detect at this point that slave has an alternate future. So now we are going to give two *different* error messages to the user essentially at random, depending on which alternate future is furthest ahead. Is this really what you want? I would think it would be *better* for data center operations to have at least a consistent error message for the two situations. Or did I misunderstand something? Can the Google patch detect alternate slave futures in this case and distinguish it from master being behind, and if so, how? Other than the error message, (1)-(4) all pass for me on current unmodified 10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one feature request for different error message. I cannot help thinking that there is something I'm missing from all you've written already on the subject of MDEV-4820, but I don't have anything concrete. So please let me know what I'm missing. Or is it just that my explanations are confusing, and it would have been better if I'd just fixed (5) and then discussed (3) before answering? (But the discussion is very useful for me to get my thoughts clear, the details around this are unfortunately quite complex).
With GTID, @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
Right, but as I understood it gtid_binlog_pos is necessary only to adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you check that in non-strict mode when seq_no of the latest transaction can be less than seq_no of old transaction?). If we know for sure that gtid_slave_pos reflects the latest transaction then again gtid_binlog_pos doesn't carry any new information and can be empty. Am I missing something?
Yes, I think so. For one, we need to know the next sequence number to use within each domain. More subtle, we also need to know the last sequence number for every (domain_id, server_id) pair. This is the information that allows slave to start at the correct place in the master binlog even if sequence numbers are not monotonic between server_ids. Setting gtid_slave_pos does not restore any of this information. It just triggers a special case that allows to turn a --log-slave-updates=0 slave into a master. The fact that it partially works for your case of removing binlogs is mostly accidental, and it is not the correct way to handle it. You really need to understand this subtle part to understand the finer details of slave connect. It is true that _if_ we required strict mode always we could use a simpler algorithm. But we do not require that. And since we do not, and the more complex algorithm works in all cases, it is better to have just one algorithm that handles all cases, rather than two separate algorithms, with twice the potential for bugs. So if you really need to remove manually binlogs on a master, as I said before the correct way is to preserve the full information. Such way is not currently implemented. I have suggested two possible ways it could be implemented (always read master-bin.info in non-crash case, and explicit CHANGE MASTER TO gtid_list=XXX). So far I have not made it a priority to support manual deletion of binlogs on the master. I hope this makes things clearer, else please help me understand what it is that I am failing to explain properly.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
I have no idea where you heard that. Can you please be more concrete? Eg. give an example where the MariaDB design makes it impossible to avoid lost transactions, alternate futures, or other db inconsistencies?
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
There are many ways. For example using one of the --replicate-ignore-* options. For example if you have a master with two schemas, you could have two slaves S1 and S2 each replicating one schema. You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A. Of course, that third-level slave can not use GTID positioning unless you correctly configure different domain ids for the original first-level master transactions depending on schema used. _Exactly_ because in this case GTID can not ensure against lost transactions or other corruption. But thanks to the design, GTID can still be enabled and used in other parts of the replication hierarchy.
Don't forget the special case when the GTID requested by slave is the last event in this domain in the previous binlog file. Then you don't look into that file and start serving directly from the next event which won't be equal to what slave requested.
Agree.
Well, if I understood you correctly all test cases shouldn't work by design. Maybe only except the second case when server doesn't replicate at all.
On the contrary, from what I could determine all the test cases should work, and I will fix the one bug where they do not (and the error message if you really insist).
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information.
Well, Google's Group ID must also need to store last GTID logged persistently, so that it can continue at the right point after a server restart/crash. Where is this stored? MariaDB GTID chooses to store this in the binlog, to avoid the overhead of extra InnoDB row operations for each transaction to store it in a table or system tablespace header. One way or the other, Google's Group ID must store this somewhere. AFAIK, Google's Group ID is not on by default, and needs to be explicitly enabled. Enabling it adds overhead to every binlog event. In contrast, MariaDB GTID is on by default, and the implementation actually _decreases_ the size of binlog events compared to 5.5. Making this work requires the extra information in Gtid_list. - Kristian.
On Tue, Aug 13, 2013 at 3:26 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Note that the patch I've attached have test case that should reproduce the problems.
Thanks, I've now gone through the testcases also. Let me number the individual tests as follows:
1. Check that gap in seq_no without binlogs doesn't allow to replicate 2. The same test but with starting master GTID from different server_id 3. Check processing of alternate futures 4. Check alt future when divergence is in last event in binlog file 5. Check alt future without binlogs
I tried the test cases against current 10.0-base.
(5) fails, this is a bug. It should fail also in non-strict mode. I will fix (as I said in comment to MDEV-4820).
(3) Fails, but only because of a different error message (generic "event is not found in master binlog" rather than specific "slave has alternate future).
This is surprising. Test doesn't check the particular message text. How does it fail?
I can put in the "alternate future" error message, but I want to be sure you really understand what this does and the limitations.
In your test, slave does one extra transaction, master does two. Slave ends up at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that slave has an alternate future.
But suppose you did the test the other way around, slave does two transactions, master does one. Then slave has 0-2-113 and master has 0-1-112. It is not the case that 112 >= 113. So we can not detect at this point that slave has an alternate future.
So now we are going to give two *different* error messages to the user essentially at random, depending on which alternate future is furthest ahead. Is this really what you want?
Well, even if I wanted it differently there's no good way to do that.
I would think it would be *better* for data center operations to have at least a consistent error message for the two situations.
Or did I misunderstand something? Can the Google patch detect alternate slave futures in this case and distinguish it from master being behind, and if so, how?
No, neither my patch on MDEV-4820 nor Google's Group ID patch in MySQL 5.1 cannot detect alt future when slave has more transactions than master. But that doesn't matter (for us) because in normal situation master's GTID will continue moving forward while slave's GTID will remain the same. So eventually (usually very quickly) we'll reach the situation when seq_no on master is bigger and then slave will get "alt future" error and it will stick, i.e. won't ever change again. That's good enough for us because this small window of different error message is pretty much impossible to catch -- you'll always look at the logs when there's already "alt future" error in them.
Other than the error message, (1)-(4) all pass for me on current unmodified 10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one feature request for different error message. I cannot help thinking that there is something I'm missing from all you've written already on the subject of MDEV-4820, but I don't have anything concrete. So please let me know what I'm missing.
Apparently I failed to reproduce the problem scenarios in the testing environment. Sorry, I didn't try to run it on unchanged code. But did you try the manual reproduction steps I mentioned in MDEV-4820?
With GTID, @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
Right, but as I understood it gtid_binlog_pos is necessary only to adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you check that in non-strict mode when seq_no of the latest transaction can be less than seq_no of old transaction?). If we know for sure that gtid_slave_pos reflects the latest transaction then again gtid_binlog_pos doesn't carry any new information and can be empty. Am I missing something?
Yes, I think so.
For one, we need to know the next sequence number to use within each domain.
This information exists in gtid_slave_pos in situation I'm talking about.
More subtle, we also need to know the last sequence number for every (domain_id, server_id) pair. This is the information that allows slave to start at the correct place in the master binlog even if sequence numbers are not monotonic between server_ids.
This is where I disagree. You keep insisting that this information is necessary. By that you are basically saying: I need to support those who set up lousy multi-master replication, thus I won't treat domain_id as a single replication stream, I'll treat a pair (domain_id, server_id) as a single replication stream (meaning there could be several replication streams within one domain_id). So virtually you merge server_id into domain_id and create one 64-bit domain_id. And then each such merged domain_id has its own replication position (determined by seq_no) and using that you determine where in binlog slave should start to replicate. But again it seems that MariaDB currently is working inconsistently even in this setup: slave passes to server GTID only for one of server_ids. But binlog events for this server_id don't have any particular order related to binlog events from another server_id. So master can easily send events that slave already has or skip some events that slave doesn't have. Even if you put some protection on slave to not re-execute events that it already has you still cannot protect against skipped events. So replication in such situation probably will never break (i.e. slave won't stop with some error) but results will be questionable. And the only argument for this seem to be that anyone can be in the same situation without GTID replication... But what I'm asserting is that when replication is set up properly, when different masters never write binlog events with the same domain_id (and if a second server creates binlog event with the same domain_id that's considered an error), when events with the same domain_id can never come to a slave through different replication streams, in such situation domain_id is one and only true domain id that needs to remember last position, i.e. seq_no. In such setup there's no need to remember all server_ids that this database had ever had as masters (there could be hundreds of those). Then (domain_id, seq_no) pair will uniquely identify server's position within domain and no other information is necessary for that. server_id is then needed only to distinguish alternate futures, that's it. I thought gtid_strict_mode was supposed to be such mode of operation.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
I have no idea where you heard that.
Can you please be more concrete? Eg. give an example where the MariaDB design makes it impossible to avoid lost transactions, alternate futures, or other db inconsistencies?
Here is your words from an earlier email in this thread:
I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers.
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
If MariaDB cannot detect missing events and alternate futures that means it silently allows them to exist. You said you made this decision deliberately, so it's by design. And you said that MariaDB shouldn't implement second design here. So if we don't want to tolerate lost transactions and alternate futures and want things to break whenever such events happen, stock MariaDB cannot do it for us by design. And we have to do our custom modifications to it to support such production environment. Have I misunderstood what you said?
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
There are many ways. For example using one of the --replicate-ignore-* options. For example if you have a master with two schemas, you could have two slaves S1 and S2 each replicating one schema. You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A.
Of course, that third-level slave can not use GTID positioning unless you correctly configure different domain ids for the original first-level master transactions depending on schema used. _Exactly_ because in this case GTID can not ensure against lost transactions or other corruption. But thanks to the design, GTID can still be enabled and used in other parts of the replication hierarchy.
I would think actually third-level slave will break in such situation because MariaDB doesn't allow GTIDs with the same (domain_id, server_id) pair to be out of order, right? But with such replication setup third-level slave can get a bunch of events from S1 first and then it won't be able to get any events from S2 because they'll have smaller seq_no than already present in the binlog.
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information.
Well, Google's Group ID must also need to store last GTID logged persistently, so that it can continue at the right point after a server restart/crash. Where is this stored? MariaDB GTID chooses to store this in the binlog, to avoid the overhead of extra InnoDB row operations for each transaction to store it in a table or system tablespace header. One way or the other, Google's Group ID must store this somewhere.
Sure. Last event in the binlog has last Group ID. If there's no binlogs or there are only binlogs without events then variable binlog_group_id has it. Pavel
Pavel Ivanov <pivanof@google.com> writes:
Have I misunderstood what you said?
Yes, totally :-(
By that you are basically saying: I need to support those who set up lousy multi-master replication, thus I won't treat domain_id as a single replication stream, I'll treat a pair (domain_id, server_id) as a single replication stream (meaning there could be several replication streams within one domain_id). So
Absolutely not! This would be total breakage of the whole design. The whole foundation of MariaDB GTID is that for each domain id, we have a well-defined binlog order that _must_ be the same on every server in the replication hierarchy. This is what allows to represent the slave position as a single GTID per domain id. And that is why we provide the gtid_strict_mode, which enforces globally monotonic sequence numbers. Because if sequence numbers are monotonic everywhere, then it is impossible to have different binlog orders on different servers. But even without globally monotonic sequence numbers, you can still have the same binlog order on every server. And GTID can still work correctly. But it becomes the user's responsibility to ensure that the binlog order is the same on all servers in the same hierarchy. So in gtid strict mode it works exactly as you (and I) want. It will not hurt you that it also works in non-strict mode, which you will not use anyway. What is so hard to understand about this?
Here is your words from an earlier email in this thread:
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
So if we don't want to tolerate lost transactions and alternate futures and want things to break whenever such events happen, stock MariaDB cannot do it for us by design.
Ok, sorry about this, I can see how this could be misunderstood. I was trying to explain too many things at once and got things mixed up. All I meant here is that the code needs to use a bit more complex algorithms to correctly detect errors, not that such detection is impossible or should not be done. And that your patch similarly needs to be written in a different way, not that it would be impossible to do. I think to get this discussion back on track, you need to forget everything you think I said so far, and instead accept the following points: 1. I completely agree with you on how things should work in strict mode. Binlog events should always have monotonic sequence numbers, and no lost transactions or alternate futures are acceptable. 2. MariaDB GTID also supports some non-strict usage. This is not allowed to break point (1), so you do not need to worry about it if you are happy to use strict mode. 3. I agree that the issues you report in MDEV-4820 are bugs and I will fix them. Once we are on the same track here, we can discuss the finer details on how I make things work in non-strict mode, and why that is a desirable thing to do, if you like. But if you think I'm saying something to contradict points 1-3 above, you have misunderstood me. Ok? ---- (A couple more answers to less important questions:)
This is surprising. Test doesn't check the particular message text. How does it fail?
Just that the test case has a suppression for the error message about alternate future. Current 10.0-base gives a differently worded error message. So a different suppression is needed.
Apparently I failed to reproduce the problem scenarios in the testing environment. Sorry, I didn't try to run it on unchanged code. But did you try the manual reproduction steps I mentioned in MDEV-4820?
Yes, with exactly the same results. I found the one bug I've mentioned, and the rest passed.
This is where I disagree. You keep insisting that this information is necessary.
It is necessary in the sense that the code was written under the assumption that this information is there. And it is necessary to be able to correctly locate an arbitrary GTID in a binlog that was written in non-strict mode. But now that I have thought about it, I think you are right that they are not needed if the entire binlog was written obeying strict mode. It seems we can always detect errors in this case. The point is: I did not consider the possibility that user would manually remove the binlog on a master when I wrote the code. So I did not want to promise that it is supported until I thought the problem through.
You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A.
I would think actually third-level slave will break in such situation because MariaDB doesn't allow GTIDs with the same (domain_id, server_id) pair to be out of order, right? But with such replication setup third-level slave can get a bunch of events from S1 first and then it won't be able to get any events from S2 because they'll have smaller seq_no than already present in the binlog.
Thanks, yes you are right, GTID will not work correctly on A or any slave of A in this setup. And that is exactly the point of gtid_strict_mode. A setup such as this requires extra configuration (separate domain_id) to work correctly with GTID. So we provide strict mode to be able to give an error when the configuration is incorrect. But we have to have this off by default to not break upgrades. - Kristian.
I'm a little confused, but it sounds like you are agreeing with me and are willing to support my use case. Great. Looking forward for the patches. Then I'll look at the new behavior and will see if I actually understood your intentions correctly. Pavel On Thu, Aug 15, 2013 at 1:30 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
Have I misunderstood what you said?
Yes, totally :-(
By that you are basically saying: I need to support those who set up lousy multi-master replication, thus I won't treat domain_id as a single replication stream, I'll treat a pair (domain_id, server_id) as a single replication stream (meaning there could be several replication streams within one domain_id). So
Absolutely not! This would be total breakage of the whole design.
The whole foundation of MariaDB GTID is that for each domain id, we have a well-defined binlog order that _must_ be the same on every server in the replication hierarchy. This is what allows to represent the slave position as a single GTID per domain id.
And that is why we provide the gtid_strict_mode, which enforces globally monotonic sequence numbers. Because if sequence numbers are monotonic everywhere, then it is impossible to have different binlog orders on different servers.
But even without globally monotonic sequence numbers, you can still have the same binlog order on every server. And GTID can still work correctly. But it becomes the user's responsibility to ensure that the binlog order is the same on all servers in the same hierarchy.
So in gtid strict mode it works exactly as you (and I) want. It will not hurt you that it also works in non-strict mode, which you will not use anyway. What is so hard to understand about this?
Here is your words from an earlier email in this thread:
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
So if we don't want to tolerate lost transactions and alternate futures and want things to break whenever such events happen, stock MariaDB cannot do it for us by design.
Ok, sorry about this, I can see how this could be misunderstood. I was trying to explain too many things at once and got things mixed up. All I meant here is that the code needs to use a bit more complex algorithms to correctly detect errors, not that such detection is impossible or should not be done. And that your patch similarly needs to be written in a different way, not that it would be impossible to do.
I think to get this discussion back on track, you need to forget everything you think I said so far, and instead accept the following points:
1. I completely agree with you on how things should work in strict mode. Binlog events should always have monotonic sequence numbers, and no lost transactions or alternate futures are acceptable.
2. MariaDB GTID also supports some non-strict usage. This is not allowed to break point (1), so you do not need to worry about it if you are happy to use strict mode.
3. I agree that the issues you report in MDEV-4820 are bugs and I will fix them.
Once we are on the same track here, we can discuss the finer details on how I make things work in non-strict mode, and why that is a desirable thing to do, if you like. But if you think I'm saying something to contradict points 1-3 above, you have misunderstood me.
Ok?
----
(A couple more answers to less important questions:)
This is surprising. Test doesn't check the particular message text. How does it fail?
Just that the test case has a suppression for the error message about alternate future. Current 10.0-base gives a differently worded error message. So a different suppression is needed.
Apparently I failed to reproduce the problem scenarios in the testing environment. Sorry, I didn't try to run it on unchanged code. But did you try the manual reproduction steps I mentioned in MDEV-4820?
Yes, with exactly the same results. I found the one bug I've mentioned, and the rest passed.
This is where I disagree. You keep insisting that this information is necessary.
It is necessary in the sense that the code was written under the assumption that this information is there.
And it is necessary to be able to correctly locate an arbitrary GTID in a binlog that was written in non-strict mode.
But now that I have thought about it, I think you are right that they are not needed if the entire binlog was written obeying strict mode. It seems we can always detect errors in this case. The point is: I did not consider the possibility that user would manually remove the binlog on a master when I wrote the code. So I did not want to promise that it is supported until I thought the problem through.
You could even have an aggregating slave A on the third level that uses multi-master to replicate each schema from each second-level slave back to a single server. M->S1->A and M->S2->A.
I would think actually third-level slave will break in such situation because MariaDB doesn't allow GTIDs with the same (domain_id, server_id) pair to be out of order, right? But with such replication setup third-level slave can get a bunch of events from S1 first and then it won't be able to get any events from S2 because they'll have smaller seq_no than already present in the binlog.
Thanks, yes you are right, GTID will not work correctly on A or any slave of A in this setup.
And that is exactly the point of gtid_strict_mode. A setup such as this requires extra configuration (separate domain_id) to work correctly with GTID. So we provide strict mode to be able to give an error when the configuration is incorrect. But we have to have this off by default to not break upgrades.
- Kristian.
Ok, I've pushed to 10.0-base a patch for MDEV-4820. revid:knielsen@knielsen-hq.org-20130816131025-etjrvmfvupsjzq83 As far as I can determine (and I checked quite carefully), this fixes all the problems you mentioned in the bug description and in your test cases. But I could have misunderstood something. Note that for the problem "For some reason at this point server 1 doesn't have any errors and doesn't replicate anything from server 2. Oops", the error is caught not when slave connects, but instead when the first event is received, which should be just as good. The reason is briefly explained in the changeset comment, and is to not re-introduce the bug MDEV-4485. The error message for "alternate future" I formulated like this: "Connecting slave requested to start from GTID %u-%u-%llu, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra errorneous transactions" I did not want to use the term "alternate future" as this seems to be not standard terminology. The MySQL manual uses the related term "diverge". I am not sure if you will be happy with the fix, but if not, please explain clearly if 1. You observe incorrect behavior (eg. lost transactions, alternate future not caught by error), and if so describe as clearly as possible how to reproduce; or 2. The behaviour is correct, but you are unhappy about the wording of the error messages, or how the code is implemented. - Kristian. PS. I hope it is clear that I greatly value your feedback. You and Elena are the only ones who have seriously worked to help improve the MariaDB GTID, and your input has already been very valuable.
OK, I performed some quick testing of the latest 10.0-base. I see a few points I'm unhappy with at the moment. These are not necessarily related to MDEV-4820, I probably should file new bugs for these. I can do that later if you want me to do that. 1. When master doesn't have binlogs and gtid_slave_pos is ahead of the GTID that slave tries to connect with you give error "The binlog on the master is missing the GTID ... requested by the slave (even though both a prior and a subsequent number does exist), and GTID strict mode is enabled". I find this error message very confusing: presence of a subsequent GTID in such situation is questionable, but there is no prior GTID in master's binlog for sure. 2. The error message "An attempt was made to binlog GTID ... which would create an out-of-order sequence number with existing GTID ..., and gtid strict mode is enabled" is confusing too, because it's issued not when slave actually tries to write event to binlog. Apparently the error condition is checked when slave considers executing the event that was just received from master. And if this event contains changes only to tables matching replicate-wild-ignore-table filter then this event won't be ever binlog'ed on slave in non-strict mode. So there's no "attempt to binlog" involved and error wording becomes not quite understandable. 3. There's error message "Specified GTID ... conflicts with the binary log which contains a more recent GTID .... If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos". It looks like it's issued inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27, 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos were set to '0-2-30'. In this situation I was able to set gtid_slave_pos to '0-1-29' successfully and get "slave has diverged" error after START SLAVE. Then I was able to set gtid_slave_pos to '0-2-29' and get error "Attempt was made to binlog out-of-order" after START SLAVE. I'd think that at least in strict mode MariaDB shouldn't allow to set gtid_slave_pos to a value that is clearly in the past. 4. Now real bug. Start three servers S1, S2 and S3 without binlogs. Set gtid_slave_pos to the same value on all of them. Connect S2 to replicate from S1. Execute a few transactions on S1. Perform a failover, make S1 to replicate from S2. Now connect S3 to replicate from S2. At this point S3 should be able to replicate successfully because it has the same db state as S2 had in the beginning (S3 has the same gtid_slave_pos as S2 had initially), and S2 has all binlogs to move from current position on S3 to the current position on S2. But yet S3 gets error that starting GTID doesn't exist in S2's binlogs. I think to fix this bug we should stop using gtid_slave_pos as indication of the current db state. We should make it possible to change gtid_binlog_pos when there's no events in binlogs. And when gtid_binlog_pos is changed we should force binlog rotation so that we have Gtid_list with initial value of gtid_binlog_pos. Then gtid_binlog_pos could be always used for setting initial db state and it kind of makes sense more than using gtid_slave_pos. But probably this will break the detection of slaves trying to connect using GTID before the start of binlogs... 5. Completely from different area but also GTID related bug. Take database from previous MySQL version (I've tested on the database from 5.1), start MariaDB on it, run mysql_upgrade and then try to set gtid_slave_pos to something. At this point I've got error "unable to load slave state from gtid_slave_pos table". This error was apparently remembered from MariaDB's start and reading of gtid_slave_pos table wasn't retried after mysql_upgrade actually created it. Pavel On Fri, Aug 16, 2013 at 6:27 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Ok, I've pushed to 10.0-base a patch for MDEV-4820.
revid:knielsen@knielsen-hq.org-20130816131025-etjrvmfvupsjzq83
As far as I can determine (and I checked quite carefully), this fixes all the problems you mentioned in the bug description and in your test cases. But I could have misunderstood something.
Note that for the problem "For some reason at this point server 1 doesn't have any errors and doesn't replicate anything from server 2. Oops", the error is caught not when slave connects, but instead when the first event is received, which should be just as good. The reason is briefly explained in the changeset comment, and is to not re-introduce the bug MDEV-4485.
The error message for "alternate future" I formulated like this:
"Connecting slave requested to start from GTID %u-%u-%llu, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra errorneous transactions"
I did not want to use the term "alternate future" as this seems to be not standard terminology. The MySQL manual uses the related term "diverge".
I am not sure if you will be happy with the fix, but if not, please explain clearly if
1. You observe incorrect behavior (eg. lost transactions, alternate future not caught by error), and if so describe as clearly as possible how to reproduce; or
2. The behaviour is correct, but you are unhappy about the wording of the error messages, or how the code is implemented.
- Kristian.
PS. I hope it is clear that I greatly value your feedback. You and Elena are the only ones who have seriously worked to help improve the MariaDB GTID, and your input has already been very valuable.
Pavel Ivanov <pivanof@google.com> writes:
I think to fix this bug we should stop using gtid_slave_pos as indication of the current db state. We should make it possible to
Agree.
change gtid_binlog_pos when there's no events in binlogs. And when
Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable @@GLOBAL.gtid_binlog_state Example value: '0-1-100,0-2-101' And you get an error if you set it unless the binlog is empty. Would this be what you need?
it kind of makes sense more than using gtid_slave_pos. But probably this will break the detection of slaves trying to connect using GTID before the start of binlogs...
I do not think it will break that (but we will see).
5. Completely from different area but also GTID related bug. Take database from previous MySQL version (I've tested on the database from 5.1), start MariaDB on it, run mysql_upgrade and then try to set gtid_slave_pos to something. At this point I've got error "unable to load slave state from gtid_slave_pos table". This error was apparently remembered from MariaDB's start and reading of gtid_slave_pos table wasn't retried after mysql_upgrade actually created it.
Ok, I will take a look. I think there is an existing bug report on that. IIRC there is some locking issue (the variable can be accessed from a place where table locks cannot be taken to read gtid_slave_pos table), but I will see what can be done.
1. When master doesn't have binlogs and gtid_slave_pos is ahead of the GTID that slave tries to connect with you give error "The binlog on the master is missing the GTID ... requested by the slave (even though both a prior and a subsequent number does exist), and GTID strict mode is enabled". I find this error message very confusing: presence of a subsequent GTID in such situation is questionable, but there is no prior GTID in master's binlog for sure.
Hm, this sounds like a bug. Do you have a testcase? But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will get instead the correct error message, that the position that the slave requests to connect at has been purged from the master's binlog.
2. The error message "An attempt was made to binlog GTID ... which would create an out-of-order sequence number with existing GTID ..., and gtid strict mode is enabled" is confusing too, because it's issued not when slave actually tries to write event to binlog. Apparently the error condition is checked when slave considers executing the event that was just received from master. And if this event contains changes only to tables matching replicate-wild-ignore-table filter then this event won't be ever binlog'ed on slave in non-strict mode. So there's no "attempt to binlog" involved and error wording becomes not quite understandable.
Right, I see. Thanks! One problem here is that when using non-transactional (DDL or MyISAM), then we _do_ need to check this _before_ executing the event. Because we cannot roll back after the event. But I agree of course that this is a bug. I will try to find a way to fix. Maybe the check can be delayed until the first event that we are actually going to execute (not filter).
3. There's error message "Specified GTID ... conflicts with the binary log which contains a more recent GTID .... If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos". It looks like it's issued inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27, 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos were set to '0-2-30'. In this situation I was able to set gtid_slave_pos to '0-1-29' successfully and get "slave has diverged" error after START SLAVE. Then I was able to set gtid_slave_pos to '0-2-29' and get error "Attempt was made to binlog out-of-order" after START SLAVE. I'd think that at least in strict mode MariaDB shouldn't allow to set gtid_slave_pos to a value that is clearly in the past.
Right, thanks, I will check. (I can understand that 0-1-29 did not give error, though you are probably right that it should; but that 0-2-29 did not give error is surprising).
4. Now real bug. Start three servers S1, S2 and S3 without binlogs. Set gtid_slave_pos to the same value on all of them. Connect S2 to replicate from S1. Execute a few transactions on S1. Perform a failover, make S1 to replicate from S2. Now connect S3 to replicate from S2. At this point S3 should be able to replicate successfully because it has the same db state as S2 had in the beginning (S3 has the same gtid_slave_pos as S2 had initially), and S2 has all binlogs to move from current position on S3 to the current position on S2. But yet S3 gets error that starting GTID doesn't exist in S2's binlogs.
This should also be fixed by setting @@GLOBAL.gtid_binlog_state. - Kristian.
Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable
@@GLOBAL.gtid_binlog_state
Example value: '0-1-100,0-2-101'
And you get an error if you set it unless the binlog is empty.
Would this be what you need?
Yep, sounds like what we need. Thanks, Pavel On Mon, Aug 19, 2013 at 4:28 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
I think to fix this bug we should stop using gtid_slave_pos as indication of the current db state. We should make it possible to
Agree.
change gtid_binlog_pos when there's no events in binlogs. And when
Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable
@@GLOBAL.gtid_binlog_state
Example value: '0-1-100,0-2-101'
And you get an error if you set it unless the binlog is empty.
Would this be what you need?
it kind of makes sense more than using gtid_slave_pos. But probably this will break the detection of slaves trying to connect using GTID before the start of binlogs...
I do not think it will break that (but we will see).
5. Completely from different area but also GTID related bug. Take database from previous MySQL version (I've tested on the database from 5.1), start MariaDB on it, run mysql_upgrade and then try to set gtid_slave_pos to something. At this point I've got error "unable to load slave state from gtid_slave_pos table". This error was apparently remembered from MariaDB's start and reading of gtid_slave_pos table wasn't retried after mysql_upgrade actually created it.
Ok, I will take a look. I think there is an existing bug report on that. IIRC there is some locking issue (the variable can be accessed from a place where table locks cannot be taken to read gtid_slave_pos table), but I will see what can be done.
1. When master doesn't have binlogs and gtid_slave_pos is ahead of the GTID that slave tries to connect with you give error "The binlog on the master is missing the GTID ... requested by the slave (even though both a prior and a subsequent number does exist), and GTID strict mode is enabled". I find this error message very confusing: presence of a subsequent GTID in such situation is questionable, but there is no prior GTID in master's binlog for sure.
Hm, this sounds like a bug. Do you have a testcase?
But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will get instead the correct error message, that the position that the slave requests to connect at has been purged from the master's binlog.
2. The error message "An attempt was made to binlog GTID ... which would create an out-of-order sequence number with existing GTID ..., and gtid strict mode is enabled" is confusing too, because it's issued not when slave actually tries to write event to binlog. Apparently the error condition is checked when slave considers executing the event that was just received from master. And if this event contains changes only to tables matching replicate-wild-ignore-table filter then this event won't be ever binlog'ed on slave in non-strict mode. So there's no "attempt to binlog" involved and error wording becomes not quite understandable.
Right, I see. Thanks!
One problem here is that when using non-transactional (DDL or MyISAM), then we _do_ need to check this _before_ executing the event. Because we cannot roll back after the event.
But I agree of course that this is a bug. I will try to find a way to fix. Maybe the check can be delayed until the first event that we are actually going to execute (not filter).
3. There's error message "Specified GTID ... conflicts with the binary log which contains a more recent GTID .... If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos". It looks like it's issued inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27, 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos were set to '0-2-30'. In this situation I was able to set gtid_slave_pos to '0-1-29' successfully and get "slave has diverged" error after START SLAVE. Then I was able to set gtid_slave_pos to '0-2-29' and get error "Attempt was made to binlog out-of-order" after START SLAVE. I'd think that at least in strict mode MariaDB shouldn't allow to set gtid_slave_pos to a value that is clearly in the past.
Right, thanks, I will check. (I can understand that 0-1-29 did not give error, though you are probably right that it should; but that 0-2-29 did not give error is surprising).
4. Now real bug. Start three servers S1, S2 and S3 without binlogs. Set gtid_slave_pos to the same value on all of them. Connect S2 to replicate from S1. Execute a few transactions on S1. Perform a failover, make S1 to replicate from S2. Now connect S3 to replicate from S2. At this point S3 should be able to replicate successfully because it has the same db state as S2 had in the beginning (S3 has the same gtid_slave_pos as S2 had initially), and S2 has all binlogs to move from current position on S3 to the current position on S2. But yet S3 gets error that starting GTID doesn't exist in S2's binlogs.
This should also be fixed by setting @@GLOBAL.gtid_binlog_state.
- Kristian.
Krisitan, Could you say are you working on these? Is there an ETA? This is blocking us from pushing MariaDB into testing in the near-production environment, and I'm hesitant to implement fixes myself because I'd think you'll do it completely differently. Thank you, Pavel On Mon, Aug 19, 2013 at 6:49 AM, Pavel Ivanov <pivanof@google.com> wrote:
Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable
@@GLOBAL.gtid_binlog_state
Example value: '0-1-100,0-2-101'
And you get an error if you set it unless the binlog is empty.
Would this be what you need?
Yep, sounds like what we need.
Thanks, Pavel
On Mon, Aug 19, 2013 at 4:28 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
I think to fix this bug we should stop using gtid_slave_pos as indication of the current db state. We should make it possible to
Agree.
change gtid_binlog_pos when there's no events in binlogs. And when
Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable
@@GLOBAL.gtid_binlog_state
Example value: '0-1-100,0-2-101'
And you get an error if you set it unless the binlog is empty.
Would this be what you need?
it kind of makes sense more than using gtid_slave_pos. But probably this will break the detection of slaves trying to connect using GTID before the start of binlogs...
I do not think it will break that (but we will see).
5. Completely from different area but also GTID related bug. Take database from previous MySQL version (I've tested on the database from 5.1), start MariaDB on it, run mysql_upgrade and then try to set gtid_slave_pos to something. At this point I've got error "unable to load slave state from gtid_slave_pos table". This error was apparently remembered from MariaDB's start and reading of gtid_slave_pos table wasn't retried after mysql_upgrade actually created it.
Ok, I will take a look. I think there is an existing bug report on that. IIRC there is some locking issue (the variable can be accessed from a place where table locks cannot be taken to read gtid_slave_pos table), but I will see what can be done.
1. When master doesn't have binlogs and gtid_slave_pos is ahead of the GTID that slave tries to connect with you give error "The binlog on the master is missing the GTID ... requested by the slave (even though both a prior and a subsequent number does exist), and GTID strict mode is enabled". I find this error message very confusing: presence of a subsequent GTID in such situation is questionable, but there is no prior GTID in master's binlog for sure.
Hm, this sounds like a bug. Do you have a testcase?
But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will get instead the correct error message, that the position that the slave requests to connect at has been purged from the master's binlog.
2. The error message "An attempt was made to binlog GTID ... which would create an out-of-order sequence number with existing GTID ..., and gtid strict mode is enabled" is confusing too, because it's issued not when slave actually tries to write event to binlog. Apparently the error condition is checked when slave considers executing the event that was just received from master. And if this event contains changes only to tables matching replicate-wild-ignore-table filter then this event won't be ever binlog'ed on slave in non-strict mode. So there's no "attempt to binlog" involved and error wording becomes not quite understandable.
Right, I see. Thanks!
One problem here is that when using non-transactional (DDL or MyISAM), then we _do_ need to check this _before_ executing the event. Because we cannot roll back after the event.
But I agree of course that this is a bug. I will try to find a way to fix. Maybe the check can be delayed until the first event that we are actually going to execute (not filter).
3. There's error message "Specified GTID ... conflicts with the binary log which contains a more recent GTID .... If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos". It looks like it's issued inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27, 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos were set to '0-2-30'. In this situation I was able to set gtid_slave_pos to '0-1-29' successfully and get "slave has diverged" error after START SLAVE. Then I was able to set gtid_slave_pos to '0-2-29' and get error "Attempt was made to binlog out-of-order" after START SLAVE. I'd think that at least in strict mode MariaDB shouldn't allow to set gtid_slave_pos to a value that is clearly in the past.
Right, thanks, I will check. (I can understand that 0-1-29 did not give error, though you are probably right that it should; but that 0-2-29 did not give error is surprising).
4. Now real bug. Start three servers S1, S2 and S3 without binlogs. Set gtid_slave_pos to the same value on all of them. Connect S2 to replicate from S1. Execute a few transactions on S1. Perform a failover, make S1 to replicate from S2. Now connect S3 to replicate from S2. At this point S3 should be able to replicate successfully because it has the same db state as S2 had in the beginning (S3 has the same gtid_slave_pos as S2 had initially), and S2 has all binlogs to move from current position on S3 to the current position on S2. But yet S3 gets error that starting GTID doesn't exist in S2's binlogs.
This should also be fixed by setting @@GLOBAL.gtid_binlog_state.
- Kristian.
Pavel Ivanov <pivanof@google.com> writes:
Could you say are you working on these? Is there an ETA? This is blocking us from pushing MariaDB into testing in the near-production environment, and I'm hesitant to implement fixes myself because I'd think you'll do it completely differently.
If you have time to test this then that would be a nice help. I'll see if I can come up with a quick patch (ie. later today or tomorrow). - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
If you have time to test this then that would be a nice help. I'll see if I can come up with a quick patch (ie. later today or tomorrow).
Please try this patch and let me know if you find any issues. I still need to implement test cases, but it seems to work from quick manual testing. If you use this to save and restore the internal binlog state across deleting the binlogs, you should be able to have things work the same as if FLUSH LOGS + PURGE BINARY LOGS had been used, and avoid all the small issues that were caused by the deletion of the binlogs. I've attached the patch and also appended some text that I indend to add to the documentation. - Kristian. ----------------------------------------------------------------------- Variable: gtid_binlog_state Scope: global Dynamic: Yes Type: String The variable gtid_binlog_state holds the internal state of the binlog. The state consists of the last GTID ever logged to the binary log for every combination of domain_id and server_id. This information is used by the master to determine whether a given GTID has been logged to the binlog in the past, even if it has later been deleted due to binlog purge. Normally this internal state is not needed by users, as @@gtid_binlog_pos is more useful in most cases. The main usage of @@gtid_binlog_state is to restore the state of the binlog after RESET MASTER (or equivalently if the binlog files are lost). If the value of @@gtid_binlog_state is saved before RESET MASTER and restored afterwards, the master will retain information about past history, same as if PURGE BINARY LOGS had been used (of course the actual events in the binary logs are lost). Note that to set the value of @@gtid_binlog_state, the binary log must be empty, that is it must not contain any GTID events and the previous value of @@gtid_binlog_state must be the empty string. If not, then RESET MASTER must be used first to erase the binary log first. For completeness, note that setting @@gtid_binlog_state internally executes a RESET MASTER. This is normally not noticable as it can only be changed when the binlog is empty of GTID events. However, if executed eg. immediately after upgrading to MariaDB 10, it is possible that the binlog is non-empty but without any GTID events, in which case all such events will be deleted, just as if RESET MASTER had been run. -----------------------------------------------------------------------
Thank you. Quick note: Sys_var_gtid_binlog_state::global_update() has label "err" in the wrong place -- it tries to free "data" before it's allocated. And as I picked up rev 3682 from 10.0-base it also has problems: New variables "rev" and "glev" in the write_ignored_events_info_to_relay_log should be initialized to NULL. For the new code in Gtid_list_log_event::Gtid_list_log_event() instead of "#ifdef MYSQL_SERVER" you probably wanted to write "#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)" or "#if defined(MYSQL_SERVER) && !defined(EMBEDDED_LIBRARY)". Pavel On Thu, Aug 22, 2013 at 7:27 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
If you have time to test this then that would be a nice help. I'll see if I can come up with a quick patch (ie. later today or tomorrow).
Please try this patch and let me know if you find any issues. I still need to implement test cases, but it seems to work from quick manual testing.
If you use this to save and restore the internal binlog state across deleting the binlogs, you should be able to have things work the same as if FLUSH LOGS + PURGE BINARY LOGS had been used, and avoid all the small issues that were caused by the deletion of the binlogs.
I've attached the patch and also appended some text that I indend to add to the documentation.
- Kristian.
----------------------------------------------------------------------- Variable: gtid_binlog_state Scope: global Dynamic: Yes Type: String
The variable gtid_binlog_state holds the internal state of the binlog. The state consists of the last GTID ever logged to the binary log for every combination of domain_id and server_id. This information is used by the master to determine whether a given GTID has been logged to the binlog in the past, even if it has later been deleted due to binlog purge.
Normally this internal state is not needed by users, as @@gtid_binlog_pos is more useful in most cases. The main usage of @@gtid_binlog_state is to restore the state of the binlog after RESET MASTER (or equivalently if the binlog files are lost). If the value of @@gtid_binlog_state is saved before RESET MASTER and restored afterwards, the master will retain information about past history, same as if PURGE BINARY LOGS had been used (of course the actual events in the binary logs are lost).
Note that to set the value of @@gtid_binlog_state, the binary log must be empty, that is it must not contain any GTID events and the previous value of @@gtid_binlog_state must be the empty string. If not, then RESET MASTER must be used first to erase the binary log first.
For completeness, note that setting @@gtid_binlog_state internally executes a RESET MASTER. This is normally not noticable as it can only be changed when the binlog is empty of GTID events. However, if executed eg. immediately after upgrading to MariaDB 10, it is possible that the binlog is non-empty but without any GTID events, in which case all such events will be deleted, just as if RESET MASTER had been run. -----------------------------------------------------------------------
Kristian, Unfortunately this doesn't work as expected. I took 10.0-base r3685. Started new just bootstrapped server with server_id = 1. It has @@global.gtid_binlog_pos, @@global.gtid_slave_pos and @@global.gtid_current_pos empty. Then I execute set global gtid_binlog_state = '0-10-10' After that @@global.gtid_binlog_pos = '0-10-10' as expected, but both @@global.gtid_slave_pos and @@global.gtid_current_pos are still empty. Because of that server won't be able to replicate from master. If I set gtid_binlog_state to '0-1-10' though @@global.gtid_current_pos changes to '0-1-10' and everything is fine. It looks like the problem is in the server_id check in the first loop in rpl_slave_state::iterate(). Can it be removed from there? Pavel On Thu, Aug 22, 2013 at 1:26 PM, Pavel Ivanov <pivanof@google.com> wrote:
Thank you.
Quick note: Sys_var_gtid_binlog_state::global_update() has label "err" in the wrong place -- it tries to free "data" before it's allocated.
And as I picked up rev 3682 from 10.0-base it also has problems: New variables "rev" and "glev" in the write_ignored_events_info_to_relay_log should be initialized to NULL. For the new code in Gtid_list_log_event::Gtid_list_log_event() instead of "#ifdef MYSQL_SERVER" you probably wanted to write "#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)" or "#if defined(MYSQL_SERVER) && !defined(EMBEDDED_LIBRARY)".
Pavel
On Thu, Aug 22, 2013 at 7:27 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
If you have time to test this then that would be a nice help. I'll see if I can come up with a quick patch (ie. later today or tomorrow).
Please try this patch and let me know if you find any issues. I still need to implement test cases, but it seems to work from quick manual testing.
If you use this to save and restore the internal binlog state across deleting the binlogs, you should be able to have things work the same as if FLUSH LOGS + PURGE BINARY LOGS had been used, and avoid all the small issues that were caused by the deletion of the binlogs.
I've attached the patch and also appended some text that I indend to add to the documentation.
- Kristian.
----------------------------------------------------------------------- Variable: gtid_binlog_state Scope: global Dynamic: Yes Type: String
The variable gtid_binlog_state holds the internal state of the binlog. The state consists of the last GTID ever logged to the binary log for every combination of domain_id and server_id. This information is used by the master to determine whether a given GTID has been logged to the binlog in the past, even if it has later been deleted due to binlog purge.
Normally this internal state is not needed by users, as @@gtid_binlog_pos is more useful in most cases. The main usage of @@gtid_binlog_state is to restore the state of the binlog after RESET MASTER (or equivalently if the binlog files are lost). If the value of @@gtid_binlog_state is saved before RESET MASTER and restored afterwards, the master will retain information about past history, same as if PURGE BINARY LOGS had been used (of course the actual events in the binary logs are lost).
Note that to set the value of @@gtid_binlog_state, the binary log must be empty, that is it must not contain any GTID events and the previous value of @@gtid_binlog_state must be the empty string. If not, then RESET MASTER must be used first to erase the binary log first.
For completeness, note that setting @@gtid_binlog_state internally executes a RESET MASTER. This is normally not noticable as it can only be changed when the binlog is empty of GTID events. However, if executed eg. immediately after upgrading to MariaDB 10, it is possible that the binlog is non-empty but without any GTID events, in which case all such events will be deleted, just as if RESET MASTER had been run. -----------------------------------------------------------------------
Pavel Ivanov <pivanof@google.com> writes:
I took 10.0-base r3685. Started new just bootstrapped server with server_id = 1. It has @@global.gtid_binlog_pos, @@global.gtid_slave_pos and @@global.gtid_current_pos empty. Then I execute
set global gtid_binlog_state = '0-10-10'
After that @@global.gtid_binlog_pos = '0-10-10' as expected, but both @@global.gtid_slave_pos and @@global.gtid_current_pos are still empty. Because of that server won't be able to replicate from master. If I set gtid_binlog_state to '0-1-10' though @@global.gtid_current_pos changes to '0-1-10' and everything is fine.
The short answer is that you should just set both gtid_slave_pos and gtid_binlog_state on the new server. SET GLOBAL gtid_binlog_state = '0-10-10'; SET GLOBAL gtid_slave_state = @@GLOBAL.gtid_binlog_pos; For the longer answer, let me try to explain: The gtid_binlog_pos and the gtid_slave_pos are different concepts in MariaDB. The former is the last GTID logged into the binlog (for each domain). The latter is the last GTID replicated by the slave. These become different because on the one hand slave can use --log-slave-updates=0 (so binlog is not updated), and on the other hand I did not want to add overhead of updating gtid_slave_pos for every transaction on the master. So a GTID that goes into one of them may or may not go into the other. Now let us set up a slave with CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos; The slave starts replication at the value of gtid_slave_pos. Every replicated GTID updates gtid_slave_pos, so to switch master we can just point it to the new host and it will continue from the correct point. But suppose we promote a new master, and later want the old master to to become a slave. The old master did not update gtid_slave_pos, so the point at which to start is the last GTID logged to the binlog, gtid_binlog_pos. Thus to start the old master replicating a slave one should use: SET GLOBAL gtid_slave_pos = @@GLOBAL.gtid_binlog_pos; CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos; and then things will proceed correctly with the new slave server. So this is how you should think of the variables. The gtid_slave_pos is the position at which to start replication for a slave. The gtid_binlog_pos is the last GTID logged into the binlog. Now, this creates an asymmetry - to switch a server to replicate from a new master, the user has to know if the server was a master or a slave before, and do it differently depending on which it is. So I wanted to provide a way to avoid this asymmetry, and I implemented CHANGE MASTER TO master_use_gtid=current_pos for this. In this mode, when the slave connects, it looks into both the gtid_slave_pos and the gtid_binlog_pos to decide which of these has the most recent GTID - and then uses that GTID as the point to start replication at. If server was a master before, then the last GTID in the binlog will have the server's own server_id; _and_ the sequence number will be bigger that what is in the gtid_slave_pos because sequence numbers on a master are always generated bigger than any seen before. So in this case we use the last GTID in the binlog to connect to. Otherwise we use the gtid_slave_pos. So that is _all_ that gtid_current_pos is - it is a way for the server to guess whether it was a master or a slave before, and act accordingly. A bit of magic for casual users that do not want to be aware of whether the server they are setting up as a slave was a slave already before, or a master. So the point is that if you want to use gtid_current_pos on a newly setup server, you need to provide correct values for _both_ gtid_binlog_pos/gtid_binlog_state _and_ gtid_slave_pos. Because gtid_current_pos is the result of combining the two.
It looks like the problem is in the server_id check in the first loop in rpl_slave_state::iterate(). Can it be removed from there?
I think so - in strict mode, the most recent GTID will always be the one with the highest sequence number, so the server_id check is not needed. On the other hand, if things are done correctly, the server_id check will make no difference, as a GTID with different server_id cannot get into the binlog without also getting into gtid_slave_pos But for now I have other, more critical things I want to fix first - I think this is not a critical thing, just setting gtid_slave_pos on the new server should make things work for you? (else let me know if I missed something). - Kristian.
Alright. I'd say if this is the only meaning current_pos should have then the name "current" is somewhat misleading. But ok, I'll set both gtid_binlog_state and gtid_slave_pos. It seems working so far. Pavel On Sat, Aug 24, 2013 at 1:00 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
I took 10.0-base r3685. Started new just bootstrapped server with server_id = 1. It has @@global.gtid_binlog_pos, @@global.gtid_slave_pos and @@global.gtid_current_pos empty. Then I execute
set global gtid_binlog_state = '0-10-10'
After that @@global.gtid_binlog_pos = '0-10-10' as expected, but both @@global.gtid_slave_pos and @@global.gtid_current_pos are still empty. Because of that server won't be able to replicate from master. If I set gtid_binlog_state to '0-1-10' though @@global.gtid_current_pos changes to '0-1-10' and everything is fine.
The short answer is that you should just set both gtid_slave_pos and gtid_binlog_state on the new server.
SET GLOBAL gtid_binlog_state = '0-10-10'; SET GLOBAL gtid_slave_state = @@GLOBAL.gtid_binlog_pos;
For the longer answer, let me try to explain:
The gtid_binlog_pos and the gtid_slave_pos are different concepts in MariaDB. The former is the last GTID logged into the binlog (for each domain). The latter is the last GTID replicated by the slave.
These become different because on the one hand slave can use --log-slave-updates=0 (so binlog is not updated), and on the other hand I did not want to add overhead of updating gtid_slave_pos for every transaction on the master. So a GTID that goes into one of them may or may not go into the other.
Now let us set up a slave with
CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;
The slave starts replication at the value of gtid_slave_pos. Every replicated GTID updates gtid_slave_pos, so to switch master we can just point it to the new host and it will continue from the correct point.
But suppose we promote a new master, and later want the old master to to become a slave. The old master did not update gtid_slave_pos, so the point at which to start is the last GTID logged to the binlog, gtid_binlog_pos. Thus to start the old master replicating a slave one should use:
SET GLOBAL gtid_slave_pos = @@GLOBAL.gtid_binlog_pos; CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;
and then things will proceed correctly with the new slave server.
So this is how you should think of the variables. The gtid_slave_pos is the position at which to start replication for a slave. The gtid_binlog_pos is the last GTID logged into the binlog.
Now, this creates an asymmetry - to switch a server to replicate from a new master, the user has to know if the server was a master or a slave before, and do it differently depending on which it is.
So I wanted to provide a way to avoid this asymmetry, and I implemented CHANGE MASTER TO master_use_gtid=current_pos for this. In this mode, when the slave connects, it looks into both the gtid_slave_pos and the gtid_binlog_pos to decide which of these has the most recent GTID - and then uses that GTID as the point to start replication at.
If server was a master before, then the last GTID in the binlog will have the server's own server_id; _and_ the sequence number will be bigger that what is in the gtid_slave_pos because sequence numbers on a master are always generated bigger than any seen before. So in this case we use the last GTID in the binlog to connect to. Otherwise we use the gtid_slave_pos.
So that is _all_ that gtid_current_pos is - it is a way for the server to guess whether it was a master or a slave before, and act accordingly. A bit of magic for casual users that do not want to be aware of whether the server they are setting up as a slave was a slave already before, or a master.
So the point is that if you want to use gtid_current_pos on a newly setup server, you need to provide correct values for _both_ gtid_binlog_pos/gtid_binlog_state _and_ gtid_slave_pos. Because gtid_current_pos is the result of combining the two.
It looks like the problem is in the server_id check in the first loop in rpl_slave_state::iterate(). Can it be removed from there?
I think so - in strict mode, the most recent GTID will always be the one with the highest sequence number, so the server_id check is not needed. On the other hand, if things are done correctly, the server_id check will make no difference, as a GTID with different server_id cannot get into the binlog without also getting into gtid_slave_pos
But for now I have other, more critical things I want to fix first - I think this is not a critical thing, just setting gtid_slave_pos on the new server should make things work for you? (else let me know if I missed something).
- Kristian.
Kristian, I'm sorry for reviving this old thread, but I think it still doesn't work correctly. So I took the latest 10.0-base (rev 3690) and started to simulate different situations when slave is restored from backup that is too old and thus it can't replicate from master. I've setup servers S1 (server_id = 1) and S2 (server_id = 2) and in all tests I make S1 master and S2 slave and I execute CHANGE MASTER TO ... MASTER_USE_GTID = current_pos. 1. Set gtid_binlog_state and gtid_slave_pos to '0-1-10' on S1 and to '0-1-1' on S2. Try to start slave on S2. I get the correct error "Probably the slave state is too old". 2. Execute 3 transactions on S1, its gtid_current_pos is 0-1-13, start slave on S2 (after CHANGE MASTER) it shows correct error "slave state is too old" again. 3. Set gtid_binlog_state and gtid_slave_pos to '0-3-10' on S1 and to '0-1-1' on S2. Try to start slave on S2. Now I get error "slave has diverged". What gives? It's not diverged, it's just behind. 4. Now execute a couple transactions on S1, its gtid_current_pos is 0-1-12 now. Start slave on S2 (remember -- its gtid_current_pos is 0-1-1). And now I see even more confusing "The binlog on the master is missing the GTID 0-1-1 requested by the slave (even though both a prior and a subsequent sequence number does exist)". I'm sorry, which prior sequence number exists? Do you think you can fix these problems? Pavel On Sat, Aug 24, 2013 at 10:25 PM, Pavel Ivanov <pivanof@google.com> wrote:
Alright. I'd say if this is the only meaning current_pos should have then the name "current" is somewhat misleading. But ok, I'll set both gtid_binlog_state and gtid_slave_pos. It seems working so far.
Pavel
On Sat, Aug 24, 2013 at 1:00 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
I took 10.0-base r3685. Started new just bootstrapped server with server_id = 1. It has @@global.gtid_binlog_pos, @@global.gtid_slave_pos and @@global.gtid_current_pos empty. Then I execute
set global gtid_binlog_state = '0-10-10'
After that @@global.gtid_binlog_pos = '0-10-10' as expected, but both @@global.gtid_slave_pos and @@global.gtid_current_pos are still empty. Because of that server won't be able to replicate from master. If I set gtid_binlog_state to '0-1-10' though @@global.gtid_current_pos changes to '0-1-10' and everything is fine.
The short answer is that you should just set both gtid_slave_pos and gtid_binlog_state on the new server.
SET GLOBAL gtid_binlog_state = '0-10-10'; SET GLOBAL gtid_slave_state = @@GLOBAL.gtid_binlog_pos;
For the longer answer, let me try to explain:
The gtid_binlog_pos and the gtid_slave_pos are different concepts in MariaDB. The former is the last GTID logged into the binlog (for each domain). The latter is the last GTID replicated by the slave.
These become different because on the one hand slave can use --log-slave-updates=0 (so binlog is not updated), and on the other hand I did not want to add overhead of updating gtid_slave_pos for every transaction on the master. So a GTID that goes into one of them may or may not go into the other.
Now let us set up a slave with
CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;
The slave starts replication at the value of gtid_slave_pos. Every replicated GTID updates gtid_slave_pos, so to switch master we can just point it to the new host and it will continue from the correct point.
But suppose we promote a new master, and later want the old master to to become a slave. The old master did not update gtid_slave_pos, so the point at which to start is the last GTID logged to the binlog, gtid_binlog_pos. Thus to start the old master replicating a slave one should use:
SET GLOBAL gtid_slave_pos = @@GLOBAL.gtid_binlog_pos; CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;
and then things will proceed correctly with the new slave server.
So this is how you should think of the variables. The gtid_slave_pos is the position at which to start replication for a slave. The gtid_binlog_pos is the last GTID logged into the binlog.
Now, this creates an asymmetry - to switch a server to replicate from a new master, the user has to know if the server was a master or a slave before, and do it differently depending on which it is.
So I wanted to provide a way to avoid this asymmetry, and I implemented CHANGE MASTER TO master_use_gtid=current_pos for this. In this mode, when the slave connects, it looks into both the gtid_slave_pos and the gtid_binlog_pos to decide which of these has the most recent GTID - and then uses that GTID as the point to start replication at.
If server was a master before, then the last GTID in the binlog will have the server's own server_id; _and_ the sequence number will be bigger that what is in the gtid_slave_pos because sequence numbers on a master are always generated bigger than any seen before. So in this case we use the last GTID in the binlog to connect to. Otherwise we use the gtid_slave_pos.
So that is _all_ that gtid_current_pos is - it is a way for the server to guess whether it was a master or a slave before, and act accordingly. A bit of magic for casual users that do not want to be aware of whether the server they are setting up as a slave was a slave already before, or a master.
So the point is that if you want to use gtid_current_pos on a newly setup server, you need to provide correct values for _both_ gtid_binlog_pos/gtid_binlog_state _and_ gtid_slave_pos. Because gtid_current_pos is the result of combining the two.
It looks like the problem is in the server_id check in the first loop in rpl_slave_state::iterate(). Can it be removed from there?
I think so - in strict mode, the most recent GTID will always be the one with the highest sequence number, so the server_id check is not needed. On the other hand, if things are done correctly, the server_id check will make no difference, as a GTID with different server_id cannot get into the binlog without also getting into gtid_slave_pos
But for now I have other, more critical things I want to fix first - I think this is not a critical thing, just setting gtid_slave_pos on the new server should make things work for you? (else let me know if I missed something).
- Kristian.
Pavel Ivanov <pivanof@google.com> writes:
3. Set gtid_binlog_state and gtid_slave_pos to '0-3-10' on S1 and to '0-1-1' on S2. Try to start slave on S2. Now I get error "slave has diverged". What gives? It's not diverged, it's just behind.
Yes, it is diverged. S1 has binlog state '0-3-10'. This means that the only GTIDs it ever contained were 0-3-1, 0-3-2, 0-3-3, ..., 0-3-10. But S2 has applied GTID 0-1-1, which never existed on S1, nor can it in the future without violating strict mode. If S2 had been behind S1, then S1 must have had 0-1-1 in its binlog at some point, so the binlog state would have been '0-1-5,0-3-10' or something like that. With such a binlog state, the error message would have been "slave too old".
4. Now execute a couple transactions on S1, its gtid_current_pos is 0-1-12 now. Start slave on S2 (remember -- its gtid_current_pos is 0-1-1). And now I see even more confusing "The binlog on the master is missing the GTID 0-1-1 requested by the slave (even though both a prior and a subsequent sequence number does exist)". I'm sorry, which prior sequence number exists?
Ok, that is a bit unfortunate wording. The point is - there is a hole in the binlog of S1 at the point of GTID 0-1-1. Because GTID 0-1-11 and 0-1-12 exists, but GTIDs 0-1-1, 0-1-2, ..., 0-1-10 do not exist in the binlog of S1 (and they never existed, according to binlog state). Let me change the error message text to: "The binlog on the master is missing the GTID %u-%u-%llu requested by the slave (even though a subsequent sequence number does exist), and GTID strict mode is enabled" to avoid the confusion in the case where like here, the hole is at the very start of the sequence numbers. Again, if the intension was that GTID 0-1-1 did exist in the binlog history, but was removed in purge or restore or whatever, the binlog state should have been '0-1-X,0-3-10'. Then the error message would have been 'slave too old'. - Kristian.
So, you keep insisting that it's important to have information about server_id of all servers that ever were a master for the database even with gtid_strict_mode = 1. :( I don't agree with that, because I believe this information can be useful only in circular replication or something similar. But okay, I understand your point... Pavel On Sun, Sep 15, 2013 at 5:46 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
3. Set gtid_binlog_state and gtid_slave_pos to '0-3-10' on S1 and to '0-1-1' on S2. Try to start slave on S2. Now I get error "slave has diverged". What gives? It's not diverged, it's just behind.
Yes, it is diverged.
S1 has binlog state '0-3-10'. This means that the only GTIDs it ever contained were 0-3-1, 0-3-2, 0-3-3, ..., 0-3-10.
But S2 has applied GTID 0-1-1, which never existed on S1, nor can it in the future without violating strict mode.
If S2 had been behind S1, then S1 must have had 0-1-1 in its binlog at some point, so the binlog state would have been '0-1-5,0-3-10' or something like that. With such a binlog state, the error message would have been "slave too old".
4. Now execute a couple transactions on S1, its gtid_current_pos is 0-1-12 now. Start slave on S2 (remember -- its gtid_current_pos is 0-1-1). And now I see even more confusing "The binlog on the master is missing the GTID 0-1-1 requested by the slave (even though both a prior and a subsequent sequence number does exist)". I'm sorry, which prior sequence number exists?
Ok, that is a bit unfortunate wording.
The point is - there is a hole in the binlog of S1 at the point of GTID 0-1-1. Because GTID 0-1-11 and 0-1-12 exists, but GTIDs 0-1-1, 0-1-2, ..., 0-1-10 do not exist in the binlog of S1 (and they never existed, according to binlog state).
Let me change the error message text to:
"The binlog on the master is missing the GTID %u-%u-%llu requested by the slave (even though a subsequent sequence number does exist), and GTID strict mode is enabled"
to avoid the confusion in the case where like here, the hole is at the very start of the sequence numbers.
Again, if the intension was that GTID 0-1-1 did exist in the binlog history, but was removed in purge or restore or whatever, the binlog state should have been '0-1-X,0-3-10'. Then the error message would have been 'slave too old'.
- Kristian.
I took a close look at your patch for MDEV-4820.
I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers. I found this disturbing and not fully follow what kind of holes are
Hi, On 08/12/2013 02:59 PM, Kristian Nielsen wrote: possible. These GTIDS can be used by human users to start slaves on particular position. How do you know that there is really a hole in GTID numbers instead that you started slave from incorrect position ? If you set the starting point to the real hole, what happens, is the replication started from next real GTID or from the beginning ? Users could use GTID as a way to verify the slave state consistency using their own software. If the actual implementation does not allow this, it makes creating cluster replication state monitoring software very hard to implement.
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I found alternate futures also disturbing, in database field that would mean that one server is on one state and another in different state and that would lead to state where you do not know which one is consistent or both are inconsistent. R: Jan
Jan Lindström <jplindst@mariadb.org> writes:
I found this disturbing and not fully follow what kind of holes are possible. These GTIDS can be used by human users to start slaves on
For example, if you use some of the many filtering options, like --replicate-ignore-*. Then there will be GTIDs on the master that are not in the binlog on the slave. This slave can itself be a master, and it will have "holes" relative to the original master (but not relative to the sub-cluster it is the master of). So note that one persons missing transaction is another persons deliberate filtering.
particular position. How do you know that there is really a hole in GTID numbers instead that you started slave from incorrect position ?
You can always use the contents of the binlogs to know this. You can search the binlogs for your GTID and determine if it was a) logged in an earlier binlog that was purged, b) found in the binlog, c) a "hole" due to filtering or whatever, or d) not yet existing at all in the binlog (not yet received from the master or completely alternate future).
If you set the starting point to the real hole, what happens, is the replication started from next real GTID or from the beginning ?
Usually I would consider it an error for a slave to try to start from a hole. In gtid strict mode, we give the error. But in non-strict mode, replication is allowed to start from the next GTID, so that we remain compatible with all existing usage of replication. I've very carefully tried to make sure that we can do everything correctly, same as if we enforced monotonic sequence numbers with no holes.
Users could use GTID as a way to verify the slave state consistency using their own software. If the actual implementation does not allow this, it makes creating cluster replication state monitoring software very hard to implement.
What exactly is it that the implementation does not allow, or which is hard to implement?
I found alternate futures also disturbing, in database field that would mean that one server is on one state and another in different state and that would lead to state where you do not know which one is consistent or both are inconsistent.
Yeah, welcome to the scary world of MySQL replication. Some people do crazy stuff with it and certainly have "alternate futures" as daily normal operation. MariaDB GTID needs to support both that kind of anarchy, and also disciplined setups where alternate futures are considered a severe error to be avoided. This is one of the things that makes the problem so hard. Any constructive help in reaching this goal is appreciated. - Kristian.
Hi, On 08/13/2013 09:49 AM, Kristian Nielsen wrote:
You can always use the contents of the binlogs to know this. You can search the binlogs for your GTID and determine if it was a) logged in an earlier binlog that was purged, b) found in the binlog, c) a "hole" due to filtering or whatever, or d) not yet existing at all in the binlog (not yet received from the master or completely alternate future). There is a big assumption here, that you have that binlog file available. If it is c) and d) look very similar if this binlog is the only one available currently. What exactly is it that the implementation does not allow, or which is hard to implement?
Let's assume we do not have master up and running. But there are several slaves to choose. If every slave have different last GTID as executed, what automatic rule you could come up to choose the correct up the date slave as master ? The fact that some slave has largest GTID does not mean that it has executed all the same GTIDs as the former master (now dead and might have lost its binlog). If all slaves are in different alternate futures, not sure which one to select or maybe we should select superposition of all of those ;-) R: Jan
Jan Lindström <jplindst@mariadb.org> writes:
Hi,
On 08/13/2013 09:49 AM, Kristian Nielsen wrote:
You can always use the contents of the binlogs to know this. You can search the binlogs for your GTID and determine if it was a) logged in an earlier binlog that was purged, b) found in the binlog, c) a "hole" due to filtering or whatever, or d) not yet existing at all in the binlog (not yet received from the master or completely alternate future).
There is a big assumption here, that you have that binlog file available.
It is not "that binlog file", or any individual binlog file. It is the whole binlog (master-bin.index, master-bin.XXXXXX, master.info). "The binlog" is always available on the master. Even if you RESET MASTER, and make the binlog empty - it is still the binlog. On a slave, the binlog is not available, so the design is carefully made so that the code never needs to access the binlog on the slave for things to work properly. If some part (or all) of the binlog is missing for whatever reason (RESET MASTER, purge, manual delete), and this part would be needed to safely replicate, an error results and replication is halted (at least in strict mode).
If it is c) and d) look very similar if this binlog is the only one available currently.
Sorry, I do not understand that sentence. c) is when slave asks for D-S-N1 which never existed in our binlog, but we do have D-S-N2, with N2 > N1. Since sequence numbers are guaranteed to be monotonic, we can be sure that this transaction is missing and give an error in strict mode. d) is when the last GTID we have for domain D and server id S is D-S-N1, but slave asks for D-S-N2, N2 > N1. We cannot know for sure if this is because slave has an alternate future, or if it is just because slave got D-S-N2 from an upper-level master before we did. Since we cannot know for sure, we give an error in all cases (strict or non-strict) to be safe. If we have some other GTID D-S'-N3 with N3 >= N2, we can guess that we have the "alternate future" case and give a more detailed error as Pavel suggests, but as I explained in the previous mail this is somewhat unreliable.
What exactly is it that the implementation does not allow, or which is hard to implement?
Let's assume we do not have master up and running. But there are several slaves to choose. If every slave have different last GTID as executed, what automatic rule you could come up to choose the correct up the date slave as master ? The fact that some slave has largest GTID does not mean that it has executed all the same GTIDs as the former master (now dead and might have lost its binlog).
You have to understand that there is a difference between what the mysqld server code can do, and what the application layer (or user, or failover scripts or whatever) can do. The GTID design is made so that the mysqld code is not allowed to assume that sequence numbers are monotonic (and it is written so that it does not have to). This means the user is not obliged to obey such assumption; even if she does not, the code still promises to work correctly. But the user is free to decide to restrict to globally monotonic GTID. In fact, such decision is encouraged and the gtid_strict_mode is available to assist such users (by giving an error if the restriction would otherwise be violated). Such disciplined user is then free to use this restriction to make more informed decisions such as which slave to promote and so on. Note btw. that there is a standard solution described in the GTID documentation that works for your particular problem in any case. You can pick any slave at random, let it replicate from every other slave with START SLAVE UNTIL to catch up to each one, and then it will be suitable as the master. This is needed in the general case with more than one replication domain. But I agree that a common case will be a single domain, gtid_strict_mode=1, and just pick the slave with highest sequence number as the new master. This will work fine in strict mode. MariaDB GTID is flexible, without sacrificing any safety. At least that is the ambition.
If all slaves are in different alternate futures, not sure which one to select or maybe we should select superposition of all of those ;-)
Well, either you want alternate futures in your replication hierarchy or you do not. If you do not want it, you can enable strict mode and get an error if you mess up. If you do want it, I think the burden is on you to define what it means to correctly promote a new master. No? - Kristian.
participants (3)
-
Jan Lindström
-
Kristian Nielsen
-
Pavel Ivanov