On Mon, Aug 12, 2013 at 4:59 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I took a close look at your patch for MDEV-4820.
I think there is a fundamental disconnect. In MariaDB GTID, I do not require or rely on monotonically increasing seqeunce numbers (monoticity is requred per-server-id, but not between different servers). Nor do I enforce or rely on absence of holes in the sequence numbers.
This decision was a hard one to make and I spent considerable thought on this point quite early. It is true that this design reduces possibilities to detect some kinds of errors, like missing events and alternate futures.
I can understand if this design is not optimal for what you are trying to do. However, implementing two different designs (eg. based on value of gtid_strict_mode) is not workable. I believe at least for the first version, the current design is what we have to work with.
I wonder what kind of production environment tolerates lost transactions or alternate futures. It's really sad to hear that by intentional design MariaDB doesn't fit well into those environments that don't want to tolerate db inconsistencies...
The gtid_strict_mode is about helping the user by giving an error instead of logging certain undeirable event sequences to the binlog. It should not cause different non-error behaviour ever, nor should it prevent working with binlogs that were logged when strict mode was not enabled.
diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -993,7 +993,7 @@ rpl_binlog_state::check_strict_sequence(uint32 domain_id, uint32 server_id,
if ((elem= (element *)my_hash_search(&hash, (const uchar *)(&domain_id), 0)) && - elem->last_gtid && elem->last_gtid->seq_no >= seq_no) + elem->last_gtid && elem->last_gtid->seq_no + 1 != seq_no)
If I understand this part correctly, this is about enforcing that there are no holes in sequence numbers. This goes against the GTID design.
Just out of curiosity: could tell me what legitimate sequence of events can lead to hole in sequence numbers?
+ else if (*first_event && event_gtid.seq_no > gtid->seq_no) + { + /* + Earlier we decided that this binlog file has event that we need, + but the first event in the file has seq_no beyond what we need. + This can happen in two cases: + - this file is the first binlog we have and Gtid_list event + in it is empty; + - there is a gap between the last event mentioned in Gtid_list + and the first event in the binlog file. + As the first reason is the most probable one let's issue the + appropriate error message. + */ + my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; + *error_gtid= *gtid; + return "Could not find GTID state requested by slave in any " + "binlog files. Probably the slave state is too old and " + "required binlog files have been purged"; + } + }
This "first_event" (which I admit I do not fully understand) looks wrong - all such things usually need to be per-domain.
You are right, although I'd think it shouldn't be first_event per-domain. Instead the Gtid_list event should be remembered and then first seen event should be compared to what was in Gtid_list...
However, I think this is the place in the code I mention above, where we get an event in a domain that the slave requested, but we did not have anything at connect time. So here I want an even stricter check, and also in non-strict mode: The first GTID in the domain must be _equal_ to what the slave requested, or we should fail.
Don't forget the special case when the GTID requested by slave is the last event in this domain in the previous binlog file. Then you don't look into that file and start serving directly from the next event which won't be equal to what slave requested.
----
Next I will take a closer look at your test cases and see what can be done to fix it in a way compatible with the basic design.
Well, if I understood you correctly all test cases shouldn't work by design. Maybe only except the second case when server doesn't replicate at all.
I also want to re-iterate the suggestion to not delete binlogs when cloning a master to a new master. The most recent binlog file has real information (in the initial Gtid_list event) that is needed for proper operation as a master, and it really doesn't make sense to first purposely corrupt the server state and subsequently try to hack the code to fix the damage. If one really wanted not to copy any binlog files, then a better idea would be a way to restore the state, eg. a RESET MASTER TO gtid_list='xxx,yyy,...' command or something like that.
So, in MySQL 5.1 with Google's Group IDs binlog didn't have real information. I guess we can reevaluate what we should do with MariaDB. Although it sounds like whatever we do it still won't fit us without hacks that go against MariaDB's design -- we'll still need to hack to prohibit any transaction loss or alternate futures... Pavel