Hi Pavel, Thanks for explanation. I was following your discussion about "strict" mode as well. So I guess I can formulate my concerns more or less compactly: 1) Although server ID allows us to distinguish between 0-0-101 and 0-1-101 in the example below, it does not seem to be used for anything useful, but to log events with duplicate sequence numbers within a given domain ID, which, as we know, should not be there. 2) Even the ability to distinguish between 0-0-101 and 0-1-101 seems to not work consistently and be a matter of luck since either one can get lost after a couple of failovers. Why bother at all? So it seems that once you forbid logging of duplicate sequence numbers (making "strict" mode the only mode, at least in this context), you can safely discard server ID from the GTID, no? Regards, Alex On 2013-05-08 02:02, Pavel Ivanov wrote:
Helping Kristian to answer questions (see below). He can elaborate if he wish to.
On Tue, May 7, 2013 at 12:52 PM, Alex Yurchenko <alexey.yurchenko@codership.com> wrote:
On 2013-05-07 17:13, Kristian Nielsen wrote:
Alex Yurchenko <alexey.yurchenko@codership.com> writes:
From the documentation the purpose of domain ID in GTID is quite clear. But what is the role of server ID?
The role is mainly to ensure uniqueness of GTID when domain_id is not configured correctly.
Since both are configured manually, and domain ID can simply default to 0 in simple setups, I'd imagine that the possibility of having server ID configured incorrectly (just missing to configure it) is way more probable than having domain ID incorrect. Actually, being an arbitrary node group ID what is "incorrect" here? ANY value just makes the node a member of the corresponding domain, so ANY domain ID value is legal. Whereas server ID can certainly be incorrect (a duplicate within the domain).
Incorrect in this case would be having multi-master replication with independent replication streams having the same domain_id.
Replication already requires server ID to be unique, so (server_id, sequence number) will be globally unique as long as sequence number is increased locally on each server.
Domain id is not required to be unique, in fact it will typically be shared by master and slave. It is a common mistake to do a manual transaction on the slave while transactions are also being done at the same time on the master. Having server_id in the GTID prevents that two different transactions end up with the same GTID.
So suppose we have nodes N0, N1 and N2 with IDs 0-0, 0-1, 0-2 respectively.
Initially N1 and N2 both replicate from N0 and have identical DB contents.
At 0-0-10 N2 goes to maintenance.
After 0-0-100 someone executes local transaction on N1 and it gets logged as 0-1-101. Right?
Right.
So if now N0 executes another transaction, what will be its GTID? a) on N0 - 0-0-101?
Correct.
b) on N1 - 0-0-102 or 0-0-101? (as your documentation states
On N1 it will be the same -- 0-0-101. The purpose of GTID is that the same transaction has the same GTID on each server.
The server ID is set to the server ID of the server where the event group is first logged into the binlog. The sequence number is increased on a server for every event group logged.
So it is actually another question, sequence number is not set on the master server but always computed locally?)
For every transaction replicated from master sequence number is set on the master. For each transaction executed on the slave locally sequence number is generated locally.
Or, does N1 detect a problem at this point? If yes, how exactly?
N1 doesn't detect a problem currently, but Kristian plans to implement a "strict gtid mode" when N1 will detect a problem.
How server ID is involved there?
Server ID is involved in a sense that there's no confusion when you talk about transaction with sequence number 101 in your example above. There are two transactions with sequence number 101 -- 0-0-101 and 0-1-101. So server ID gives distinct GTID to these transactions despite the same sequence number.
Now if we can get past this point without an error at N1, and start N2 to replicate from N1, I take it will receive and commit 0-1-101. But will it ever record it somewhere or its state will be simply 0-0-XXX?
Yes, N2 will receive transaction 0-1-101 and it will put it in binlogs as 0-1-101, the same way as it was on N1.
If at 0-0-110 we failover N0 to replicate from N2, will it receive 0-1-101?
No, if you failover when both N0 and N2 have already 0-0-110 then N0 won't receive 0-1-101 because it will start replication from the first transaction after 0-0-110. OTOH if at 0-0-110 you take N0 out, restore it to the pre- 0-1-101 state and connect to replicate from N2 then N0 will receive 0-1-101.
Pavel
-- Alexey Yurchenko, Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011