Giuseppe Maxia <g.maxia@gmail.com> writes:
Thanks for the quick answer. What you are saying is that the transaction in node 101 is retained because it is part of the stream coming from node 100.
Yes.
Here I note a difference with the usage of domain_ids. * If the domain_id is the same, then the transaction from 101 is discarded and replaced by the latest events coming from 100. * If I set a different domain ID for each master, the transaction from 101 is retained even if I then insert hundreds of events from 100 while 101 stays idle.
The way it works is that @@gtid_slave_pos records, for each configured domain_id, the last GTID seen within that replication domain. So in general, @@gtid_slave_pos will have as many elements as there are domains in the replication topology (though a domain can appear empty if no GTIDs were ever replicated from it on a given server). So this difference is always there: configuring one more domain_id will result in one extra entry in @@gtid_slave_pos. In a simple replication topology with only a single master, domain_id is not needed. Slaves preserve commit order from their master, so the order of GTIDs in all binlogs is the same, and a slave that reconnects need only start at the point of the last GTID it applied before. When a slave switches to a new master, it is essential that binlog order is the same on old and new master for this to work correctly. With multiple masters (multi-source, or ring-topology, or otherwise), in general we can not be sure that binlog order will be identical between old and new master when a slave switches master. This is the purpose of domain_id. Effectively, it makes the binlog consist of independent streams, one per replication domain. The slave keeps track of its position (last applied GTID) for each domain individually. Then only order within each stream needs to be consistent for slave connect to new master to work correctly. And this is ensured by configuring domain_id so that each has at most one master active at any one time. So in your star topology, if you configure different domain ids, and you have a slave off one endpoint, you can switch it to use a different endpoint (or they hub) as a new master - and later switch it back. This requires remembering the position within each domain indefinitely. If you were to use the same domain id everywhere, replication would still work, same as in non-GTID mode. But you would not be able to easily switch a slave from one endpoint to another - again the same as in non-GTID mode.
To give you a clearer view of what I am doing, I am experimenting with a star topology, where I have endpoints that are masters connected to a hub, which is the node with log-slave-updates enabled. The topology works perfectly: data produced in the endpoints or in the hub reach all the other endpoints. The only glitch is the endpoints (master nodes without log-slave-updates) have their own transactions in gtid_slave_pos long after purging the logs in all nodes.
If this entry was missing, it would mean that upon next slave connect, that server should fetch _all_ events in that domain (that it created itself) from the master, only to skip them because of --replicate-save-server-id=0. Which surely is not intended. I guess the thing that is not clear to me is why you consider this a glitch? Does it create any problems for your setup? - Kristian.