Re: [Maria-developers] Interaction between rpl_slave_state and rpl_binlog_state
Hi Andrei! The @@gtid_current_pos exists for one sole purpose. This is to let the user promote a slave as the new master and attach the old master as a slave to the new master. By using master_use_gtid=current_pos, the exact same command can be used to attach a slave to the new master, regardless of whether that slave was previously a slave or a master: CHANGE MASTER TO master_host=new_promoted_master If not using gtid_current_pos (ie master_use_gtid=slave_pos), then to let the old master become a slave of the new master, the old master's position must explicitly be set: SET GLOBAL gtid_slave_pos=@@gtid_binlog_pos This is because for efficiency reasons, the master doesn't update the mysql.gtid_slave_pos in each commit. So now we can see why only GTIDs with the servers own server_id should contribute to @@gtid_current_pos. If a GTID was replicated from another server, that GTID will appear in the @@gtid_slave_pos. If the GTID originated on this server, it will appear in @@gtid_binlog_pos. The @@gtid_current_pos is the @@gtid_slave_pos extended with GTIDs originating on this server, hence only GTIDs with our own server id. Normally, every GTID in the binlog with a different server id than our own will already be in the @@gtid_slave_pos as well - since it originated on another server and was replicated to this server. Thus, in the normal case, where user did not play tricks with the binlog and slave state, extending the @@gtid_current_pos as suggested in MDEV-18404 has no effect - the GTIDs are already in the @@gtid_slave_pos, so @@gtid_current_pos is unaffected. And in case the user deliberately modified the state, it should be up to the user to decide what goes into @@gtid_slave_state and @@gtid_binlog_state. For example, the MDEV-18404 change would make it impossible on a server to remove a replicated GTID from the @@gtid_current_pos if --log-slave-updates (without the drastic RESET MASTER). Another problem is that the server cannot reliably compare GTIDs with distinct server ids to decide which one is the most recent. There is no guarantee that sequence numbers are monotonic across different server ids. Thus the MDEV-18404 method could create completely invalid positions in some setups where @@gtid_strict_mode=0 and replication domains are not strictly maintained. I don't see the value in MDEV-18404. If the user is updating @@gtid_binlog_state (itself a very drastic operation), and wants a specific GTID to go into the slave position - just update the @@gtid_slave_pos with the desired GTID, don't leave the server with an inconsistent replication state. And finally, let me reiterate: I consider the @@gtid_current_pos a design mistake. Better to just transfer the @@gtid_binlog_pos to the @@gtid_slave_pos _only_ at the point where an old master is turned into a slave. This can be done manually already, and it would be simple to implement automatic support for this with an extra option for CHANGE MASTER. Hope this helps, - Kristian. Andrei Elkin <andrei.elkin@mariadb.com> writes:
Kristian, howdy!
I thought it's good to refer to a previous discussion that touched 'current_pos' feature now when we need to resolve MDEV-18404 which I look at under a specific angle, narrated here.
Finally, experience has shown that a _lot_ of users get problems when locally done transactions on a slave influence the slave's GTID position. In retrospect, I have realised that CHANGE MASTER TO master_use_gtid=current_pos was a mistake, only slave_pos should be used. Similarly, if a local transaction in a slave's binlog can cause transactions from the master to be silently ignored, it will cause a lot of grief for users.
To the definition of
gtid_current_pos
the docs https://mariadb.com/kb/en/library/gtid/#gtid_current_pos first introduce an intuition with
This variable is the GTID of the last change to the database for each replication domain. Such changes can either be master events (ie. local changes made by user or application), or replicated events originating from another master server.
and then define the variable essentially as an union of two sets:
For each replication domain, if the server ID of the corresponding GTID in @@gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos, then the GTID from @@gtid_binlog_pos will be used. Otherwise the GTID from @@gtid_slave_pos will be used for that domain.
The union operation is on a *subset* of @@gtid_binlog_pos (call it B') and the whole @@gtid_slave_pos (S):
C = B' \cup S
where C @@gtid_current_pos. The @@gtid_binlog_pos subset consists of GTID:s originated by @@global.server_id (that is foreign ones excluded).
Notice that makes the union non symmetric as binary operator. A foreign GTID feels perfectly fine in S and gets into the union result.
The following use case
--connection master CREATE TABLE t (a INT); RESET MASTER; SELECT @@global.server_id; # => 1 SET @@server_id= 3; INSERT INTO t SET a=3; select @@global.gtid_binlog_state = '0-3-1'; # => 1
technically it's a 'local changes', but when S (@@global.gtid_slave_pos) is empty
SELECT @@global_current_pos = ''; # => 1
This result is another way to read MDEV-18404.
After thinking over the restricted definition of @@gtid_current_pos I could not arrive at relaxing it to account the whole @@gtid_binlog_set which fixes everything. I wonder what do think about this case, have I missed any use case that might have forced to the definition be as it is?
Below is a straightforward patch that sustains the current mtr rpl suites, expect rpl_gtid_startpos (that exactly relies at one point on the empty @@gtid_current_pos due to the union specifics, I am leaving out changes to the test at this point).
Cheers,
Andrei
diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc index 752b4172b6e..1f9802bd8f8 100644 --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -845,8 +845,7 @@ rpl_slave_state::iterate(int (*cb)(rpl_gtid *, void *), void *data, my_hash_init(>id_hash, &my_charset_bin, 32, offsetof(rpl_gtid, domain_id), sizeof(uint32), NULL, NULL, HASH_UNIQUE); for (i= 0; i < num_extra; ++i) - if (extra_gtids[i].server_id == global_system_variables.server_id && - my_hash_insert(>id_hash, (uchar *)(&extra_gtids[i]))) + if (my_hash_insert(>id_hash, (uchar *)(&extra_gtids[i]))) goto err;
mysql_mutex_lock(&LOCK_slave_state);
participants (1)
-
Kristian Nielsen