[Maria-developers] Interaction between rpl_slave_state and rpl_binlog_state
Hi Kristian, Andrei I have some question related to rpl_slave_state. Suppose A circular async replication between A < -- > B (gtid_ignore_duplicates on) Now, we set some temp server_id on server A , lets say `X`. Now the problem is each event group which originates from A is executed 2 times. For example we insert into table t1 and gtid is 0-X-2. The event goes to slave B B applies it, And send it back to A, Since its server_is different from A global server_id , this event is not filtered in queue_event(). Now event goes to handle_sql thread , And check_duplicate_gtid function is called. Since rpl_slave_state::hash does not have a element whose sequence_no is >= 2, So this event is applied again. Andrei suggested a solution of checking rpl_binlog_state in check_duplicate_gtid, This solution solves some problem but creates some problem. (multi_source.gtid_ignore_duplicates fails in master_sync, test for mdev_10715 also fails ). Reason for this is sync_with_master calls master_gtid_wait , which internally calls gtid_waiting::wait_for_gtid , and this checks rpl_salve_state::hash. Although this method can also be patched for to look for rpl_gtid_global_binlog_state. But I am not sure whether this will solve all problem or create new problem. Another solution might be somehow update rpl_slave_state::hash when we write gtid_event in log. But this does not make sense. rpl_slave_state should be used for slave replication usage. I think we need a more better solution for this. -- Regards sachin
Hi All, On Tue, Nov 28, 2017 at 3:03 PM, Sachin Setiya <sachin.setiya@mariadb.com> wrote:
Hi Kristian, Andrei
I have some question related to rpl_slave_state. Suppose A circular async replication between A < -- > B (gtid_ignore_duplicates on) Now, we set some temp server_id on server A , lets say `X`. Now the problem is each event group which originates from A is executed 2 times. For example we insert into table t1 and gtid is 0-X-2. The event goes to slave B B applies it, And send it back to A, Since its server_is different from A global server_id , this event is not filtered in queue_event(). Now event goes to handle_sql thread , And check_duplicate_gtid function is called. Since rpl_slave_state::hash does not have a element whose sequence_no is >= 2, So this event is applied again.
Andrei suggested a solution of checking rpl_binlog_state in check_duplicate_gtid, This solution solves some problem but creates some problem. (multi_source.gtid_ignore_duplicates fails in master_sync, test for mdev_10715 also fails ). Reason for this is sync_with_master calls master_gtid_wait , which internally calls gtid_waiting::wait_for_gtid , and this checks rpl_salve_state::hash. Although this method can also be patched for to look for rpl_gtid_global_binlog_state. But I am not sure whether this will solve all problem or create new problem. Actually this wont work , because I can just look at maximum gtid
Another solution might be somehow update rpl_slave_state::hash when we write gtid_event in log. But this does not make sense. rpl_slave_state should be used for slave replication usage.
I think we need a more better solution for this.
-- Regards sachin
Experimental code is on this branch http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-10715 -- Regards Sachin Setiya Software Engineer at MariaDB
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I have some question related to rpl_slave_state. Suppose A circular async replication between A < -- > B (gtid_ignore_duplicates on)
Why do you set gtid_ignore_duplicates? This option is for multi-source replication: https://mariadb.com/kb/en/library/gtid/#gtid_ignore_duplicates "When set, different master connections in multi-source replication are allowed to receive and process event groups with the same GTID" But you are not using multi-source connection here, there is only one master connection (eg. connection to B on slave A). Thus, the option will do nothing in this case.
Now, we set some temp server_id on server A , lets say `X`. Now the problem is each event group which originates from A is executed 2 times. For example we insert into table t1 and gtid is 0-X-2. The event goes to slave B B applies it, And send it back to A, Since its server_is different
I think here you mean that A has server_id=1 (eg), B has server_id=2, but on A you do SET server_id=3; INSERT INTO t1 VALUES (1); But there is no server with server_id=3 anywhere. In this case, you need to break the circle yourself somewhere. For example by CHANGE MASTER ... IGNORE_SERVER_IDS=3 on A. To my knowledge, this has always been so for ring replication.
Andrei suggested a solution of checking rpl_binlog_state in check_duplicate_gtid, This solution solves some problem but creates
It seems you think that --gtid-ignore-duplicates should magically ignore any apply of duplicate GTID. But that is not the case, as the documentation states (though admittedly rather briefly). --gtid-ignore-duplicates is _only_ for multi-source replication (so perhaps unfortunately named). In this case, the conflict is not between GTIDs replicated from different master connections. It is a conflict between a transaction originated on a master with a transaction replicated from another master.
write gtid_event in log. But this does not make sense. rpl_slave_state should be used for slave replication usage.
Agree. rpl_binlog_state should not be involved in slave GTID processing. There should be a clear separation: rpl_slave_state is what a slave has applied from another master. rpl_binlog state is what a master has originated. The gtid_ignore_duplicates option is already very difficult for users to understand and use correctly. It would be a mistake to make it even more complicated. Also, this seems to originate from some Galera issue. It is well known that Galera was merged prematurely into MariaDB with a broken design, and this was never fixed. Galera issues must never influence how non-galera replication (which at least attempts to have a proper design) works. Hope this helps, - Kristian.
Kristian, howdy. Thanks for a simple CHANGE MASTER ... IGNORE_SERVER_IDS that you remind us about! (This time evaded myself alone :-)) It perfectly covers a cluster circular case. What motivated me to consider this option for looking for duplicates also in gtid_binlog_pos was the following observation. A duplicate gtid (transaction) can also arrive from a separate session of the same server but in this case the gtid_ignore_duplicates rules do not apply. Such gtid would silently override an existing. On the other hand gtid_strict_mode applies to either the ordinary server and the slave (by the docs). MariaDB [test]> show global variables like 'gtid_binlog_pos'; +-----------------+--------+ | Variable_name | Value | +-----------------+--------+ | gtid_binlog_pos | 0-1-12 | +-----------------+--------+ 1 row in set (0.00 sec) MariaDB [test]> set @@session.gtid_seq_no=11; ERROR 1950 (HY000): An attempt was made to binlog GTID 0-1-11 which would create an out-of-order sequence number with existing GTID 0-1-12, Maybe it would not a bad idea to generalize the gtid_ignore_duplicates to cover any source duplicate which would become effectively a "soft" mode to silently ... reject. In other words how about extending a gtid (operational) mode as a set to "gtid_mode" \in { on (override by dups), strict (error out dups) + , soft (ignore dups) } To other subjects,
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I have some question related to rpl_slave_state. Suppose A circular async replication between A < -- > B (gtid_ignore_duplicates on)
Why do you set gtid_ignore_duplicates? This option is for multi-source replication:
https://mariadb.com/kb/en/library/gtid/#gtid_ignore_duplicates
"When set, different master connections in multi-source replication are allowed to receive and process event groups with the same GTID"
But you are not using multi-source connection here, there is only one master connection (eg. connection to B on slave A). Thus, the option will do nothing in this case.
Now, we set some temp server_id on server A , lets say `X`. Now the problem is each event group which originates from A is executed 2 times. For example we insert into table t1 and gtid is 0-X-2. The event goes to slave B B applies it, And send it back to A, Since its server_is different
I think here you mean that A has server_id=1 (eg), B has server_id=2, but on A you do
SET server_id=3; INSERT INTO t1 VALUES (1);
But there is no server with server_id=3 anywhere. In this case, you need to break the circle yourself somewhere. For example by CHANGE MASTER ... IGNORE_SERVER_IDS=3 on A.
To my knowledge, this has always been so for ring replication.
Andrei suggested a solution of checking rpl_binlog_state in check_duplicate_gtid, This solution solves some problem but creates
It seems you think that --gtid-ignore-duplicates should magically ignore any apply of duplicate GTID. But that is not the case, as the documentation states (though admittedly rather briefly). --gtid-ignore-duplicates is _only_ for multi-source replication (so perhaps unfortunately named).
In this case, the conflict is not between GTIDs replicated from different master connections. It is a conflict between a transaction originated on a master with a transaction replicated from another master.
write gtid_event in log. But this does not make sense. rpl_slave_state should be used for slave replication usage.
Agree. rpl_binlog_state should not be involved in slave GTID processing. There should be a clear separation: rpl_slave_state is what a slave has applied from another master. rpl_binlog state is what a master has originated.
The gtid_ignore_duplicates option is already very difficult for users to understand and use correctly. It would be a mistake to make it even more complicated.
Also, this seems to originate from some Galera issue. It is well known that Galera was merged prematurely into MariaDB with a broken design, and this was never fixed. Galera issues must never influence how non-galera replication (which at least attempts to have a proper design) works.
I would support this. Cheers, Andrei
I am sure you can find some who would want something that ignores replicated GTIDs that duplicate GTIDs originating locally. I can only say that my experience is that this can cause unexpected problems, and requires a lot of thought to get a well-defined semantics that users can understand and will not bring surprises. A central design decision for MariaDB GTID is _not_ to try to remember the whole history of GTIDs applied, unlike MySQL GTID. Because of this there are limitations to what can be done in terms of avoiding duplicate GTIDs - the server lacks the required information. Another decision was to allow and handle correctly out-of-order sequence numbers (eg. gtid_strict_mode=0). This was necessary to be able to generate GTIDs by default in 10.0. But it again means that detecting duplicates is harder, and in fact only the master has the required information to do this, the slave does not (in the general case). Finally, experience has shown that a _lot_ of users get problems when locally done transactions on a slave influence the slave's GTID position. In retrospect, I have realised that CHANGE MASTER TO master_use_gtid=current_pos was a mistake, only slave_pos should be used. Similarly, if a local transaction in a slave's binlog can cause transactions from the master to be silently ignored, it will cause a lot of grief for users. Hope this helps, - Kristian. andrei.elkin@pp.inet.fi writes:
Kristian, howdy.
Thanks for a simple CHANGE MASTER ... IGNORE_SERVER_IDS that you remind us about! (This time evaded myself alone :-)) It perfectly covers a cluster circular case.
What motivated me to consider this option for looking for duplicates also in gtid_binlog_pos was the following observation.
A duplicate gtid (transaction) can also arrive from a separate session of the same server but in this case the gtid_ignore_duplicates rules do not apply. Such gtid would silently override an existing.
On the other hand gtid_strict_mode applies to either the ordinary server and the slave (by the docs).
MariaDB [test]> show global variables like 'gtid_binlog_pos'; +-----------------+--------+ | Variable_name | Value | +-----------------+--------+ | gtid_binlog_pos | 0-1-12 | +-----------------+--------+ 1 row in set (0.00 sec)
MariaDB [test]> set @@session.gtid_seq_no=11; ERROR 1950 (HY000): An attempt was made to binlog GTID 0-1-11 which would create an out-of-order sequence number with existing GTID 0-1-12,
Maybe it would not a bad idea to generalize the gtid_ignore_duplicates to cover any source duplicate which would become effectively a "soft" mode to silently ... reject.
In other words how about extending a gtid (operational) mode as a set to
"gtid_mode" \in { on (override by dups), strict (error out dups) + , soft (ignore dups) }
To other subjects,
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I have some question related to rpl_slave_state. Suppose A circular async replication between A < -- > B (gtid_ignore_duplicates on)
Why do you set gtid_ignore_duplicates? This option is for multi-source replication:
https://mariadb.com/kb/en/library/gtid/#gtid_ignore_duplicates
"When set, different master connections in multi-source replication are allowed to receive and process event groups with the same GTID"
But you are not using multi-source connection here, there is only one master connection (eg. connection to B on slave A). Thus, the option will do nothing in this case.
Now, we set some temp server_id on server A , lets say `X`. Now the problem is each event group which originates from A is executed 2 times. For example we insert into table t1 and gtid is 0-X-2. The event goes to slave B B applies it, And send it back to A, Since its server_is different
I think here you mean that A has server_id=1 (eg), B has server_id=2, but on A you do
SET server_id=3; INSERT INTO t1 VALUES (1);
But there is no server with server_id=3 anywhere. In this case, you need to break the circle yourself somewhere. For example by CHANGE MASTER ... IGNORE_SERVER_IDS=3 on A.
To my knowledge, this has always been so for ring replication.
Andrei suggested a solution of checking rpl_binlog_state in check_duplicate_gtid, This solution solves some problem but creates
It seems you think that --gtid-ignore-duplicates should magically ignore any apply of duplicate GTID. But that is not the case, as the documentation states (though admittedly rather briefly). --gtid-ignore-duplicates is _only_ for multi-source replication (so perhaps unfortunately named).
In this case, the conflict is not between GTIDs replicated from different master connections. It is a conflict between a transaction originated on a master with a transaction replicated from another master.
write gtid_event in log. But this does not make sense. rpl_slave_state should be used for slave replication usage.
Agree. rpl_binlog_state should not be involved in slave GTID processing. There should be a clear separation: rpl_slave_state is what a slave has applied from another master. rpl_binlog state is what a master has originated.
The gtid_ignore_duplicates option is already very difficult for users to understand and use correctly. It would be a mistake to make it even more complicated.
Also, this seems to originate from some Galera issue. It is well known that Galera was merged prematurely into MariaDB with a broken design, and this was never fixed. Galera issues must never influence how non-galera replication (which at least attempts to have a proper design) works.
I would support this.
Cheers,
Andrei
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (3)
-
andrei.elkin@pp.inet.fi
-
Kristian Nielsen
-
Sachin Setiya