[Maria-developers] Updated Gtid_slave_pos of Untracked domain creates skipped events
Hi Andrei, Kristian! Problem:- This issue is regarding Mdev-9107 , where we have 3 master master with do_domain_ids of replication channel master domain-id . , When one replication channel is stopped and that master is updated , slave server updates the gtid slave_pos from second replication channel. And then when replication channel is brought up again then since we have updated the gtid_slave_pos we skip that event. I have created a abstract test case for this probem. (3) ^ ^ / \ (2)----->(1) 3 and 1 is configured with log slave updates. 3 has 2 replication channel m2_s3(do_domain_id=2 ), m1_s3(do_domain_id=1) 1 has one replication channel m2_s1(do_domain_id=2) I have tried with master_use_gtid == slave_pos and current_pos both give same result. When I stop replication channel m2_s3 and then write one-one event in (2) and (1) server. Server 3 will update its gtid_slave_pos and it will have latest gtid event seq no from domain id 2 and 1. When replication channel m2_s3 is bought up again since gtid_slave_pos of (3) is equal to (2) so we actually skipped the event generated on server 2. (I believe network lag between 2-->3 will have similar effect) (I have attached the test cases) Possible Solution. May be we should have one more option in master_use_gtid = binlog_state ? which will compare its binlog_state to arriving event gtid no ? (It is different from current_pos , current_pos will select the maximum seq_no between binlog_state and slave_pos) -- Regards Sachin Setiya Software Engineer at MariaDB
Sachin Setiya <sachin.setiya@mariadb.com> writes:
This issue is regarding Mdev-9107 , where we have 3 master master with do_domain_ids of
I have created a abstract test case for this probem. (3) ^ ^ / \ (2)----->(1)
3 and 1 is configured with log slave updates. 3 has 2 replication channel m2_s3(do_domain_id=2 ), m1_s3(do_domain_id=1) 1 has one replication channel m2_s1(do_domain_id=2)
This seems to be just user error. All these --do-xxx / --ignore-xxx replication filter options are always dangerous, and this usage seems clearly wrong. I also did not see a clearly explained reason in the bug report why this should work. On the contrary, Elena's suggestion to use --gtid-ignore-duplicate (if one really wants to do something as complex as this) seems appropriate. Is there a reason this is considered a bug (other than that the reporter somehow assumed a different behaviour for --do-domain-id)? What does the documentation say?
May be we should have one more option in master_use_gtid = binlog_state ? which will compare its binlog_state to
I think that sounds like a very bad idea. The current_pos/slave_pos is the single biggest source of confusion regarding GTID. (In fact, I think it would be best to deprecate/eventually remove current_pos). Better not add to the confusion... - Kristian.
Sachin Setiya <sachin.setiya@mariadb.com> writes:
This issue is regarding Mdev-9107 , where we have 3 master master with do_domain_ids of
I have created a abstract test case for this probem. (3) ^ ^ / \ (2)----->(1)
3 and 1 is configured with log slave updates. 3 has 2 replication channel m2_s3(do_domain_id=2 ), m1_s3(do_domain_id=1) 1 has one replication channel m2_s1(do_domain_id=2)
This seems to be just user error. All these --do-xxx / --ignore-xxx replication filter options are always dangerous, and this usage seems clearly wrong. I also did not see a clearly explained reason in the bug report why this should work. On the contrary, Elena's suggestion to use --gtid-ignore-duplicate (if one really wants to do something as complex as this) seems appropriate.
I tried with --gtid-ignore-duplicates and it worked perfectly. I guess
Hi Kristian! On Fri, Jul 20, 2018 at 8:38 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote: this is just user error.
Is there a reason this is considered a bug (other than that the reporter somehow assumed a different behaviour for --do-domain-id)? What does the documentation say?
According to documentation it is correct behavior , it will not apply events but update gtid_slave_pos table.
May be we should have one more option in master_use_gtid = binlog_state ? which will compare its binlog_state to
I think that sounds like a very bad idea. The current_pos/slave_pos is the single biggest source of confusion regarding GTID. (In fact, I think it would be best to deprecate/eventually remove current_pos). Better not add to the confusion...
If we remove current pos then how will how will master turned slave will
work ?
- Kristian.
Regards sachin
On Sat, Jul 21, 2018, 11:51 Sachin Setiya <sachin.setiya@mariadb.com> wrote:
Hi Kristian! On Fri, Jul 20, 2018 at 8:38 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Sachin Setiya <sachin.setiya@mariadb.com> writes:
This issue is regarding Mdev-9107 , where we have 3 master master with do_domain_ids of
I have created a abstract test case for this probem. (3) ^ ^ / \ (2)----->(1)
3 and 1 is configured with log slave updates. 3 has 2 replication channel m2_s3(do_domain_id=2 ), m1_s3(do_domain_id=1) 1 has one replication channel m2_s1(do_domain_id=2)
This seems to be just user error. All these --do-xxx / --ignore-xxx replication filter options are always dangerous, and this usage seems clearly wrong. I also did not see a clearly explained reason in the bug report why this should work. On the contrary, Elena's suggestion to use --gtid-ignore-duplicate (if one really wants to do something as complex as this) seems appropriate.
I tried with --gtid-ignore-duplicates and it worked perfectly. I guess this is just user error.
Is there a reason this is considered a bug (other than that the reporter somehow assumed a different behaviour for --do-domain-id)? What does the documentation say?
According to documentation it is correct behavior , it will not apply events but update gtid_slave_pos table.
May be we should have one more option in master_use_gtid = binlog_state ? which will compare its binlog_state to
I think that sounds like a very bad idea. The current_pos/slave_pos is the single biggest source of confusion regarding GTID. (In fact, I think it would be best to deprecate/eventually remove current_pos). Better not add to the confusion...
If we remove current pos then how will how will master turned slave will
work ?
May be in this case user have to manually update gtid_slave_pos ?
- Kristian.
Regards sachin
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I think that sounds like a very bad idea. The current_pos/slave_pos is the single biggest source of confusion regarding GTID. (In fact, I think it would be best to deprecate/eventually remove current_pos). Better not add to the confusion...
If we remove current pos then how will how will master turned slave will work ?
May be in this case user have to manually update gtid_slave_pos ?
Yes, that is one option, eg. SET GLOBAL gtid_slave_pos = @@gtid_binlog_pos; Another option is to make an option to CHANGE MASTER that updates the gtid_slave_pos _at_the_time_of_that_command_only_, like a one-shot master_use_gtid=current_pos. Eg. something like: CHANGE MASTER TO master_host=xxx ... gtid_slave_pos_from_master=1; The idea is that to make old master a slave, a CHANGE MASTER command will usually be needed anyway. And _that_ is the point at which user wants the binlog position to migrate to the slave position. But with the current master_use_gtid=current_pos, this migration happens at every future slave reconnect, which is very much not expected, and users are all the time confused that a random manual transation on their slave later makes GTID replication break. - Kristian.
Hi Kristian ! On Sat, Jul 21, 2018, 15:14 Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I think that sounds like a very bad idea. The current_pos/slave_pos is the single biggest source of confusion regarding GTID. (In fact, I think it would be best to deprecate/eventually remove current_pos). Better not add to the confusion...
If we remove current pos then how will how will master turned slave will work ?
May be in this case user have to manually update gtid_slave_pos ?
Yes, that is one option, eg. SET GLOBAL gtid_slave_pos = @@gtid_binlog_pos;
Another option is to make an option to CHANGE MASTER that updates the gtid_slave_pos _at_the_time_of_that_command_only_, like a one-shot master_use_gtid=current_pos. Eg. something like:
CHANGE MASTER TO master_host=xxx ... gtid_slave_pos_from_master=1;
The idea is that to make old master a slave, a CHANGE MASTER command will usually be needed anyway. And _that_ is the point at which user wants the binlog position to migrate to the slave position.
But with the current master_use_gtid=current_pos, this migration happens at every future slave reconnect, which is very much not expected, and users are all the time confused that a random manual transation on their slave later makes GTID replication break.
- Kristian.
Sounds like a really good idea, I have created a mdev 16800 for it, thanks!
participants (2)
-
Kristian Nielsen
-
Sachin Setiya