[Maria-discuss] Pacemaker and mariadb semi-sync
Hi, I've been working on developing a pacemaker resource-agent for mariadb semi-sync replication. There is a pull request here on github: https://github.com/ClusterLabs/resource-agents/pull/929 I would be very happy for some review (and even testing) of this resource-agent from somebody with expertise in mariadb replication. The general idea is to select the master at startup and failure as the node with the highest GTID. AFTER_SYNC is used to ensure at least one slave will have all the updates. mariadb is started with --log-slave-updates to ensure that slaves log all events, so the binary log on each node should always be complete. The parameter set on each node is as follows: SET GLOBAL rpl_semi_sync_slave_enabled='ON', \ rpl_semi_sync_master_enabled='ON', \ rpl_semi_sync_master_wait_point='AFTER_SYNC', \ gtid_strict_mode='ON', \ sync_binlog=1, \ sync_master_info=1, \ sync_relay_log=1, \ sync_relay_log_info=1; I suspect this may be overkill, but I don't have the competence to judge. I have also found that calling "SHOW SLAVE STATUS;" may take a long time to display data when the slave is very busy (typically replicating after a failure). Is there any way to make sure it doesn't block? All feedback very welcome. Best Regards, Nils Carlson
Nils Carlson
I would be very happy for some review (and even testing) of this resource-agent from somebody with expertise in mariadb replication.
I do not think I can offer review or testing, but I can give a few general remarks, maybe they can be helpful.
The parameter set on each node is as follows:
SET GLOBAL rpl_semi_sync_slave_enabled='ON', \ rpl_semi_sync_master_enabled='ON', \ rpl_semi_sync_master_wait_point='AFTER_SYNC', \ gtid_strict_mode='ON', \ sync_binlog=1, \ sync_master_info=1, \ sync_relay_log=1, \ sync_relay_log_info=1;
I suspect this may be overkill, but I don't have the competence to judge.
gtid_strict_mode='ON' is good. I assume that the slaves are also set up with MASTER_USE_GTID=slave_pos (or current_pos). sync_relay_log=1 and sync_relay_log_info=1 would be quite bad for performance on a busy server. And they do not really make things any safer, in my opinion. The GTID position is stored transactionally in a table (assuming InnoDB storage engine), not in relay_log.info, and the relay log is in any case deleted and reloaded from the new master. On the other hand, innodb_flush_log_at_trx_commit=1 is needed (together with sync_binlog=1) to ensure transactional consistency between (InnoDB) data and binlog.
The general idea is to select the master at startup and failure as the node with the highest GTID.
This should work fine in the simple cases with a single domain ID used and no multi-source replication. In more complex cases, it is possible for one node to be ahead in on replication domain, while another node is ahead in another, so that no single node has the "highest GTID". The way to handle this is described here: https://mariadb.com/kb/en/mariadb/gtid/#start-slave-until-master_gtid_posxxx I think it is perfectly fine to start with the simple approach you describe. That should be by far the most common use case. Just good to be aware about the general case. Hope this helps, - Kristian.
Note also that you don't want to set rpl_semi_sync_master_enabled='ON'
on all your slaves, it should be set only on the master. With the
setting your slaves' SQL threads will hang waiting for semi-sync ack
as well, and they will hang forever if
rpl_semi_sync_master_wait_no_slave is set to ON too.
On Wed, Aug 9, 2017 at 2:30 AM, Kristian Nielsen
Nils Carlson
writes: I would be very happy for some review (and even testing) of this resource-agent from somebody with expertise in mariadb replication.
I do not think I can offer review or testing, but I can give a few general remarks, maybe they can be helpful.
The parameter set on each node is as follows:
SET GLOBAL rpl_semi_sync_slave_enabled='ON', \ rpl_semi_sync_master_enabled='ON', \ rpl_semi_sync_master_wait_point='AFTER_SYNC', \ gtid_strict_mode='ON', \ sync_binlog=1, \ sync_master_info=1, \ sync_relay_log=1, \ sync_relay_log_info=1;
I suspect this may be overkill, but I don't have the competence to judge.
gtid_strict_mode='ON' is good. I assume that the slaves are also set up with MASTER_USE_GTID=slave_pos (or current_pos).
sync_relay_log=1 and sync_relay_log_info=1 would be quite bad for performance on a busy server. And they do not really make things any safer, in my opinion. The GTID position is stored transactionally in a table (assuming InnoDB storage engine), not in relay_log.info, and the relay log is in any case deleted and reloaded from the new master.
On the other hand, innodb_flush_log_at_trx_commit=1 is needed (together with sync_binlog=1) to ensure transactional consistency between (InnoDB) data and binlog.
The general idea is to select the master at startup and failure as the node with the highest GTID.
This should work fine in the simple cases with a single domain ID used and no multi-source replication.
In more complex cases, it is possible for one node to be ahead in on replication domain, while another node is ahead in another, so that no single node has the "highest GTID". The way to handle this is described here:
https://mariadb.com/kb/en/mariadb/gtid/#start-slave-until-master_gtid_posxxx
I think it is perfectly fine to start with the simple approach you describe. That should be by far the most common use case. Just good to be aware about the general case.
Hope this helps,
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (3)
-
Kristian Nielsen
-
Nils Carlson
-
Pavel Ivanov