Nils Carlson <pyssling@ludd.ltu.se> writes:
I would be very happy for some review (and even testing) of this resource-agent from somebody with expertise in mariadb replication.
I do not think I can offer review or testing, but I can give a few general remarks, maybe they can be helpful.
The parameter set on each node is as follows:
SET GLOBAL rpl_semi_sync_slave_enabled='ON', \ rpl_semi_sync_master_enabled='ON', \ rpl_semi_sync_master_wait_point='AFTER_SYNC', \ gtid_strict_mode='ON', \ sync_binlog=1, \ sync_master_info=1, \ sync_relay_log=1, \ sync_relay_log_info=1;
I suspect this may be overkill, but I don't have the competence to judge.
gtid_strict_mode='ON' is good. I assume that the slaves are also set up with MASTER_USE_GTID=slave_pos (or current_pos). sync_relay_log=1 and sync_relay_log_info=1 would be quite bad for performance on a busy server. And they do not really make things any safer, in my opinion. The GTID position is stored transactionally in a table (assuming InnoDB storage engine), not in relay_log.info, and the relay log is in any case deleted and reloaded from the new master. On the other hand, innodb_flush_log_at_trx_commit=1 is needed (together with sync_binlog=1) to ensure transactional consistency between (InnoDB) data and binlog.
The general idea is to select the master at startup and failure as the node with the highest GTID.
This should work fine in the simple cases with a single domain ID used and no multi-source replication. In more complex cases, it is possible for one node to be ahead in on replication domain, while another node is ahead in another, so that no single node has the "highest GTID". The way to handle this is described here: https://mariadb.com/kb/en/mariadb/gtid/#start-slave-until-master_gtid_posxxx I think it is perfectly fine to start with the simple approach you describe. That should be by far the most common use case. Just good to be aware about the general case. Hope this helps, - Kristian.