Markus Mäkelä via developers <developers@lists.mariadb.org> writes:
On 12/4/24 13:19, Kristian Nielsen via developers wrote:
5. A more controversial thought is to drop support for semi-sync replication. I think many users use semi-sync believing it does something
As a (kind of) user of semi-sync replication, I believe it has a
Hi Markus, thanks for taking the time to comment! Your input is very valuable.
valid, albeit limited, use-case and that it's a necessary component in setups where no transactions are allowed to be lost when the primary node in a replication cluster goes down. Perhaps I'm wrong or the way
I would like to be explicit about what it means "no transactions are allowed to be lost". I know you Markus fully understand what it means, of course. Transactions can easily be lost if the server crashes up to and during the commit. What it really means is that the server will send a notification to the client at some point when a single point of failure will no longer cause the transaction to be lost. With semi-sync, this notification comes in the form of the "ok" result of the client's commit. I want to understand if there are other, possibly better ways to get this notification, if that is all the relevant applications need? I was suggesting that the application could itself use MASTER_GTID_WAIT() against a slave before accepting the commit as "ok" (or a proxy like MaxScale could do it for the application). Does the current semi-sync replication do anything more for the application than this, and if so, what? One benefit of this method is that each commit can decide whether it needs to wait or not. One commit that "is not allowed to be lost" will not block other transactions from committing. I think with AFTER_SYNC, all following transactions will be blocked from committing until the current commit has been acknowledged by a slave, and that with AFTER_COMMIT they will not be blocked, but I'm not 100% sure.
misunderstanding comes from this. The default value of rpl_semi_sync_master_wait_point should be AFTER_SYNC (lossless failover) and rpl_semi_sync_master_timeout should be set to something
I would like to understand the reason(s) AFTER_SYNC is better than AFTER_COMMIT. From my understanding, from the client's narrow perspective about their own commit there is little difference, either is a notification that the transaction is now robust to single point of failure (available on at least two servers). I know of one usecase, which is when things are set up so that if the master crashes, failover to a slave is _always_ done, and the crashed master is changed to be a slave of the new master (as opposed to letting the master restart, do crash recovery, and continue its operation as a master). With AFTER_COMMIT, the old master might have a transaction committed that does not exist on the new master, which will prevent it from working as a slave and it will need to be discarded (possibly restored from a backup). With AFTER_SYNC, the old master may still (after restarting) have a transaction committed to the binlog that is not on the slave / new master. But the old master can be restarted with --rpl-semi-sync-slave-enabled that tries to truncate the binlog to discard as many transactions from it as possible, to make sure it only has transactions that are also present on the new master. (Interestingly, this means that the purpose of AFTER_SYNC is to ensure that transactions _are_ lost, rather than ensure that they are _not_ lost). Is this the (only) reason that AFTER_SYNC should be default? Or do you know of other reasons to prefer it? Now, with the new binlog implementation, there is no longer any AFTER_SYNC. The whole point of the feature is to make the binlog commit and the InnoDB commit atomic with each other as a whole, there is no point at which a transaction is durably committed in the binlog and not committed in InnoDB. So the truncation of the binlog at old master restart with --rpl-semi-sync-slave-enabled no longer applies. But I would argue that this binlog truncation is anyway a misfeature. If we want to ensure that the master never commits a transaction before it has been received by a slave, then send the transaction to the slave and await slave reply _before_ writing it to the binlog. Don't first write it to the binlog, and then add complex crash recovery code to try and remove it from the binlog again. And doing the semi-sync handshake _before_ writing the transaction to the binlog is something that could be implemented in the new binlog implementation. It would be something like BEFORE_WRITE, instead of AFTER_SYNC (which does not exist in the new binlog implementation). Thus, I really want to understand: 1. Is the --rpl-semi-sync-slave-enabled use case, where a crashing master is always demoted to a slave, used by users in practice, to warrant implementing something like BEFORE_WRITE semisync for the new binlog format? 2. Is there another reason that AFTER_SYNC is useful that I should know, and which needs to be designed into the new binlog format? - Kristian.