Kristian, Let me try to explain and maybe answer most of your questions. Semi-sync replication for us is a DBA tool that helps to achieve durability of transactions in the world where MySQL doesn't do any flushes to disk. As you may guess by removing disk flushes we can achieve a very high transaction throughput. Plus if we accept the reality that disks can fail and repairing information from it is time-consuming and expensive (if at all possible), with such reality you can realize that flush or no flush there's no durability if disk fails, and thus disk flushes don't make much sense. So to get durability we use semi-sync. And definition of "durability" in this case is "if client gets ok on the transaction he will find this data after that". And that should stand in case of any master failures and failovers. If we set semi_sync_master_timeout = infinity we get something that is very close to that kind of durability. Yes there is a problem that while one connection is waiting for semi-sync ack another one can already see the data committed. And if the first client doesn't ever receive "ok" from the transaction then we can consider it non-existent and we can safely "lose" it during failover. And that will confuse the second client a lot (the data he was seeing suddenly disappears). That's a trade-off we are ready to accept. It looks like MySQL 5.7.2 already implements another way of semi-sync replication when transaction is not visible to other connections until it's semi-sync ack'ed (http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_r...). We will be happy to try that. But it has another trade-off that could be hard to accept sometimes -- InnoDB releases all row locks only when semi-sync ack is received. And that could slow down inter-dependent transactions significantly. So that's how we look at the semi-sync replication. BTW, digging through some history I've realized that semi-sync plugins in MariaDB look very close to how semi-sync patch looked like at Google in 2008. Apparently back then it was included into MySQL, but then it evolved here and all the changes already didn't make it to upstream. Now to your questions.
The problem here is that the transaction _is_ committed locally. If we return an error, we are confusing all existing applications that expect an error return from commit to mean that the transaction is guaranteed _not_ to be committed. Did you consider this issue, and possible different ways to solve your problem that would not have this issue?
For example:
- The client could receive a warning, rather than an error. The warning could be handled by those applications that are interested.
As I said above semi-sync replication is a DBA tool, so it's not up to application to be interested in it or not. It's up to DBAs to make sure that application developers don't get feeling that they have lost some data. DBAs should be able to guarantee durability even if it's with some constraints in usage.
- The master could kill the client connection rather than return the error. This matches the normal ACID expectations: If commit returns ok then transaction is durable. If it returns error then transaction is not committed. If it does not return (connection lost), then it is unknown if the transaction is committed or not.
I think this makes sense. And this is actually how we use semi-sync now -- we use it only with semi_sync_master_timeout = infinity, i.e. connection either gets semi-sync ack or gets killed (or gets client-side timeout).
- The master could check during the prepare phase if any slaves are connected. If not, the transaction could be rolled back and a normal error returned to the client.
This is racy and basically introduces complexity to the code without eliminating the situation when transaction is committed, but client gets error. So overall I'm not sure this is worth it.
- The master could crash itself, causing promotion of a new master, which then could involve checking all replication servers to find the one that is most advanced.
This is the scariest proposition of all. Deliberate crash in production can lead to higher than necessary periods of service unavailability.
- The master could truncate the current binlog file to before the offending transaction and roll back the InnoDB changes. Of course, since this is not true synchronous replication, this leaves the possibility that the transaction exists on a slave but not on the master.
This is actually what https://mariadb.atlassian.net/browse/MDEV-162 (and probably MySQL 5.7.2 implementation) is about, right? I hope our view of the way how semi-sync replication should work is clear to you now. Pavel