andrei.elkin@pp.inet.fi kirjoitti 2024-01-23 21:01:
Howdy Kristian, Monty! ...
[KN wrote ]>> Here is my idea for a design that solves most of these problems.
Not to disregard your text Kristian and also loggically split two subjects, let me process it in another reply tomorrow.
At XA PREPARE, we can still write the events to the binlog, but without a GTID, and we do not replicate it to slaves by default. Then at XA COMMIT we binlog a commit transaction that can be replicated normally to the slaves without problems.
This is lesser simplistic version of the mentioned patch that reduces an XA to normal transaction in binlog.
If necessary, the events for XA COMMIT can be read from the PREPARE earlier in the binlog, eg. after server crash/restart. We already have the binlog checkpoint mechanism to ensure that required binlog files are preserved until no longer needed for transaction recovery.
Indeed The XA becomes recoverable on its original host server.
This way we make external XA preserved across server restart, and all normal replication features continue to work - mysqldump, mixed-mode, etc. Nice and simple.
True, yet this solution apart from losing failover (I am backing up below specifically) slows things down. Big prepared XA transactions would be idling around - and apparently creating hot spots for overall execution when their commits finally arrive. This methods is just an anti-thesis to what I believe we needs to strive our development, that is to replicated everything sooner, up to an individual statement of a trx, or a sub-statement of a big long running one. I always thought of doing it in connection with the optimistic parallel execution :-).
Then optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover. When enabled, the slave can receive the XA PREPARE events and binlog them itself, without applying. Then as part of failover, those XA PREPARE in the binlog that are still pending can be applied, leaving them in PREPAREd state on the new master. This way, _only_ the few transactions that need to be failed-over need special handling, the majority can still just replicate normally.
Notice, that an initialization part of failover 'the few transactions' that would have to be "officially" prepared now. However MDEV-32020 shows just two is enough for hanging. Therefore this may not be a solution for the failover case.
There are different refinements and optimizations that can be added on top of this. But the point is that this is a simple implementation that is robust, correct, and crash-safe from the start, without needing to add complexity and fixes on top.
The failover with or without XA is never a specific use case. It's shown with XA this ... XA PREPARE 'x'; #=> OK to the user *crash of master* *failover to slave* XA RECOVER; #=> 'x' is the prepared list does not work in your simpler design. Cheers, Andrei