andrei.elkin@pp.inet.fi writes:
Back in Aug I wrote a patch that converts XA:s to normal transactions at binlogging.
I don't think I saw that patch, but it sounds like exactly what I am proposing as an alternate solution... ?
Notice, that an initialization part of failover 'the few transactions' that would have to be "officially" prepared now. However MDEV-32020 shows just two is enough for hanging. Therefore this may not be a solution for the failover case.
But what is your point? If the XA PREPAREs hang when applied at failover, they will also hang in the current implementation. The user who wants to use failover of XA PREPAREd transactions will have to accept severe restrictions, such as row-based primary-key only replication. That's exactly why it must be optional and off by default, so it doesn't affect _all_ XA users (as it does currently).
Another architectural issue is that each XA PREPARE keeps row locks around on every slave until commit (or rollback). This means replication will break
Indeed, but there can be no non-GAP lock on any unique index.
Only by restricting replication to primary-key updates only. MariaDB is an SQL database, please don't try to turn it into a key-value store.
In a really disastrous cases (which we're unaware of as of yet) there exits a safety measure to identify a prepared XA that got in a way of next transactions, to roll it back and re-apply like a normal transaction when XA-COMMIT finally arrives.
By "exist" I think you mean "exist an idea" - this is not implemented in the current code. In fact, this is exactly the point of my alternate solution, to make it possible to apply at XA COMMIT time. And once you have this, there is no point to apply it at XA PREPARE, since for most transactions, the XA COMMIT comes shortly after the XA PREPARE.
Non-unique indexes remain vulnerable but only in ROW format,
What do you mean by "vulnerable only in ROW format"? There are many cases of statement-based replication with XA that will break the slave in current inplementation. The test cases from MDEV-5914 and MDEV-5941 for example (from when CONSERVATIVE parallel replication was implemented) also cause current XA to break in statement/mixed replication mode.
The root of the issue is not XA. The latter may exacerbate what might in normal transaction case lead to "just" (double quote here to hint that the current XA hang might be still better option for the user) data inconsistency.
What data inconsistency?
This for example means that a mysqldump backup can no longer be used to provision a slave, since any XA PREPAREd event at the time of the dump will
Notice this is regardless of how XA are binlogged/replicated. This provisioned server won't be able to replace the original server at failover. In other words rpl_xa_provision.test also relates to this general issue.
The problem is not failover to the new server, that will be possible as soon as all transactions that were XA PREPAREd at the time of dump are committed, which is normally a fraction of a second. The problem is that in the current implementation, a slave setup from the dump does not have a binlog position to start from, any position will cause replication to fail.
I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to `dump.sql`.
So then you will need to extend the binlog checkpoint mechanism to preserve binlogs for all pending XA PREPAREd transactions, just as I propose. And once you do that, there's no longer a need to apply the XA PREPAREs until failover.
If the binlog is not available, then a list of gtids of XA-prepared:s
You will need the binlog, otherwise how will you preserve the list of gtids of pending XA-prepared transactions across server restart?
XA PREPARE 'x'; #=> OK to the user *crash of master*
*failover to slave* XA RECOVER; #=> 'x' is the prepared list
does not work in your simpler design.
That's the whole point of the "optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover". Of course my proposal is not implemented yet, but why do you think it cannot work?
There are other problems; for example the update of mysql.gtid_slave_pos cannot be committed crash-safe together with the transaction for XA PREPARE (since the transaction is not committed).
For this part I mentioned MDEV-21117 many times. It's going to involve
But this is still not implemeted, right? (I think you meant another bug than MDEV-21117).
And I can't help to underline the real virtue of the XA replication as a pioneer of "fragmented" replication that I tried to promote for Kristian in
Fragmented replication should send events to the slave as soon as possible after starting on the master, so the slave has time to work on it in parallel. And any conflicts with the commit order of other transactions should be detected and the fragmented transaction rolled back and retried. But the current XA implementation does exactly the opposite: It sends the events to the slave only at the end of the transaction (XA PREPARE), and it makes it _impossible_ to rollback and retry the prepare in case of conflict (by assigning a GTID to the XA PREPARE that's updated in the @@gtid_slave_pos).
The rational part is that the XA transaction is represented by more than one GTID. Arguably it's a bit uncomfortable, but such generalization is fair to call flexible especially looking forward on implementing fragmented transaction replication, or long running and non necessarily transactional DML or DDL statements including ALTER TABLE.
But I don't see any new code in the current XA implementation that could be used for more general "fragmented transaction replication" - what did I miss? Why do you want to assign more than one GTID to a fragmented transaction, wouldn't it be better to binlog the fragments without a GTID (as in my proposed XA solution)?
(I am backing up below specifically) slows things down. Big prepared XA transactions would be idling around - and apparently creating hot spots for overall execution when their commits finally arrive.
There is no slowdown from this. Running a big transaction takes the same time whether you end it with an XA PREPARE or an XA COMMIT. In fact, my proposal will speed things up, because only one 2-phase commit between binlog and engine is needed per transaction. While in current implementation two are needed, one for XA PREPARE and one for XA COMMIT. And a simple sequence like this will be able to group commit together (on the slave): XA PREPARE 't1'; XA COMMIT 't1'; XA PREPARE 't2'; XA COMMIT 't2'; XA PREPARE 't3'; XA COMMIT 't3'; I believe in the current code, it's impossible to group-commit the XA PREPARE together with the XA COMMIT on a slave? - Kristian.