Suggestions for the problems around replication of XA in 10.5
Hi Monty, As promised, here my thoughts around the issues with the implementation of replication of external XA as implemented since MariaDB 10.5. I see at least two architectural issues with the current implementation. One is that it splits transactions in two separate GTIDs in the binlog (XA PREPARE and XA COMMIT). This breaks with the fundamental principle that replication applies transactions one after the other in strict sequence, and the replication position/GTID is the single transaction last replicated. This for example means that a mysqldump backup can no longer be used to provision a slave, since any XA PREPAREd event at the time of the dump will be missing; a testcase rpl_xa_provision.test in MDEV-32020 demonstrates this. Another architectural issue is that each XA PREPARE keeps row locks around on every slave until commit (or rollback). This means replication will break if _any_ transaction replicated after the XA PREPARE gets blocked on a lock. This can easily happen; surely in many ways in statement-based replication, and even in row-based replication without primary key as demonstrated by testcase rpl_mdev32020.test in MDEV-32020. There are other problems; for example the update of mysql.gtid_slave_pos cannot be committed crash-safe together with the transaction for XA PREPARE (since the transaction is not committed). I believe the root of the problem is architectural: external XA should be replicated only after they commit on the master. Trying to fix individual problems one by one will not address the root problem and will lead to ever increasing complexity without ever being fully successful. The current implementation appears to only address a very specific and rare use-case, where enhanced semi-synchronous replication is used with row-based binlogging to try to fail-over to a slave and preserve any external XA that was in PREPAREd state on the master before the failover. Mixed-mode replication, provisioning slaves with mysqldump, slaves not intended for failover, etc., seem to be not considered and basically broken since 10.5. Here is my idea for a design that solves most of these problems. At XA PREPARE, we can still write the events to the binlog, but without a GTID, and we do not replicate it to slaves by default. Then at XA COMMIT we binlog a commit transaction that can be replicated normally to the slaves without problems. If necessary, the events for XA COMMIT can be read from the PREPARE earlier in the binlog, eg. after server crash/restart. We already have the binlog checkpoint mechanism to ensure that required binlog files are preserved until no longer needed for transaction recovery. This way we make external XA preserved across server restart, and all normal replication features continue to work - mysqldump, mixed-mode, etc. Nice and simple. Then optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover. When enabled, the slave can receive the XA PREPARE events and binlog them itself, without applying. Then as part of failover, those XA PREPARE in the binlog that are still pending can be applied, leaving them in PREPAREd state on the new master. This way, _only_ the few transactions that need to be failed-over need special handling, the majority can still just replicate normally. There are different refinements and optimizations that can be added on top of this. But the point is that this is a simple implementation that is robust, correct, and crash-safe from the start, without needing to add complexity and fixes on top. I've done some initial proof-of-concept code for this, and continue to work on it on the branch knielsen_mdev32020 on github. - Kristian.
Howdy Kristian, Monty!
Hi Monty,
As promised, here my thoughts around the issues with the implementation of replication of external XA as implemented since MariaDB 10.5.
It's good to have Kristian's mail first so that I can respond to each item raised about the current XA replication framework.
I see at least two architectural issues with the current implementation.
One is that it splits transactions in two separate GTIDs in the binlog (XA PREPARE and XA COMMIT). This breaks with the fundamental principle that replication applies transactions one after the other in strict sequence, and the replication position/GTID is the single transaction last replicated.
Let me soften this part. The rational part is that the XA transaction is represented by more than one GTID. Arguably it's a bit uncomfortable, but such generalization is fair to call flexible especially looking forward on implementing fragmented transaction replication, or long running and non necessarily transactional DML or DDL statements including ALTER TABLE. And of course fundamentally there's no violation of committing xa transactions in binlog order. To Kristian's findings that deserve full credit...
This for example means that a mysqldump backup can no longer be used to provision a slave, since any XA PREPAREd event at the time of the dump will be missing; a testcase rpl_xa_provision.test in MDEV-32020 demonstrates this.
The issue at hand is that mysqldump can't represent prepared XA:s: --connection one xa start 'x'; /* work */ xa end 'x'; xa prepare 'x' # => gtid 0-1-1 --connection two commit /* trx_normal */; # => gtid 0-1-2 shell> mysqldump --gtid > dump.sql shell> grep 'gtid' dump.sql # => gtid_slave_pos = 0-1-2 So slave that is provisioned with `dump.sql` will not have prepared xid --connection slave xa recover; # => empty Notice this is regardless of how XA are binlogged/replicated. This provisioned server won't be able to replace the original server at failover. In other words rpl_xa_provision.test also relates to this general issue. What to do with the case? I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to `dump.sql`. If the binlog is not available, then a list of gtids of XA-prepared:s could be added instead and for each XA-prepare gtid must be retrieved from a server that has it in binlog.
Another architectural issue is that each XA PREPARE keeps row locks around on every slave until commit (or rollback). This means replication will break if _any_ transaction replicated after the XA PREPARE gets blocked on a lock.
Indeed, but there can be no non-GAP lock on any unique index. And no GAP lock by any prepared on slave XA - that's the grace of MDEV-30165/MDEV-26682 efforts by Marko and Vlad Lesin. In a really disastrous cases (which we're unaware of as of yet) there exits a safety measure to identify a prepared XA that got in a way of next transactions, to roll it back and re-apply like a normal transaction when XA-COMMIT finally arrives.
This can easily happen; surely in many ways in statement-based replication, and even in row-based replication without primary key as demonstrated by testcase rpl_mdev32020.test in MDEV-32020.
Non-unique indexes remain vulnerable but only in ROW format, as they still are to be locked as different subsets on master and slave. The root of the issue is not XA. The latter may exacerbate what might in normal transaction case lead to "just" (double quote here to hint that the current XA hang might be still better option for the user) data inconsistency. Monty offered already to fix this with table scan (consistency is there then). On part I'd put such statement into binlog always in the STATEMENT format.
There are other problems; for example the update of mysql.gtid_slave_pos cannot be committed crash-safe together with the transaction for XA PREPARE (since the transaction is not committed).
For this part I mentioned MDEV-21117 many times. It's going to involve the prepared XA and an autocommit INSERT-into-gtid_slave_pos into 2pc so the binlog-less slave would recover as well.
I believe the root of the problem is architectural: external XA should be replicated only after they commit on the master.
But we'd lose failover.
Trying to fix individual problems one by one will not address the root problem and will lead to ever increasing complexity without ever being fully successful.
Well, there is still some work to complete in this project, but I don't see where we are going to get stuck. And I can't help to underline the real virtue of the XA replication as a pioneer of "fragmented" replication that I tried to promote for Kristian in our face to face meetings.
The current implementation appears to only address a very specific and rare use-case, where enhanced semi-synchronous replication is used with row-based binlogging to try to fail-over to a slave and preserve any external XA that was in PREPAREd state on the master before the failover. Mixed-mode replication, provisioning slaves with mysqldump, slaves not intended for failover, etc., seem to be not considered and basically broken since 10.5.
Back in Aug I wrote a patch that converts XA:s to normal transactions at binlogging. Could we reconcile on a server option that activates it?
Here is my idea for a design that solves most of these problems.
Not to disregard your text Kristian and also loggically split two subjects, let me process it in another reply tomorrow.
At XA PREPARE, we can still write the events to the binlog, but without a GTID, and we do not replicate it to slaves by default. Then at XA COMMIT we binlog a commit transaction that can be replicated normally to the slaves without problems. If necessary, the events for XA COMMIT can be read from the PREPARE earlier in the binlog, eg. after server crash/restart. We already have the binlog checkpoint mechanism to ensure that required binlog files are preserved until no longer needed for transaction recovery.
This way we make external XA preserved across server restart, and all normal replication features continue to work - mysqldump, mixed-mode, etc. Nice and simple.
Then optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover. When enabled, the slave can receive the XA PREPARE events and binlog them itself, without applying. Then as part of failover, those XA PREPARE in the binlog that are still pending can be applied, leaving them in PREPAREd state on the new master. This way, _only_ the few transactions that need to be failed-over need special handling, the majority can still just replicate normally.
There are different refinements and optimizations that can be added on top of this. But the point is that this is a simple implementation that is robust, correct, and crash-safe from the start, without needing to add complexity and fixes on top.
I've done some initial proof-of-concept code for this, and continue to work on it on the branch knielsen_mdev32020 on github.
- Kristian.
Cheers, Andrei
andrei.elkin@pp.inet.fi kirjoitti 2024-01-23 21:01:
Howdy Kristian, Monty! ...
[KN wrote ]>> Here is my idea for a design that solves most of these problems.
Not to disregard your text Kristian and also loggically split two subjects, let me process it in another reply tomorrow.
At XA PREPARE, we can still write the events to the binlog, but without a GTID, and we do not replicate it to slaves by default. Then at XA COMMIT we binlog a commit transaction that can be replicated normally to the slaves without problems.
This is lesser simplistic version of the mentioned patch that reduces an XA to normal transaction in binlog.
If necessary, the events for XA COMMIT can be read from the PREPARE earlier in the binlog, eg. after server crash/restart. We already have the binlog checkpoint mechanism to ensure that required binlog files are preserved until no longer needed for transaction recovery.
Indeed The XA becomes recoverable on its original host server.
This way we make external XA preserved across server restart, and all normal replication features continue to work - mysqldump, mixed-mode, etc. Nice and simple.
True, yet this solution apart from losing failover (I am backing up below specifically) slows things down. Big prepared XA transactions would be idling around - and apparently creating hot spots for overall execution when their commits finally arrive. This methods is just an anti-thesis to what I believe we needs to strive our development, that is to replicated everything sooner, up to an individual statement of a trx, or a sub-statement of a big long running one. I always thought of doing it in connection with the optimistic parallel execution :-).
Then optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover. When enabled, the slave can receive the XA PREPARE events and binlog them itself, without applying. Then as part of failover, those XA PREPARE in the binlog that are still pending can be applied, leaving them in PREPAREd state on the new master. This way, _only_ the few transactions that need to be failed-over need special handling, the majority can still just replicate normally.
Notice, that an initialization part of failover 'the few transactions' that would have to be "officially" prepared now. However MDEV-32020 shows just two is enough for hanging. Therefore this may not be a solution for the failover case.
There are different refinements and optimizations that can be added on top of this. But the point is that this is a simple implementation that is robust, correct, and crash-safe from the start, without needing to add complexity and fixes on top.
The failover with or without XA is never a specific use case. It's shown with XA this ... XA PREPARE 'x'; #=> OK to the user *crash of master* *failover to slave* XA RECOVER; #=> 'x' is the prepared list does not work in your simpler design. Cheers, Andrei
andrei.elkin@pp.inet.fi writes:
Back in Aug I wrote a patch that converts XA:s to normal transactions at binlogging.
I don't think I saw that patch, but it sounds like exactly what I am proposing as an alternate solution... ?
Notice, that an initialization part of failover 'the few transactions' that would have to be "officially" prepared now. However MDEV-32020 shows just two is enough for hanging. Therefore this may not be a solution for the failover case.
But what is your point? If the XA PREPAREs hang when applied at failover, they will also hang in the current implementation. The user who wants to use failover of XA PREPAREd transactions will have to accept severe restrictions, such as row-based primary-key only replication. That's exactly why it must be optional and off by default, so it doesn't affect _all_ XA users (as it does currently).
Another architectural issue is that each XA PREPARE keeps row locks around on every slave until commit (or rollback). This means replication will break
Indeed, but there can be no non-GAP lock on any unique index.
Only by restricting replication to primary-key updates only. MariaDB is an SQL database, please don't try to turn it into a key-value store.
In a really disastrous cases (which we're unaware of as of yet) there exits a safety measure to identify a prepared XA that got in a way of next transactions, to roll it back and re-apply like a normal transaction when XA-COMMIT finally arrives.
By "exist" I think you mean "exist an idea" - this is not implemented in the current code. In fact, this is exactly the point of my alternate solution, to make it possible to apply at XA COMMIT time. And once you have this, there is no point to apply it at XA PREPARE, since for most transactions, the XA COMMIT comes shortly after the XA PREPARE.
Non-unique indexes remain vulnerable but only in ROW format,
What do you mean by "vulnerable only in ROW format"? There are many cases of statement-based replication with XA that will break the slave in current inplementation. The test cases from MDEV-5914 and MDEV-5941 for example (from when CONSERVATIVE parallel replication was implemented) also cause current XA to break in statement/mixed replication mode.
The root of the issue is not XA. The latter may exacerbate what might in normal transaction case lead to "just" (double quote here to hint that the current XA hang might be still better option for the user) data inconsistency.
What data inconsistency?
This for example means that a mysqldump backup can no longer be used to provision a slave, since any XA PREPAREd event at the time of the dump will
Notice this is regardless of how XA are binlogged/replicated. This provisioned server won't be able to replace the original server at failover. In other words rpl_xa_provision.test also relates to this general issue.
The problem is not failover to the new server, that will be possible as soon as all transactions that were XA PREPAREd at the time of dump are committed, which is normally a fraction of a second. The problem is that in the current implementation, a slave setup from the dump does not have a binlog position to start from, any position will cause replication to fail.
I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to `dump.sql`.
So then you will need to extend the binlog checkpoint mechanism to preserve binlogs for all pending XA PREPAREd transactions, just as I propose. And once you do that, there's no longer a need to apply the XA PREPAREs until failover.
If the binlog is not available, then a list of gtids of XA-prepared:s
You will need the binlog, otherwise how will you preserve the list of gtids of pending XA-prepared transactions across server restart?
XA PREPARE 'x'; #=> OK to the user *crash of master*
*failover to slave* XA RECOVER; #=> 'x' is the prepared list
does not work in your simpler design.
That's the whole point of the "optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover". Of course my proposal is not implemented yet, but why do you think it cannot work?
There are other problems; for example the update of mysql.gtid_slave_pos cannot be committed crash-safe together with the transaction for XA PREPARE (since the transaction is not committed).
For this part I mentioned MDEV-21117 many times. It's going to involve
But this is still not implemeted, right? (I think you meant another bug than MDEV-21117).
And I can't help to underline the real virtue of the XA replication as a pioneer of "fragmented" replication that I tried to promote for Kristian in
Fragmented replication should send events to the slave as soon as possible after starting on the master, so the slave has time to work on it in parallel. And any conflicts with the commit order of other transactions should be detected and the fragmented transaction rolled back and retried. But the current XA implementation does exactly the opposite: It sends the events to the slave only at the end of the transaction (XA PREPARE), and it makes it _impossible_ to rollback and retry the prepare in case of conflict (by assigning a GTID to the XA PREPARE that's updated in the @@gtid_slave_pos).
The rational part is that the XA transaction is represented by more than one GTID. Arguably it's a bit uncomfortable, but such generalization is fair to call flexible especially looking forward on implementing fragmented transaction replication, or long running and non necessarily transactional DML or DDL statements including ALTER TABLE.
But I don't see any new code in the current XA implementation that could be used for more general "fragmented transaction replication" - what did I miss? Why do you want to assign more than one GTID to a fragmented transaction, wouldn't it be better to binlog the fragments without a GTID (as in my proposed XA solution)?
(I am backing up below specifically) slows things down. Big prepared XA transactions would be idling around - and apparently creating hot spots for overall execution when their commits finally arrive.
There is no slowdown from this. Running a big transaction takes the same time whether you end it with an XA PREPARE or an XA COMMIT. In fact, my proposal will speed things up, because only one 2-phase commit between binlog and engine is needed per transaction. While in current implementation two are needed, one for XA PREPARE and one for XA COMMIT. And a simple sequence like this will be able to group commit together (on the slave): XA PREPARE 't1'; XA COMMIT 't1'; XA PREPARE 't2'; XA COMMIT 't2'; XA PREPARE 't3'; XA COMMIT 't3'; I believe in the current code, it's impossible to group-commit the XA PREPARE together with the XA COMMIT on a slave? - Kristian.
Hi Kristian! Kristian Nielsen kirjoitti 2024-01-24 23:59:
andrei.elkin@pp.inet.fi writes:
Back in Aug I wrote a patch that converts XA:s to normal transactions at binlogging.
I don't think I saw that patch, but it sounds like exactly what I am proposing as an alternate solution... ?
MIne is just simplistic to never log any prepare part in the binlog. In the follow-up mail Date: Wed, 24 Jan 2024 13:41:15 +0200 I explained more on it. Yours is apparently better for the user as it provides at least the xa original host recovery.
Notice, that an initialization part of failover 'the few transactions' that would have to be "officially" prepared now. However MDEV-32020 shows just two is enough for hanging. Therefore this may not be a solution for the failover case.
But what is your point? If the XA PREPAREs hang when applied at failover, they will also hang in the current implementation. The user who wants to use failover of XA PREPAREd transactions will have to accept severe restrictions, such as row-based primary-key only replication.
That's exactly why it must be optional and off by default, so it doesn't affect _all_ XA users (as it does currently). We might have to resort to that, but first I'd take on analysis of anything
Kristian, this statement of row-based primary-key only replication requires at least a confirmation with a test. Oth ROW may be somewhat difficult to dismiss and I am not going to now, but the PK is too much as a UK guarantees correctness, please read on. So far we only have MDEV 32020 about non-UK ROW format vulnerability. To our and our users testing the current XA replication works when there is at least one unique key. And there's a theoretical background to validate except of course for implementation bugs. Let me narrow our context to the Read-Committed isolation and ROW format. After MDEV-30165/26682 have removed GAP locks from prepared xa:s, the latter cease to be potentially unilaterally-slave-side conflicting with the following in binlog order ones. That's because a prepared XA can only hold conflicting X locks on indexes and Insert-Intention ones are harmless for the following in binlog order normal trx' GAP locks. In presence of a Unique Key therefore XAP_1 (XAP := XA-Prepare) can *not* stop any normal Trx_2 (think of 1,2 as gtid seq_no:s). that gets in a way, and hopefully it could be tackled as MDEV-32020 with a menu of choices.
Another architectural issue is that each XA PREPARE keeps row locks around on every slave until commit (or rollback). This means replication will break
Indeed, but there can be no non-GAP lock on any unique index.
Only by restricting replication to primary-key updates only. MariaDB is an SQL database, please don't try to turn it into a key-value store.
Let me state it this way: with at least one UK *and* GAP locks out of the picture two binlogged transactions can have conflicts, if any, only through its index X locks. A trx obviously X-locks all modified records.
In a really disastrous cases (which we're unaware of as of yet) there exits a safety measure to identify a prepared XA that got in a way of next transactions, to roll it back and re-apply like a normal transaction when XA-COMMIT finally arrives.
By "exist" I think you mean "exist an idea" - this is not implemented in the current code. In fact, this is exactly the point of my alternate solution, to make it possible to apply at XA COMMIT time. And once you have this, there is no point to apply it at XA PREPARE, since for most transactions, the XA COMMIT comes shortly after the XA PREPARE.
There's point and its name is yours aka ... optimistic (parallel) execution :-). Why should we defer a day long transaction execution when its events, maybe not all, are around, and we can always retreat to the savepoint of its BEGIN?!
Non-unique indexes remain vulnerable but only in ROW format,
What do you mean by "vulnerable only in ROW format"? There are many cases of statement-based replication with XA that will break the slave in current inplementation. The test cases from MDEV-5914 and MDEV-5941 for example (from when CONSERVATIVE parallel replication was implemented) also cause current XA to break in statement/mixed replication mode.
The root of the issue is not XA. The latter may exacerbate what might in normal transaction case lead to "just" (double quote here to hint that the current XA hang might be still better option for the user) data inconsistency.
What data inconsistency?
(I don't mean how the notion of consistency can apply to a no-UK table case, do you :-?) For instance the MIN row format. When master and slave are instructed to executed a trx using different indexes a non-UK table whey can modify in the end different records.
This for example means that a mysqldump backup can no longer be used to provision a slave, since any XA PREPAREd event at the time of the dump will
Notice this is regardless of how XA are binlogged/replicated. This provisioned server won't be able to replace the original server at failover. In other words rpl_xa_provision.test also relates to this general issue.
The problem is not failover to the new server, that will be possible as soon as all transactions that were XA PREPAREd at the time of dump are committed, which is normally a fraction of a second.
Well, in your case there's a condition ' will be possible as soon ...'. I don't have it. Just run that sql file and a clone of a master is provisioned. It sure works like that for the normal trx, and does not for the XA. To me it's a problem to resolve.
The problem is that in the current implementation, a slave setup from the dump does not have a binlog position to start from, any position will cause replication to fail.
Of course. The prepared XA:s gtid:s are thought to be ones of committed trx:s.
I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to `dump.sql`.
So then you will need to extend the binlog checkpoint mechanism to preserve binlogs for all pending XA PREPAREd transactions, just as I propose.
Right. Let's take this part.
And once you do that, there's no longer a need to apply the XA PREPAREs until failover.
If the binlog is not available, then a list of gtids of XA-prepared:s
You will need the binlog, otherwise how will you preserve the list of gtids of pending XA-prepared transactions across server restart?
I am merging currently for final piece of XA recovery which is XA_list_log_event. It's a part of the part IV of bb-10.6-MDEV-31949 that I need to update (ETA tomorrow). The list is needed in order to decide what to do with a prepared user xa xid at restart, when binlog purged already the prepared part.
XA PREPARE 'x'; #=> OK to the user *crash of master*
*failover to slave* XA RECOVER; #=> 'x' is the prepared list
does not work in your simpler design.
That's the whole point of the "optionally we can support the specific usecase of being able to recover external XA PREPAREd transactions on a slave after failover". Of course my proposal is not implemented yet, but why do you think it cannot work?
Take MDEV-32020 description case. Crash the hanging slave. Restart it as master which ensues executing of the two, right? And with the same effect on the hanging salve. I thought we got on the same page on this back at our zulip conversation.
There are other problems; for example the update of mysql.gtid_slave_pos cannot be committed crash-safe together with the transaction for XA PREPARE (since the transaction is not committed).
For this part I mentioned MDEV-21117 many times. It's going to involve
But this is still not implemeted, right? (I think you meant another bug than MDEV-21117).
MDEV-21777, indeed. Thanks. Not implemented. We were four people and one tester at one point.. [ Let reply to more general subjects below later? there's a hot release stuff awaiting my attention..
And I can't help to underline the real virtue of the XA replication as a pioneer of "fragmented" replication that I tried to promote for Kristian in
Fragmented replication should send events to the slave as soon as possible after starting on the master, so the slave has time to work on it in parallel. And any conflicts with the commit order of other transactions should be detected and the fragmented transaction rolled back and retried.
But the current XA implementation does exactly the opposite: It sends the events to the slave only at the end of the transaction (XA PREPARE), and it makes it _impossible_ to rollback and retry the prepare in case of conflict (by assigning a GTID to the XA PREPARE that's updated in the @@gtid_slave_pos).
The rational part is that the XA transaction is represented by more than one GTID. Arguably it's a bit uncomfortable, but such generalization is fair to call flexible especially looking forward on implementing fragmented transaction replication, or long running and non necessarily transactional DML or DDL statements including ALTER TABLE.
But I don't see any new code in the current XA implementation that could be used for more general "fragmented transaction replication" - what did I miss?
Why do you want to assign more than one GTID to a fragmented transaction, wouldn't it be better to binlog the fragments without a GTID (as in my proposed XA solution)?
... ^ ]
(I am backing up below specifically) slows things down. Big prepared XA transactions would be idling around - and apparently creating hot spots for overall execution when their commits finally arrive.
There is no slowdown from this. Running a big transaction takes the same time whether you end it with an XA PREPARE or an XA COMMIT.
Slowdown is apparent at failover. There's nothing good in having some operations delayed in general, especially that we must have agreed (in the past at least) on the dynamic forced (by "circumstances") rollback idea. (Personally I would add, to hear defenses like that from the author or the optimistic parallel replication is as painful as blasphemy from the mouth of a local priest :-)!)
In fact, my proposal will speed things up, because only one 2-phase commit between binlog and engine is needed per transaction. While in current implementation two are needed, one for XA PREPARE and one for XA COMMIT.
And a simple sequence like this will be able to group commit together (on the slave):
XA PREPARE 't1'; XA COMMIT 't1'; XA PREPARE 't2'; XA COMMIT 't2'; XA PREPARE 't3'; XA COMMIT 't3';
I believe in the current code, it's impossible to group-commit the XA PREPARE together with the XA COMMIT on a slave?
Of course they can be in one group. That's the part II ready for review in bb-10.6-MDEV-31949.
- Kristian.
All the best, Andrei
My apologies for
In fact, my proposal will speed things up, because only one 2-phase commit between binlog and engine is needed per transaction. While in current implementation two are needed, one for XA PREPARE and one for XA COMMIT.
And a simple sequence like this will be able to group commit together (on the slave):
XA PREPARE 't1'; XA COMMIT 't1'; XA PREPARE 't2'; XA COMMIT 't2'; XA PREPARE 't3'; XA COMMIT 't3';
I believe in the current code, it's impossible to group-commit the XA PREPARE together with the XA COMMIT on a slave?
Of course they can be in one group. That's the part II ready for review in bb-10.6-MDEV-31949.
this hastily statement. In fact XA PREPARE (XAP) xid can't be in one binlog group with XA COMMIT (XAC) xid. Yet they can run in parallel up to the wait-for-prior commit by XAC. In normal cases XAC_k+i would be waiting for XAP_k 's completion which includes (per your idea) xid release (partly also addressed by the part I of the branch). This extra communication can be optimized away. For instance at the price of more sophisticated recovery (further details are deferred).
Hi! On Thu, Jan 25, 2024 at 8:09 PM <andrei.elkin@pp.inet.fi> wrote:
Kristian, this statement of row-based primary-key only replication requires at least a confirmation with a test. Oth ROW may be somewhat difficult to dismiss and I am not going to now, but the PK is too much as a UK guarantees correctness, please read on. So far we only have MDEV 32020 about non-UK ROW format vulnerability.
Note the Unique key is not enough. All key parts has also be NOT NULL, otherwise you have all the same problems as you have with a table without any keys. Regards, Monty
participants (3)
-
andrei.elkin@pp.inet.fi
-
Kristian Nielsen
-
Michael Widenius