As was discussed recently off-list, there is a desire to find a minimal fix to the performance issues in 10.6 with parallel replication of user XA due to non-optimal scheduling (MDEV-31949). How about something like this two-patch series? It _only_ modifies the scheduling of event groups to worker threads, and only for XA transactions, and is quite a bit simpler than the other proposed patches for MDEV-31949. The current 10.6 code for parallel replication of user XA needs to avoid applying in parallel two different event groups that have the same XA XID. Typically the XA PREPARE and corresponding XA COMMIT/ROLLBACK, but could also be another transaction that uses a duplicate XID. The method currently used in 10.6 takes a hash of the XID and assigns the transaction to worker (hash MOD N) amongst the N workers. This is sub-optimal, since there is a high chance that amonst N transactions, several of them will map to the same worker. Thus less than N transactions will be able to replicate in parallel and group commit together. The second patch in the series improves this by keeping track of recently active XIDs and which worker they were scheduled on. A subsequent event group with the same XID is scheduled on the same worker. If the XID is different, the event group is scheduled normally on the next free worker in round-robin fashion. This permits most XA PREPARE to be scheduled without dependency restrictions, same as normal non-XA transactions. Only if the user submits multiple XA transactions close together using the same duplicate XID will there be dependency restrictions. The XA COMMIT will normally be scheduled on the same worker as the XA PREPARE, unless the two events are far apart in the replication stream. This is mostly unavoidable in the current XA replication design, since the XA PREPARE and XA COMMIT of a single transaction cannot group-commit together. The first patch is a refactor/preparation patch. It rewrites the scheduling to use an explicit FIFO for scheduling the threads instead of a simple `i := (i+1) % N` cyclic counter. This allows to combine explicit scheduling of some transactions with round-robin scheduling of the rest in the second patch. Patches also available on github: https://github.com/MariaDB/server/commits/knielsen_xa_sched_minimal_fix https://github.com/MariaDB/server/commit/66d6cce96f831b638812490844d75423178... https://github.com/MariaDB/server/commit/3a32fb546c111e4627cad7c66bcc089bc11... This is just a quick proof-of-concept I did today, but it seems to work, and is small and simple and suitable for targeted fixing and cherry-picking. Brandon and/or Andrei, maybe you could try your benchmarks with this patch and see if it also solves the performance issues you were seeing? Or if there's something else required for XA that I'm missing? - Kristian. Kristian Nielsen (2): Refactor parallel replication round-robin scheduling to use explicit FIFO More precise dependency tracking of XA XID in parallel replication sql/rpl_parallel.cc | 158 ++++++++++++++++++++++++++++++++++++++------ sql/rpl_parallel.h | 48 +++++++++++++- 2 files changed, 181 insertions(+), 25 deletions(-) -- 2.30.2