Slow or hanging parallel replication in 10.6
As we are discussing possible problems with XA parallel replication being slow or hanging in 10.6, I just want to remind of these recently fixed bugs: MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication This can cause slow or hanging parallel replication, as it hangs for --innodb-lock-wait-timeout MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait This can cause show or hanging parallel replication MDEV-31655: Parallel replication deadlock victim preference code errorneously removed This can cause parallel replication to be slow or fail with "too many retries" These bugs were fixed in 10.6.16. These bugs are not related to XA. But the symptoms of these bugs are that optimistic parallel replication occasionally is slow, hangs, or breaks under high load. So it is important not to mistakenly assume that these symptoms are necessarily caused by XA, even though the workload that has problems might be using external XA transactions. In versions prior to 10.6.16, these kinds of problems are likely to be caused by the above known bugs. (I got this list from a quick scan through the 10.6 commit history, there may be one or two more like them I missed. But the conclusion is the same, using optimistic parallel replication under high load in 10.6 requires using at least 10.6.16 to work). - Kristian.
participants (1)
-
Kristian Nielsen