
Some interesting work is starting to be done by users to test parallel replication on real workloads. One issue that comes up is the performance on workloads that have a relatively high number of lock conflicts between transactions. I made some patches (against latest 10.1) to help investigate and improve the issues, and I wanted to mention them here in case someone wants to comment on or experiment with them: 1. Adding status variables for parallel replication. http://lists.askmonty.org/pipermail/commits/2015-July/008131.html This patch adds new status variables that measure the time spent by parallel replication worker threads being - Idle (waiting for work from the SQL thread). - Processing relay log events. - Waiting for a prior group commit to finish (measuring the overhead of insufficient parallelism recorded on the master, in conservative mode, or the overhead of serialisation around DDL, in optimistic/aggressive modes). - Waiting for the immediately prior transaction to commit (measuring the overhead of in-order parallel replication). - Rolling back and re-executing events due to deadlocks. It would be interesting to see how the different numbers compare on various workloads and parallel replication modes. 2. More aggressive retry of conflicting transactions. http://lists.askmonty.org/pipermail/commits/2015-July/008133.html When we get a deadlock and a transaction retry, the current MariaDB waits for all prior transactions to commit before retrying. The logic is that since we already got a conflict, there is a high risk that an immediate retry will just give another conflict. So if we have T1 T2 T3 T4, and T4 conflicts with T1, we roll back T4, wait for T3 to commit, and only then retry T4. If we have many such conflicts, we could end up wasting a lot of times on such waits. This patch changes aggressive mode so that T4 will only wait for T1 to commit, then it will retry. This allows T4 to run in parallel with T2 and T3. It will be interesting to see if this improves throughput in aggressive mode in some workloads, and also to see if/how much it increases the number of transaction retries and associated overhead. 3. Debug patch to log all row lock waits. http://lists.askmonty.org/pipermail/commits/2015-July/008132.html This is not a patch suitable for production use. It adds an option --gtid-log-all-lock-conflicts. When enabled, whenever one parallel replication transaction needs to wait on the InnoDB row lock of another, it will log a line to the error log, including the GTIDs of the transactions. This could be used to correlate back with the binlog and study exactly which transactions it is that are conflicting with each other. Such results could again be highly interesting. - Kristian.