Rich Prohaska <prohaska7@gmail.com> writes:
Is TokuDB supposed to call the thd report wait for API just prior to a thread about to wait on a tokudb lock?
Yes, that's basically it. Optimistic parallel replication runs transactions in parallel, but enforces that they commit in the original order. So suppose we have transactions T1 followed by T2 in the replication stream, and that they try to update the same row. When T2 gets ready to commit, it needs to wait for T1 to commit first (this is what you see in wait_for_prior_commit()). However, if T1 is waiting on a row lock held by T2, we have a deadlock. thd_report_wait_for() checks for this condition. If a transaction goes to wait on a lock held by a later (in terms of in-order replication) transaction, the later transaction is killed (using the normal thread kill mechanism). Parallel replication then gracefully handles the kill (by rollback and retry). You can see in storage/xtradb/lock/lock0lock.cc how this is done for InnoDB/XtraDB, eg. lock_report_waiters_to_mysql(). Hopefully it would be easy to hook this into TokuDB. It does require being able to locate the transaction (and in particular the THD) that owns a given lock. Another potential issue (at least it was for InnoDB/XtraDB) is that thd_report_wait_for() can call back into the handlerton->kill_query method, so the callor of thd_report_wait_for() needs to be prepared for this to happen. Note that we can modify/extend the thd_report_wait_for() API to work better for TokuDB, if necessary. The current API was deliberately left "internal" (not a service with public headerfile etc.) in anticipation that it might need changing to better support other storage engines, such as TokuDB. Also note that the call to thd_report_wait_for() does not need to happen "just prior" to the lock wait - it can happen later, as long as it happens at some point (though of course the earlier the better, in terms of more quickly resolving the deadlock and allowing replication to proceed).
I have been running sysbench oltp with a mariadb 10.1 master-slave topology. I have not seen any replication errors when slave parallel mode is conservative.
No, it should not happen, because in conservative mode transactions are not run in parallel on a slave unless they ran without lock conflicts on the master (both transactions reached the commit point at the same time). But in InnoDB/XtraDB, there are some interesting (but very rare) corner cases where two transactions may or may not have lock conflicts depending on the exact order of execution. So for these cases, the thd_report_wait_for() mechanism is also needed. - Kristian.