Marko Mäkelä via developers <developers@lists.mariadb.org> writes:
For better testing of this, I think that we really need a 10.6 based branch that contains both this and MDEV-31482.
I did an initial merge to 10.6. The victim selection code is in Deadlock::report() in lock0lock.cc. I noticed one thing in the current 10.6 code: if (next == victim) trx_pos= l; ... if (trx_pos && trx_weight == victim_weight) { victim= trx; victim_pos= trx_pos; IIUC, the intention with this code is that if the current transaction and another transaction are both candidates for being the deadlock victim, select the current transaction as the preferred candidate? If so, seems there's a small mistake in the code. The current condition will always be true for each new value of victim, so trx_pos will always be set != 0. The first condition should probably have been "if (trx == victim)" ? In theory I think this means we could wrongly select the current trx as the deadlock victim in the case of a "P-shaped" path, and thus the deadlock would not be broken. I thought about coming up with a test case to show this, but it looks like it's not easy to get the same TRX_WEIGHT for trx and victim. If so, maybe this part of the code can just be removed, or what do you think? Another important issue I noticed, though not directly related to MDEV-31655, is this: /* Even though lock_wait_rpl_report() has nothing to do with deadlock detection, it was always disabled by innodb_deadlock_detect=OFF. We will keep it in that way, because unfortunately thd_need_wait_reports() will hold even if parallel (or any) replication is not being used. We want to be allow the user to skip lock_wait_rpl_report(). */ const bool rpl= !(type_mode & LOCK_AUTO_INC) && trx->mysql_thd && innodb_deadlock_detect && thd_need_wait_reports(trx->mysql_thd); So this is a bug I introduced in my original implementation. The reporting of lock waits is essential for optimistic parallel replication to work at all. Without it, the following will happen: - Transactions T1 and T2 replicate in parallel, run out-of-order. - T2 updates some row, goes to commit and wait_for_prior_commit(T1). - T1 goes to update the same row, gets a lock wait on T2. - T1 eventually gets a lock wait timeout and is retried. - The retry of T1 will get the same problem, and replication eventually breaks when the --slave-transactions-retries value is reached. I think for parallel replication, we have to do the lock wait reporting always to avoid this. Since we are anyway going to do a lock wait, an expensive operation, I think it is reasonable to accept the small overhead of a lock wait report to the server layer. However, the reason thd_need_wait_reports() also returns true for non-replication threads (when the binlog is enabled) is to write a hint in the GTID event that a transaction had a wait on the master. This is non-critical and could be avoided if we want. Maybe something we should look into/discuss.
I sent some minor comments that apply to the InnoDB code changes.
Thanks! Fixed, except for this one:
+ extern "C" int thd_deadlock_victim_preference(const MYSQL_THD thd1, const MYSQL_THD thd2);
Usually, server function prototypes for InnoDB are declared in storage/innobase/include/ha_prototypes.h. Please move this there
The other similar functions (thd_rpl_deadlock_check()) are declared in a section in lock0lock.cc, so I moved this declaration there as well for consistency (but let me know if I you still prefer ha_prototypes.h for this or maybe all of them).
I think that this needs to be reviewed by the replication team as well.
Yes, Andrei said he would review also.
Whoever reviews that should pay attention to the stability of the tests. We have a few replication tests that fail rather often.
Tell me about it ... replication tests, and in general any test where many parallel threads of operation are involved, are notoriously hard to get stable. I'll make sure to check for any failures in Buildbot. I 100% agree with the need to be careful with this. In fact this problem was found because I investigated a sporadic failure in the test case rpl_mark_optimize_tbl_ddl which was caused by the errorneous removal of this code.
In the patch, note in particular the modification to has_higher_priority(), which I think has the effect of choosing which of two waiting transactions to grant a lock to. I don't remember adding this myself, so it might have
Because innodb_lock_schedule_algorithm=VATS is known to be broken even without any involvement of replication, I do not think that we have to care too much about it.
Ok.
I must trust your judgement here; I have no fundamental objection.
Thanks, Marko. I pushed an initial merge to 10.6 of this, but I'll refine it once I fully understand the code around trx_pos: https://github.com/MariaDB/server/tree/knielsen_bugfix_10.6 https://github.com/MariaDB/server/commit/b475257dc2297a8a7271367ad4a8d659297... The code with review comments fixed for 10.4 is here: https://github.com/MariaDB/server/tree/knielsen_bugfix https://github.com/MariaDB/server/commit/24679f3daf57c8c62faf019f732ef263472... - Kristian.