[MariaDB developers] Re: [Commits] cb06612a9da: MDEV-31655: Parallel replication deadlock victim preference code errorneously removed

6 Aug 2023

      Marko Mäkelä via developers <developers@lists.mariadb.org> writes:
...
For
better testing of this, I think that we really need a 10.6 based
branch that contains both this and MDEV-31482.
I did an initial merge to 10.6. The victim selection code is in
Deadlock::report() in lock0lock.cc.

I noticed one thing in the current 10.6 code:

        if (next == victim)
          trx_pos= l;
      ...
      if (trx_pos && trx_weight == victim_weight) {
        victim= trx;
        victim_pos= trx_pos;

IIUC, the intention with this code is that if the current transaction and
another transaction are both candidates for being the deadlock victim,
select the current transaction as the preferred candidate?

If so, seems there's a small mistake in the code. The current condition will
always be true for each new value of victim, so trx_pos will always be set
!= 0. The first condition should probably have been "if (trx == victim)" ?

In theory I think this means we could wrongly select the current trx as the
deadlock victim in the case of a "P-shaped" path, and thus the deadlock
would not be broken. I thought about coming up with a test case to show
this, but it looks like it's not easy to get the same TRX_WEIGHT for trx and
victim. If so, maybe this part of the code can just be removed, or what do
you think?

Another important issue I noticed, though not directly related to
MDEV-31655, is this:

  /* Even though lock_wait_rpl_report() has nothing to do with
  deadlock detection, it was always disabled by innodb_deadlock_detect=OFF.
  We will keep it in that way, because unfortunately
  thd_need_wait_reports() will hold even if parallel (or any) replication
  is not being used. We want to be allow the user to skip
  lock_wait_rpl_report(). */
  const bool rpl= !(type_mode & LOCK_AUTO_INC) && trx->mysql_thd &&
    innodb_deadlock_detect && thd_need_wait_reports(trx->mysql_thd);

So this is a bug I introduced in my original implementation. The reporting
of lock waits is essential for optimistic parallel replication to work at
all. Without it, the following will happen:

 - Transactions T1 and T2 replicate in parallel, run out-of-order.
 - T2 updates some row, goes to commit and wait_for_prior_commit(T1).
 - T1 goes to update the same row, gets a lock wait on T2.
 - T1 eventually gets a lock wait timeout and is retried.
 - The retry of T1 will get the same problem, and replication eventually
   breaks when the --slave-transactions-retries value is reached.

I think for parallel replication, we have to do the lock wait reporting
always to avoid this. Since we are anyway going to do a lock wait, an
expensive operation, I think it is reasonable to accept the small overhead
of a lock wait report to the server layer.

However, the reason thd_need_wait_reports() also returns true for
non-replication threads (when the binlog is enabled) is to write a hint in
the GTID event that a transaction had a wait on the master. This is
non-critical and could be avoided if we want. Maybe something we should look
into/discuss.
...
I sent some minor comments that apply to the InnoDB code changes.
Thanks! Fixed, except for this one:
...
+ extern "C" int thd_deadlock_victim_preference(const MYSQL_THD thd1, const MYSQL_THD thd2);
...
Usually, server function prototypes for InnoDB are declared in
storage/innobase/include/ha_prototypes.h. Please move this there
The other similar functions (thd_rpl_deadlock_check()) are declared in a
section in lock0lock.cc, so I moved this declaration there as well for
consistency (but let me know if I you still prefer ha_prototypes.h for this
or maybe all of them).
...
I think that this needs to be reviewed by the replication team as well.
Yes, Andrei said he would review also.
...
Whoever reviews that should pay attention to the stability of the tests.
We have a few replication tests that fail rather often.
Tell me about it ... replication tests, and in general any test where many
parallel threads of operation are involved, are notoriously hard to get
stable. I'll make sure to check for any failures in Buildbot.

I 100% agree with the need to be careful with this. In fact this problem was
found because I investigated a sporadic failure in the test case
rpl_mark_optimize_tbl_ddl which was caused by the errorneous removal of this
code.
...
...
In the patch, note in particular the modification to has_higher_priority(),
which I think has the effect of choosing which of two waiting transactions
to grant a lock to. I don't remember adding this myself, so it might have
...
Because innodb_lock_schedule_algorithm=VATS is known to be broken even
without any involvement of replication, I do not think that we have to
care too much about it.
Ok.
...
I must trust your judgement here; I have no fundamental objection.
Thanks, Marko.

I pushed an initial merge to 10.6 of this, but I'll refine it once I fully
understand the code around trx_pos:

  https://github.com/MariaDB/server/tree/knielsen_bugfix_10.6
  https://github.com/MariaDB/server/commit/b475257dc2297a8a7271367ad4a8d659297...

The code with review comments fixed for 10.4 is here:

  https://github.com/MariaDB/server/tree/knielsen_bugfix
  https://github.com/MariaDB/server/commit/24679f3daf57c8c62faf019f732ef263472...

 - Kristian.

[MariaDB developers] Re: [Commits] cb06612a9da: MDEV-31655: Parallel replication deadlock victim preference code errorneously removed

Kristian Nielsen