[MariaDB developers] Re: [MariaDB commits] [PATCH 1/6] MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates

23 Aug 2024

      Hi Andrei, thanks a lot for looking at this patch!

andrei.elkin@pp.inet.fi writes:
...
...
+        if (rgi->killed_for_retry != rpl_group_info::RETRY_KILL_NONE)
+          wait_for_pending_deadlock_kill(thd, rgi);
...
Assuming my understanding, please correct me if anything, I left a question on this block in
https://github.com/MariaDB/server/commit/df0c36a354ffde2766ebed2615642c74b79...
quoted further.
... this guard would not let a victim of an optimistic conflict proceed until
it clears out itself it is such a victim. For instance in the binlog sequence like
T_1, T_2, D_3
where T_2 is the victim, D_3 is a DDL, the victim T_2 won't anymore
release the next gco member D_3 into its execution.
But what if at this point T_2 is only going to be marked in
rgi->killed_for_retry by T_1?
That is it is apparently able, with this patch as well, having ...NONE killed status go through.
T_1 would reach this very point later and when that is done before T_2
finds itself killed, T_1 would
release D_3, and afterward T_2 would finally see itself KILLed into retry.
I am not entirely sure I understood your question correctly, hopefully I got
it right. I think your question is that this patch only helps if T_1 sets
rgi->killed_for_retry _before_ T_2 executes the above code that does
wait_for_pending_deadlock_kill(). So what if T_1 does so a bit later, after
T_2 has already proceeded beyond this check?

This is a very good question, and the answer is complex, hopefylly I can
explain it.

The problem addressed by this patch in fact only occurs in the first case,
where T_1 sets rgi->killed_for_retry _before_ T_2 executes the above
wait_for_pending_deadlock_kill(). What happens when the problem occurs is
the following:

1. T_2 gets a lock inside InnoDB, on the dict_stats table in this case.

2. T_1 requests a conflicting lock on dict_stats. It calls
thd_rpl_deadlock_check() which sets rgi->killed_for_retry and schedules a
task for the manager thread to kill T_2. But the manager thread does not
service the request yet.

3. T_1 goes to wait on the lock.

4. T_2 later releases the lock on dict_stats - this is a special case,
normally DML locks are held until COMMIT.

5. T_2 now gets ready to commit, and reaches the above code added in the
patch. The manager thread has not yet serviced the request, so thd->killed
is not set. But rgi->killed_for_retry _is_ set, as this is done
synchroneously in thd_rpl_deadlock_check().

6. T_1 now wakes up, since the lock on dict_stats was released earlier by
T_2.

7. T_1 can now proceed to commit and call mark_start_commit()

8. Only now does the manager thread wake up to service the request, and it
kills the T_2, setting thd->killed.

Before the patch, at this point T_2 could run mark_start_commit(). This
could allow D_3 to start too early, if T_2 later sees the thd->killed flag
being set and rolls back and retries.

With the patch, this is no longer possible, as T_2 will in (5) run
wait_for_pending_deadlock_kill(). This will wait for the manager thread to
complete, and then thd->killed will be set and T_2 will rollback and retry
_before_ calling mark_start_commit().

Now consider the second case, where T_1 only sets rgi->killed_for_retry
_after_ T_2 has passed the above code point:

1. T_2 acquires some lock inside InnoDB

2. T_2 passes the above code point with the check for rgi->killed_for_retry,
it calls mark_start_commit() and starts the commit process.

3. T_1 only now gets a lock conflict with T_2, it calls
thd_rpl_deadlock_check(), sets rgi->killed_for_retry, schedules a task for
the manager thread to kill T_2

4. T_1 goes to wait on the lock.

In this situation, the lock in InnoDB was still held by T_2 when it ran
mark_start_commit(). This will now prevent T_1 from waking up and itself
doing mark_start_commit(), until one of two happens: (A) T_2 commits; or (B)
T_2 sees the thd->killed and rolls back, in which case T_2 will first call
unmark_start_commit(), which avoids D_3 starting too early.

So the problem here is the special case of InnoDB locks on dict_stats, which
are held only temporarily and released _before_ commit/rollback (autoinc is
another such case). Normal locks held until commit/rollback are not the
problem, as these will prevent the other transaction T_1 from calling
mark_start_commit() too early.

And the patch works under the assumption that such special case locks that
will be released before commit/rollback, can no longer be acquired after
this point in the code, where only the commit step of the transaction
remains.

This part of the parallel replication code is something that I was always
unhappy with. It feels very fragile to rely on this property, that _if_ T_2
gets killed after mark_start_commit(), we can be sure that another
transaction will then always be blocked on some lock held by T_2 until after
T_2 has time to do unmark_start_commit(). The patch doesn't fundamentally
change this, but it does solve a real bug that occurs in the scenario
explained above.

 - Kristian.

[MariaDB developers] Re: [MariaDB commits] [PATCH 1/6] MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates

Kristian Nielsen