Re: [MariaDB commits] [PATCH 1/6] MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates
Howdy Kristian.
Before doing mark_start_commit(), check that there is no pending deadlock kill. If there is a pending kill, we won't commit (we will abort, roll back, and retry). Then we should not mark the commit as started, since that could potentially make the following GCO start too early, before we completed the commit after the retry.
This condition could trigger in some corner cases, where InnoDB would take temporarily table/row locks that are released again immediately, not held until the transaction commits. This happens with dict_stats updates and possibly auto-increment locks.
Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock kill to be scheduled in the background. But since the blocking locks are held only temporarily, they can be released before the background kill happens. This way, the kill can be delayed until after mark_start_commit() has been called. Thus we need to check the synchronous indication rgi->killed_for_retry, not just the asynchroneous thd->killed.
I think I understood what's going on, also thanks to a verbose Jira ticket description.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org> --- sql/rpl_parallel.cc | 20 ++++++++++++++++---- sql/rpl_rli.cc | 17 +++++++++++++++++ 2 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 1cfdf96ee3b..9c4222d7817 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -1450,11 +1450,23 @@ handle_rpl_parallel_thread(void *arg) after mark_start_commit(), we have to unmark, which has at least a theoretical possibility of leaving a window where it looks like all transactions in a GCO have started committing, while in fact one - will need to rollback and retry. This is not supposed to be possible - (since there is a deadlock, at least one transaction should be - blocked from reaching commit), but this seems a fragile ensurance, - and there were historically a number of subtle bugs in this area. + will need to rollback and retry. + + Normally this will not happen, since the kill is there to resolve a + deadlock that is preventing at least one transaction from proceeding. + One case it can happen is with InnoDB dict stats update, which can + temporarily cause transactions to block each other, but locks are + released immediately, they don't linger until commit. There could be + other similar cases, there were historically a number of subtle bugs + in this area. + + But once we start the commit, we can expect that no new lock + conflicts will be introduced. So by handling any lingering deadlock + kill at this point just before mark_start_commit(), we should be + robust even towards spurious deadlock kills. */ + if (rgi->killed_for_retry != rpl_group_info::RETRY_KILL_NONE) + wait_for_pending_deadlock_kill(thd, rgi);
Assuming my understanding, please correct me if anything, I left a question on this block in https://github.com/MariaDB/server/commit/df0c36a354ffde2766ebed2615642c74b79... quoted further. ... this guard would not let a victim of an optimistic conflict proceed until it clears out itself it is such a victim. For instance in the binlog sequence like T_1, T_2, D_3 where T_2 is the victim, D_3 is a DDL, the victim T_2 won't anymore release the next gco member D_3 into its execution. But what if at this point T_2 is only going to be marked in rgi->killed_for_retry by T_1? That is it is apparently able, with this patch as well, having ...NONE killed status go through. T_1 would reach this very point later and when that is done before T_2 finds itself killed, T_1 would release D_3, and afterward T_2 would finally see itself KILLed into retry. Cheers, Andrei
Hi Andrei, thanks a lot for looking at this patch! andrei.elkin@pp.inet.fi writes:
+ if (rgi->killed_for_retry != rpl_group_info::RETRY_KILL_NONE) + wait_for_pending_deadlock_kill(thd, rgi);
Assuming my understanding, please correct me if anything, I left a question on this block in https://github.com/MariaDB/server/commit/df0c36a354ffde2766ebed2615642c74b79... quoted further.
... this guard would not let a victim of an optimistic conflict proceed until it clears out itself it is such a victim. For instance in the binlog sequence like
T_1, T_2, D_3
where T_2 is the victim, D_3 is a DDL, the victim T_2 won't anymore release the next gco member D_3 into its execution.
But what if at this point T_2 is only going to be marked in rgi->killed_for_retry by T_1?
That is it is apparently able, with this patch as well, having ...NONE killed status go through. T_1 would reach this very point later and when that is done before T_2 finds itself killed, T_1 would release D_3, and afterward T_2 would finally see itself KILLed into retry.
I am not entirely sure I understood your question correctly, hopefully I got it right. I think your question is that this patch only helps if T_1 sets rgi->killed_for_retry _before_ T_2 executes the above code that does wait_for_pending_deadlock_kill(). So what if T_1 does so a bit later, after T_2 has already proceeded beyond this check? This is a very good question, and the answer is complex, hopefylly I can explain it. The problem addressed by this patch in fact only occurs in the first case, where T_1 sets rgi->killed_for_retry _before_ T_2 executes the above wait_for_pending_deadlock_kill(). What happens when the problem occurs is the following: 1. T_2 gets a lock inside InnoDB, on the dict_stats table in this case. 2. T_1 requests a conflicting lock on dict_stats. It calls thd_rpl_deadlock_check() which sets rgi->killed_for_retry and schedules a task for the manager thread to kill T_2. But the manager thread does not service the request yet. 3. T_1 goes to wait on the lock. 4. T_2 later releases the lock on dict_stats - this is a special case, normally DML locks are held until COMMIT. 5. T_2 now gets ready to commit, and reaches the above code added in the patch. The manager thread has not yet serviced the request, so thd->killed is not set. But rgi->killed_for_retry _is_ set, as this is done synchroneously in thd_rpl_deadlock_check(). 6. T_1 now wakes up, since the lock on dict_stats was released earlier by T_2. 7. T_1 can now proceed to commit and call mark_start_commit() 8. Only now does the manager thread wake up to service the request, and it kills the T_2, setting thd->killed. Before the patch, at this point T_2 could run mark_start_commit(). This could allow D_3 to start too early, if T_2 later sees the thd->killed flag being set and rolls back and retries. With the patch, this is no longer possible, as T_2 will in (5) run wait_for_pending_deadlock_kill(). This will wait for the manager thread to complete, and then thd->killed will be set and T_2 will rollback and retry _before_ calling mark_start_commit(). Now consider the second case, where T_1 only sets rgi->killed_for_retry _after_ T_2 has passed the above code point: 1. T_2 acquires some lock inside InnoDB 2. T_2 passes the above code point with the check for rgi->killed_for_retry, it calls mark_start_commit() and starts the commit process. 3. T_1 only now gets a lock conflict with T_2, it calls thd_rpl_deadlock_check(), sets rgi->killed_for_retry, schedules a task for the manager thread to kill T_2 4. T_1 goes to wait on the lock. In this situation, the lock in InnoDB was still held by T_2 when it ran mark_start_commit(). This will now prevent T_1 from waking up and itself doing mark_start_commit(), until one of two happens: (A) T_2 commits; or (B) T_2 sees the thd->killed and rolls back, in which case T_2 will first call unmark_start_commit(), which avoids D_3 starting too early. So the problem here is the special case of InnoDB locks on dict_stats, which are held only temporarily and released _before_ commit/rollback (autoinc is another such case). Normal locks held until commit/rollback are not the problem, as these will prevent the other transaction T_1 from calling mark_start_commit() too early. And the patch works under the assumption that such special case locks that will be released before commit/rollback, can no longer be acquired after this point in the code, where only the commit step of the transaction remains. This part of the parallel replication code is something that I was always unhappy with. It feels very fragile to rely on this property, that _if_ T_2 gets killed after mark_start_commit(), we can be sure that another transaction will then always be blocked on some lock held by T_2 until after T_2 has time to do unmark_start_commit(). The patch doesn't fundamentally change this, but it does solve a real bug that occurs in the scenario explained above. - Kristian.
I am not entirely sure I understood your question correctly, hopefully I got it right. I think your question is that this patch only helps if T_1 sets rgi->killed_for_retry _before_ T_2 executes the above code that does wait_for_pending_deadlock_kill(). So what if T_1 does so a bit later, after T_2 has already proceeded beyond this check?
That's so.
Now consider the second case, where T_1 only sets rgi->killed_for_retry _after_ T_2 has passed the above code point:
1. T_2 acquires some lock inside InnoDB
2. T_2 passes the above code point with the check for rgi->killed_for_retry, it calls mark_start_commit() and starts the commit process.
3. T_1 only now gets a lock conflict with T_2, it calls thd_rpl_deadlock_check(), sets rgi->killed_for_retry, schedules a task for the manager thread to kill T_2
4. T_1 goes to wait on the lock.
In this situation, the lock in InnoDB was still held by T_2 when it ran mark_start_commit().
Hmm, this is unclear. I thought you had a case in which T_2 releases an innodb stats lock at the end of its current statement. And T_1 would try to acquire the lock within that timeframe. Am I confused in the above? It'd be great to have a T_2 worker stack at time of the lock is granted to it, but it's not provided.. Obvisoulsy under this perception T_2 would execute mark_start_commit() after the lock is released. Please clear it out.
andrei.elkin@pp.inet.fi writes:
Now consider the second case, where T_1 only sets rgi->killed_for_retry _after_ T_2 has passed the above code point:
1. T_2 acquires some lock inside InnoDB
2. T_2 passes the above code point with the check for rgi->killed_for_retry, it calls mark_start_commit() and starts the commit process.
3. T_1 only now gets a lock conflict with T_2, it calls thd_rpl_deadlock_check(), sets rgi->killed_for_retry, schedules a task for the manager thread to kill T_2
4. T_1 goes to wait on the lock.
In this situation, the lock in InnoDB was still held by T_2 when it ran mark_start_commit().
Hmm, this is unclear. I thought you had a case in which T_2 releases an innodb stats lock at the end of its current statement. And T_1 would try to acquire the lock within that timeframe.
I was describing two different cases, "first case" and "second case". The "first case" is the case where the bug occurred, and the one my patch fixes. It is just as you describe, T_2 releases an innodb stats lock at the end of its current statement. The above "second case" describes a different scenario, trying to explain why this scenario does not cause a bug. This case differs from the "first case" in that the lock taken by T_2 is _not_ released at the end of its current statement, it is only released at commit/rollback. Sorry if this was not clear.
Am I confused in the above? It'd be great to have a T_2 worker stack at time of the lock is granted to it, but it's not provided..
I have included below a stack trace from my notes, where an insert goes to take a dict stats lock.
Obvisoulsy under this perception T_2 would execute mark_start_commit() after the lock is released.
For the scenario where T_2 executes mark_start_commit() after the lock is released, and the bug occurred, you should look at the "first case", described in points 1-8. The "second case" that you found unclear you can just ignore. The point of that was to explain what happens in a different scenario, where T_2 does not release its lock until the transaction is rolled back. In this case we avoid starting D_3 too early, because T_2 does unmark_start_commit() before rolling back and releasing its lock, so there is no bug (with or without the patch).
Please clear it out.
I hope this helps. - Kristian. #13 0x000055af4d51a81e in lock_table_for_trx (table=0x7f30ac0018d8, trx=0x7f30ec7efe80, mode=LOCK_X, no_wait=false) at /kvm/src/my/mariadb/storage/innobase/lock/lock0lock.cc:3831 #14 0x000055af4d7baad8 in dict_stats_save (table_orig=0x7f308897a368, only_for_index=0x0) at /kvm/src/my/mariadb/storage/innobase/dict/dict0stats.cc:3298 #15 0x000055af4d7bcc3b in dict_stats_update (table=0x7f308897a368, stats_upd_option=DICT_STATS_RECALC_PERSISTENT) at /kvm/src/my/mariadb/storage/innobase/dict/dict0stats.cc:4096 #16 0x000055af4d7bd01a in dict_stats_update (table=0x7f308897a368, stats_upd_option=DICT_STATS_FETCH_ONLY_IF_NOT_IN_MEMORY) at /kvm/src/my/mariadb/storage/innobase/dict/dict0stats.cc:4216 #17 0x000055af4d455462 in dict_stats_init (table=0x7f308897a368) at /kvm/src/my/mariadb/storage/innobase/include/dict0stats.inl:165 #18 0x000055af4d474eee in ha_innobase::info_low (this=0x7f3088981450, flag=282, is_analyze=false) at /kvm/src/my/mariadb/storage/innobase/handler/ha_innodb.cc:14754 #19 0x000055af4d4758dd in ha_innobase::info (this=0x7f3088981450, flag=282) at /kvm/src/my/mariadb/storage/innobase/handler/ha_innodb.cc:15018 #20 0x000055af4d461690 in ha_innobase::open (this=0x7f3088981450, name=0x7f3088991e90 "./test/t6") at /kvm/src/my/mariadb/storage/innobase/handler/ha_innodb.cc:6131 #21 0x000055af4d0c42b0 in handler::ha_open (this=0x7f3088981450, table_arg=0x7f307c013188, name=0x7f3088991e90 "./test/t6", mode=2, test_if_locked=18, mem_root=0x0, partitions_to_open=0x0) at /kvm/src/my/mariadb/sql/handler.cc:3375 #22 0x000055af4ce7999a in open_table_from_share (thd=0x7f3088000f88, share=0x7f30889918d0, alias=0x7f308800a9f0, db_stat=33, prgflag=8, ha_open_flags=18, outparam=0x7f307c013188, is_create_table=false, partitions_to_open=0x0) at /kvm/src/my/mariadb/sql/table.cc:4459 #23 0x000055af4cc541b7 in open_table (thd=0x7f3088000f88, table_list=0x7f308800a9a8, ot_ctx=0x7f30e4288d00) at /kvm/src/my/mariadb/sql/sql_base.cc:2196 #24 0x000055af4cc58768 in open_and_process_table (thd=0x7f3088000f88, tables=0x7f308800a9a8, counter=0x7f30e4288d94, flags=0, prelocking_strategy=0x7f30e4288e18, has_prelocking_list=false, ot_ctx=0x7f30e4288d00) at /kvm/src/my/mariadb/sql/sql_base.cc:4126 #25 0x000055af4cc59a69 in open_tables (thd=0x7f3088000f88, options=..., start=0x7f30e4288d78, counter=0x7f30e4288d94, flags=0, prelocking_strategy=0x7f30e4288e18) at /kvm/src/my/mariadb/sql/sql_base.cc:4613 #26 0x000055af4cc5b850 in open_and_lock_tables (thd=0x7f3088000f88, options=..., tables=0x7f308800a9a8, derived=true, flags=0, prelocking_strategy=0x7f30e4288e18) at /kvm/src/my/mariadb/sql/sql_base.cc:5587 #27 0x000055af4cc11466 in open_and_lock_tables (thd=0x7f3088000f88, tables=0x7f308800a9a8, derived=true, flags=0) at /kvm/src/my/mariadb/sql/sql_base.h:510 #28 0x000055af4ccb46d0 in mysql_insert (thd=0x7f3088000f88, table_list=0x7f308800a9a8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /kvm/src/my/mariadb/sql/sql_insert.cc:768 #29 0x000055af4cd05d06 in mysql_execute_command (thd=0x7f3088000f88, is_called_from_prepared_stmt=false) at /kvm/src/my/mariadb/sql/sql_parse.cc:4579
andrei.elkin@pp.inet.fi writes:
Now consider the second case, where T_1 only sets rgi->killed_for_retry _after_ T_2 has passed the above code point:
1. T_2 acquires some lock inside InnoDB
2. T_2 passes the above code point with the check for rgi->killed_for_retry, it calls mark_start_commit() and starts the commit process.
3. T_1 only now gets a lock conflict with T_2, it calls thd_rpl_deadlock_check(), sets rgi->killed_for_retry, schedules a task for the manager thread to kill T_2
4. T_1 goes to wait on the lock.
In this situation, the lock in InnoDB was still held by T_2 when it ran mark_start_commit().
Hmm, this is unclear. I thought you had a case in which T_2 releases an innodb stats lock at the end of its current statement. And T_1 would try to acquire the lock within that timeframe.
I was describing two different cases, "first case" and "second case".
The "first case" is the case where the bug occurred, and the one my patch fixes. It is just as you describe, T_2 releases an innodb stats lock at the end of its current statement.
The above "second case" describes a different scenario, trying to explain why this scenario does not cause a bug. This case differs from the "first case" in that the lock taken by T_2 is _not_ released at the end of its current statement, it is only released at commit/rollback.
Sorry if this was not clear.
Am I confused in the above? It'd be great to have a T_2 worker stack at time of the lock is granted to it, but it's not provided..
I have included below a stack trace from my notes, where an insert goes to take a dict stats lock.
Obvisoulsy under this perception T_2 would execute mark_start_commit() after the lock is released.
For the scenario where T_2 executes mark_start_commit() after the lock is released, and the bug occurred, you should look at the "first case", described in points 1-8.
The "second case" that you found unclear you can just ignore.
Got it. Thanks for spelling out! I see I should've asked specifically on the first case. I meant the following. What if p.4 lock release by T_2 happened within p.2, and before T_2::rgi->killed_for_retry is changed. Something like 1. T_2 gets a lock inside InnoDB, on the dict_stats table in this case. 2. T_1 requests a conflicting lock on dict_stats. It calls thd_rpl_deadlock_check() ... + 4 ... which sets rgi->killed_for_retry and schedules a task for the manager thread to kill T_2. But the manager thread does not service the request yet. - 3. T_1 goes to wait on the lock. + 3. T_1 is granted the lock immediately. - 4. T_2 later releases the lock on dict_stats - this is a special case, normally DML locks are held until COMMIT. But I see now that the step 4 relocation is impossible due to Innodb locking protocol. So that proves T_2 can't miss to find its killing status at mark_start_commit().
that was to explain what happens in a different scenario, where T_2 does not release its lock until the transaction is rolled back. In this case we avoid starting D_3 too early, because T_2 does unmark_start_commit() before rolling back and releasing its lock, so there is no bug (with or without the patch).
Please clear it out.
I hope this helps.
No more question, as we're done :-). Thanks for this piece of analysis and work! Andrei
- Kristian.
participants (2)
-
andrei.elkin@pp.inet.fi
-
Kristian Nielsen