Hi, Seppo, Jan! Note, this is 10.2 patch below.
commit 4b164f176e6 Author: Seppo Jaakola <seppo.jaakola@galeracluster.com> Date: Wed Sep 15 09:16:44 2021 +0300
MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
I think this should say MDEV-23328 Server hang due to Galera lock conflict resolution it'd be easier to understand the commit comment and code changes if it'll refer to the correct problem it is fixing. The fix for MDEV-25114 was the previous commit, the one that This reverts commit 29bbcac0ee841faaa68eeb09c86ff825eabbe6b6 it should have MDEV-25114 in the commit comment.
This patch is the plan D variant for fixing potetial mutex locking
"potential"
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen.
TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well.
I like the idea of using TOI apparatus to provide total isolation for KILL. Actually replicating it to nodes is kind of crude, it'd be more elegant not to replicate at all. But it'll do.
This patch also fixes mutex locking order and unprotected THD member accesses on bf aborting case. We try to hold THD::LOCK_thd_data during bf aborting. Only case where it is not possible is at wsrep_abort_transaction before call wsrep_innobase_kill_one_trx where we take InnoDB mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during close_connection and while wsrep is disconnecting connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 5ada018e540..05ec9a1c369 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -1804,11 +1804,11 @@ void THD::awake(killed_state state_to_set) the Vio might be disassociated concurrently. */
-void THD::disconnect() +void THD::disconnect_mutexed()
should be disconnect_no_mutex, cf. awake_no_mutex, set_killed_no_mutex.
{ Vio *vio= NULL;
- mysql_mutex_lock(&LOCK_thd_data); + mysql_mutex_assert_owner(&LOCK_thd_data);
set_killed(KILL_CONNECTION);
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index 3e1f248b082..2bec9c6b6cd 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -9069,6 +9069,18 @@ static void sql_kill(THD *thd, longlong id, killed_state state, killed_type type) { uint error; +#ifdef WITH_WSREP + if (WSREP(thd)) + { + WSREP_DEBUG("sql_kill called"); + if (thd->wsrep_applier) + { + WSREP_DEBUG("KILL in applying, bailing out here"); + return; + } + WSREP_TO_ISOLATION_BEGIN(WSREP_MYSQL_DB, NULL, NULL)
when can it fail and jump to wsrep_error_label? under what conditions?
+ } +#endif /* WITH_WSREP */ if (!(error= kill_one_thread(thd, id, state, type))) { if (!thd->killed) diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc index e60100e2e90..00a6dfe2f8a 100644 --- a/sql/wsrep_mysqld.cc +++ b/sql/wsrep_mysqld.cc @@ -835,13 +835,25 @@ void wsrep_thr_init() DBUG_VOID_RETURN; }
+/* This is wrapper for wsrep_break_lock in thr_lock.c */ +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, my_bool signal) +{ + THD* victim_thd= (THD *) victim_thd_ptr; + /* We need to lock THD::LOCK_thd_data to protect victim + from concurrent usage or disconnect or delete. */
How do you know victim_thd wasn't deleted before you locked LOCK_thd_data below?
+ mysql_mutex_lock(&victim_thd->LOCK_thd_data); + int res= wsrep_abort_thd(bf_thd_ptr, victim_thd_ptr, signal); + mysql_mutex_unlock(&victim_thd->LOCK_thd_data); + return res; +}
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 1f666f19d8a..921b8282e45 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -19739,27 +19739,65 @@ wsrep_abort_transaction( { DBUG_ENTER("wsrep_abort_transaction");
+ ut_ad(bf_thd); + ut_ad(victim_thd); trx_t* victim_trx= thd_to_trx(victim_thd); - trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL; + trx_t* bf_trx= thd_to_trx(bf_thd);
if (victim_trx) { + wsrep_thd_UNLOCK(victim_thd);
what keeps victim_trx from disappearing here?
lock_mutex_enter(); trx_mutex_enter(victim_trx); wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim_trx, signal); lock_mutex_exit(); trx_mutex_exit(victim_trx); wsrep_srv_conc_cancel_wait(victim_trx); + wsrep_thd_LOCK(victim_thd); DBUG_VOID_RETURN; } else { - wsrep_thd_LOCK(victim_thd); wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT); - wsrep_thd_UNLOCK(victim_thd); wsrep_thd_awake(victim_thd, signal); }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org