Hi, Jan! On Dec 19, Jan Lindström wrote:
revision-id: 889f90365ef (mariadb-10.2.31-623-g889f90365ef) parent(s): 74223c33d1a author: Jan Lindström <jan.lindstrom@mariadb.com> committer: Jan Lindström <jan.lindstrom@mariadb.com> timestamp: 2020-12-18 16:10:39 +0200 message:
MDEV-23536 : Race condition between KILL and transaction commit ... If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion:
diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 7eaafbd9044..46aeb4f2bb3 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -1437,6 +1437,31 @@ void THD::cleanup(void) void THD::free_connection() { DBUG_ASSERT(free_connection_done == 0); +#ifdef WITH_WSREP + /* + wsrep_rgi should be protected with LOCK_thd_data mutex and + mutexing ordering rules state that LOCK_thd_data must be taken + before LOCK_thd_kill. + */ + if (wsrep_rgi && WSREP(this)) mysql_mutex_lock(&LOCK_thd_data); +#endif + /* + Other threads may have a lock on LOCK_thd_kill to ensure that this + THD is not deleted while they access it. The following mutex_lock + ensures that no one else is using this THD and it's now safe to + continue. + */ + mysql_mutex_lock(&LOCK_thd_kill); + mysql_mutex_unlock(&LOCK_thd_kill);
Note that THD destructor starts from `assert_not_linked()` - it's important, because if the THD is linked in the list of threads, then someone might start using it after you released LOCK_thd_kill. With the new logic you need to move assert_not_linked to free_connection(). otherwise ok to push
+ +#ifdef WITH_WSREP + if (rpl_group_info *rgi= wsrep_rgi) + { + wsrep_rgi= 0; + if (WSREP(this)) mysql_mutex_unlock(&LOCK_thd_data); + delete rgi; + } +#endif my_free(db); db= NULL;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org