Re: [Maria-developers] 889f90365ef: MDEV-23536 : Race condition between KILL and transaction commit
Hi, Jan! On Dec 19, Jan Lindström wrote:
revision-id: 889f90365ef (mariadb-10.2.31-623-g889f90365ef) parent(s): 74223c33d1a author: Jan Lindström <jan.lindstrom@mariadb.com> committer: Jan Lindström <jan.lindstrom@mariadb.com> timestamp: 2020-12-18 16:10:39 +0200 message:
MDEV-23536 : Race condition between KILL and transaction commit ... If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion:
diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 7eaafbd9044..46aeb4f2bb3 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -1437,6 +1437,31 @@ void THD::cleanup(void) void THD::free_connection() { DBUG_ASSERT(free_connection_done == 0); +#ifdef WITH_WSREP + /* + wsrep_rgi should be protected with LOCK_thd_data mutex and + mutexing ordering rules state that LOCK_thd_data must be taken + before LOCK_thd_kill. + */ + if (wsrep_rgi && WSREP(this)) mysql_mutex_lock(&LOCK_thd_data); +#endif + /* + Other threads may have a lock on LOCK_thd_kill to ensure that this + THD is not deleted while they access it. The following mutex_lock + ensures that no one else is using this THD and it's now safe to + continue. + */ + mysql_mutex_lock(&LOCK_thd_kill); + mysql_mutex_unlock(&LOCK_thd_kill);
Note that THD destructor starts from `assert_not_linked()` - it's important, because if the THD is linked in the list of threads, then someone might start using it after you released LOCK_thd_kill. With the new logic you need to move assert_not_linked to free_connection(). otherwise ok to push
+ +#ifdef WITH_WSREP + if (rpl_group_info *rgi= wsrep_rgi) + { + wsrep_rgi= 0; + if (WSREP(this)) mysql_mutex_unlock(&LOCK_thd_data); + delete rgi; + } +#endif my_free(db); db= NULL;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Sergei, Jan, I have already approved the InnoDB changes. On Sat, Dec 19, 2020 at 12:43 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Jan! [snip]
+ mysql_mutex_lock(&LOCK_thd_kill); + mysql_mutex_unlock(&LOCK_thd_kill);
Note that THD destructor starts from `assert_not_linked()` - it's important, because if the THD is linked in the list of threads, then someone might start using it after you released LOCK_thd_kill.
With the new logic you need to move assert_not_linked to free_connection().
otherwise ok to push
Do you mean that also the empty mutex lock/unlock pair should be removed from THD::~THD()? It was copied (not moved) from there in this patch. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
Hi, Marko! On Dec 19, Marko Mäkelä wrote:
I have already approved the InnoDB changes.
Yes, I've noticed :)
On Sat, Dec 19, 2020 at 12:43 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Jan! [snip]
+ mysql_mutex_lock(&LOCK_thd_kill); + mysql_mutex_unlock(&LOCK_thd_kill);
Note that THD destructor starts from `assert_not_linked()` - it's important, because if the THD is linked in the list of threads, then someone might start using it after you released LOCK_thd_kill.
With the new logic you need to move assert_not_linked to free_connection().
otherwise ok to push
Do you mean that also the empty mutex lock/unlock pair should be removed from THD::~THD()? It was copied (not moved) from there in this patch.
Not exactly. In 10.2 THD::~THD() only has mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_unlock(&LOCK_thd_data); it's not what Jan did. So I thought it'd be safer to leave it in place. But 10.5 has if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_lock(&LOCK_thd_kill); mysql_mutex_unlock(&LOCK_thd_kill); if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data); which is *exactly* what Jan did, so yes, when merging this chunk should go away (and it's quite likely that you'll be doing this merge, so please don't forget). Because of Jan's change in THD::~THD() there will be a merge conflict in the destructor, so it cannot slip unnoticed and auto-merged. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei,
Not exactly. In 10.2 THD::~THD() only has
mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_unlock(&LOCK_thd_data);
it's not what Jan did. So I thought it'd be safer to leave it in place. But 10.5 has
if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_lock(&LOCK_thd_kill); mysql_mutex_unlock(&LOCK_thd_kill); if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data);
which is *exactly* what Jan did, so yes, when merging this chunk should go away (and it's quite likely that you'll be doing this merge, so please don't forget). Because of Jan's change in THD::~THD() there will be a merge conflict in the destructor, so it cannot slip unnoticed and auto-merged.
I do not follow merging this chunk should go away here. 10.2 : https://github.com/MariaDB/server/commit/d112c363b51768b9e1d7bc7c600a4358a10... 10.3 : https://github.com/MariaDB/server/commit/79b22723d87459f2bf6265b7fcc163ef63a... 10.6 : https://github.com/MariaDB/server/commit/cede9020fa7da400bb285780ae6d8a3e15a... Looking into KILL: 10.2: KILL find_thread_by_id acquires LOCK_THD_data and THD::set_killed aquires LOCK_thd_kill 10.3-10.6: KILL find_thread_by_id acquires LOCK_thd_data (wsrep) and LOCK_thd_data THD::set_killed acquires LOCK_thd_kill In all cases I do: void THD::free_connection() { DBUG_ASSERT(free_connection_done == 0); #ifdef WITH_WSREP /* wsrep_rgi should be protected with LOCK_thd_data mutex and mutexing ordering rules state that LOCK_thd_data must be taken before LOCK_thd_kill. */ if (wsrep_rgi) mysql_mutex_lock(&LOCK_thd_data); #endif /* Other threads may have a lock on LOCK_thd_kill to ensure that this THD is not deleted while they access it. The following mutex_lock ensures that no one else is using this THD and it's now safe to continue. */ mysql_mutex_lock(&LOCK_thd_kill); mysql_mutex_unlock(&LOCK_thd_kill); #ifdef WITH_WSREP if (rpl_group_info *rgi= wsrep_rgi) { assert(WSREP(this)); wsrep_rgi= 0; mysql_mutex_unlock(&LOCK_thd_data); delete rgi; } #endif Code looks like that because Marko requested not to do delete rgi under a mutex. So do you mean that I should take LOCK_thd_data even when we do not have wsrep_rgi for wsrep as if I correctly understand wsrep_rgi is there only for applier not all THDs with WSREP(thd). R: Jan
Hi, Jan! On Dec 20, Jan Lindström wrote:
Hi Sergei,
Not exactly. In 10.2 THD::~THD() only has
mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_unlock(&LOCK_thd_data);
it's not what Jan did. So I thought it'd be safer to leave it in place. But 10.5 has
if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data); mysql_mutex_lock(&LOCK_thd_kill); mysql_mutex_unlock(&LOCK_thd_kill); if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data);
which is *exactly* what Jan did, so yes, when merging this chunk should go away (and it's quite likely that you'll be doing this merge, so please don't forget). Because of Jan's change in THD::~THD() there will be a merge conflict in the destructor, so it cannot slip unnoticed and auto-merged.
I do not follow merging this chunk should go away here. ... So do you mean that I should take LOCK_thd_data even when we do not have wsrep_rgi for wsrep as if I correctly understand wsrep_rgi is there only for applier not all THDs with WSREP(thd).
No, I mean that 10.5 has the the same code in THD::~THD() that you have added to free_connection() in 10.2. So when your patch will be merged to 10.5, this old code should be removed from THD::~THD(), there's no need to lock same mutexes twice. Here's your code: https://github.com/MariaDB/server/blob/cede9020fa/sql/sql_class.cc#L1617-L16... so this old code (in the same file, a bit down) can be removed: https://github.com/MariaDB/server/blob/cede9020fa/sql/sql_class.cc#L1725-L17... Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (3)
-
Jan Lindström
-
Marko Mäkelä
-
Sergei Golubchik