On Wed, Feb 26, 2014 at 11:51 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Looking at the code it looks like most places (including THD::awake) assume that thd->mysys_var->current_mutex and thd->mysys_var->current_cond can be changed only when thd->mysys_var->mutex is locked. debug_sync.cc is breaking this assumption. I've found that Item_func_sleep::val_int(), THD::enter_cond() and THD::exit_cond() break this assumption too. I believe all of those places should be fixed to lock thd->mysys_var->mutex before any attempts to read or write to thd->mysys_var->current_mutex and thd->mysys_var->current_cond.
Hm... well, as I understand the code in THD::awake(), the idea is that it is ok for enter_cond() to _set_ the new values without locks. THD::awake() will use them if they are set, and ignore them if not. However, to _clear_ them, the mysys_var->mutex needs to be held - and awake() holds this mutex while using the values.
It would seem a bit sad to introduce an extra mutex for every call to enter_cond(), only to satisfy some debug code which is never used in production. But maybe it is not so important, since that is only done when the thread is going to sleep on a condition anyway?
I am also not sure if it could introduce risk of deadlocks or other issues if taking the mysys_var->mutex while also holding the mysys_var->current_mutex. There is a comment in exit_cond() that says that current_mutex must be unlocked before mysys_var->mutex is locked. But I do not know enough of this code to be sure if that is a problem also in enter_cond().
I didn't look at it this way (that current_cond and current_mutex can go NULL -> non-NULL without mutex and then changed under mutex). Probably that could usually work. And that would mean that only debug_sync.cc must be changed to lock thd->mysys_var->mutex before changing current_cond and current_mutex. But I'd say trying to build system like that is very fragile and unreliable. You can't trust the fact that someone won't later look at THD::enter_cond(), decide that current_cond and current_mutex can be changed anywhere without mutex, and add some code doing that. And you can't even rely on the fact that compiler won't optimize something and won't move assignment of current_mutex and current_cond into some inappropriate place or won't even write some temporary values in there -- compiler is totally allowed to do that. Besides the code like this makes it completely impossible to analyze MySQL/MariaDB execution using dynamic tools like ThreadSanitizer (https://code.google.com/p/data-race-test/wiki/ThreadSanitizer) which is very helpful in finding bugs. So I'd suggest to add an extra mutex or come up with some other synchronization solution that will guarantee cleanliness without any assumptions. Pavel