Sergey Vojtovich <svoj@mariadb.org> writes:
# First spin wait, hoping to get the mutex without yielding. os_compare_and_swap_ulint(&mutex->waiters, 0, 1); Where did you see "os_compare_and_swap_ulint(&mutex->waiters, 0, 1)"? FWICS waiters of InnoDB mutex are non-atomic loads/stores.
mutex_enter_func() calls mutex_spin_wait() which does mutex_set_waiters(mutex, 1): mutex_set_waiters(mutex_t *mutex, ulint n) { #ifdef INNODB_RW_LOCKS_USE_ATOMICS if (n) { os_compare_and_swap_ulint(&mutex->waiters, 0, 1); } else { os_compare_and_swap_ulint(&mutex->waiters, 1, 0); } So that is where I saw it. I do not understand what you mean with "waiters of InnoDB mutex are non-atomic loads/stores" ?
os_compare_and_swap_ulint() is defined as either __sync_bool_compare_and_swap() or atomic_cas_ulong(), both of which implies a full barrier. So ENTER looks ok. See above: waiters loads/stores are not atomic, so no memory barriers at all.
Well, the code is hopelessly complicated with various #ifdef cases. I do not claim to have verified all cases that they have the correct barrier for ENTER. However, I found that in my particular build, mutex_set_waiters used the INNODB_RW_LOCKS_USE_ATOMICS case, which uses os_compare_and_swap_ulint(). And that is defined in os0sync.h as either __sync_bool_compare_and_swap(), which is documented as having a full barrier; or atomic_cas_ulong(), which I think is the MySQL's own atomics, which on x86 were implemented using LOCK prefix last time I looked, so also full barrier. Why do you think there is no full barrier? Did I miss something?
This is consistent with the observations in MDEV-7026. A thread is waiting for wakeup on a mutex that is already marked free. Ok, so you say this hang is now repeatable with x86 because we changed ACQUIRE -> RELEASE. I assume you were actually able to reproduce it on x86.
I have not tried reproducing on x86. Note that on x86 we changed it from full barrier (XCHG instruction) to aquire (which does nothing if my test program is to be believed). The documentation only guarantees AQUIRE for __sync_lock_test_and_set(), true, but in reality it seems to have been full barrier on x86, which would explain why it worked before. I am speculating that some of the hangs that started to appear recently could be due to this change. At least one case, that I researched in considerable depth, can be explained adequately in this way. It seems likely that this is the explanation, but I cannot be sure.
Note that we also added mysterious os_isync to mutex_exit_func, which is supposed to be among other things ACQUIRE. But on x86 it is no-op with a comment: /* Performance regression was observed at some conditions for Intel architecture. Disable memory barrier for Intel architecture for now. */
I saw the added isync. I think this is also wrong. isync is about speculative execution, one can use it with a conditional instruction to prevent instructions being executed ahead of the completion of a load. But the place I saw, it was used incorrectly, without the conditional instruction. I did not dig deeper into this though, as even if used correctly it would still not be the full barrier that seems to be required here.
Does it explain server lockups at startup or during normal operations? Could you elaborate the problem you're talking about?
So my guess is that on x86, the full barriers are in place, and the mutex code works without losing wakeups. However, at some point there was some problem, maybe some specific compiler or architecture or whatever, and the sync_arr_wake_threads_if_sema_free() hack was implemented. I am guessing that things work correctly on x86 because the sync_arr_wake_threads_if_sema_free() is fundamentally broken. It is not active during startup (and maybe shutdown?). And it does not work if the server error monitor thread happens to also get stuck on a mutex. So I think missing barriers on x86 would have been discovered in commonly used binaries. And the sync_arr_wake_threads_if_sema_free() has been able to hide problems in rare architectures. That is my guess. As to the concrete problems: 1. MDEV-7026 there is a lockup during startup. I believe this is visible because the server error monitor thread is not running at this point. 2. In a user hang I investigated in detail, a thread was waiting on log_sys->mutex, but no other thread was holding that mutex (consistent with lost wakeup). The server error monitor thread was itself stuck on a mutex, which would prevent sync_arr_wake_threads_if_sema_free() from hiding the problem.
with Power8. I'd vote for putting full memory barrier into mutex_exit_func() and probably remove that ugly hack with sync_arr_wake_threads_if_sema_free. :(
Yes, I agree, that is what I meant. I agree that it was probably broken on PowerPC also before the change. But it looked to me like we had full barriers on x86 before the change (even if it was not guaranteed by documentation).
Though a proper solution should never issue full memory barriers for mutex lock/unlock. I suggested different idea, but that's for 10.1:
This sounds very strange. Are you sure? Normally, mutex lock/unlock is assumed to be a full barrier. If my reading of the code is correct, this is also how it was implemented on x86. It seems _very_ likely that part of the InnoDB code relies on such full barriers. It does not seem safe to me to change it. Posix thread operations are for example defined to imply full barrier. Or do you mean that there should be _one_ full barrier implied in mutex_enter(), and one in mutex_exit()? But there are currently more than one, and this should be optimised? That would make sense. Though my personal opinion is that the InnoDB mutex implementation has always been a piece of garbage, and rather than mess more with it, it should be replaced with something simpler and better (isn't this what the InnoDB team is already working on?). - Kristian.