Hi Kristian, On Thu, Nov 20, 2014 at 02:28:54PM +0100, Kristian Nielsen wrote:
Sergey Vojtovich <svoj@mariadb.org> writes:
+/* + os_atomic_test_and_set_byte_acquire() is a full memory barrier on x86. But + in general, just an aquire barrier should be sufficient. */ +# define os_atomic_test_and_set_byte_acquire(ptr, new_val) \ __sync_lock_test_and_set(ptr, (byte) new_val) !!! The problem is here!!! Strictly speaking not here, but in mutex_spin_wait(): ... mutex_set_waiters(mutex, 1);
/* Try to reserve still a few times */ for (i = 0; i < 4; i++) { if (mutex_test_and_set(mutex) == 0) { ... We'll get stuck if "waiters= 1" will become visible after "if (mutex_test_and_set(mutex))". I don't see anything preventing this reorder.
1. If INNODB_RW_LOCKS_USE_ATOMICS is set, then mytex_set_waiters() is os_compare_and_swap_ulint() which is __sync_bool_compare_and_swap(). This is documented as a full barrier. So it is not possible for "waiters=1" to become visible after "if (mutex_test_and_set(mutex))".
2. If INNODB_RW_LOCKS_USE_ATOMICS is _not_ set, but HAVE_ATOMIC_BUILTINS _is_ set, then os_atomic_test_and_set_byte_acquire() is used in mutex_test_and_set(). This is a full barrier on x86 (despite using the wrong GCC primitive), so should be ok there. On PowerPC it is probably only an acquire barrier, so you are probably right that the "waiters=1" may only become visible after "if (mutex_test_and_set(mutex))". If I understand correctly, this was the same before my patch also.
3. If neither INNODB_RW_LOCKS_USE_ATOMICS nor HAVE_ATOMIC_BUILTINS is set, then mutex_test_and_set() uses os_fast_mutex_trylock_full_barrier(), which again prevents "waiter=1" to be come visible too late, at least on x86. You may be right that the os_mb() in os_fast_mutex_trylock_full_barrier() is no-op in this case, so that PowerPC has only acquire barrier in trylock(), which is insufficent. Again, if I understand correctly, this is unchanged by my patch.
So as I understand it, case (1) is correct for both x86 and PowerPC. Cases (2) and (3) should be correct for x86 also. (2) and (3) may be incorrect for PowerPC, but then they would have been incorrect before also, my patch does not change it.
As you mentioned, it seems likely that problematic cases (2) and (3) do not occur at all in our builds, only working case (1) is used. If so, it seems we do not actually have the problem in current binaries. Maybe we could put in
#ifdef __powerpc__ #error not supported compiler #endif
for cases (2) and (3) to ensure this.
Just to make it clear, I find it absolutely horrible that the InnoDB mutex code requires completely different reasoning to explain its correctness, depending on #ifdefs. It is almost impossible to be sure that one really understands how (and if) it works. That is why I find it so scary to change it in a stable release, and suggested something that does not change the existing, working x86 implementation. Ok so you found extra inconsistencies in this code. Sorry that I wasn't clear enough, but what I meant was...
mutex_set_waiters() calls __sync_bool_compare_and_swap() only in 5.5/XtraDB. In this case it is indeed Ok. But 5.5/InnoDB, 10.0/InnoDB, 10.0/XtraDB just do waiters= 1. I don't see this fixed. Regarding INNODB_RW_LOCKS_USE_ATOMICS - it is abused in mutex code: as it says it is supposed to control rwlocks only. Regards, Sergey