Re: [Maria-developers] [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5
Sergey Vojtovich <svoj@mariadb.org> writes:
revno: 4346 revision-id: svoj@mariadb.org-20141118080742-sdf93kgsgxulenvb parent: knielsen@knielsen-hq.org-20141112101013-t63ayylsk08ncgs3 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Tue 2014-11-18 12:07:42 +0400 message: MDEV-7026 - Occasional hang during startup on Power8
Right, so you requested second review ASAP, I'm trying to understand what's going on here. If my understanding is correct, this patch goes on top of current 5.5.40, not on top of the patch I reviewed previously. So there are two cases, depending on whether INNODB_RW_LOCKS_USE_ATOMICS is set or not. It is my understanding that this is "usually" set, eg. on our common x86 and powerpc linux binaries. The critical thing for the patch at hand is the memory barrier between mutex_set_waiters() and mutex_test_and_set() in the waiter, and mutex_reset_lock_word() and mutex_get_waiters() in the thread releasing the lock. A full memory barrier is needed in the waiter, to be sure that mutex_set_waiters() becomes visible to the releasing thread before mutex_test_and_set() runs. And a full barrier is needed in the releasing thread, to be sure that mutex_reset_lock_word() becomes visible before mutex_get_waiters() runs. So let's see how this is ensured. First the case of x86 with INNODB_RW_LOCKS_USE_ATOMICS set, HAVE_ATOMIC_BUILTINS, pre-5.5.40: - mutex_set_waiters() uses os_compare_and_swap_ulint() which is __sync_bool_compare_and_swap() which is full barrier. - mutex_reset_lock_word() uses os_atomic_test_and_set_byte() which is __sync_lock_test_and_set(). This is documented only as a release barrier, but is in fact a full barrier on x86 with current GCC. So in this case full barrier is there as needed. If HAVE_ATOMIC_BUILTINS is not set, then mutex_reset_lock_word() uses os_fast_mutex_unlock() which is pthread_mutex_unlock(), which is also full barrier on x86. If INNODB_RW_LOCKS_USE_ATOMICS is not set, then mutex_set_waiters() has no barriers. But then mutex_test_and_set() instead has a full barrier on x86. Either in os_atomic_test_and_set_byte() which is __sync_lock_test_and_set(), if HAVE_ATOMIC_BUILTINS is set, or in os_fast_mutex_trylock() otherwise. Both of these are full barriers on x86. So this is why things work ok on x86, apparently. Very bad code that seriously needs cleanup and proper documentation of memory barrier semantics. But it is what we have to work with in 5.5. So to make things work on PowerPC without risk of breaking x86, we could do this: - Make sure os_atomic_test_and_set_byte() is full barrier on PowerPC: #ifdef __powerpc__ /* os_atomic_test_and_set_byte() should imply a full barrier. But __sync_lock_test_and_set() is only documented as an aquire barrier. So on PowerPC we need to add the full barrier explicitly. */ # define os_atomic_test_and_set_byte(ptr, new_val) \ do { __sync_lock_test_and_set(ptr, (byte) new_val); \ __atomic_thread_fence(__ATOMIC_SEQ_CST); } while(0) #else /* On x86, __sync_lock_test_and_set() happens to be full barrier, due to LOCK prefix. */ # define os_atomic_test_and_set_byte(ptr, new_val) \ __sync_lock_test_and_set(ptr, (byte) new_val) #endif - Make sure that mutex_reset_lock_word() is a full barrier when HAVE_ATOMIC_BUILTINS is not set, by adding a memory barrier: os_fast_mutex_unlock(&(mutex->os_fast_mutex)); #ifdef __powerpc__ /* On PowerPC, pthread_mutex does not imply full barrier, so we need to add it explicitly. */ __atomic_thread_fence(__ATOMIC_SEQ_CST); #endif - For the case of neither HAVE_ATOMIC_BUILTINS nor INNODB_RW_LOCKS_USE_ATOMICS set, similarly add a full barrier in mutex_test_and_set() for PowerPC: #ifdef __powerpc__ /* On PowerPC, pthread_mutex does not imply full barrier, so we need to add it explicitly. */ __atomic_thread_fence(__ATOMIC_SEQ_CST); #endif ret = os_fast_mutex_trylock(&(mutex->os_fast_mutex)); Then in 10.1. if someone is brave enough, we can clean this up properly. Clearly define what the memory barrier semantics is, and implement it using the correct primitives that work as needed on all platforms. So now that I understand (hopefully...) how the code used to work, and how it can be fixed for PowerPC without changing existing code, let me look at the new patch ...
=== modified file 'storage/innobase/include/os0sync.h' --- a/storage/innobase/include/os0sync.h 2014-09-08 15:10:48 +0000 +++ b/storage/innobase/include/os0sync.h 2014-11-18 08:07:42 +0000 @@ -310,8 +310,16 @@ Returns the old value of *ptr, atomicall # define os_atomic_test_and_set_byte(ptr, new_val) \ __sync_lock_test_and_set(ptr, (byte) new_val)
+#ifdef __powerpc__ # define os_atomic_lock_release_byte(ptr) \ __sync_lock_release(ptr) +#else +/* In theory __sync_lock_release should be used to release the lock. +Unfortunately, it does not work properly alone. The workaround is +that more conservative __sync_lock_test_and_set is used instead. */ +# define os_atomic_lock_release_byte(ptr) \ + __sync_lock_test_and_set(ptr, 0) +#endif
This is very confusing. You define a method with full barrier semantics, but call it something suggesting release semantics. Why not revert the original change so that mutex_reset_lock_word() uses os_atomic_test_and_set_byte()? And then fix the implementation of os_atomic_test_and_set_byte() to be full barrier on PowerPC, as I suggested above? The term "atomic" generally implies full barrier in the MySQL source code...
#elif defined(HAVE_IB_SOLARIS_ATOMICS)
@@ -428,7 +436,7 @@ clobbered */ # define os_rmb __atomic_thread_fence(__ATOMIC_ACQUIRE) # define os_wmb __atomic_thread_fence(__ATOMIC_RELEASE) #ifdef __powerpc__ -# define os_isync __asm __volatile ("isync":::"memory") +# define os_isync __atomic_thread_fence(__ATOMIC_SEQ_CST) #else #define os_isync do { } while(0) #endif
As Jonas suggested, don't call a full memory barrier "isync". And do not introduce something that is a barrier on one platform but a no-operation on another platform. But ...
=== modified file 'storage/innobase/sync/sync0sync.c' --- a/storage/innobase/sync/sync0sync.c 2014-11-03 13:43:44 +0000 +++ b/storage/innobase/sync/sync0sync.c 2014-11-18 08:07:42 +0000 @@ -474,10 +474,9 @@ mutex_set_waiters(
ptr = &(mutex->waiters);
- os_wmb; - *ptr = n; /* Here we assume that the write of a single word in memory is atomic */ + os_isync; }
With this, the semantics of how things work becomes even more complicated. Now on PowerPC the barrier is in one function, but on x86 the barrier is in another function. Why not instead do as I suggested above? Put in full barriers for PowerPC in the same places that they are in x86? ---- Maybe I should try to say if I think the patch fixes the actual bug, in case you don't want to go with my suggestions and instead keep your approach? Since it goes back to using __sync_lock_test_and_set() on x86, and since it has a full barrier in mutex_exit_func() on PowerPC, I think perhaps it does. But it is really hard to understand exactly what is going on, as you can see above I had to do a lot of reasoning to understand even why the old code worked... My advice is still to revert the original incorrect PowerPC patch and do something that is at least consistent with the existing code... (And then feel free to do a proper cleanup in 10.1). Hope this helps, - Kristian.
Hi Kristian, thanks for your feedback! Since we're short on time and you have an idea of a proper fix, what if we do it the other way around: you'll implement a fix and I'll review it from PPC64 perspective? I'm sorry but I won't be able to come up with a reasonable patch in that short time frame. Thanks, Sergey On Tue, Nov 18, 2014 at 04:59:55PM +0100, Kristian Nielsen wrote:
Sergey Vojtovich <svoj@mariadb.org> writes:
revno: 4346 revision-id: svoj@mariadb.org-20141118080742-sdf93kgsgxulenvb parent: knielsen@knielsen-hq.org-20141112101013-t63ayylsk08ncgs3 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Tue 2014-11-18 12:07:42 +0400 message: MDEV-7026 - Occasional hang during startup on Power8
Right, so you requested second review ASAP, I'm trying to understand what's going on here.
If my understanding is correct, this patch goes on top of current 5.5.40, not on top of the patch I reviewed previously.
So there are two cases, depending on whether INNODB_RW_LOCKS_USE_ATOMICS is set or not. It is my understanding that this is "usually" set, eg. on our common x86 and powerpc linux binaries.
The critical thing for the patch at hand is the memory barrier between mutex_set_waiters() and mutex_test_and_set() in the waiter, and mutex_reset_lock_word() and mutex_get_waiters() in the thread releasing the lock.
A full memory barrier is needed in the waiter, to be sure that mutex_set_waiters() becomes visible to the releasing thread before mutex_test_and_set() runs. And a full barrier is needed in the releasing thread, to be sure that mutex_reset_lock_word() becomes visible before mutex_get_waiters() runs.
So let's see how this is ensured. First the case of x86 with INNODB_RW_LOCKS_USE_ATOMICS set, HAVE_ATOMIC_BUILTINS, pre-5.5.40:
- mutex_set_waiters() uses os_compare_and_swap_ulint() which is __sync_bool_compare_and_swap() which is full barrier.
- mutex_reset_lock_word() uses os_atomic_test_and_set_byte() which is __sync_lock_test_and_set(). This is documented only as a release barrier, but is in fact a full barrier on x86 with current GCC.
So in this case full barrier is there as needed.
If HAVE_ATOMIC_BUILTINS is not set, then mutex_reset_lock_word() uses os_fast_mutex_unlock() which is pthread_mutex_unlock(), which is also full barrier on x86.
If INNODB_RW_LOCKS_USE_ATOMICS is not set, then mutex_set_waiters() has no barriers. But then mutex_test_and_set() instead has a full barrier on x86. Either in os_atomic_test_and_set_byte() which is __sync_lock_test_and_set(), if HAVE_ATOMIC_BUILTINS is set, or in os_fast_mutex_trylock() otherwise. Both of these are full barriers on x86.
So this is why things work ok on x86, apparently. Very bad code that seriously needs cleanup and proper documentation of memory barrier semantics. But it is what we have to work with in 5.5.
So to make things work on PowerPC without risk of breaking x86, we could do this:
- Make sure os_atomic_test_and_set_byte() is full barrier on PowerPC:
#ifdef __powerpc__ /* os_atomic_test_and_set_byte() should imply a full barrier. But __sync_lock_test_and_set() is only documented as an aquire barrier. So on PowerPC we need to add the full barrier explicitly. */ # define os_atomic_test_and_set_byte(ptr, new_val) \ do { __sync_lock_test_and_set(ptr, (byte) new_val); \ __atomic_thread_fence(__ATOMIC_SEQ_CST); } while(0) #else /* On x86, __sync_lock_test_and_set() happens to be full barrier, due to LOCK prefix. */ # define os_atomic_test_and_set_byte(ptr, new_val) \ __sync_lock_test_and_set(ptr, (byte) new_val) #endif
- Make sure that mutex_reset_lock_word() is a full barrier when HAVE_ATOMIC_BUILTINS is not set, by adding a memory barrier:
os_fast_mutex_unlock(&(mutex->os_fast_mutex)); #ifdef __powerpc__ /* On PowerPC, pthread_mutex does not imply full barrier, so we need to add it explicitly. */ __atomic_thread_fence(__ATOMIC_SEQ_CST); #endif
- For the case of neither HAVE_ATOMIC_BUILTINS nor INNODB_RW_LOCKS_USE_ATOMICS set, similarly add a full barrier in mutex_test_and_set() for PowerPC:
#ifdef __powerpc__ /* On PowerPC, pthread_mutex does not imply full barrier, so we need to add it explicitly. */ __atomic_thread_fence(__ATOMIC_SEQ_CST); #endif ret = os_fast_mutex_trylock(&(mutex->os_fast_mutex));
Then in 10.1. if someone is brave enough, we can clean this up properly. Clearly define what the memory barrier semantics is, and implement it using the correct primitives that work as needed on all platforms.
So now that I understand (hopefully...) how the code used to work, and how it can be fixed for PowerPC without changing existing code, let me look at the new patch ...
=== modified file 'storage/innobase/include/os0sync.h' --- a/storage/innobase/include/os0sync.h 2014-09-08 15:10:48 +0000 +++ b/storage/innobase/include/os0sync.h 2014-11-18 08:07:42 +0000 @@ -310,8 +310,16 @@ Returns the old value of *ptr, atomicall # define os_atomic_test_and_set_byte(ptr, new_val) \ __sync_lock_test_and_set(ptr, (byte) new_val)
+#ifdef __powerpc__ # define os_atomic_lock_release_byte(ptr) \ __sync_lock_release(ptr) +#else +/* In theory __sync_lock_release should be used to release the lock. +Unfortunately, it does not work properly alone. The workaround is +that more conservative __sync_lock_test_and_set is used instead. */ +# define os_atomic_lock_release_byte(ptr) \ + __sync_lock_test_and_set(ptr, 0) +#endif
This is very confusing. You define a method with full barrier semantics, but call it something suggesting release semantics.
Why not revert the original change so that mutex_reset_lock_word() uses os_atomic_test_and_set_byte()? And then fix the implementation of os_atomic_test_and_set_byte() to be full barrier on PowerPC, as I suggested above? The term "atomic" generally implies full barrier in the MySQL source code...
#elif defined(HAVE_IB_SOLARIS_ATOMICS)
@@ -428,7 +436,7 @@ clobbered */ # define os_rmb __atomic_thread_fence(__ATOMIC_ACQUIRE) # define os_wmb __atomic_thread_fence(__ATOMIC_RELEASE) #ifdef __powerpc__ -# define os_isync __asm __volatile ("isync":::"memory") +# define os_isync __atomic_thread_fence(__ATOMIC_SEQ_CST) #else #define os_isync do { } while(0) #endif
As Jonas suggested, don't call a full memory barrier "isync". And do not introduce something that is a barrier on one platform but a no-operation on another platform. But ...
=== modified file 'storage/innobase/sync/sync0sync.c' --- a/storage/innobase/sync/sync0sync.c 2014-11-03 13:43:44 +0000 +++ b/storage/innobase/sync/sync0sync.c 2014-11-18 08:07:42 +0000 @@ -474,10 +474,9 @@ mutex_set_waiters(
ptr = &(mutex->waiters);
- os_wmb; - *ptr = n; /* Here we assume that the write of a single word in memory is atomic */ + os_isync; }
With this, the semantics of how things work becomes even more complicated. Now on PowerPC the barrier is in one function, but on x86 the barrier is in another function.
Why not instead do as I suggested above? Put in full barriers for PowerPC in the same places that they are in x86?
----
Maybe I should try to say if I think the patch fixes the actual bug, in case you don't want to go with my suggestions and instead keep your approach? Since it goes back to using __sync_lock_test_and_set() on x86, and since it has a full barrier in mutex_exit_func() on PowerPC, I think perhaps it does. But it is really hard to understand exactly what is going on, as you can see above I had to do a lot of reasoning to understand even why the old code worked...
My advice is still to revert the original incorrect PowerPC patch and do something that is at least consistent with the existing code...
(And then feel free to do a proper cleanup in 10.1).
Hope this helps,
- Kristian.
Sergey Vojtovich <svoj@mariadb.org> writes:
thanks for your feedback! Since we're short on time and you have an idea of a proper fix, what if we do it the other way around: you'll implement a fix and I'll review it from PPC64 perspective?
Hm, ok then. You can find the patch here: http://lists.askmonty.org/pipermail/commits/2014-November/007047.html I pushed it here: lp:~maria-captains/maria/5.5-mdev7026 The idea is to: 1. Revert the original, incorrect changes to InnoDB mutex, make them be unchanged on i386/amd64 compared to 5.5.39. 2. Add the missing full barriers in InnoDB mutex for PowerPC so the code works similarly as on i386/amd64. Feel free to use it or not, or modify it as you see fit. - Kristian.
participants (2)
-
Kristian Nielsen
-
Sergey Vojtovich