[Maria-developers] Fixes for an issue with my_atomic
Hi, I had a problem with my group commit patch, and tracked it down to a problem in my_atomic. The issue is that my_atomic_cas*(val, cmp, new) accesses *cmp after successful CAS operation (in one place reading it, in another place writing it). Here is the fix: Index: work-5.1-groupcommit/include/atomic/gcc_builtins.h =================================================================== --- work-5.1-groupcommit.orig/include/atomic/gcc_builtins.h 2010-06-09 11:53:59.000000000 +0200 +++ work-5.1-groupcommit/include/atomic/gcc_builtins.h 2010-06-09 11:54:06.000000000 +0200 @@ -19,8 +19,9 @@ v= __sync_lock_test_and_set(a, v); #define make_atomic_cas_body(S) \ int ## S sav; \ - sav= __sync_val_compare_and_swap(a, *cmp, set); \ - if (!(ret= (sav == *cmp))) *cmp= sav; + int ## S cmp_val= *cmp; \ + sav= __sync_val_compare_and_swap(a, cmp_val, set);\ + if (!(ret= (sav == cmp_val))) *cmp= sav #ifdef MY_ATOMIC_MODE_DUMMY #define make_atomic_load_body(S) ret= *a Index: work-5.1-groupcommit/include/atomic/x86-gcc.h =================================================================== --- work-5.1-groupcommit.orig/include/atomic/x86-gcc.h 2010-06-09 11:53:59.000000000 +0200 +++ work-5.1-groupcommit/include/atomic/x86-gcc.h 2010-06-09 11:54:06.000000000 +0200 @@ -45,8 +45,12 @@ #define make_atomic_fas_body(S) \ asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a)) #define make_atomic_cas_body(S) \ + int ## S sav; \ asm volatile (LOCK_prefix "; cmpxchg %3, %0; setz %2;" \ - : "+m" (*a), "+a" (*cmp), "=q" (ret): "r" (set)) + : "+m" (*a), "=a" (sav), "=q" (ret) \ + : "r" (set), "a" (*cmp)); \ + if (!ret) \ + *cmp= sav #ifdef MY_ATOMIC_MODE_DUMMY #define make_atomic_load_body(S) ret=*a This makes the behaviour consistent with the other implementations, see for example generic-msvc.h. (It is also pretty important for correct operation. In my code, I use my_atomic_casptr() to atomically enqueue a struct from one thread, and as soon as it is enqueued another thread may grab the struct and change it, including the field pointed to by cmp. So it is essential that my_atomic_casptr() neither reads nor writes *cmp after successful CAS, as the above patch ensures.) It was btw. a bit funny how I tracked this down. After digging for a few hours in my own code without success I got the idea to check the CAS implementation, and spotted the problem in gcc_builtins.h. After fixing I was baffled as to why my code stil failed, until I realised I was not using gcc_builtins.h but x86-gcc.h, and found and fixed the similar problem there ;-) Now the question is, where should I push this (if at all)? Any opinions? ----------------------------------------------------------------------- While I was there, I also noticed another potential problem in gcc_builtins.h, suggesting this patch: Index: work-5.1-groupcommit/include/atomic/x86-gcc.h =================================================================== --- work-5.1-groupcommit.orig/include/atomic/x86-gcc.h 2010-06-09 11:37:12.000000000 +0200 +++ work-5.1-groupcommit/include/atomic/x86-gcc.h 2010-06-09 11:52:47.000000000 +0200 @@ -38,17 +38,31 @@ #define asm __asm__ #endif +/* + The atomic operations imply a memory barrier for the CPU, to ensure that all + prior writes are flushed from cache, and all subsequent reads reloaded into + cache. + + We need to imply a similar memory barrier for the compiler, so that all + pending stores (to memory that may be aliased in other parts of the code) + will be flushed to memory before the operation, and all reads from such + memory be re-loaded. This is achieved by adding the "memory" pseudo-register + to the clobber list, see GCC documentation for more explanation. + + The compiler and CPU memory barriers are needed to make sure changes in one + thread are made visible in another by the atomic operation. +*/ #ifndef MY_ATOMIC_NO_XADD #define make_atomic_add_body(S) \ - asm volatile (LOCK_prefix "; xadd %0, %1;" : "+r" (v) , "+m" (*a)) + asm volatile (LOCK_prefix "; xadd %0, %1;" : "+r" (v) , "+m" (*a): : "memory") #endif #define make_atomic_fas_body(S) \ - asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a)) + asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a) : : "memory") #define make_atomic_cas_body(S) \ int ## S sav; \ asm volatile (LOCK_prefix "; cmpxchg %3, %0; setz %2;" \ : "+m" (*a), "=a" (sav), "=q" (ret) \ - : "r" (set), "a" (*cmp)); \ + : "r" (set), "a" (*cmp) : "memory"); \ if (!ret) \ *cmp= sav @@ -63,9 +77,9 @@ #define make_atomic_load_body(S) \ ret=0; \ asm volatile (LOCK_prefix "; cmpxchg %2, %0" \ - : "+m" (*a), "+a" (ret): "r" (ret)) + : "+m" (*a), "+a" (ret) : "r" (ret) : "memory") #define make_atomic_store_body(S) \ - asm volatile ("; xchg %0, %1;" : "+m" (*a), "+r" (v)) + asm volatile ("; xchg %0, %1;" : "+m" (*a), "+r" (v) : : "memory") #endif /* TODO test on intel whether the below helps. on AMD it makes no difference */ The comment in the patch explains the idea I think. Basically, these memory barrier operations need to be a compiler barrier also. Otherwise there is nothing that prevents GCC from moving unrelated stores across the memory barrier operation. This is a problem for example when filling in a structure and then atomically linking it into a shared list. The CPU memory barrier ensures that the fields in the structure will be visible to other threads on the CPU level, but it is also necessary to tell GCC to keep the stores into the struct fields prior to the store linking into the shared list. (It also makes the volatile declaration on the updated memory location unnecessary, but maybe it is needed for other implementations (though it shouldn't be, volatile is almost always wrong)). - Kristian.
participants (1)
-
Kristian Nielsen