Hi Kristian, On Fri, Nov 07, 2014 at 08:40:36AM +0100, Kristian Nielsen wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Hm, I don't really have a reference to point to. Well, I believe the posix thread definition says that a number of operations ensure full memory ordering across them.
I guess I was thinking of this:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_...
It is somewhat wague: "The following functions synchronize memory with respect to other threads: ...".
I thought some more about this. Consider a simple example:
THREAD A THREAD B lock(M) foo() unlock(M) lock(M) bar() unlock(M)
Here, I would expect that bar() in B can see all changes made by foo() in A, because B locked the mutex after A released it. This requires that stores in foo() are completed before loads in bar() begin, effectively a full barrier.
However, this can be implemented in different ways. It doesn't have to be done with an explicit full barrier CPU instruction, as long as the correct semantics is preserved.
So let's consider unlock() with release semantics, and lock() with aquire semantics:
unlock(M): __sync_lock_release(M->lock_word)
lock(M): while (__sync_lock_test_and_set(M->lock_word, 1)) /* wait for lock */ ;
The release barrier ensures that stores in foo() will complete before the store of lock_word. The acquire barrier in the wait loop ensures that loads in bar() cannot start until a load in the while loop has seen the store to lock_word.
Thus, loads in bar() become ordered after stores in foo(), effectively a full memory barrier even though no explicit full barrier instruction was needed.
So you may be right that the implementation needs no explicit full memory barrier in theory... Right. But I just realized that I'm a bit lost in terms now. When you say full memory barrier I assume memory_order_seq_cst. Is there any other type of barrier was available before C11/C++11 acquire/release semantics were introduced?
On the other hand, I have implemented code that _does_ assume a full barrier for each operation individually (see wait_for_commit::wakeup_subsequent_commits2() in 10.0 for an example). Others could have done the same. Reviewing all InnoDB code to make sure it is not the case, it does not really seem feasible...
The burden will be on the one who changes InnoDB locking primitives to prove that such change is safe with respect to the expectations of the code that uses those primitives. This is made all the more challenging by the semantics being somewhat unclear, and the main platform x86 mostly doing "full barrier" by default on atomic operations like test-and-set.
*** I MAY BE VERY WRONG NOW, GUESS BASED STATEMENTS FOLLOW *** If we abstract from platform specific implementation of barriers and think in logical acquire/release terms... In fact there is additional requirement for pthread_mutex_lock() which I omitted before: stores that appear after ACQUIRE must be actually performed after ACQUIRE. Manual says: ... memory_order_acquire: A load operation with this memory order performs the acquire operation on the affected memory location: prior writes made to other memory locations by the thread that did the release become visible in this thread. ... As you can see it only defines that memory LOADS become visible, but doesn't define anything wrt stores performed by current thread. This makes me think that stores are allowed to be postponed, but they are never executed in advance. If that's the case, your code is correct wrt memory barriers. Regards, Sergey