Re: [Maria-developers] Memory barrier problem in InnoDB/XtraDB mutex implementation causing hangs

6 Nov 2014

      Kristian,

On Thu, Nov 06, 2014 at 09:23:42PM +0100, Kristian Nielsen wrote:
...skip...
...
...
Strange... Monty should have fixed this. Error monitor thread should call
log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
trace?
This investigation was some time ago (maybe 1-2 months). It seems likely that
this was a version before Monty's log_get_lsn_nowait() (eg. 10.0.14 has it,
but not 10.0.13).
According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.
...
...
lock/unlock? Probably that was quite outdated writing. See e.g.:
Outdated, perhaps, but aquire/release is AFAIK a relatively new concept.
Mutexes, and InnoDB source code predates that? I'm not sure that C++ coming up
with some strange new semantics for their language has much bearing on legacy
code?
But I agree it would be nice to have some references about "old-style" mutexes
implying full memory barrier, so that it's not just me...
Yes, acquire/release/etc is relatively new concept. For x86 this probably makes
So runtime hangs should be solved both in 5.5 and 10.0. This leaves hangs during
startup, which are unfortunate but not as critical as runtime hangs.

Are there any other known hangs?

little sense. But at least on Power8:
- pthread_mutex_lock() issues "isync" (confirms to acquire semantics)
- pthread_mutex_unlock() issues "lwsync" (confirms to release semantics)
- sync builtins issue "sync" (confirms to seq_cst semantics)
...
...
http://en.cppreference.com/w/cpp/atomic/memory_order
...Atomic load with memory_order_acquire or stronger is an acquire operation.
The lock() operation on a Mutex is also an acquire operation...
...Atomic store with memory_order_release or stronger is a release operation.
The unlock() operation on a Mutex is also a release operation...
Interesting... so C++ defines a "Mutex" with different semantics than what is
usually understood with eg. pthread_mutex...
...
Full memory barriers are way too heavy even for mutexes. All we need to to is
to ensure that:
- all loads following lock are performed after lock (acquire)
- all stores preceding unlock are completed before unlock (release)
Are you sure?
Note that if this is true, then it means that there is _no_ way to get a
StoreLoad barrier using only normal mutex operations. That seems odd...
I know I have seen, and written, code that depends on lock/unlock being full
barrier. How can we be sure that such code doesn't also exist in InnoDB?
Though I agree that full barrier is a lot more expensive than just LoadLoad or
StoreStore, and best avoided if possible (I even blogged about that not so
long ago).
That's how I read it. So there is no guarantee that global_var1 will be stored
before global_var2 is loaded:
global_var1= 1;
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
local_var= global_var2;

Even more interesting: it has concept of "affected memory location" bound to
memory barrier:
...
memory_order_acquire: A load operation with this memory order performs the
acquire operation on the affected memory location: prior writes made to other
memory locations by the thread that did the release become visible in this
thread.

memory_order_release: A store operation with this memory order performs the
release operation: prior writes to other memory locations become visible to
the threads that do a consume or an acquire on the same location.
...

I read it as "release" on one memory location won't neccessarily make stores
visible to "acquire" on a different location.

Regards,
Sergey

Re: [Maria-developers] Memory barrier problem in InnoDB/XtraDB mutex implementation causing hangs

Sergey Vojtovich