Michael Widenius <monty@askmonty.org> writes:
Kristian> We _do_ need a full memory barrier here (memory barrier is implied in taking a Kristian> mutex). Otherwise the compiler or CPU could re-order the setting of the Kristian> wakeup_subsequent_commits_running flag with the reads and writes done in the Kristian> list manipulations. This could cause the two threads to corrupt the list as Kristian> they both manipulate it at the same time.
This I would like to understand better. (It always nice to know what the cpu/compiler is really doing)..
In this case the code is looping over a list and just setting list->wakeup_subsequent_commits_running= false;
I don't see how the compiler or CPU could reorder the code to things in different order (as we are not updating anything else than this flag in the loop).
This is especially true as setting of cur->wakeup_subsequent_commits_running= true; is done within a mutex and that is the last write to this element of the list.
So there is a barrier between we set it and potentially clear it. As the 'clear' may now happen 'any time' (from other threads point of view) I don't see why it needs to be protected.
Right. I looked into this again. So we do need a memory barrier, I will try to explain this more below. However, it turns out we already have this memory barrier. Because in all relevant cases we call wait_for_commit::wakeup() before we set the wakeup_subsequent_commits_running flag back to false. And wakeup() takes a mutex, which implies a full memory barrier. So I now removed the extra redundant locking (after adding comments explaining why this is ok), and pushed that to 10.0-knielsen. Let me see if I can explain why the memory barrier is needed. The potential race (or one of them at least) is as follows: Thread A is doing wakeup_subsequent_commits2(). Thread B is doing unregister_wait_for_prior_commit2(). We assume B is on A's list of waiters. A will do a read of B's next pointer, and a write of B's waiting_for_commit flag. After that A will clear the wakeup_subsequent_commits_running flag. B will read the wakeup_subsequent_commits_running flag, and if it is not set, then it will remove itself from the list and later possibly put itself on the list of another thread or whatever. Without a memory barrier, the clearing of the flag might become visible to B early. B could then remove itself from the list, and perhaps add itself to the list of another thread. And then the write from A to B->waiting_for_commit could become visible (which would cause a wrong spurious wakeup of B), or B's writing of new next pointer could become visible to A's list traversal, causing A to walk into the wrong list. (The latter problem sounds rather unlikely, but at least theoretically it is allowed behaviour by compiler/CPU). But due to the wakeup() call in A we do have a memory barrier between the list manipulations and the clearing of the flag. And we have another memory barrier in B (a full mutex lock actually) between checking the flag and manipulating the list. This prevents the race.
Kristian> What we need to ensure is that
Kristian> - All the reads of the list in thread 1 see only writes from other threads Kristian> that happened _before_ the variable was reset.
Kristian> - All changes to the list in thread 2 happen only _after_ the variable was Kristian> reset in thread 1.
Kristian> So a memory barrier is needed, but as you say, lock/unlock should not be Kristian> needed.
I still don't understand the current code fully.
The issue is that unregister_wait_for_prior_commit2() can be called by thread 2 just before or just after thread 1 is resetting wakeup_subsequent_commits_running. There is not other variables involved.
This means that in unregister_wait_for_prior_commit2() if thread 2 is there just before list->wakeup_subsequent_commits_running is set to false, we will go into this code:
if (loc_waitee->wakeup_subsequent_commits_running) { /* When a wakeup is running, we cannot safely remove ourselves from the list without corrupting it. Instead we can just wait, as wakeup is already in progress and will thus be immediate.
See comments on wakeup_subsequent_commits2() for more details. */ mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit); while (waiting_for_commit) mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit); }
I don't however see any code in :queue_for_group_commit() that will signal COND_wait_commit. What code do we have that will wake up the above code ?
In queue_for_group_commit() we call waiter->wakeup(), which signals COND_wait_commit.
Looking at wait_for_commit::wakeup() that makes this:
mysql_mutex_lock(&LOCK_wait_commit); waiting_for_commit= false; mysql_cond_signal(&COND_wait_commit); mysql_mutex_unlock(&LOCK_wait_commit);
However, in wait_for_commit::register_wait_for_prior_commit()
We set wait_for_commit() without a mutex protection ? Is this right?
It should be ok. When register_wait_for_prior_commit() is called, the thread should not be registered to wait for anything, so no other thread can access the variable. However, it would be fine to move setting waiting_for_commit= true under the mutex protection, if that would make the code easier to understand.
I assume that the code works because things are called in a specific order and some threads can't call other functions until at some certain point when things are safe.
Yes, waiting_for_commit is initialised to true, and only after do we lock the waitee and put ourself in the list. Only once we are in the wait list of another thread can the variable be accessed from that thread.
Like that wakeup() for a thread can never be called when that thread is just before the first mutex in
wait_for_commit::register_wait_for_prior_commit()
Yes. Because thread B can not be woken up by thread A until B has put itself into A's waiter list, which is done under the first mutex in register_wait_for_prior_commit(). Hope this helps, - Kristian.