Thanks for the analysis. Filed http://bugs.mysql.com/bug.php?id=68575 for the frequent rotate problem. On Tue, Mar 5, 2013 at 2:02 AM, Kristian Nielsen <knielsen@knielsen-hq.org>wrote:
MARK CALLAGHAN <mdcallag@gmail.com> writes:
This is the classic semi-sync deadlock that you explained so many times already.
The 5.6 group commit implements a similar technique to MariaDB, where a single thread does the fast part of commit for all transactions in the group. But for some reason, they put the semi-sync wait inside the single-threaded loop as well!?! So they get the classic deadlock: A waits for semi-sync notification before dropping prepared_xid counter. B waits for A to drop prepared_xid counter to zero before rotating binlog and releasing LOCK_log. C waits for B to release LOCK_log before it can send events to slave and get semisync notification for A ...
This second one is different from the first one. One thread is trying to rotate the binlog, but waiting for prepared_xids count to drop to zero. There does not appear though to be any threads having prepared XIDs pending.
But look at the code in binlog.cc and binlog.h that handles the prepared_xids counter:
while (get_prep_xids() > 0) mysql_cond_wait(&m_prep_xids_cond, &LOCK_commit);
void dec_prep_xids() { my_atomic_rwlock_wrlock(&m_prep_xids_lock); int32 result= my_atomic_add32(&m_prep_xids, -1); my_atomic_rwlock_wrunlock(&m_prep_xids_lock); if (result == 1) mysql_cond_signal(&m_prep_xids_cond); }
This code is trivially racy. It can easily happen that A reads the counter as 1, then B decrements the counter to zero and signals the condition (which does nothing as there is no waiter), then A goes to sleep on the condition, never to wake up.
BTW, it looks like there is another bug in there. You have a stack trace with two threads, these two threads are waiting for LOCK_log to also rotate the binlog. So it looks like the 5.6 code is missing protection from multiple threads trying to rotate the binlog at the same time, which could lead to dummy rotations leaving empty binlog files.
This makes me so sad. The code for 5.6 group commit is absolutely disgusting, and clearly written by people with absolutely no clue about what they are doing. The pthread_cond_signalling bug is even spelled out in the man page for pthread_cond_wait()/_signal(), they clearly did not even bother reading it! And this after years of you/Facebook, I, and Perconal all spending lots of effort solving and carefully documenting how to do this properly :-(
Did you do anything special to avoid deadlock with ::new_file_impl and prepared XIDs? My memory is vague but I thought you handled it. I added
With respect to the first one, I split the commit processing into two, and only the first part runs serialised. And the semisync wait runs during the second part, after the prepared_xids counter has been decremented. So there is no potential for the same deadlock.
Later I changed binlog recovery to completely remove the prepared_xids counter and the associated wait/stall. That should make it safe to do the semisync wait early, since a pending commit can no longer block binlog rotation.
With respect to the second one, having a clue about synchronisation primitives and a proper review process helps, what else can I say?
- Kristian.
-- Mark Callaghan mdcallag@gmail.com