MARK CALLAGHAN <mdcallag@gmail.com> writes:
This is a really long thread so a summary elsewhere would be great for people like me.
I agree that the discussion has become quite long. I summarised the group commit part of it on my blog: http://kristiannielsen.livejournal.com/12254.html http://kristiannielsen.livejournal.com/12408.html http://kristiannielsen.livejournal.com/12553.html
I think Alex mentioned that he needs the commit protocol to be changed so that the binlog/commit-log/commit-service/redundancy-service guarantees commit and the storage engine does not. If that is the case, the storage engine can do async commits. As long as it recovers to some point in time and tells the binlog what the point in time was (must know XID), then the binlog can give it the transactions it lost during crash recovery. Here 'binlog' is what guarantees commit and could be something other than a file on the master. I want something like this. It means that we don't need to use XA internally which currently costs 3 fsyncs per commit (2 shared, 1 not). We are changing MySQL to really do group commit and that will change the cost to 3 shared fsyncs. But what I think you have described here is a huge improvement.
Yes, it sounds quite promising.
As a further optimization, I want a callback that is called after the binlog entries are written for a transaction and before the wait for group commit on the fsync is done. That callback will be used to release row locks (optionally) held by the transaction.
I think the point here is that the locks must not be released until the order in the binlog has been determined, right? So that any transaction order enforced by the log will be the same on the slave. So the callback might be called before or after the actual write of the binlog, but only after (not before) determining the order of such write? I think this could be handled by the xa_prepare_fast() and/or the commit_fast() callbacks that I propose in the third article referenced above. BTW, it was great to discuss these issues with you at the MySQL Conference! - Kristian.