On Mon, Dec 28, 2009 at 9:20 AM, Sergei Golubchik <sergii@pisem.net> wrote:
Hi, MARK!
On Dec 25, MARK CALLAGHAN wrote:
InnoDB fixed group commit in the InnoDB plugin. This performs as expected when the binlog is disabled. This does not perform as I expect when the binlog is enabled.
The problems for InnoDB are: 1) commit is serialized on the binlog write/fsync 2) row locks are not released until the commit step of XA prepare/commit 3) per-table auto inc locks not released until the commit step of XA
I think that 2) and 3) can be fixed without significant changes.
It's not that easy, I think.
What InnoDB needs locks for ? Not for protecting uncommitted changes - it uses versioning for it. For serializability (when innodb_locks_unsafe_for_binlog=true or on SERIALIZABLE level) and for explicit SELECT ... IN SHARE MORE or FOR UPDATE. Explicit locks are typically used when one reads the data and later modifies them in the same transaction based on the read values, right ?
After xa_prepare no data can be modified anymore, it's safe to release these explicit locks.
If InnoDB locks would be protecting uncommitted data from beeing seen by another transaction, they would have to stay until commit - but InnoDB doesn't use locks for this. Safe too.
But locks that help to maintain serializability still have to be released on commit, I'm afraid. Otherwise you'll have
trn1> start transaction; insert t1 select * from t2; trn1> commit; trn1>> ... xa_prepare() ...
trn2> start transaction; insert t2 values (1); commit; trn2>> xa_prepare(); binlog.write(); xa_commit();
trn1> ... binlog.write(); xa_commit();
and you have incorrect transaction order in binlog.
There are several issues here: * for SBR, tm1 cannot release row locks until it is guaranteed that it writes the binlog ahead of any dependent transactions. This is guaranteed by locking prepare_commit_mutex at the end of innobase_xa_prepare and not unlocking until row locks are released during the call to innobase_commit. * at least for the plugin the order in which InnoDB prepare is done might not match the order in which transactions are written to the binlog. InnoDB locks prepare_commit_mutex in innobase_xa_prepare after doing a prepare (the call to trx_prepare_for_mysql). It is unlocked after the commit record is written to the InnoDB transaction buffer and before that buffer is flushed to disk. What does match today is the order of transactions in the binlog and the commit records in the InnoDB transaction log. * Traditional implementations of group commit require releasing locks earlier in the commit cycle. Group commit works by pausing commit processing in the hope that other commits will be done so they can share 1 fsync. It is a bad idea to hold locks during this pause. I don't know whether InnoDB requires: 1) that transactions in the binlog and commit records in the innodb transaction log record things in the same order or 2) all of 1) above and the binlog is at most one trx ahead of the innodb transaction log prepare_commit_mutex provides 2) today and that makes group commit for the binlog unlikely or impossible. I am trying to determine myself whether 2) is required and get an answer from the InnoDB team. If 1) is required instead of 2) then group commit on the binlog is possible for InnoDB. Group commit with SBR is possible as long as the per-transaction lock release order determines the order in which the binlog is written. -- Mark Callaghan mdcallag@gmail.com