On Tue, Dec 29, 2009 at 11:07 AM, Sergei Golubchik <sergii@pisem.net> wrote:
Hi, MARK!
On Dec 29, MARK CALLAGHAN wrote:
On Mon, Dec 28, 2009 at 9:20 AM, Sergei Golubchik <sergii@pisem.net> wrote:
trn1> start transaction; insert t1 select * from t2; trn1> commit; trn1>> ... xa_prepare() ...
trn2> start transaction; insert t2 values (1); commit; trn2>> xa_prepare(); binlog.write(); xa_commit();
trn1> ... binlog.write(); xa_commit();
and you have incorrect transaction order in binlog.
There are several issues here: * for SBR, tm1 cannot release row locks until it is guaranteed that it writes the binlog ahead of any dependent transactions. This is guaranteed by locking prepare_commit_mutex at the end of innobase_xa_prepare and not unlocking until row locks are released during the call to innobase_commit.
I don't see what prepare_commit_mutex has to do with it. It is guaranteed by row locks released at commit time, no matter whether prepare_commit_mutex exists or not.
Yes, prepare_commit_mutex isn't the issue here. I want to release row locks during the call to innobase_xa_prepare after trx_prepare_for_mysql() has been called. I expect the mythical group commit for the binlog to potentially pause (make a committing connection sleep) and I don't want the pause to occur when the transaction holds locks that may be blocking other transactions. If group commit for the binlog doesn't introduce a pause there isn't much chance of forming a group of transactions doing a binlog write/fsync concurrently. If the row locks continue to be released during the call to innobase commit (after the binlog write/fsync) as they are today, then convoys will form on the locks held by those transactions. These performance problems are limited to high-throughput workloads, but those are the workloads for which group commit is needed. Synchronization will be needed to gurantee that the the order of XID events in the binlog match the order of commit records in InnoDB despite the changes mentioned above. -- Mark Callaghan mdcallag@gmail.com