This is a really long thread so a summary elsewhere would be great for people like me. I think Alex mentioned that he needs the commit protocol to be changed so that the binlog/commit-log/commit-service/redundancy-service guarantees commit and the storage engine does not. If that is the case, the storage engine can do async commits. As long as it recovers to some point in time and tells the binlog what the point in time was (must know XID), then the binlog can give it the transactions it lost during crash recovery. Here 'binlog' is what guarantees commit and could be something other than a file on the master. I want something like this. It means that we don't need to use XA internally which currently costs 3 fsyncs per commit (2 shared, 1 not). We are changing MySQL to really do group commit and that will change the cost to 3 shared fsyncs. But what I think you have described here is a huge improvement. As a further optimization, I want a callback that is called after the binlog entries are written for a transaction and before the wait for group commit on the fsync is done. That callback will be used to release row locks (optionally) held by the transaction. On Tue, Mar 30, 2010 at 11:40 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Alex Yurchenko <alexey.yurchenko@codership.com> writes:
On Mon, 29 Mar 2010 00:02:09 +0200, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
The way I understood the above is that global mutex is taken in InnoDB prepare() solely to synchronize binlog and InnoDB commits. Is that so? If
Yes.
it is, than it is precisely the thing we want to achieve, but instead of locking global mutex in Innodb prepare() we'll be doing it in redundancy_service->pre_commit() as discussed earlier:
innodb->prepare();
if (redundancy_service->pre_commit() == SUCCESS) // locks commit_order mtx { innodb->commit(); redundancy_service->post_commit(); // unlocks commit_order mtx } ...
Yes. This way will prevent group commit in InnoDB, as here innodb->commit() does fsync() under a global mutex.
This way global lock in innnodb->prepare() can be naturally removed without any additional provisions. Am I missing something?
Agree that this removes the need for innodb to take its lock in prepare() and release in commit().
On the other hand, if we can reduce the amount of commit ordering operations to the absolute minimum, as you suggest below, it would only benefit performance. I'm just not sure about names. Essentially this means splitting commit() into 2 parts: the one that absolutely must be run under commit_order mutex protection and another that can be run outside of the critical section. I guess in that setup all actual IO can easily go into the 2nd part.
Yes (I did not think long about the names, probably better names can be devised).
lock(global_commit_order_mutex) fix_binlog_or_redundancy_service_commit_order() for (each storage engine) engine->fix_commit_order() unlock(global_commit_order_mutex)
What I'd like to correct here is that ordering is needed at least in redundancy service. You need global trx ID. And I believe storage engines won't be able to do without it either - otherwise we'll need to deal with holes in commit sequence during recovery.
Yes.
Also, I'd suggest to move the global_commit_order_mutex into what goes by "fix_binlog_or_redundancy_service_commit_order()" (the name is misleading - redundancy service determines the order, it does not have to fix it) in the above pseudocode. Locking it outside may seriously reduce concurrency.
Agree (in fact, though I did not say so explicitly, I thought of the entire pseudo code above as being in fact implemented inside the redundancy service plugin).
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com