Sergei Golubchik <serg@askmonty.org> writes:
So now the algorithm is something like this:
Where in this algorithm you call ht->commit_ordered() ?
Oops, I forgot, sorry! It should be at the start of "for thd2 in <queue>" (before the wakeup).
thd->ready= false lock(LOCK_prepare_ordered) old_queue= group_commit_queue thd->next= old_queue group_commit_queue= thd ht->prepare_ordered() unlock(LOCK_prepare_ordered)
if (old_queue == NULL) // leader? lock(LOCK_group_commit)
lock(LOCK_prepare_ordered) queue= reverse(group_commit_queue) group_commit_queue= NULL unlock(LOCK_prepare_ordered)
group_log_xid(queue)
lock(LOCK_commit_ordered) // but see below unlock(LOCK_group_commit) for thd2 in <queue>
Here: ht->commit_ordered(thd2)
lock(thd2->LOCK_wakeup) thd2->ready= true signal(thd2->COND_wakeup) unlock(thd2->LOCK_wakeup) unlock(LOCK_commit_ordered) else lock (thd->LOCK_wakeup) while (!thd->ready) wait(COND_wakeup, LOCK_wakeup) unlock (thd->LOCK_wakeup)
cookie= xid_log_after()
On the other hand, the algorithm I suggested earlier for START TRANSACTION WITH CONSISTENT SNAPSHOT used the LOCK_commit_ordered, and there might be other uses...
START TRANSACTION WITH CONSISTENT SNAPSHOT is a good reason to keep the mutex.
Yes, probably.
But I choose to do it earlier, as soon as the transaction is put in the queue and commit order thereby defined.
There can be quite a "long" time interval between these two events: the time it takes for the previous group_log_xid() (eg. an fsync()), plus sometimes one wants to add extra sleeps in group commit to group more transactions together.
No. The long interval is *inside* the group_log_xid(), while you call prepare_ordered() *before* it.
Right, that is what I meant. One group of transactions execute the long interval inside group_log_xid(). While this happens, new transactions that want to commit queue up waiting for the first group to finish. The first waiting transaction (the new leader) blocks on the LOCK_group_commit, any other waits for the new leader to wake them up. So I want to call prepare_ordered() before blocking on LOCK_group_commit, as that mutex is held for the duration of group_log_xid()
But anyway, the LOCK_prepare_ordered mutex is not going to be contented, so removing it by using a lock-free queue (that's what this second approach is about) will not bring any noticeable benefits.
Very true.
It's reasonable to say that if an engine does not implement commit_ordered() then it needs to take care of its own recovery and fsync both in prepare and commit.
Yes, sounds reasonable. Thanks! - Kristian.