Kristian Nielsen <knielsen@knielsen-hq.org> writes:
single thread. In MySQL, _both_ prepare and commits are so grouped from a single thread (though I think one thread can do group prepare in parallel with another doing group commit).
Ehm, this is not true, of course. The prepare() calls are from multiple threads in parallel. Just the flush_logs(hton, true) call is from a single thread for a whole group of transactions.
This way, the extra lock can be avoided for storage engines that do not need group_prepare(). And storage engines have freedom to implement
And I do not think this will work either, all binlog commits must use the same lock sequence, so that a later one not taking the new lock cannot race ahead of another. It is important to use a separate lock though, så one storage engine prepare fsync can happen in parallel with one binlog write fsync. It still seems useful if the upper layer could pass down a list of the entire group of transactions being group committed (or prepared). I think prepare_ordered() can be just removed, it ended up never being useful. And maybe a group_commit_ordered(list_of_transactions) can be added as an alternative to commit_ordered(). A new group_prepare_ordered(list_of_transactions) might help the performance issue for rocksdb. It really should be make async though. Like group_prepare_ordered_start(cookie, list) and group_prepare_ordered_complete(cookie) or whatever. With the MySQL "API", it seems it is impossible for two participating storage engines to persist their prepares in parallel, which isn't great for performance. The MySQL flush_logs() during prepare really feels like a gross hack. It doesn't seem right to run fsync()'s single-threaded under a lock... - Kristian.