"nanyi607rao" <nanyi607rao@gmail.com> writes:
as we know there are 3 steps in XA transaction committing 1, prepare step 2, write binary log 3, commit step in engines
all these steps need a fsync(). Group commit strategy can make a group of transactions durable with one fsync() at step 2 and step 3, which can lead to dramatic performance enchance.
But in step 1, each transaction still do its own fsync(). so why not make several transactions durable whith one fsync() in prepare step just like step 2 and 3, which I think can improve performanc further more ?
Actually, this is already implemented. Further, in MariaDB 10.0, there is no fsync() needed in step 3. This is because in case of a crash, XA crash recovery can repeat the step 3 using the information saved in step 1 and 2. So in 10.0, we only need one shared fsync in step 1 plus one shared fsync in step 2. If you look in the innodb/xtradb code, you can see this. The prepare step calls trx_prepare_for_mysql() in trx/trx0trx.cc. This calls trx_prepare() which goes to trx_flush_log_if_needed_low() and calls log_write_up_to() in log/log0log.cc. And in log_write_up_to(), you will see the group commit logic. The transaction will wait for any previous fsync to complete; then if it still needs the fsync(), it will fsync not just itself, but also any other transactions that are waiting for fsync. There is some description of the removal of fsync() in step 3 here: http://kristiannielsen.livejournal.com/16382.html However, the group commit in step 1 has been in the InnoDB code for many years, as far as I know. Hope this helps, - Kristian.