Hi Marko, Time flies, somehow it's already more than a year since our first discussions on implementing the binlog in InnoDB and avoiding the extra fsync() and complexity of the two-phase commit between InnoDB and binlog. But things have progressed, and I have now reached the point where most of the basic groundwork is implemented. Event groups are binlogged to InnoDB tablespaces. Binlog dump thread can read the binlog and send to slave, and replication is working. Large event groups are split into pieces, bounding the amount of data that needs to be atomically written in mini-transactions and at commit time. There are still many details left, but mostly in the server-layer replication code which should be manageable, just will take some time to get completed. I think now is a good time for you to take a first real look at the InnoDB part of the changes, I would really value your input. The main part of the InnoDB code is in two files: 1. handler/handler0binlog.cc for the high-level part that deals mostly with the new binlog file format and interfacing to the server layer. 2. fsp/fsp0binlog.cc for the low-level part that most tightly interacts with the InnoDB mini-transactions and buffer pool. The most interesting part for you to look at is fsp/fsp0binlog.cc (~1k lines), though I'm happy to hear comments on any part of the patch, of course. The code is pushed to GitHub in the branch knielsen_binlog_in_engine: https://github.com/MariaDB/server/commits/knielsen_binlog_in_engine and I've also attached the complete patch. This is my first major patch for InnoDB, so there will undoubtedly be a number of style changes required. But the overall structure of the code should now be close to what I imagine would be the final result, with some pending ToDo steps marked in comments in the code, and detailed in the below list, some of which we discussed a bit already. I hope you will take a look at the patch and let me know of any questions or other things you need from me. Maybe we can also find a chance to discuss further if you will come to FOSDEM start of February, or I could visit sometimes in Finland. - Kristian. ---- Some known outstanding issues in the InnoDB part: - We previously discussed removing some of the page header overhead for binlog tablespaces. Currently the code just leaves alone the first FIL_PAGE_DATA bytes (38) and the last FIL_PAGE_DATA_END (8 IIRC). - We discussed previously to write the current LSN at the start of the tablespace, and use this in recovery to handle that we have only two tablespace IDs that are reused. So we need code in recovery that checks the LSN at the start of the tablespace, and skips redo records with LSN smaller than this. - We want to avoid the double-write buffer for binlog pages, at least for the first page write (most pages will only be written as full pages). You mentioned an idea to completely avoid the double-write buffer and instead do some specific code for recovery in the uncommon case where a partial binlog page is written to disk due to low commit activity. - The flushing of binlog pages to disk currently happens in a dedicated thread in the background. I'd welcome ideas on how to do this differently. It is good to flush binlog pages quickly and re-use their buffer pool entries for something better. Also writing the pages to disk quickly (not necessarily fsync()'ing) makes the data readable by mysqlbinlog. - Checksum and encryption should use the standard InnoDB mechanism. I assume checksum is already handled in the code through using the buffer pool and mini-transactions to read/write pages. Not sure about encryption. I need to implement that the code handles checksum and decryption when reading the pages manually from the file (not through buffer pool).