Remaining InnoDB parts for MDEV-34705 binlog-in-engine?

Hi Marko, I have been making some good progress on the new binlog implementation, I implemented all of what we discussed regarding the InnoDB side I believe, and also got a lot done on the integration with the server side code. I wanted to ask you what outstanding issues on the InnoDB side remain from your point of view, if any? The page format now is implemented in the new design, as described in a separate mail: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/threa... All the InnoDB-specific page overhead is gone, only a 4-byte CRC32 remains at the end of each page. The first page is reserved as a header page. The new binlog code no longer uses the InnoDB buffer pool, nor create any InnoDB tablespace structures (except for the two reserved tablespace IDs for the redo records). The code has its own buffer pool replacement (called the "page fifo") and own background flushing. The binlog hooks into the InnoDB crash recovery code following your suggestions, and just the single "memcpy" record is needed. Recovery is implemented (and tested somewhat), and has been carefully designed to recover in all cases (barring any bugs of course) while requiring only the two reserved tablespace IDs. I have two outstanding InnoDB-related issues I know about: 1. In order for a replication setup to be crash-safe, I must ensure that the slave does not receive an event until it has become durable on the master. Otherwise, when the master crashes and comes back up, an event might be applied on the slave that no (longer) exist on the master. For this, I need something in the InnoDB redo log code that will callback into the binlog after a specific LSN has been durably written to disk. This callback will then notify the dump threads (slave connections) somehow that the corresponding part of the binlog can now be safely read. 2. Integration with mariabackup. With the new binlog, mariabackup can backup the binlog together with the rest of the InnoDB data in a consistent way, something that I think is not possible currently? This will also remove the need for mariabackup to extract the binlog position with SHOW MASTER STATUS or for InnoDB to store the binlog position in the undo segments. When the binlog is backed up, the correct GTID position can be obtained after restoring a master backup simply by checking the binlog (SELECT @@gtid_binlog_pos). (For a slave backup, the position is similarly available from the backup of the mysql.gtid_slave_pos table, as before). I am not yet sure of the details of mariabackup support. On the high level, it will involve copying the binlog-NNNNNN.ibb into the backup along with the other InnoDB tablespace files; and applying redo records to those binlog files as part of --prepare. One important detail is to handle that we have only two reserved tablespace IDs for binlog files. The current recovery code requires that only the two most recent binlog files need recovery. We could try to ensure that the same applies duing mariabackup --prepare. An idea is to use the fact that binlog files are strictly append-only. Around the point where mariabackup decides the LSN where it stops copying the redo log that becomes the backup snapshot, we can determine the last binlog file that exists. We will check this just before and just after determining the LSN, and repeat if a new binlog file was created just at that point (which would otherwise leave doubt if the file was created before or after that LSN). Then after the snapshot LSN is determined, we can just copy all the binlog files up to and including (but not after) the snapshot point. This way mariabackup --prepare should be able to run the same binlog recovery code that runs in crash recovery, only needing to consider the two most recent of the copied binlog files. I can imagine other ways to do mariabackup of binlog files, but this is one idea that could be feasible. What do you think? Anything other outstanding issues with the new binlog implementation regarding InnoDB that you are aware of? - Kristian.

Kristian Nielsen via developers <developers@lists.mariadb.org> writes:
2. Integration with mariabackup. With the new binlog, mariabackup can backup the binlog together with the rest of the InnoDB data in a consistent way, something that I think is not possible currently?
An idea is to use the fact that binlog files are strictly append-only. Around the point where mariabackup decides the LSN where it stops copying the redo log that becomes the backup snapshot, we can determine the last binlog file that exists. We will check this just before and just after determining the LSN, and repeat if a new binlog file was created just at that point (which would otherwise leave doubt if the file was created before or after that LSN).
Then after the snapshot LSN is determined, we can just copy all the binlog files up to and including (but not after) the snapshot point. This way
I implemented a simpler version of this. There is no need to keep track of binlog creation around the point where the backup snapshot LSN is decided. Instead, I just check the start_lsn value in the header of each binlog file before copying them, and skip any that were created after the backup LSN. The code is in the branch knielsen_binlog_in_engine now, along with a couple test cases. - Kristian.
participants (1)
-
Kristian Nielsen