Marko Mäkelä <marko.makela@mariadb.com> writes:
On Wed, Jan 29, 2025 at 2:08 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
binlog_tablespace_truncate(tablespace, new_length_in_pages) Truncate a binlog tablespace (like mtr.trim_pages() and mtr.commit_shrink()). Can be independent, does not need to be part of a logging group with any other operations.
When would this be invoked?
My understanding is that InnoDB only needs to identify 2 files: the one that is being written to, and another one that is being
Yes. Truncate is used only on the one that is being written to. It is used to implement FLUSH BINARY LOGS, which is used to close the currently written file early and move on to the next binlog file. This is used in certain cases, for example to be able to remove old binlog data without having to wait for the current binlog file to be written full. The truncate always happens on a page boundary. If it is a problem to implement truncate, binlog can instead just pad the rest of the binlog file with dummy data. If we can have a truncate record in the redo log for recovery, we can avoid this dummy data and binlog can simply ftruncate() the file during recovery.
binlog_write_up_to(lsn) Request the binlog to durably write ASAP all data needed up to specified lsn Could be called by InnoDB checkpointing code, similar to fil_flush_file_spaces() perhaps.
Right. This call could also pass the previously completed checkpoint LSN, which would give a permission to delete or archive any older binlog files. In this way, the binlog layer could safely remove or archive the last-but-one binlog file, and only retain 1 file if that is desirable.
Ah, good point, I had not thought about that. The user command for this on the SQL layer is PURGE BINARY LOGS. This command will not remove files that are still active or could be used in recovery. This could be extended to also not remove any file that was still active at the last checkpoint LSN.
We do need something that would trim the end of a binlog file, to discard anything that was not recovered via the ib_logfile0. That could be implemented as part of the binlog_recover_data() logic, exploiting the fact that all writes are going to be in ascending order of page number and byte offset, with the possible exception of starting a rewrite of the last block from byte offset 0.
Yes. Any bytes in the file after the last WRITE record recovered, can just be overwritten with zeros. Thanks, - Kristian.