Hi Kristian, On Mon, Feb 26, 2024 at 8:31 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I would tweak the log checkpoint to ensure that all pages of the "previous" binlog tablespace must be written back before we can advance the log checkpoint. The tablespace ID (actually just 1 bit of
Conversely, we would then also need to wait for a log checkpoint before we can rotate to a new binlog tablespace, right?
That is not necessary. We only have to completely write back the changes from the buffer pool to the last-but-one binlog file, whose tablespace ID we are about to reuse for the new file. That can be done by invoking buf_flush_list_space(). It does not matter if there are pending changes to other tablespaces that will prevent the log checkpoint from being advanced. If we write the last modification LSN to the first page of the binlog tablespace, recovery can simply skip all log records for the binlog tablespace that are older than the LSN.
I think log checkpoints can be relatively infrequent, to improve transaction throughput and reduce I/O (but increasing the time for recovery), right?
Checkpoints can actually occur once per second or even more frequently, depending on the workload and the log capacity. If there is lots of free space in the buffer pool and in the redo log, or if writes are infrequent, then checkpoints could occur less often.
I am wondering if it would make sense to fix the page in the buffer pool already at fsp_page_create().
Created pages are fixed in the buffer pool until the mtr_t::commit() that would release the page latch and the buffer-fix. Simply by invoking buf_page_t::io_fix() before mtr_t::commit() you can extend the buffer-fix, to reuse the page in a subsequent mini-transaction. For example, purge_sys_t::iterator::free_history_rseg() is making use of that: rseg_hdr->fix(); //... mtr.commit(); //... mtr.start(); rseg_hdr->page.lock.x_lock(); mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_FIX);
Keep the page fixed until it has been written (not necessarily fsync()'ed); and after that have the slave dump threads read the data from the file through the OS, so the page can be dropped from the buffer pool. To reduce the load on the buffer pool, and especially avoid binlog pages being purged from the pool and then later re-read.
If a page is buffer-fixed for an unbounded time, it could interfere with an attempt to shrink the buffer pool or to respond to a memory pressure event. Some interface for releasing those pages would be nice to have. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc