Thanks for this. Is there a way to force replay of the entire redo log on an unclean shutdown even if the checkpoint in the redo log says it was flushed to tablespace?

I'm exploring the idea of running datadir on storage that preserves write ordering but runs with the equivalent of nobarrier. It will still flush in the background every X seconds where X is configurable, so I am hoping to use the redo log to keep my data crash-safe even though I am lying about tablespace write flushes, because write ordering will be preserved despite running with the equivalent of nobarrier.



On Sat, 18 Nov 2023, 22:13 Marko Mäkelä via discuss, <discuss@lists.mariadb.org> wrote:
Hi Kristian,

Thank you for your excellent reply. I thought that some additional
details might be worth mentioning.

On Sat, Nov 18, 2023 at 8:03 PM Kristian Nielsen via discuss
<discuss@lists.mariadb.org> wrote:
> The redo log is of finite size, and cycles. InnoDB regularly does a
> checkpoint, to ensure that all tablespace data up to a certain point has
> been durably written.

The writes to each data file must be made durable by fdatasync() or
fsync() before the log checkpoint can be advanced. There are two
checkpoint headers near the start of ib_logfile0, for remembering the
last 2 checkpoint LSNs and the corresponding log file offsets.
Recovery or mariadb-backup --backup will choose the larger checkpoint
LSN as the starting point.

One more aspect to crash recovery is the InnoDB doublewrite buffer,
which protects against torn page writes. When it is enabled, any data
page writes would first be written to the doublewrite buffer (128
pages in the InnoDB system tablespace), and upon write completion, to
the final destination. In that way, if the process is killed during
the "main" write, it should be possible to find an intact version of
the page in the doublewrite buffer. This buffer is not used by
mariadb-backup; it would simply retry reading pages when it encounters
a checksum mismatch.

Marko
--
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc
_______________________________________________
discuss mailing list -- discuss@lists.mariadb.org
To unsubscribe send an email to discuss-leave@lists.mariadb.org