On Sun, Nov 19, 2023 at 3:42 PM Marko Mäkelä <marko.makela@mariadb.com> wrote:
Thanks for this. Is there a way to force replay of the entire redo log on an unclean shutdown even if the checkpoint in the redo log says it was flushed to tablespace?
You can overwrite the newer checkpoint block, so that recovery is forced to use the older one. Before MariaDB 10.8, the two checkpoint blocks are 512 (0x200) bytes starting at ib_logfile0 offset 0x200 and 0x600. Starting with 10.8, the checkpoint blocks are 64 bytes starting at ib_logfile0 offset 0x1000 and 0x2000. Obviously, do not try this on any important data, or experiment on a copy of the data. It is possible that the recovery will fail in various ways if the section of the log between the older checkpoint and the logical end of the log has been overwritten. The InnoDB WAL file is cyclic: checkpoints "truncate" the head and the tail (new log records) is not supposed to overwrite the head. If you are moving the head backwards by discarding the latest checkpoint, there will be no guarantee that no overwrite took place.
Another way to experiment would be to run mariadb-backup --backup while a server is executing a write heavy workload. When you --prepare the backup, it will start from the LSN of the checkpoint that was the latest when the backup started. When the backup finishes, the server’s log file may already be several checkpoints ahead of the backup.
I think what I'm looking for is an option to ignore checkpoints, scan the entire redo log and replay everything from lowest to highest available LSN. From what you are saying, if I zero out bytes 512-1023 and bytes 1536-2047 That will force a full log scan / replay? Did I understand that correctly?
I'm exploring the idea of running datadir on storage that preserves write ordering but runs with the equivalent of nobarrier. It will still flush in the background every X seconds where X is configurable, so I am hoping to use the redo log to keep my data crash-safe even though I am lying about tablespace write flushes, because write ordering will be preserved despite running with the equivalent of nobarrier.
I can't comment much on that. It could be a good idea to execute some kind of "pull the plug" testing during a write workload. Perhaps that could be arranged more easily in a virtualized environment.
Yes, obviously this would need some extreme testing, that goes without saying. I just wanted to make sure my idea wasn't outright retarded before I went down this particular rabbit hole.