Hi Gordan and Kristian, On Mon, Nov 20, 2023 at 12:48 PM Gordan Bobic via discuss <discuss@lists.mariadb.org> wrote:
On Sun, Nov 19, 2023 at 11:36 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Gordan Bobic via discuss <discuss@lists.mariadb.org> writes:
That would still leave the edge case of a few seconds after it does eventually write the checkpoint, would it not? I am effectively looking at a case of "never write a checkpoint".
Yes. I'm thinking that Marko's suggestion to clear the newest checkpoint (not both of them) would eliminate this edge case (with all of the caveats that Marko mentioned).
Which one is the more recent one? The first or second? If establishing which is more recent requires reading it, how do I parse these blocks and what am I looking for?
You should look for the 64-bit big-endian unsigned checkpoint LSN. In the file mysql-test/suite/innodb/include/no_checkpoint_end.inc in the source code repository that corresponds to the version that you use, you should find some Perl code for this.
ZFS - it preserves write ordering (based on the flushing calls it receives), and if we can run datadir with sync=disabled and only ib_logfile* and binlogs on a path with sync=standard, it should provide some improvement
I see. Is there any alternative system call that could be used to guarantee write ordering? That is, a lighter-weight variant of fdatasync()? I think that we’d only want strict fdatasync() on the redo log files when the user cares about innodb_flush_log_at_trx_commit=1. If there was a lighter-weight write-ordering system call, and if the fdatasync() made all previous ordered writes persistent, then this could gain some performance in the page flushing, but maybe not so much in the end. Does your storage stack (including the file system implementation in the kernel) support FUA? Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc