Gordan Bobic via discuss <discuss@lists.mariadb.org> writes:
Do tablespace commits get explicitly flushed during normal runtime operation?
Not commits, no. Only the redo log (and binlog) is fsync'ed per commit, as controlled by --innodb-flush-log-at-trx-commit and --sync-binlog.
If we have a write that successfully commits to the redo log and to the binlog, but the tablespace loses, say, 5 seconds worth of commits in an unclean shutdown, would crash recovery deal with it? Is
Yes.
replaying the redo log followed by binlog based recovery sufficient to put the tablespace(s) into a consistent state even if the redo+binary logs are in terms of on-disk state a few seconds ahead of the tablespaces?
Yes, this is precisely the purpose of the redo log - and why it's also called a write-ahead log.
On other words, provided that write ordering is preserved (ordering as guided by flush calls), can I do the equivalent of LD_PRELOAD=libeatmydata on the tablespace operations safely as long as the redo and binary logs are fsync()-ed reliably?
No. The redo log is of finite size, and cycles. InnoDB regularly does a checkpoint, to ensure that all tablespace data up to a certain point has been durably written. At that point, the redo log corresponding to earlier changes is no longer needed, and can be overwritten by new log data. Crash recovery only needs to replay the log from the last checkpoint. If libeatmydata or other incorrect fsync-behaviour leaves a checkpoint corrupted, then crash recovery can fail. - Kristian.