Apologies for late response. Thanks to everybody for your responses. On 16. 09. 19 04:25, Daniel Black wrote:
If devices ever supported atomic writes this would require implementing these changes all from filesystem the way down (though pseudo block devies like device mapper (aka LVM) and crypt layers) to block layers would be required otherwise.
So it is my understanding that --innodb-flush-log-at-trx-commit=2 is an acceptable workaround for reducing the amount of fsyncs to one per second, on suitably crash resistant hardware (and assuming for the moment, slave databases are not used). That is until there is a proper solution for atomic writes on the filesystem level. Realistically, even if this were only solved on ext4, xfs, or a filesystem advertised as recommended for high-performance MariaDB deployments, the issue like mine would be rendered nonexistent. And it's just down to carrying this idea over to the kernel people.
From a major disk vendor in the LPC Database Microconference session, SCSI had ordering as an option, however it was never implemented by any vendor.
Without this existing in hardware I think the discussion went along the lines that it needs to wait until the hardware queue is fully flushed. (lots of hardware specification acronyms where mentioned in quick succession)
My understanding of the correspondence on this subject on the kernel mailing list to date is that for a long time, this was intentionally not implemented because of the performance penalties inherent with this being done right (as fsync effectively negates all cache and storage vendors like their cache). This basically makes it all the more obvious that what is actually needed is a high-performance atomic write mechanism separate from fsync and then fsync can be properly implemented down to the hardware level. LP, Jure