Marko Mäkelä <marko.makela@mariadb.com> writes:
There is some ongoing development work around this area. If the binlog is enabled, it should actually be unnecessary to persist the storage engine log, because it should be possible to replay any not-committed-in-engine transactions from the binlog. We must merely
Nice to hear that this is being worked on. There is an old worklog MWL#164 with some analysis of potential issues to be solved. http://worklog.askmonty.org/worklog/Server-RawIdeaBin/?tid=164 It becomes tricky in some corner cases, for example cross-engine transactions where one engine has the changes persisted after a crash and the other does not. But the impact of a robust implementation of this could be huge, double-fsync-per-commit is _really_ expensive. Hopefully the corner cases can be solved or handled with some kind of fall-back.
But, InnoDB’s use of fsync() on data files feels like an overkill. I believe that we only need some 'write barriers', that is, some
This is also quite interesting. My (admittedly limited) understanding is that disks in fact have write-barrier functionality, and that journalling file systems in fact use that. The problem seems to be how to expose that to userspace. I wonder if there are any existing or proposed interfaces to allow userspace to specify write barriers between writes. - Kristian.