Il 24-08-2018 17:25 Gionatan Danti ha scritto:
So it really seems that a doublewrite-less MariaDB would be safe from corruption unless extraordinary bad luck (ie: mysqld crash at a *really small* wrong moment) hits.
Hi all, I have a follow-up on my tests. It seems that a write() up to 16 MB is unkillable/unstoppable when directly done on top of a ZFS filesystem. I *think* this is a deliberate result of how ARC accept write for buffering. It does not seems a coincidence that current max recordsize on a ZFS filesystem is 16 MB. On the other side, layering a ext4 filesystem on top of a ZVOL does *not* avoid partial writes. Similarly, I *think* this is due the linux own pagecache accepting an interrupted write stream. In short, it seems a performance vs correctness tradeoff: while pagecache is way faster (for reads/writes that hit), ARC seems to greatly favor correcteness by avoiding interrupted writes. These are only *speculations* on my parts, but they are backed by my (empirical) test results. If they are correct, disabling doublewrite would be safe if mysqld runs directly on top of a ZFS filesystem, while it have a (small) probability of corruption itself when running inside a virtual machine and/or through another filesystem layer. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8