-----Original Message----- From: Sergei Golubchik [mailto:serg@askmonty.org] Sent: Donnerstag, 4. April 2013 23:12 To: Vladislav Vaintroub Cc: maria-developers@lists.launchpad.net Subject: Re: Rev 3712: MDEV-4338 : Support atomic option on directFS/FusionIO
Hi Serg, <skip>
it'd be nice if you could try os_file_set_atomic_writes() here to see if that works. This function creates and opens quite a few files, you could use one of them to mark it for atomic writes, and if that would fail, you'd disable atomic writes, and wouldn't change innobase_file_flush_method and innobase_use_doublewrite.
It is tricky to do it before files are opened the first time , without moving lot of code around - there is non-trivial parsing of tablespace names later on in open_or_create_data_files(), just to figure out directories and filenames.
I mean, you can try your function on any other file, not necessarily on the tablespace. Even on a temporary file, like with
int fd = mysql_tmpfile("ib"); if (os_file_set_atomic_writes(fd)) ... my_close(fd);
Btw, why not to use posix_fallocate whenever it's available? Or, at least, with its own --innodb-use-fallocate option?
Yes, I guess it (the new option) is a good idea. I created a followup
Right, but the subtlety here is that test file needs to be on the same device as ibdata1 tablespace, and figuring out the correct directory is not trivial . Apart from easy default situation, where ibdata lands in the into datadir, there is innodb_data_home_dir , as well as innodb_data_file_path (this one needs to be parsed). Not to forget possible symbolic links . Imagine ibdata1 is placed on atomic-capable filesystem, and symlink to it into datadir. So I'd still say this is tricky.. patch
that introduces the option http://lists.askmonty.org/pipermail/commits/2013-April/004569.html . I set default to ON. What do you think?
There are two related threads:
https://lists.launchpad.net/maria-developers/msg05068.html and the one in internals@ mysql list, starting from http://lists.mysql.com/internals/38679
In particular, I noticed this part " I relied on that fallocate does not need fsync since metadata is protected by filesystem journal. But I am not confident whether it is true. I'm wondering if this patch may lead InnoDB committing schema to not function normally.
What do you know about it, does one need to sync after posix_fallocate()? What if the filesystem is not journalling?
This is a good question, and I do not think I have enough knowledge to answer that one :) note, that fsync() is done anyway almost everywhere after os_set_file_size() 1. os_set_file_size during creation single tablespace (file-per-table I believe) is followed almost immediately by os_file_flush() 2. os_set_file_size when tablespace is extended, is followed by fil_flush. When innodb starts up for the first time (bootstrap), it is a little bit different - log file or tablespace are created , os_set_file_size() is called, and then the file is closed, but not flushed. My feeling is that it is ok, hoping at least on close() the metadata will be flushed. And even if not, in case of the probably worst scenario that machine crashes during bootstrap, - in this case user data is not lost , as there is no user data yet. Having said all this, I would not mind to adding an extra fsync() to the function, if this makes feel safer for someone. I think the overhead of it would be minimal.
Regards, Sergei