[Maria-developers] Rev 3712: MDEV-4338 : Support atomic option on directFS/FusionIO

Hi, Vladislav! On Mar 29, Vladislav Vaintroub wrote:
Is that the typical usage pattern in the innodb code? Because it looks a bit silly to me, why couldn't you create a sysvar directly on srv_use_atomic_writes, why an intermediate innobase_use_atomic_writes variable?
it'd be nice if you could try os_file_set_atomic_writes() here to see if that works. This function creates and opens quite a few files, you could use one of them to mark it for atomic writes, and if that would fail, you'd disable atomic writes, and wouldn't change innobase_file_flush_method and innobase_use_doublewrite. Btw, why not to use posix_fallocate whenever it's available? Or, at least, with its own --innodb-use-fallocate option? Regards, Sergei

Hi Serg,
Yes, this is typical usage pattern in Innodb. Bit weird, I agree. Every innobase_xxx variable has an srv_xxx counterpart. I have no clue why it is so, but I did not want to disturb their existing conventions:) <skip>
It is tricky to do it before files are opened the first time , without moving lot of code around - there is non-trivial parsing of tablespace names later on in open_or_create_data_files(), just to figure out directories and filenames. On the other hand, I think it is acceptable to fail fast during start, as long as option is not default, and good diagnostics are written into error log . look into error log, reset parameter to default. Currently there is a single filesystem with atomic capability, so people who use innodb_use_atomic_writes will probably know what they are doing. If/once the option becomes popular so it makes sense to make it default, then I would agree with you, a detection/fallback is very appropriate.
Btw, why not to use posix_fallocate whenever it's available? Or, at least, with its own --innodb-use-fallocate option?
Yes, I guess it (the new option) is a good idea. I created a followup patch that introduces the option http://lists.askmonty.org/pipermail/commits/2013-April/004569.html . I set default to ON. What do you think? Thanks a lot for looking at the patch, Wlad

FWIW not necessarily. Look at innodb_io_capacity, innodb_purge_batch_size, innodb_rollback_segments, innodb_purge_threads just name the first few where a srv_* variable is specified in MYSQL_SYSVAR_* directly. Note that the srv_* = innobase_* assignments have typecasts on them, so I haven't checked, but it might be that using the srv_ in a MYSQL_SYSVAR_ directly would have resulted in compilation warnings otherwise. Just 2c, Laurynas

Thanks, Laurinas, this is interesting. I tested it , using srv_use_atomic_writes directly. This gives me compiler error, not warning :) 6>..\..\..\storage\xtradb\handler\ha_innodb.cc(12654): error C2440: 'initializing' : cannot convert from 'ulint *' to 'char *' 6> Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast Indeed, boolean values are handled consequently as "char" in MySQL (i.e my_bool is char ), and as "ulint" in innodb (ibool is ulint)
Just 2c, Laurynas

Hi, Vladislav! On Apr 04, Vladislav Vaintroub wrote:
Ok, good that you and Laurinas have figured that out :) Not that I understand why bool == ulint, but I'm not suggesting to break this convention.
I mean, you can try your function on any other file, not necessarily on the tablespace. Even on a temporary file, like with int fd = mysql_tmpfile("ib"); if (os_file_set_atomic_writes(fd)) ... my_close(fd);
There are two related threads: https://lists.launchpad.net/maria-developers/msg05068.html and the one in internals@ mysql list, starting from http://lists.mysql.com/internals/38679 In particular, I noticed this part " I relied on that fallocate does not need fsync since metadata is protected by filesystem journal. But I am not confident whether it is true. I'm wondering if this patch may lead InnoDB committing schema to not function normally. " What do you know about it, does one need to sync after posix_fallocate()? What if the filesystem is not journalling? Regards, Sergei

Hi Serg, <skip>
Right, but the subtlety here is that test file needs to be on the same device as ibdata1 tablespace, and figuring out the correct directory is not trivial . Apart from easy default situation, where ibdata lands in the into datadir, there is innodb_data_home_dir , as well as innodb_data_file_path (this one needs to be parsed). Not to forget possible symbolic links . Imagine ibdata1 is placed on atomic-capable filesystem, and symlink to it into datadir. So I'd still say this is tricky.. patch
This is a good question, and I do not think I have enough knowledge to answer that one :) note, that fsync() is done anyway almost everywhere after os_set_file_size() 1. os_set_file_size during creation single tablespace (file-per-table I believe) is followed almost immediately by os_file_flush() 2. os_set_file_size when tablespace is extended, is followed by fil_flush. When innodb starts up for the first time (bootstrap), it is a little bit different - log file or tablespace are created , os_set_file_size() is called, and then the file is closed, but not flushed. My feeling is that it is ok, hoping at least on close() the metadata will be flushed. And even if not, in case of the probably worst scenario that machine crashes during bootstrap, - in this case user data is not lost , as there is no user data yet. Having said all this, I would not mind to adding an extra fsync() to the function, if this makes feel safer for someone. I think the overhead of it would be minimal.
Regards, Sergei

Hi, Vladislav! On Apr 05, Vladislav Vaintroub wrote:
Okay, let's do it your way. If it will cause problems for users we can always reconsider.
Right, and as soon as there are any data there - on any transaction - there will be a flush.
No, I think it's fine the way it is. As you didn't remove any fsyncs and InnoDB does sync after resizing, it should be ok. Regards, Sergei
participants (3)
-
Laurynas Biveinis
-
Sergei Golubchik
-
Vladislav Vaintroub