Re: [Maria-developers] a bug that affects group commit implementations
[Cc:ing maria-developers@ as I want us to get into the habbit of discussing more openly, hope that is ok] MARK CALLAGHAN <mdcallag@gmail.com> writes:
I think that Mats identified the problem and then Vamsi did a reproduction of it. http://bugs.mysql.com/58787
Thanks for pointing me to this issue Mark! I'm wondering what the real bug is here? Can one of you explain? I suppose it is that we can get in binary log eg. INSERT, ALTER; while in InnoDB transaction log we get ALTER, INSERT ? I believe the reason to avoid such inconsistency normally is to allow something like XtraBackup to get a consistent binlog position that can be used to provision a slave. However, for DDL this is not enough, as .frm files are not handled. So ensuring consistent order for DDL between binlog and innodb transaction log does not really solve the problem. The bug also suggests that DDL should use 2-phase commit. But 2-phase commit implies the ability to rollback. I do not know if InnoDB is able to roll back DDL, but MySQL .frm handling certainly is not. So great care would be needed to not just introduce different bugs, if this approach was taken. It is also a non-trivial change of the storage engine API. There has been talk of making DLL in MySQL (/MariaDB) transactional, or at least crash-safe. This is something that I would really like to see. I was told that partitioning already has code for this, by logging .frm changes and recovering / rolling back after crash. Something similar could work for general DDL. This would allow a proper solution to the binlog order problem for DDL. It does not mean that a partial solution now cannot be an improvement, however I do not understand from the bug discussion what such improvement would be. Can you elaborate? BTW, the purpose of 2-phase commit is to ensure consistency between different engines/binlog in case of crash, not to ensure consistent ordering of commits. The fact that it currently _does_ ensure ordering for InnoDB is just a gross hack with the prepare_commit_mutex. This is so expensive (3 x fsync() per commit) that I believe most users don't use it anyway (eg. setting sync_binlog != 1, which defeats the whole purpose of prepare_commit_mutex). I would really recommend looking at MWL#116 (http://askmonty.org/worklog/Server-Sprint/?tid=116), which solves the ordering issue in a proper way. - Kristian.
participants (1)
-
Kristian Nielsen