On Mon, Nov 18, 2013 at 6:26 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
sql/log.cc has highly inconsistent pattern of accessing rpl_global_gtid_binlog_state. In some places it's protected by mutex rpl_global_gtid_binlog_state.LOCK_binlog_state, in some places it's protected by global mutex LOCK_rpl_gtid_state, and in some places it's not protected by any mutex at all. This is a bug that should be fixed, right?
Yes, agree, thanks for finding this.
I can file this in JIRA if you want, just wanted to let you know faster.
I've filed it as MDEV-5306.
I've noticed this problem after one of our AddressSanitizer testing bots reported that rpl_binlog_state::bump_seq_no_if_needed (called from MYSQL_BIN_LOG::bump_seq_no_counter_if_needed) was using memory from hash after it was already freed by rpl_binlog_state::reset (called by MYSQL_BIN_LOG::reset_logs).
Can you test again with the latest patch I've just pushed to 10.0-base? I looked briefly into making an mtr test case to catch this, but I did not see any easy way to do that. I went through the code to try and check all uses of rpl_binlog_state for proper locking, but it would be nice to get a confirmation that I did not miss anything.
As you may guess this confirmation will be fuzzy. Of course I'll include http://bazaar.launchpad.net/~maria-captains/maria/10.0-base/revision/3936 into our tree and will monitor testing bots, but I saw the problem only once so far... <rant> I'm sure I would be able to get a definitive confirmation if MySQL/MariaDB could be tested with ThreadSanitizer. But at the moment it's a complete hell full of benign races (most of which seem to be in InnoDB), so it's impossible to filter out real races from developers' ignorance... :( I hope sometimes someone will see importance of this problem and will have time to chase and fix everything... </rant> Pavel