[Maria-developers] New problem with GTID patch
Kristian, Unfortunately with your recent commit (revision 3546) test rpl.rpl_gtid_startpos doesn't work when GTID patch is applied on top of 10.0.1 tarball. It works okay in 10.0-mdev26 though, so I'm not sure if you'll want to look at the problem now. In particular the problematic addition is if (!is_relay_log && read_state_from_file()) DBUG_RETURN(1); to function MYSQL_BIN_LOG::open(). Without it test works. With it test breaks in the section "Test that master gives error when slave asks for empty gtid pos and binlog files have been purged". The problem is when get_gtid_list_event() is called from gtid_find_binlog_file() it returns event with count == 0. It causes contains_all_slave_gtid() to return true and master doesn't send error to slave as expected... I didn't look in any further details because I'm not familiar with everything that should happen here that much. If you want me to look at something else to better understand the problem let me know. Pavel
Pavel Ivanov <pivanof@google.com> writes:
of 10.0.1 tarball. It works okay in 10.0-mdev26 though, so I'm not sure if you'll want to look at the problem now.
Yes, I would like to look. It could easily be a hidden problem in 10.0-mdev26 too, and I'll have to merge to 10.0 soon anyway.
Unfortunately with your recent commit (revision 3546) test rpl.rpl_gtid_startpos doesn't work when GTID patch is applied on top
In particular the problematic addition is
if (!is_relay_log && read_state_from_file()) DBUG_RETURN(1);
We remember the last GTIDs written into previous binlog files, and save/restore this across server restarts. read_state_from_file() reads it in the first time a binlog file is opened. If we read the wrong thing, like empty file, then we get the behaviour you observed. The state is empty, we log empty gtid list event, this causes gtid_find_binlog_file() to find the wrong place to start. You might want to try revision 3547. It fixes a stupid mistake where after crash recovery, I forgot to set the flag to not load the state from the file, so state was imemdiately overwritten. But since crash recovery is not involved here, I do not see how this would cause your failure, unless it is somehow multiple test cases interacting with each other. Can I get your 10.0-based tree somewhere to try and repeat it? (Or just the current patch against 10.0.1 tarball).
to function MYSQL_BIN_LOG::open(). Without it test works. With it test breaks in the section "Test that master gives error when slave asks for empty gtid pos and binlog files have been purged". The problem is when get_gtid_list_event() is called from gtid_find_binlog_file() it returns event with count == 0. It causes contains_all_slave_gtid() to
This strongly indicates that we are wrongly re-loading stale data from the state file, overwriting the correct in-memory state.
everything that should happen here that much. If you want me to look at something else to better understand the problem let me know.
If you can add printouts to error log of all calls to read_state_from_file() and all GTIDs passed to rpl_binlog_state::update(), then that may give a hint to where we overwrite the binlog state. - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Pavel Ivanov <pivanof@google.com> writes:
Unfortunately with your recent commit (revision 3546) test rpl.rpl_gtid_startpos doesn't work when GTID patch is applied on top
I just pushed another patch (bzr revision 3548) with fixes for a similar failure in rpl.rpl_gtid_startpos. I still do not understand why it would be affected by "if (!is_relay_log && read_state_from_file())". However, the bug was a race in the test case, it failed or not at random depending on timing. So could it be that it just appeared to depend on this line, while in reality the issue was different timing when running the test? (I would like to understand the issue so we do not overlook a hidden bug). Incidentally, I've put the tree in Buildbot, you can see it here, it may be useful for you to check from time to time: https://buildbot.askmonty.org/buildbot/grid?branch=10.0-mdev26
everything that should happen here that much. If you want me to look at something else to better understand the problem let me know.
If you can add printouts to error log of all calls to read_state_from_file()
Or just send me the full output from mysql-test-run.pl for the failure, that may tell me something. - Kristian.
On Thu, Mar 28, 2013 at 2:42 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Pavel Ivanov <pivanof@google.com> writes:
Unfortunately with your recent commit (revision 3546) test rpl.rpl_gtid_startpos doesn't work when GTID patch is applied on top
I just pushed another patch (bzr revision 3548) with fixes for a similar failure in rpl.rpl_gtid_startpos. I still do not understand why it would be affected by "if (!is_relay_log && read_state_from_file())". However, the bug was a race in the test case, it failed or not at random depending on timing. So could it be that it just appeared to depend on this line, while in reality the issue was different timing when running the test?
It looks like revision 3548 indeed fixed the problem. It was very consistent difference in timing. :) Thank you, Pavel
participants (2)
-
Kristian Nielsen
-
Pavel Ivanov