[Maria-developers] Wrong SQL thread position reporting after IO thread restart
Krisitan, I found what I think is a bug in IO and SQL thread accounting. How to reproduce: 1) Set up two servers S1 and S2. S1 is a master, S2 is slave with master_using_gtid = current_pos. 2) Execute some transactions on the master, e.g. create database d; create table d.t (n int); insert into d.t values (1); 3) Both servers are at 0-1-3 now, SHOW SLAVE STATUS on S2 shows Read_Master_Log_Pos equals to Exec_Master_Log_Pos. 4) Execute STOP SLAVE IO_THREAD on S2. 5) S2 reports in the logs: "Slave I/O thread exiting, ... GTID position 0-1-2". So IO thread didn't realize that it received full transaction for 0-1-3 even though it didn't receive next GTID event. 6) Execute START SLAVE IO_THREAD on S2. 7) At this point SHOW SLAVE STATUS on S2 shows Read_Master_Log_Pos the same as in step 3, but Exec_Master_Log_Pos is now less than in step 3 as if SQL thread didn't catch up with IO thread yet. But despite both threads running and no more transactions executed on master Exec_Master_Log_Pos doesn't change and doesn't become equal to Read_Master_Log_Pos. This apparently happens because IO thread restarts from one transaction behind, adds to relay log Rotate event that master sends with the position of that transaction, but then doesn't add any events for the transaction because it knows they already were added into relay log. I think both problems are bugs. And although after fixing the first it would be really hard (if possible) to reproduce second, I'd think the reporting of SQL thread's position still should be fixed. Thank you, Pavel
Here's my approach to fixing both of these problems if you are interested. Pavel On Thu, Sep 12, 2013 at 12:14 AM, Pavel Ivanov <pivanof@google.com> wrote:
Krisitan,
I found what I think is a bug in IO and SQL thread accounting. How to reproduce: 1) Set up two servers S1 and S2. S1 is a master, S2 is slave with master_using_gtid = current_pos. 2) Execute some transactions on the master, e.g.
create database d; create table d.t (n int); insert into d.t values (1);
3) Both servers are at 0-1-3 now, SHOW SLAVE STATUS on S2 shows Read_Master_Log_Pos equals to Exec_Master_Log_Pos. 4) Execute STOP SLAVE IO_THREAD on S2. 5) S2 reports in the logs: "Slave I/O thread exiting, ... GTID position 0-1-2". So IO thread didn't realize that it received full transaction for 0-1-3 even though it didn't receive next GTID event. 6) Execute START SLAVE IO_THREAD on S2. 7) At this point SHOW SLAVE STATUS on S2 shows Read_Master_Log_Pos the same as in step 3, but Exec_Master_Log_Pos is now less than in step 3 as if SQL thread didn't catch up with IO thread yet. But despite both threads running and no more transactions executed on master Exec_Master_Log_Pos doesn't change and doesn't become equal to Read_Master_Log_Pos. This apparently happens because IO thread restarts from one transaction behind, adds to relay log Rotate event that master sends with the position of that transaction, but then doesn't add any events for the transaction because it knows they already were added into relay log.
I think both problems are bugs. And although after fixing the first it would be really hard (if possible) to reproduce second, I'd think the reporting of SQL thread's position still should be fixed.
Thank you, Pavel
participants (2)
-
Kristian Nielsen
-
Pavel Ivanov