Nirbhay Choubey <nirbhay@mariadb.com> writes:
While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer.
http://lists.askmonty.org/pipermail/commits/2016-June/009483.html
Urgh, please don't do this, seems there are multiple problems with this patch (insufficient locking, introducing a new redundant wait mechanism, comparing binlog file names rather than ids, ...).
By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above
I think it should work if you follow the chained locking of LOCK_xid_list and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list to become empty. Then release LOCK_xid_list and take and immediately release LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint event has been written. Note that there is already a similar wait mechanism, used by RESET MASTER. RESET MASTER also needs to wait for checkpoint events to be completed before running, so we should reuse that mechanism. Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files changing during backup. So please fix this in FTWRL, in 10.2. (If you feel you need to fix the galera bug in 10.1, you can implement it only for galera in 10.1). So in more detail, here is suggested way to fix: In FTWRL (somewhere near the end, after commits are blocked), wait for checkpoint events to be written using a similar mechanism as RESET MASTER: if (mysql_bin_log.is_open()) { mysql_mutex_lock(&LOCK_xid_list); for (;;) { if (binlog_xid_count_list.is_last(binlog_xid_count_list.head())) break; mysql_cond_wait(&COND_xid_list, &LOCK_xid_list); } mysql_mutex_unlock(&LOCK_xid_list); /* LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be obtained after mark_xid_done() has written the last checkpoint event. */ mysql_mutex_lock(&LOCK_log); mysql_mutex_unlock(&LOCK_log); } Now, since FTWRL is a bit different from RESET MASTER, we need a couple other changes: - Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal() in mark_xid_done() (to allow multiple waiters). - The second (but not the first mysql_cond_broadcast() in mark_xid_done() should be unconditional, so remove the if() here: if (unlikely(reset_master_pending)) mysql_cond_signal(&COND_xid_list); - Also add mysql_cond_broadcast(&COND_xid_list) in two other places that the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open(): while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) my_free(binlog_xid_count_list.get()); And one in reset_logs(): my_free(binlog_xid_count_list.get()); This should make FTWRL wait for all pending binlog checkpoint events to be written. And with commits blocked, no new checkpoints should become pending. Does it seem reasonable to you? Let me know if some things are unclear or if you see any potential problems with it. By the way, how to you intend to handle the case where RESET MASTER is run during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or do you have another mechanism to prevent RESET MASTER from running during SST? Thinking more, you should be holding LOCK_log while copying the binlog files (I'm guessing your not currently, right?) This will block RESET MASTER, and it also makes the extra lock/unlock of LOCK_log above redundant. Also, FTWRL has really complex semantics. You should get Monty's opinion (or maybe Serg?) on whether there are any potentials for deadlocks to waiting inside FTWRL for binlog checkpoints. - Kristian.