Hi Kristian, On Thu, Jun 23, 2016 at 3:39 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Nirbhay Choubey <nirbhay@mariadb.com> writes:
While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer.
http://lists.askmonty.org/pipermail/commits/2016-June/009483.html
Urgh, please don't do this, seems there are multiple problems with this patch (insufficient locking, introducing a new redundant wait mechanism, comparing binlog file names rather than ids, ...).
By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above
I think it should work if you follow the chained locking of LOCK_xid_list and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list to become empty. Then release LOCK_xid_list and take and immediately release LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint event has been written.
Note that there is already a similar wait mechanism, used by RESET MASTER. RESET MASTER also needs to wait for checkpoint events to be completed before running, so we should reuse that mechanism.
Right.
Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files changing during backup. So please fix this in FTWRL, in 10.2. (If you feel you need to fix the galera bug in 10.1, you can implement it only for galera in 10.1).
That sound good to me. But, considering Percona's backup locks, it seems more logical to implement this in Backup locks instead, whenever they get ported/implemented in MariaDB. Also, in this particular case, the problem lies in reload_acl_and_cache(REFRESH_BINARY_LOG), (executed after FTWRL while preparing for SST) that rotates the binary log. So, FTWRL is not directly linked to this issue. And as you rightly pointed, I will refrain from altering FTWRL's behavior in 10.1 at least.
So in more detail, here is suggested way to fix:
In FTWRL (somewhere near the end, after commits are blocked), wait for checkpoint events to be written using a similar mechanism as RESET MASTER:
if (mysql_bin_log.is_open()) { mysql_mutex_lock(&LOCK_xid_list); for (;;) { if (binlog_xid_count_list.is_last(binlog_xid_count_list.head())) break; mysql_cond_wait(&COND_xid_list, &LOCK_xid_list); } mysql_mutex_unlock(&LOCK_xid_list); /* LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be obtained after mark_xid_done() has written the last checkpoint event. */ mysql_mutex_lock(&LOCK_log); mysql_mutex_unlock(&LOCK_log); }
Now, since FTWRL is a bit different from RESET MASTER, we need a couple other changes:
- Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal() in mark_xid_done() (to allow multiple waiters).
- The second (but not the first mysql_cond_broadcast() in mark_xid_done() should be unconditional, so remove the if() here:
if (unlikely(reset_master_pending)) mysql_cond_signal(&COND_xid_list);
- Also add mysql_cond_broadcast(&COND_xid_list) in two other places that the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open():
while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) my_free(binlog_xid_count_list.get());
And one in reset_logs():
my_free(binlog_xid_count_list.get());
This should make FTWRL wait for all pending binlog checkpoint events to be written. And with commits blocked, no new checkpoints should become pending.
Does it seem reasonable to you? Let me know if some things are unclear or if you see any potential problems with it.
Yes, it worked. But, to solve this issue in 10.1, I have added this wait to REFRESH_BINARY_LOG (as explained above) only when the server is acting as a Galera node.
By the way, how to you intend to handle the case where RESET MASTER is run during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or do you have another mechanism to prevent RESET MASTER from running during SST? Thinking more, you should be holding LOCK_log while copying the binlog files (I'm guessing your not currently, right?)
You are right.
This will block RESET MASTER,
I am now taking LOG_log during the duration of file transfer as protection against the above commands.
and it also makes the extra lock/unlock of LOCK_log above redundant.
Not quite. The wait logic (that includes LOCK_log, as the snippet above) is to pause REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/ FLUSH commands while file transfer is in progress.
Also, FTWRL has really complex semantics. You should get Monty's opinion (or maybe Serg?) on whether there are any potentials for deadlocks to waiting inside FTWRL for binlog checkpoints.
As explained above, FTWRL remains unchanged, but will still check if Monty/Serg can take a look at the fix. http://lists.askmonty.org/pipermail/commits/2016-June/009494.html Best, Nirbhay
- Kristian.