Hi Kristian,

On Thu, Jun 23, 2016 at 3:39 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Nirbhay Choubey <nirbhay@mariadb.com> writes:

> While copying the last 2 binlog files would have solved this, I have worked
> out
> a solution where the donor node waits for binlog checkpoint event for last
> binlog
> file to get logged before proceeding with file transfer.
>
> http://lists.askmonty.org/pipermail/commits/2016-June/009483.html

Urgh, please don't do this, seems there are multiple problems with this
patch (insufficient locking, introducing a new redundant wait mechanism,
comparing binlog file names rather than ids, ...).

> By the way, I initially tried reusing
> is_xidlist_idle_nolock()/COND_xid_list to implement the
> waiting mechanism. But since binlog checkpoint events are written
> asynchronously after
> xid_count falls to 0, that did not work. So later came up with the above

I think it should work if you follow the chained locking of LOCK_xid_list
and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list
to become empty. Then release LOCK_xid_list and take and immediately release
LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint event
has been written.

Note that there is already a similar wait mechanism, used by RESET
MASTER. RESET MASTER also needs to wait for checkpoint events to be
completed before running, so we should reuse that mechanism.

Right.
 

Also, it seems reasonable that FTWRL in general could wait for checkpoint
events so that other backup mechanisms similarly could avoid binlog files
changing during backup. So please fix this in FTWRL, in 10.2. (If you feel
you need to fix the galera bug in 10.1, you can implement it only for galera
in 10.1).

That sound good to me. But, considering Percona's backup locks, it seems more logical to
implement this in Backup locks instead, whenever they get ported/implemented in MariaDB.

Also, in this particular case, the problem lies in reload_acl_and_cache(REFRESH_BINARY_LOG),
(executed after FTWRL while preparing for SST) that rotates the binary log. So, FTWRL is not
directly linked to this issue. And as you rightly pointed, I will refrain from altering FTWRL's
behavior in 10.1 at least.


So in more detail, here is suggested way to fix:

In FTWRL (somewhere near the end, after commits are blocked), wait for
checkpoint events to be written using a similar mechanism as RESET MASTER:

  if (mysql_bin_log.is_open())
  {
    mysql_mutex_lock(&LOCK_xid_list);
    for (;;)
    {
      if (binlog_xid_count_list.is_last(binlog_xid_count_list.head()))
        break;
      mysql_cond_wait(&COND_xid_list, &LOCK_xid_list);
    }
    mysql_mutex_unlock(&LOCK_xid_list);
    /*
      LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be
      obtained after mark_xid_done() has written the last checkpoint event.
    */
    mysql_mutex_lock(&LOCK_log);
    mysql_mutex_unlock(&LOCK_log);
  }

Now, since FTWRL is a bit different from RESET MASTER, we need a couple
other changes:

 - Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal()
   in mark_xid_done() (to allow multiple waiters).

 - The second (but not the first mysql_cond_broadcast() in mark_xid_done()
   should be unconditional, so remove the if() here:

      if (unlikely(reset_master_pending))
        mysql_cond_signal(&COND_xid_list);

 - Also add mysql_cond_broadcast(&COND_xid_list) in two other places that
   the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open():

      while ((b= binlog_xid_count_list.head()) && b->xid_count == 0)
        my_free(binlog_xid_count_list.get());

   And one in reset_logs():

      my_free(binlog_xid_count_list.get());

This should make FTWRL wait for all pending binlog checkpoint events to be
written. And with commits blocked, no new checkpoints should become pending.

Does it seem reasonable to you? Let me know if some things are unclear or if
you see any potential problems with it.

Yes, it worked. But, to solve this issue in 10.1, I have added this wait to REFRESH_BINARY_LOG
(as explained above) only when the server is acting as a Galera node.


By the way, how to you intend to handle the case where RESET MASTER is run
during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or do
you have another mechanism to prevent RESET MASTER from running during SST?
Thinking more, you should be holding LOCK_log while copying the binlog
files (I'm guessing your not currently, right?)

You are right.
 
This will block RESET
MASTER,

I am now taking LOG_log during the duration of file transfer as protection
against the above commands.
 
and it also makes the extra lock/unlock of LOCK_log above redundant.

Not quite. The wait logic (that includes LOCK_log, as the snippet above) is to pause
REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/
FLUSH commands while file transfer is in progress. 

 

Also, FTWRL has really complex semantics. You should get Monty's opinion (or
maybe Serg?) on whether there are any potentials for deadlocks to waiting
inside FTWRL for binlog checkpoints.

As explained above, FTWRL remains unchanged, but will still check if Monty/Serg
can take a look at the fix.

http://lists.askmonty.org/pipermail/commits/2016-June/009494.html 

Best,
Nirbhay


 - Kristian.