Hi Serg,

On Tue, Jun 28, 2016 at 5:02 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nirbhay!

On Jun 27, Nirbhay Choubey wrote:
> >
> > That seems quite ugly, why not call it from the SST code, after it
> > has called reload_acl_and_cache()? You're basically making FLUSH
> > LOGS behave differently in Galera and non-Galera (if my
> > understanding is correct), which might lead to subtle bugs?
>
> I initially thought of adding the call after reload_acl_and_cache(),
> but there could still be a case when user performs a
> REFRESH_BINARY_LOG before LOCK_log is acquired.

Right, but you didn't fix it. You have

  1> FTWRL
  2> reload_acl_and_cache()
    3> wait_for_last_checkpoint_event()
  4> SET global innodb_disallow_writes=1
  5> mysql_mutex_lock(LOCK_log)

You've described your case correctly: "when user performs
REFRESH_BINARY_LOG before LOCK_log is acquired". That is, you care when
a user performs REFRESH_BINARY_LOG between 3 and 5. You don't care if
somebody does REFRESH_BINARY_LOG between 2 and 3. So, you can as well
move wait_for_last_checkpoint_event() out of reload_acl_and_cache().


If wait is moved outside wait_for_last_checkpoint_event() (say 3') and user's
REFRESH_BINARY_LOG kicks in right after the wait (3') but before (5), will
trigger creation of another new binlog file for which the last checkpoint event
(logged asynchronously by a separate thread) may not make it into time and
will cause the same issue on joiner node.

Another workable option was to move wait outside and after reload_acl_and_cache
and not release LOCK_log until the file transfer is complete.

1> FTWRL
2> reload_acl_and_cache()
3> wait for last checkpoint event & lock(LOCK_log)
4> SET global innodb_disallow_writes=1
... file transfer ...
5> mysql_mutex_unlock(LOCK_log)

But with LOCK_log locked in #3, mysql_mutex_assert_not_owner(mysql_bin_log.get_log_lock())
will fail for #4.



With wait_for_last_checkpoint_event inside reload_acl_and_cache or
outside, you still don't have anything that would prevent user from
doing REFRESH_BINARY_LOG between 3 and 5.

It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with
wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would
ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits
until last checkpoint event makes into the new binary log file.

Best,
Nirbhay


Regards,
Sergei
Chief Architect MariaDB
and security@mariadb.org