Re: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

newer
Re: [Maria-developers] [Commits]...

Kristian Nielsen

2 May 2016 2 May '16

6:10 p.m.

Nirbhay Choubey <nirbhay@mariadb.com> writes: [Cc: maria-developers@, please always keep these discussions on the mailing list]

...

In Galera cluster, the state transfer scripts perform FTWRL and copy data along with the last of all available binlog files to the joiner node.

After MDEV-181, I understand that the binlog checkpoint can be in any of the binary log files (and not necessarily the last one).

This seemingly has caused MDEV-9423, in which the joiner node complains of the missing binlog file.

Now the question is : Is FTWRL not sufficient to ensure that the checkpoint is always the last binlog file?

So if I understand correctly, the issue is related to having binlog files available during XA crash recovery. When the binlog file is rotated, there is a small window where both the latest and the previous binlog files are needed for crash recovery. The binlog checkpoint is the earliest binlog file that is needed for crash recovery, and it can be seen from the binlog checkpoint event. So the problem here is that a copy is made just after binlog rotation, and Galera only copies the most recent, mostly-empty binlog file, leaving insufficient information for XA recovery, right? One option to solve this is to always copy the last two binlog files. While it is theoretically possible to have the binlog checkpoint more than two files back, I think it will not occur in practice. Another option is to wait for the binlog checkpoint to reach the current binlog file. You can see this done in the test suite: mysql-test/include/wait_for_binlog_checkpoint.inc The binlog checkpointing happens asynchroneously, I *think* it can complete even while FTWRL is active, but I am not 100% sure though. The checkpoint happens after InnoDB has made its commits durable with fsync() or similar - only after that is it safe to discard the old binlog data and still have correct crash recovery. - Kristian.

Show replies by date

Nirbhay Choubey

23 Jun 23 Jun

1:34 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi Kristian! On Mon, May 2, 2016 at 2:10 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Nirbhay Choubey <nirbhay@mariadb.com> writes:

[Cc: maria-developers@, please always keep these discussions on the mailing list]

...
In Galera cluster, the state transfer scripts perform FTWRL and copy data along with the last of all available binlog files to the joiner node.

After MDEV-181, I understand that the binlog checkpoint can be in any of the binary log files (and not necessarily the last one).

This seemingly has caused MDEV-9423, in which the joiner node complains of the missing binlog file.

Now the question is : Is FTWRL not sufficient to ensure that the checkpoint is always the last binlog file?

So if I understand correctly, the issue is related to having binlog files available during XA crash recovery. When the binlog file is rotated, there is a small window where both the latest and the previous binlog files are needed for crash recovery. The binlog checkpoint is the earliest binlog file that is needed for crash recovery, and it can be seen from the binlog checkpoint event.

So the problem here is that a copy is made just after binlog rotation, and Galera only copies the most recent, mostly-empty binlog file, leaving insufficient information for XA recovery, right?

Correct.

...

One option to solve this is to always copy the last two binlog files. While it is theoretically possible to have the binlog checkpoint more than two files back, I think it will not occur in practice.

...

Another option is to wait for the binlog checkpoint to reach the current binlog file. You can see this done in the test suite:

mysql-test/include/wait_for_binlog_checkpoint.inc

The binlog checkpointing happens asynchroneously, I *think* it can complete even while FTWRL is active, but I am not 100% sure though.

The checkpoint happens after InnoDB has made its commits durable with fsync() or similar - only after that is it safe to discard the old binlog data and still have correct crash recovery.

While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer. http://lists.askmonty.org/pipermail/commits/2016-June/009483.html By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above patch. Best, Nirbhay

...

- Kristian.

Kristian Nielsen

7:39 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...

While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer.

http://lists.askmonty.org/pipermail/commits/2016-June/009483.html

Urgh, please don't do this, seems there are multiple problems with this patch (insufficient locking, introducing a new redundant wait mechanism, comparing binlog file names rather than ids, ...).

...

By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above

I think it should work if you follow the chained locking of LOCK_xid_list and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list to become empty. Then release LOCK_xid_list and take and immediately release LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint event has been written. Note that there is already a similar wait mechanism, used by RESET MASTER. RESET MASTER also needs to wait for checkpoint events to be completed before running, so we should reuse that mechanism. Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files changing during backup. So please fix this in FTWRL, in 10.2. (If you feel you need to fix the galera bug in 10.1, you can implement it only for galera in 10.1). So in more detail, here is suggested way to fix: In FTWRL (somewhere near the end, after commits are blocked), wait for checkpoint events to be written using a similar mechanism as RESET MASTER: if (mysql_bin_log.is_open()) { mysql_mutex_lock(&LOCK_xid_list); for (;;) { if (binlog_xid_count_list.is_last(binlog_xid_count_list.head())) break; mysql_cond_wait(&COND_xid_list, &LOCK_xid_list); } mysql_mutex_unlock(&LOCK_xid_list); /* LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be obtained after mark_xid_done() has written the last checkpoint event. */ mysql_mutex_lock(&LOCK_log); mysql_mutex_unlock(&LOCK_log); } Now, since FTWRL is a bit different from RESET MASTER, we need a couple other changes: - Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal() in mark_xid_done() (to allow multiple waiters). - The second (but not the first mysql_cond_broadcast() in mark_xid_done() should be unconditional, so remove the if() here: if (unlikely(reset_master_pending)) mysql_cond_signal(&COND_xid_list); - Also add mysql_cond_broadcast(&COND_xid_list) in two other places that the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open(): while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) my_free(binlog_xid_count_list.get()); And one in reset_logs(): my_free(binlog_xid_count_list.get()); This should make FTWRL wait for all pending binlog checkpoint events to be written. And with commits blocked, no new checkpoints should become pending. Does it seem reasonable to you? Let me know if some things are unclear or if you see any potential problems with it. By the way, how to you intend to handle the case where RESET MASTER is run during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or do you have another mechanism to prevent RESET MASTER from running during SST? Thinking more, you should be holding LOCK_log while copying the binlog files (I'm guessing your not currently, right?) This will block RESET MASTER, and it also makes the extra lock/unlock of LOCK_log above redundant. Also, FTWRL has really complex semantics. You should get Monty's opinion (or maybe Serg?) on whether there are any potentials for deadlocks to waiting inside FTWRL for binlog checkpoints. - Kristian.

Nirbhay Choubey

24 Jun 24 Jun

3:29 p.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi Kristian, On Thu, Jun 23, 2016 at 3:39 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...
While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer.

http://lists.askmonty.org/pipermail/commits/2016-June/009483.html

Urgh, please don't do this, seems there are multiple problems with this patch (insufficient locking, introducing a new redundant wait mechanism, comparing binlog file names rather than ids, ...).

...
By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above

I think it should work if you follow the chained locking of LOCK_xid_list and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list to become empty. Then release LOCK_xid_list and take and immediately release LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint event has been written.

Note that there is already a similar wait mechanism, used by RESET MASTER. RESET MASTER also needs to wait for checkpoint events to be completed before running, so we should reuse that mechanism.

Right.

...

Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files changing during backup. So please fix this in FTWRL, in 10.2. (If you feel you need to fix the galera bug in 10.1, you can implement it only for galera in 10.1).

That sound good to me. But, considering Percona's backup locks, it seems more logical to implement this in Backup locks instead, whenever they get ported/implemented in MariaDB. Also, in this particular case, the problem lies in reload_acl_and_cache(REFRESH_BINARY_LOG), (executed after FTWRL while preparing for SST) that rotates the binary log. So, FTWRL is not directly linked to this issue. And as you rightly pointed, I will refrain from altering FTWRL's behavior in 10.1 at least.

...

So in more detail, here is suggested way to fix:

In FTWRL (somewhere near the end, after commits are blocked), wait for checkpoint events to be written using a similar mechanism as RESET MASTER:

if (mysql_bin_log.is_open()) { mysql_mutex_lock(&LOCK_xid_list); for (;;) { if (binlog_xid_count_list.is_last(binlog_xid_count_list.head())) break; mysql_cond_wait(&COND_xid_list, &LOCK_xid_list); } mysql_mutex_unlock(&LOCK_xid_list); /* LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be obtained after mark_xid_done() has written the last checkpoint event. */ mysql_mutex_lock(&LOCK_log); mysql_mutex_unlock(&LOCK_log); }

Now, since FTWRL is a bit different from RESET MASTER, we need a couple other changes:

- Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal() in mark_xid_done() (to allow multiple waiters).

- The second (but not the first mysql_cond_broadcast() in mark_xid_done() should be unconditional, so remove the if() here:

if (unlikely(reset_master_pending)) mysql_cond_signal(&COND_xid_list);

- Also add mysql_cond_broadcast(&COND_xid_list) in two other places that the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open():

while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) my_free(binlog_xid_count_list.get());

And one in reset_logs():

my_free(binlog_xid_count_list.get());

This should make FTWRL wait for all pending binlog checkpoint events to be written. And with commits blocked, no new checkpoints should become pending.

Does it seem reasonable to you? Let me know if some things are unclear or if you see any potential problems with it.

Yes, it worked. But, to solve this issue in 10.1, I have added this wait to REFRESH_BINARY_LOG (as explained above) only when the server is acting as a Galera node.

...

By the way, how to you intend to handle the case where RESET MASTER is run during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or do you have another mechanism to prevent RESET MASTER from running during SST? Thinking more, you should be holding LOCK_log while copying the binlog files (I'm guessing your not currently, right?)

You are right.

...

This will block RESET MASTER,

I am now taking LOG_log during the duration of file transfer as protection against the above commands.

...

and it also makes the extra lock/unlock of LOCK_log above redundant.

Not quite. The wait logic (that includes LOCK_log, as the snippet above) is to pause REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/ FLUSH commands while file transfer is in progress.

...

Also, FTWRL has really complex semantics. You should get Monty's opinion (or maybe Serg?) on whether there are any potentials for deadlocks to waiting inside FTWRL for binlog checkpoints.

As explained above, FTWRL remains unchanged, but will still check if Monty/Serg can take a look at the fix. http://lists.askmonty.org/pipermail/commits/2016-June/009494.html Best, Nirbhay

...

- Kristian.

Kristian Nielsen

25 Jun 25 Jun

7:57 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...

...
Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files

...

That sound good to me. But, considering Percona's backup locks, it seems more logical to implement this in Backup locks instead, whenever they get ported/implemented in MariaDB.

Right. As I was thinking about the problem, it occured to me that this wasn't really a Galera-specific thing, my suggestion seemed a valid general wait-for-checkpoint mechanism. So we should put the code that waits for checkpoint in its own function (as you already did, MYSQL_BIN_LOG::wait_for_last_checkpoint_event()). But I agree, we can wait with actually exposing it (in FTWRL, backup locks, whatever) until when/if that becomes relevant/priority. I would just note that this wait does not really do anything unless there is something else (like FTWRL in your case) that prevents new commits, otherwise a new checkpoint could become pending at any time after wait_for_last_checkpoint_event() returns.

...

Also, in this particular case, the problem lies in reload_acl_and_cache(REFRESH_BINARY_LOG), (executed after FTWRL while preparing for SST) that rotates the binary log.

Hm, I see. So you're always copying an empty binlog file? I'm wondering why you don't simply don't copy any binlogs and just start the new server with --tc-heuristic-recover=ROLLBACK ... maybe copying binlogs was just considered easier? Anyway, I don't have the bigger picture, so can't have much of an informed opinion here.

...

Yes, it worked. But, to solve this issue in 10.1, I have added this wait to REFRESH_BINARY_LOG (as explained above) only when the server is acting as a Galera node.

That seems quite ugly, why not call it from the SST code, after it has called reload_acl_and_cache()? You're basically making FLUSH LOGS behave differently in Galera and non-Galera (if my understanding is correct), which might lead to subtle bugs? But again, I don't have the bigger picture, and the whole wsrep patch is garbage all over the server anyway, so I suppose it doesn't matter much to me, as long as it's #ifdef WSREP.

...

...
and it also makes the extra lock/unlock of LOCK_log above redundant.

Not quite. The wait logic (that includes LOCK_log, as the snippet above) is to pause REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/ FLUSH commands while file transfer is in progress.

Sure, it's fine to have both, probably makes the code clearer anyway.

...

--- a/sql/log.cc +++ b/sql/log.cc @@ -3690,7 +3690,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name, new_xid_list_entry->binlog_id= current_binlog_id; /* Remove any initial entries with no pending XIDs. */ while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) + { my_free(binlog_xid_count_list.get()); + mysql_cond_broadcast(&COND_xid_list); + } binlog_xid_count_list.push_back(new_xid_list_entry); mysql_mutex_unlock(&LOCK_xid_list);

There is no need to mysql_cond_broadcast() multiple times. Use just a single broadcast outside the loop (before or after, doesn't make a difference). - Kristian.

Nirbhay Choubey

27 Jun 27 Jun

11:42 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi Kristian, On Sat, Jun 25, 2016 at 3:57 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...
...
Also, it seems reasonable that FTWRL in general could wait for checkpoint events so that other backup mechanisms similarly could avoid binlog files

...
That sound good to me. But, considering Percona's backup locks, it seems more logical to implement this in Backup locks instead, whenever they get ported/implemented in MariaDB.

Right. As I was thinking about the problem, it occured to me that this wasn't really a Galera-specific thing, my suggestion seemed a valid general wait-for-checkpoint mechanism.

So we should put the code that waits for checkpoint in its own function (as you already did, MYSQL_BIN_LOG::wait_for_last_checkpoint_event()). But I agree, we can wait with actually exposing it (in FTWRL, backup locks, whatever) until when/if that becomes relevant/priority.

I would just note that this wait does not really do anything unless there is something else (like FTWRL in your case) that prevents new commits, otherwise a new checkpoint could become pending at any time after wait_for_last_checkpoint_event() returns.

...
Also, in this particular case, the problem lies in reload_acl_and_cache(REFRESH_BINARY_LOG), (executed after FTWRL while preparing for SST) that rotates the binary log.

Hm, I see. So you're always copying an empty binlog file? I'm wondering why you don't simply don't copy any binlogs and just start the new server with --tc-heuristic-recover=ROLLBACK ... maybe copying binlogs was just considered easier? Anyway, I don't have the bigger picture, so can't have much of an informed opinion here.

The joiner node also picks up the GTID state from the binary log file it received.

...

...
Yes, it worked. But, to solve this issue in 10.1, I have added this wait to REFRESH_BINARY_LOG (as explained above) only when the server is acting as a Galera node.

That seems quite ugly, why not call it from the SST code, after it has called reload_acl_and_cache()? You're basically making FLUSH LOGS behave differently in Galera and non-Galera (if my understanding is correct), which might lead to subtle bugs?

I initially thought of adding the call after reload_acl_and_cache(), but there could still be a case when user performs a REFRESH_BINARY_LOG before LOCK_log is acquired.

...

But again, I don't have the bigger picture, and the whole wsrep patch is garbage all over the server anyway, so I suppose it doesn't matter much to me, as long as it's #ifdef WSREP.

...
...
and it also makes the extra lock/unlock of LOCK_log above redundant.

Not quite. The wait logic (that includes LOCK_log, as the snippet above) is to pause REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/ FLUSH commands while file transfer is in progress.

Sure, it's fine to have both, probably makes the code clearer anyway.

Right.

...

...
--- a/sql/log.cc +++ b/sql/log.cc @@ -3690,7 +3690,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name, new_xid_list_entry->binlog_id= current_binlog_id; /* Remove any initial entries with no pending XIDs. */ while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) + { my_free(binlog_xid_count_list.get()); + mysql_cond_broadcast(&COND_xid_list); + } binlog_xid_count_list.push_back(new_xid_list_entry); mysql_mutex_unlock(&LOCK_xid_list);

There is no need to mysql_cond_broadcast() multiple times. Use just a single broadcast outside the loop (before or after, doesn't make a difference).

Fixed. Best, Nirbhay

...

- Kristian.

Sergei Golubchik

28 Jun 28 Jun

9:02 p.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi, Nirbhay! On Jun 27, Nirbhay Choubey wrote:

...

...
That seems quite ugly, why not call it from the SST code, after it has called reload_acl_and_cache()? You're basically making FLUSH LOGS behave differently in Galera and non-Galera (if my understanding is correct), which might lead to subtle bugs?

I initially thought of adding the call after reload_acl_and_cache(), but there could still be a case when user performs a REFRESH_BINARY_LOG before LOCK_log is acquired.

Right, but you didn't fix it. You have 1> FTWRL 2> reload_acl_and_cache() 3> wait_for_last_checkpoint_event() 4> SET global innodb_disallow_writes=1 5> mysql_mutex_lock(LOCK_log) You've described your case correctly: "when user performs REFRESH_BINARY_LOG before LOCK_log is acquired". That is, you care when a user performs REFRESH_BINARY_LOG between 3 and 5. You don't care if somebody does REFRESH_BINARY_LOG between 2 and 3. So, you can as well move wait_for_last_checkpoint_event() out of reload_acl_and_cache(). With wait_for_last_checkpoint_event inside reload_acl_and_cache or outside, you still don't have anything that would prevent user from doing REFRESH_BINARY_LOG between 3 and 5. Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Nirbhay Choubey

29 Jun 29 Jun

3:23 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi Serg, On Tue, Jun 28, 2016 at 5:02 PM, Sergei Golubchik <serg@mariadb.org> wrote:

...

Hi, Nirbhay!

On Jun 27, Nirbhay Choubey wrote:

...
...
That seems quite ugly, why not call it from the SST code, after it has called reload_acl_and_cache()? You're basically making FLUSH LOGS behave differently in Galera and non-Galera (if my understanding is correct), which might lead to subtle bugs?

I initially thought of adding the call after reload_acl_and_cache(), but there could still be a case when user performs a REFRESH_BINARY_LOG before LOCK_log is acquired.

Right, but you didn't fix it. You have

1> FTWRL 2> reload_acl_and_cache() 3> wait_for_last_checkpoint_event() 4> SET global innodb_disallow_writes=1 5> mysql_mutex_lock(LOCK_log)

You've described your case correctly: "when user performs REFRESH_BINARY_LOG before LOCK_log is acquired". That is, you care when a user performs REFRESH_BINARY_LOG between 3 and 5. You don't care if somebody does REFRESH_BINARY_LOG between 2 and 3. So, you can as well move wait_for_last_checkpoint_event() out of reload_acl_and_cache().

If wait is moved outside wait_for_last_checkpoint_event() (say 3') and user's REFRESH_BINARY_LOG kicks in right after the wait (3') but before (5), will trigger creation of another new binlog file for which the last checkpoint event (logged asynchronously by a separate thread) may not make it into time and will cause the same issue on joiner node. Another workable option was to move wait outside and after reload_acl_and_cache and not release LOCK_log until the file transfer is complete. 1> FTWRL 2> reload_acl_and_cache() 3> wait for last checkpoint event & lock(LOCK_log) 4> SET global innodb_disallow_writes=1 ... file transfer ... 5> mysql_mutex_unlock(LOCK_log) But with LOCK_log locked in #3, mysql_mutex_assert_not_owner(mysql_bin_log.get_log_lock()) will fail for #4.

...

With wait_for_last_checkpoint_event inside reload_acl_and_cache or outside, you still don't have anything that would prevent user from doing REFRESH_BINARY_LOG between 3 and 5.

It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits until last checkpoint event makes into the new binary log file. Best, Nirbhay

...

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Kristian Nielsen

8:31 a.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...

It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits until last checkpoint event makes into the new binary log file.

The user's query will wait, but it's not clear how that helps your SST (which it seems will be unaffected by the wait in the user's connection thread). - Kristian.

Nirbhay Choubey

1:27 p.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Hi Kristian, On Wed, Jun 29, 2016 at 4:31 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...
It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits until last checkpoint event makes into the new binary log file.

The user's query will wait, but it's not clear how that helps your SST (which it seems will be unaffected by the wait in the user's connection thread).

Lets say we move the wait outside of reload_acl_and_cache() and place it after #3 and user does a REFRESH_BINARY_LOG (#4) after the wait is over but before the main thread acquires LOCK_log (#6). Since there is no wait in reload_acl_and_cache() anymore, user's FLUSH LOGS will create a new binary log file with binlog checkpoint event for the penultimate binlog and return, leaving it onto binlog background thread to take care of logging the checkpoint event for the current (new) binlog file. Now, if background thread kicks in _after_ the file transfer (as shown in #9 below), the same problem occurs - the joiner complains of the missing binlog file. 1> FTWRL 2> reload_acl_and_cache() 3> wait_for_last_checkpoint_event()  4> FLUSH LOGS (reload_acl_and_cache()) 5> SET global innodb_disallow_writes=1 6> mysql_mutex_lock(LOCK_log) 7> file transfer 8> mysql_mutex_unlock(LOCK_log)  9> MYSQL_BIN_LOG::write_binlog_checkpoint_event_already_locked() Best, Nirbhay

Kristian Nielsen

1:31 p.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...

Since there is no wait in reload_acl_and_cache() anymore, user's FLUSH LOGS will create a new binary log file with binlog checkpoint event for the penultimate binlog and return, leaving it onto binlog background thread to take care of logging the checkpoint event for the current (new) binlog file.

Now, if background thread kicks in _after_ the file transfer (as shown in #9 below), the same problem occurs - the joiner complains of the missing binlog file.

Sure, I understand, what I fail to understand is how putting wait_for_last_checkpoint_event() into the user's connection thread helps avoid this. The user thread waits for the checkpoint event of the new binlog file, however the SST thread already did its wait for its own reload_acl_and_cache(), it will not wait again... ? - Kristian.

Nirbhay Choubey

1:47 p.m.

New subject: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

On Wed, Jun 29, 2016 at 9:31 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:

...

Nirbhay Choubey <nirbhay@mariadb.com> writes:

...
Since there is no wait in reload_acl_and_cache() anymore, user's FLUSH LOGS will create a new binary log file with binlog checkpoint event for the penultimate binlog and return, leaving it onto binlog background thread to take care of logging the checkpoint event for the current (new) binlog file.

Now, if background thread kicks in _after_ the file transfer (as shown in #9 below), the same problem occurs - the joiner complains of the missing binlog file.

Sure, I understand, what I fail to understand is how putting wait_for_last_checkpoint_event() into the user's connection thread helps avoid this. The user thread waits for the checkpoint event of the new binlog file, however the SST thread already did its wait for its own reload_acl_and_cache(), it will not wait again... ?

Ah.. I get it now, adding the wait in reload_acl_and_cache() is futile. So, perhaps only option left is place this wait in sst_flush_tables() after reload_acl_and_cache(). - Nirbhay

...

- Kristian.

3301

Age (days ago)

3359

Last active (days ago)

List overview

11 comments

3 participants

participants (3)

Kristian Nielsen
Nirbhay Choubey
Sergei Golubchik

Re: [Maria-developers] MDEV-9423: FTWRL and Binlog checkpoint

tags

participants (3)