Re: [Maria-discuss] Can't find log after Galera's SST

14 Apr 2016

      Sorry for the noise.

XtraBackup does flush binary logs and does copy the last binary log
into the backup when gtids are detected (i.e. always on mariadb).
That's what's rotating your logs and it does only copy the last log
(post-rotate).   I don't think that's configurable at all.

I was able to reproduce this in my environment with identical results
to yours.   MariaDB 10.1.13 / Xtrabackup 2.3.4 on Debian 8.  I just
started a busy loop inserting a row constantly into a table and forced
SST:

[Note] Recovering after a crash using /var/log/mysql/mysql-bin
[ERROR] Binlog file '/var/log/mysql/mysql-bin.000149' not found in
binlog index, needed for recovery. Aborting.
[ERROR] Crash recovery failed. Either correct the problem (if it's,
for example, out of memory error and restart, or delete (or rename)
binary log and start mysqld with
--tc-heuristic-recover={commit|rollback}
[ERROR] Can't init tc log
ERROR] Aborting

Actual binlog changes each time while i'm writing to the cluster.

I did find that commenting out the bits from wsrep_sst_xtrabackup-v2
that copies the binlog back to /var/log/mysql/ avoids this error - I
guess the presence of a binlog here starts the recovery process.  I'm
not sure what the ramifications are of skipping that recovery.

This issue looks like it's already reported here:
https://jira.mariadb.org/browse/MDEV-9423

I will add my comments there and hope we can find a solution.

On Wed, Apr 13, 2016 at 11:50 PM, Igor Mazur <igor.kozlov@gmail.com> wrote:
...
Sorry, I've forgotten to say that I use xtrabackup.
2016-04-14 2:04 GMT+03:00 Andrew Garner <muzazzi@gmail.com>:
...
What SST method are you using? Based on this I expect you are using rsync.
I imagine someone knows more detailed answers here, but from my
understanding:
Internally the bin/wsrep_sst_rsync script indicates it needs "flush
tables" to the server and the server runs this "sst_flush_tables()"
function that effectively runs:
FLUSH TABLES WITH READ LOCK
+ reload_acl_and_cache(thd, REFRESH_ENGINE_LOG | REFRESH_BINARY_LOG, ...)
Which I think is equivalent to FLUSH ENGINE LOGS / FLUSH BINARY LOGS.
I think around this point it also sets innodb_disallow_writes to stop
InnoDB's background I/O (so rsync can be consistent).
At that point the bin/wsrep_sst_rsync script graps the last
$BINLOG_N_FILES binary logs, which I believe is hardcoded to '1' and
seems to be evaluated after this flush condition.   But internally I
guess InnoDB can have a prepared transaction related to a previous
binlog.
I've wrestled with similar problems writing backup recovery routines
for MariaDB + snapshot backups (typically via LVM snapshots), where I
typically have to go 2 logs back and it seems to eliminate this kind
of failure.  You might be able to bump up BINLOG_N_FILES in
/usr/bin/wsrep_sst_rsync (or wherever it is located) and see if that
resolves your race condition.
Alternatively I think percona-xtrabackup has solved this problem[1] a
while back as well, so a different wsrep-sst-method may be an option.
 Although xtrabackup has its own problems - in particularly I've found
that xtrabackup currently doesn't handle lost+found directories too
well[2], so you either have to avoid having the datadir directly under
the mountpoint or use some other filesystem (such as XFS) that doesn't
create lost+found.  I suppose if you're using encryption, xtrabackup
is also not currently an option. :)
~Andrew
[1] https://bugs.launchpad.net/percona-xtrabackup/+bug/1254227
[2] https://bugs.launchpad.net/percona-xtrabackup/+bug/1272329
On Wed, Apr 13, 2016 at 4:27 PM, Igor Mazur <igor.kozlov@gmail.com> wrote:
...
Hi,
Almost everytime when I join node to Galera cluster (MariaDB 10.1.13)
I'm
getting an error after SST:
2016-04-13 21:09:37 139932281419712 [Note] Recovering after a crash
using
/var/log/mysql/mariadb-bin
2016-04-13 21:09:37 139932281419712 [ERROR] Binlog file
'/var/log/mysql/mariadb-bin.000023' not found in binlog index, needed
for
recovery. Aborting.
2016-04-13 21:09:37 139932281419712 [ERROR] Crash recovery failed.
Either
correct the problem (if it's, for example, out of memory error) and
restart,
or delete (or rename) binary log and start mysqld with
--tc-heuristic-recover={commit|rollback}
2016-04-13 21:09:37 139932281419712 [ERROR] Can't init tc log
2016-04-13 21:09:37 139932281419712 [ERROR] Aborting
As I see, almost every SST rotates a log file and downloads last (in
example
above I have mariadb-bin.000024), and not previous. But for recovering
it
needs previous.
Sometimes I'm lucky and it restores.
Anyone knows why it rotates a log and is it possible to force it to load
more than last log file?
Igor Mazur
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp