MariaDb Master/Replica. How to recover replication when reverting Master to an earlier backup?
Hello We have two Mariadb 10.11.10 servers acting as Master and Replica. Both have unique local databases, plus seven that are replicated from Master to Replica with Replicate_Wild_Do_Table: DB1Name.%, DB2Name.% etc The size of the replicated databases is just over 1Tb, with the single largest being almost 500Gb. Both machines are in vmware and on the same network. Yesterday, we hit an issue with the Master which required that vm to be restored to a backup four hours previously. This got Master back in play, but obviously has broken Replication. Traditional wisdom seems to suggest that I need to recreate this replication setup from scratch - ie, stop Master (from changing, ie, close firewall and block clients, flush logs), note the Log Position, and then mysqldump each database. However, due to the size of these databases, that is going to take many hours and we can't accept that downtime for Master. It's occurred to us that we might speed this up by: Stop Master from changing. Note Log Position. Clone Master's vm to Master-Clone. (< 10 minutes) Restart Master. Then we would be at relative leisure to [On Clone] Mysqldump the databases onto a temporary drive. [On Replica] DROP the seven databases. Import the dumped databases from the temporary drive. Update the log position in config and restart the slave user. Then Replica should start syncing from Master again, even if Clone was several days old? Does that sound sensible? These databases have partitions - is that going to cause issues dumping and reimporting them or should I use another method? Any pitfalls? Any alternative ways? Thank you
Can you roll back the slave to a backup before the point you rolled the master back to? If so, assuming you are running with safe settings (sync_binlog=1, sync_master_info=1,innnodb_flush_log_at_trx_commit=1), it should come up and catch up from that earlier point, provided your binlog retention goes that far back. On Fri, 10 Jan 2025, 09:56 Simon Avery via discuss, < discuss@lists.mariadb.org> wrote:
Hello
We have two Mariadb 10.11.10 servers acting as Master and Replica.
Both have unique local databases, plus seven that are replicated from Master to Replica with Replicate_Wild_Do_Table: DB1Name.%, DB2Name.% etc
The size of the replicated databases is just over 1Tb, with the single largest being almost 500Gb. Both machines are in vmware and on the same network.
Yesterday, we hit an issue with the Master which required that vm to be restored to a backup four hours previously. This got Master back in play, but obviously has broken Replication.
Traditional wisdom seems to suggest that I need to recreate this replication setup from scratch - ie, stop Master (from changing, ie, close firewall and block clients, flush logs), note the Log Position, and then mysqldump each database. However, due to the size of these databases, that is going to take many hours and we can't accept that downtime for Master.
It's occurred to us that we might speed this up by:
Stop Master from changing.
Note Log Position.
Clone Master's vm to Master-Clone. (< 10 minutes)
Restart Master.
Then we would be at relative leisure to
[On Clone] Mysqldump the databases onto a temporary drive.
[On Replica] DROP the seven databases.
Import the dumped databases from the temporary drive.
Update the log position in config and restart the slave user.
Then Replica should start syncing from Master again, even if Clone was several days old?
Does that sound sensible?
These databases have partitions – is that going to cause issues dumping and reimporting them or should I use another method?
Any pitfalls?
Any alternative ways?
Thank you _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
Hi Gordan, …Possibly. That’s worth exploring, even if it means restoring to a new vm and then transferring the local DBs across to it if they’ve changed since then, thank you. Simon From: Gordan Bobic <gordan.bobic@gmail.com> Sent: 10 January 2025 08:24 To: Simon Avery <Simon.Avery@atass-sports.co.uk> Cc: sruli67--- via discuss <discuss@lists.mariadb.org> Subject: Re: [MariaDB discuss] MariaDb Master/Replica. How to recover replication when reverting Master to an earlier backup? You don't often get email from gordan.bobic@gmail.com<mailto:gordan.bobic@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Can you roll back the slave to a backup before the point you rolled the master back to? If so, assuming you are running with safe settings (sync_binlog=1, sync_master_info=1,innnodb_flush_log_at_trx_commit=1), it should come up and catch up from that earlier point, provided your binlog retention goes that far back. On Fri, 10 Jan 2025, 09:56 Simon Avery via discuss, <discuss@lists.mariadb.org<mailto:discuss@lists.mariadb.org>> wrote: Hello We have two Mariadb 10.11.10 servers acting as Master and Replica. Both have unique local databases, plus seven that are replicated from Master to Replica with Replicate_Wild_Do_Table: DB1Name.%, DB2Name.% etc The size of the replicated databases is just over 1Tb, with the single largest being almost 500Gb. Both machines are in vmware and on the same network. Yesterday, we hit an issue with the Master which required that vm to be restored to a backup four hours previously. This got Master back in play, but obviously has broken Replication. Traditional wisdom seems to suggest that I need to recreate this replication setup from scratch - ie, stop Master (from changing, ie, close firewall and block clients, flush logs), note the Log Position, and then mysqldump each database. However, due to the size of these databases, that is going to take many hours and we can't accept that downtime for Master. It's occurred to us that we might speed this up by: Stop Master from changing. Note Log Position. Clone Master's vm to Master-Clone. (< 10 minutes) Restart Master. Then we would be at relative leisure to [On Clone] Mysqldump the databases onto a temporary drive. [On Replica] DROP the seven databases. Import the dumped databases from the temporary drive. Update the log position in config and restart the slave user. Then Replica should start syncing from Master again, even if Clone was several days old? Does that sound sensible? These databases have partitions – is that going to cause issues dumping and reimporting them or should I use another method? Any pitfalls? Any alternative ways? Thank you _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org<mailto:discuss@lists.mariadb.org> To unsubscribe send an email to discuss-leave@lists.mariadb.org<mailto:discuss-leave@lists.mariadb.org>
Hello Again, Just wanted to say thank you again for this. I restored a copy of the replica from the day before, but after some expected HA_ERR_FOUND_DUPP_KEY errors, it started throwing a bunch of HA_ERR_KEY_NOT_FOUND and that didn’t work. Then I restored another copy from the same minute as the split, and after a few hundred HA_ERR_FOUND_DUPP_KEY skips, it started catching up. An hour later, it was 0 seconds behind and has stayed in sync. We may have lost some data from the local non-replicated DBs, but nothing we can’t recover from the original server if we need to. Certainly a lot less pain than recreating the sync. Again, thanks. It’s been a pretty crappy week because of this so it’s nice to end it on a high. Cheers S From: Simon Avery via discuss <discuss@lists.mariadb.org> Sent: 10 January 2025 09:31 To: Gordan Bobic <gordan.bobic@gmail.com> Cc: sruli67--- via discuss <discuss@lists.mariadb.org> Subject: [MariaDB discuss] Re: MariaDb Master/Replica. How to recover replication when reverting Master to an earlier backup? Hi Gordan, …Possibly. That’s worth exploring, even if it means restoring to a new vm and then transferring the local DBs across to it if they’ve changed since then, thank you. Simon From: Gordan Bobic <gordan.bobic@gmail.com<mailto:gordan.bobic@gmail.com>> Sent: 10 January 2025 08:24 To: Simon Avery <Simon.Avery@atass-sports.co.uk<mailto:Simon.Avery@atass-sports.co.uk>> Cc: sruli67--- via discuss <discuss@lists.mariadb.org<mailto:discuss@lists.mariadb.org>> Subject: Re: [MariaDB discuss] MariaDb Master/Replica. How to recover replication when reverting Master to an earlier backup? You don't often get email from gordan.bobic@gmail.com<mailto:gordan.bobic@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Can you roll back the slave to a backup before the point you rolled the master back to? If so, assuming you are running with safe settings (sync_binlog=1, sync_master_info=1,innnodb_flush_log_at_trx_commit=1), it should come up and catch up from that earlier point, provided your binlog retention goes that far back. On Fri, 10 Jan 2025, 09:56 Simon Avery via discuss, <discuss@lists.mariadb.org<mailto:discuss@lists.mariadb.org>> wrote: Hello We have two Mariadb 10.11.10 servers acting as Master and Replica. Both have unique local databases, plus seven that are replicated from Master to Replica with Replicate_Wild_Do_Table: DB1Name.%, DB2Name.% etc The size of the replicated databases is just over 1Tb, with the single largest being almost 500Gb. Both machines are in vmware and on the same network. Yesterday, we hit an issue with the Master which required that vm to be restored to a backup four hours previously. This got Master back in play, but obviously has broken Replication. Traditional wisdom seems to suggest that I need to recreate this replication setup from scratch - ie, stop Master (from changing, ie, close firewall and block clients, flush logs), note the Log Position, and then mysqldump each database. However, due to the size of these databases, that is going to take many hours and we can't accept that downtime for Master. It's occurred to us that we might speed this up by: Stop Master from changing. Note Log Position. Clone Master's vm to Master-Clone. (< 10 minutes) Restart Master. Then we would be at relative leisure to [On Clone] Mysqldump the databases onto a temporary drive. [On Replica] DROP the seven databases. Import the dumped databases from the temporary drive. Update the log position in config and restart the slave user. Then Replica should start syncing from Master again, even if Clone was several days old? Does that sound sensible? These databases have partitions – is that going to cause issues dumping and reimporting them or should I use another method? Any pitfalls? Any alternative ways? Thank you _______________________________________________ discuss mailing list -- discuss@lists.mariadb.org<mailto:discuss@lists.mariadb.org> To unsubscribe send an email to discuss-leave@lists.mariadb.org<mailto:discuss-leave@lists.mariadb.org>
On Fri, Jan 10, 2025 at 2:22 PM Simon Avery <Simon.Avery@atass-sports.co.uk> wrote:
Hello Again,
Just wanted to say thank you again for this.
I restored a copy of the replica from the day before, but after some expected HA_ERR_FOUND_DUPP_KEY errors, it started throwing a bunch of HA_ERR_KEY_NOT_FOUND and that didn’t work.
Then I restored another copy from the same minute as the split, and after a few hundred HA_ERR_FOUND_DUPP_KEY skips, it started catching up. An hour later, it was 0 seconds behind and has stayed in sync.
Those errors shouldn't have happened. Are you sure you are running the master with synv_binlog=1 and the slave with sync_master_info=1, and both with innodb_flush_log_at_trx_commit=1 ?
Then I restored another copy from the same minute as the split, and after a few hundred HA_ERR_FOUND_DUPP_KEY skips, it started catching up. An hour later, it was 0 seconds behind and has stayed in sync. * Those errors shouldn't have happened. Are you sure you are running the master with synv_binlog=1 and the slave with sync_master_info=1, and both with innodb_flush_log_at_trx_commit=1 ? sync_binlog is indeed set to 0. The others are fine. The description at https://mariadb.com/kb/en/replication-and-binary-log-system-variables/#sync_... mentions battery backed ram, which we do have, and a full UPS system. I'm guessing that's why it was never enabled on our estate. However, Veeam doing a VM image backup is just a disk snapshot, and no different to having the power pulled and it's not so surprising that data can be lost. I will make sure it is turned on soonest. Thanks again, S
On Fri, Jan 10, 2025 at 3:15 PM Simon Avery <Simon.Avery@atass-sports.co.uk> wrote:
Then I restored another copy from the same minute as the split, and after a few hundred HA_ERR_FOUND_DUPP_KEY skips, it started catching up. An hour later, it was 0 seconds behind and has stayed in sync.
Those errors shouldn't have happened. Are you sure you are running the master with synv_binlog=1 and the slave with sync_master_info=1, and both with innodb_flush_log_at_trx_commit=1 ?
sync_binlog is indeed set to 0. The others are fine.
That would indeed explain it. Unfortunately, any setting other than 1 for this means that your binlogs on the master will be inconsistent with the state of the data in the tables in case of a crash (or a snapshot based backup which only gives crash level consistency).
participants (2)
-
Gordan Bobic
-
Simon Avery