[Maria-discuss] Replication Problem
Hello, I'd have a question about MariaDB 10.1 replication, which has been interrupted by a power outage in our datacenter. We started the master server and had a duplicate id in a table on the slave, so replication stopped. We did not have time to adjust this at the moment, there were a lot of other systems to start and check if all was alright. Today I had time to look at the error, removed the duplicate ID from the table and started the slave thread again using 'start slave;'. But now I get another error: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-2948175468, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions' Can this be resolved? Or will I have to start replication from scratch? Greetings, Thomas
Hi Thomas, Could you please show the output of SHOW SLAVE STATUS and SHOW VARIABLES LIKE 'gtid%' ? Thanks Guillaume Lefranc signal18.io consulting Le mer. 4 juil. 2018 à 12:31, Thomas Plant <thomas@plant.systems> a écrit :
Hello,
I'd have a question about MariaDB 10.1 replication, which has been interrupted by a power outage in our datacenter. We started the master server and had a duplicate id in a table on the slave, so replication stopped. We did not have time to adjust this at the moment, there were a lot of other systems to start and check if all was alright.
Today I had time to look at the error, removed the duplicate ID from the table and started the slave thread again using 'start slave;'.
But now I get another error:
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-2948175468, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'
Can this be resolved? Or will I have to start replication from scratch?
Greetings, Thomas
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
About replication, I dont let it stop on duplicate keys, I prefer to use slave-skip-errors = all and let the primary key on the tables take care of duplicate records. Replication will only stop if something is corrupted, which may be your case. Some position the the binlog got lost in the power outage and you will have to reposition the master slave binlog, probably loosing some data on one side. []s. On Wed, Jul 4, 2018 at 8:02 AM, Guillaume Lefranc <guillaume@adishatz.net> wrote:
Hi Thomas,
Could you please show the output of SHOW SLAVE STATUS and SHOW VARIABLES LIKE 'gtid%' ?
Thanks
Guillaume Lefranc signal18.io consulting
Le mer. 4 juil. 2018 à 12:31, Thomas Plant <thomas@plant.systems> a écrit :
Hello,
I'd have a question about MariaDB 10.1 replication, which has been interrupted by a power outage in our datacenter. We started the master server and had a duplicate id in a table on the slave, so replication stopped. We did not have time to adjust this at the moment, there were a lot of other systems to start and check if all was alright.
Today I had time to look at the error, removed the duplicate ID from the table and started the slave thread again using 'start slave;'.
But now I get another error:
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-2948175468, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'
Can this be resolved? Or will I have to start replication from scratch?
Greetings, Thomas
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
slave-skip-errors = all = I don't care about my data. You almost certainly do not want to be using this. Really the only use-case for this is skipping specific errors when there is no chance of getting it fixed, i.e. legacy applications. Even then it's not a great idea. Rhys ________________________________ From: Maria-discuss <maria-discuss-bounces+rhys.campbell=swisscom.com@lists.launchpad.net> on behalf of Alessandro Ren <dirty.ren@gmail.com> Sent: 04 July 2018 14:50:29 Cc: maria-discuss@lists.launchpad.net Subject: Re: [Maria-discuss] Replication Problem About replication, I dont let it stop on duplicate keys, I prefer to use slave-skip-errors = all and let the primary key on the tables take care of duplicate records. Replication will only stop if something is corrupted, which may be your case. Some position the the binlog got lost in the power outage and you will have to reposition the master slave binlog, probably loosing some data on one side. []s. On Wed, Jul 4, 2018 at 8:02 AM, Guillaume Lefranc <guillaume@adishatz.net<mailto:guillaume@adishatz.net>> wrote: Hi Thomas, Could you please show the output of SHOW SLAVE STATUS and SHOW VARIABLES LIKE 'gtid%' ? Thanks Guillaume Lefranc signal18.io<http://signal18.io> consulting Le mer. 4 juil. 2018 à 12:31, Thomas Plant <thomas@plant.systems> a écrit : Hello, I'd have a question about MariaDB 10.1 replication, which has been interrupted by a power outage in our datacenter. We started the master server and had a duplicate id in a table on the slave, so replication stopped. We did not have time to adjust this at the moment, there were a lot of other systems to start and check if all was alright. Today I had time to look at the error, removed the duplicate ID from the table and started the slave thread again using 'start slave;'. But now I get another error: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-2948175468, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions' Can this be resolved? Or will I have to start replication from scratch? Greetings, Thomas _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net<mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net<mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Thomas Plant <thomas@plant.systems> writes:
Today I had time to look at the error, removed the duplicate ID from the table and started the slave thread again using 'start slave;'.
But now I get another error:
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-2948175468, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'
So it seems you are using MariaDB Global Transaction ID with MASTER_USE_GTID=current_pos, and you forgot to do the duplicate ID removal under `SET sql_log_bin=0`. The easiest solution is probably to CHANGE MASTER TO MASTER_USE_GTID=slave_pos. This should make the slave ignore the local transaction and just connect to the master using the last replicated position. (current_pos tells the MariaDB server that you expect any local transactions on the slave to also be replicated to other servers, hence the error. current_pos is appropriate for an earlier master that is turned into a slave, but not for a slave where local "fixup" transactions ended up in the binlog). Hope this helps, - Kristian.
participants (5)
-
Alessandro Ren
-
Guillaume Lefranc
-
Kristian Nielsen
-
Rhys.Campbell@swisscom.com
-
Thomas Plant