[Maria-discuss] Galera cluster with asynchronous slave
Hi! First off let show you what we are trying to achieve, as a picture says more than a thousand words. galera I'm currently working on a high-availability project where we want to use MariaDB 10 and Galera as the backend. Running this system as it is with Haproxy and keepalived works great. We are able to write to the VIP that is being moved between each node and haproxy takes care of redirecting the server with least connections. The problem comes when we want to do a standard master/slave replication from the cluster to an external slave. The slave is set up to connect to the VIP (or I have also been testing with connecting directly to the haproxied ip), and Using_gtid is set to Slave_pos. However, after some time, once the connection changes to a different node through haproxy, the following error occurs: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog' And the Slave_IO_State shows that it's no longer in sync. I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused. On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422 From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id? Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node? Hope I can get some good feedback on this. Just as a side-note, I'm fairly new to mariadb and galera cluster, so be gentle :) With best regards Johnny Antonsen
Ahoi Johnny, On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused.
Using Galera there is no different Data on the nodes.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422
Why are you using different domain-ids?
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id?
The secound value is the server-id.
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
I don't know what you are doing. All I can say Im doing also MariaDB GTID slaves and it works. Even Im not sure if domain-id matters - I haven't set them at all - be sure to have log_slave_updates and bin_log enabled. Regards Erkan -- über den grenzen muß die freiheit wohl wolkenlos sein
On 04. juli 2014 10:44, erkan yanar wrote:
Ahoi Johnny, Ahoi there :)
On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused. Using Galera there is no different Data on the nodes.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422 Why are you using different domain-ids?
From what this documentation says, it is recommended to use different domain-ids https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-... Here it says " In such setups, each active master must be configured with its own distinct replication domain ID, gtid_domain_id. The binlog will then in effect consists of multiple independent streams, one per active master. Within one replication domain, binlog order is always the same on every server." And as I'm trying to run a slave from multiple masters, this relates to my current setup doesn't it?
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id? The secound value is the server-id.
Ok, so that means that each value on the various servers in a galera clusters will be unique, like node 1 will have gtid 1-1-xxx and node 2 will have 1-2-xxx and so on? According to what you mention further up about domain id's being unique.
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
I don't know what you are doing. All I can say Im doing also MariaDB GTID slaves and it works. Even Im not sure if domain-id matters - I haven't set them at all - be sure to have log_slave_updates and bin_log enabled.
What I'm trying to do is actually pretty simple when you think about it. I have three servers running mariadb and being in a galera cluster. Each server has haproxy and keepalived running to move a virtual ip over and haproxy for checking if the actual service is up and running. On another site I have a mariadb server running with master set to the virtual ip assigned by keepalived. All this server has to do is replicate data from the mysql server it reaches once it connects. This works fine when it reaches the first server, but once it jumps to the next server I get a message saying that the GTID is not in the current binlog. The using_gtid value is set to slave_pos. log_slave_updates is enabled on all three servers running galera, and so is binlog using ROW. Hope this explains a little more on what I'm trying to achieve.
Regards Erkan
Regards Johnny
On Fri, Jul 04, 2014 at 02:56:56PM +0200, Johnny Antonsen wrote:
On 04. juli 2014 10:44, erkan yanar wrote:
Ahoi Johnny, Ahoi there :)
On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused. Using Galera there is no different Data on the nodes.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422 Why are you using different domain-ids?
From what this documentation says, it is recommended to use different domain-ids https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-...
Here it says " In such setups, each active master must be configured with its own distinct replication domain ID, gtid_domain_id. The binlog will then in effect consists of multiple independent streams, one per active master. Within one replication domain, binlog order is always the same on every server."
Galera orders your commits. You don't want to have your transactions ordered per domain-id. You want them to be ordered on all nodes.
And as I'm trying to run a slave from multiple masters, this relates to my current setup doesn't it?
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id? The secound value is the server-id.
Ok, so that means that each value on the various servers in a galera clusters will be unique, like node 1 will have gtid 1-1-xxx and node 2 will have 1-2-xxx and so on? According to what you mention further up about domain id's being unique.
The important point is the third part. The monotonically increasing sequence number.
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
I don't know what you are doing. All I can say Im doing also MariaDB GTID slaves and it works. Even Im not sure if domain-id matters - I haven't set them at all - be sure to have log_slave_updates and bin_log enabled.
What I'm trying to do is actually pretty simple when you think about it. I have three servers running mariadb and being in a galera cluster. Each server has haproxy and keepalived running to move a virtual ip over and haproxy for checking if the actual service is up and running. On another site I have a mariadb server running with master set to the virtual ip assigned by keepalived. All this server has to do is replicate data from the mysql server it reaches once it connects.
This works fine when it reaches the first server, but once it jumps to the next server I get a message saying that the GTID is not in the current binlog. The using_gtid value is set to slave_pos.
So have you checked if the events are in the binlog?
log_slave_updates is enabled on all three servers running galera, and so is binlog using ROW.
Hope this explains a little more on what I'm trying to achieve.
Thats what I do myself. Right now without a VIP, just doing a change ḿaster. No problem at all. Regards Erkan -- über den grenzen muß die freiheit wohl wolkenlos sein
On 04. juli 2014 17:24, erkan yanar wrote:
On 04. juli 2014 10:44, erkan yanar wrote:
Ahoi Johnny, Ahoi there :) On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused. Using Galera there is no different Data on the nodes.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422 Why are you using different domain-ids? From what this documentation says, it is recommended to use different domain-ids https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-...
Here it says " In such setups, each active master must be configured with its own distinct replication domain ID, gtid_domain_id. The binlog will then in effect consists of multiple independent streams, one per active master. Within one replication domain, binlog order is always the same on every server." Galera orders your commits. You don't want to have your transactions ordered
On Fri, Jul 04, 2014 at 02:56:56PM +0200, Johnny Antonsen wrote: per domain-id. You want them to be ordered on all nodes. So just to be clear server1 - server-id 1 and gtid_domain_id 1 server2 - server-id 2 and gtid_domain_id 1
Am I on the right track?
And as I'm trying to run a slave from multiple masters, this relates to my current setup doesn't it?
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id? The secound value is the server-id. Ok, so that means that each value on the various servers in a galera clusters will be unique, like node 1 will have gtid 1-1-xxx and node 2 will have 1-2-xxx and so on? According to what you mention further up about domain id's being unique. The important point is the third part. The monotonically increasing sequence number.
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
I don't know what you are doing. All I can say Im doing also MariaDB GTID slaves and it works. Even Im not sure if domain-id matters - I haven't set them at all - be sure to have log_slave_updates and bin_log enabled. What I'm trying to do is actually pretty simple when you think about it. I have three servers running mariadb and being in a galera cluster. Each server has haproxy and keepalived running to move a virtual ip over and haproxy for checking if the actual service is up and running. On another site I have a mariadb server running with master set to the virtual ip assigned by keepalived. All this server has to do is replicate data from the mysql server it reaches once it connects.
This works fine when it reaches the first server, but once it jumps to the next server I get a message saying that the GTID is not in the current binlog. The using_gtid value is set to slave_pos. So have you checked if the events are in the binlog?
Yes, I did check the binlog for further details on the events, and from what I can see the events show up on each galera server. On the async slave however, the replication seems to catch up and sync with server1 once the slave has been stopped, reset and started, but when it jumps to Master_Server_Id: 2 it fails out with the following message: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' And then it stops running until I reset it. I have found some results online on the error, but they all either refer to mysql and they do not use gtid. Which means they simply redefine the binlog file and position manually before starting the slave. This however defeats the purpose of using GTID from what I've understood.
log_slave_updates is enabled on all three servers running galera, and so is binlog using ROW.
Hope this explains a little more on what I'm trying to achieve. Thats what I do myself. Right now without a VIP, just doing a change ḿaster. No problem at all.
So how do you automate the change master process? I'm guessing going through the VIP for replicating doesn't seem to work for me, so a little hint on how to do this process with change master would be great help towards solving my setup.
Regards Erkan
On Fri, Jul 4, 2014 at 5:56 AM, Johnny Antonsen <johnny@uniweb.no> wrote:
On 04. juli 2014 10:44, erkan yanar wrote:
Ahoi Johnny,
Ahoi there :)
On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused.
Using Galera there is no different Data on the nodes.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422
Why are you using different domain-ids?
From what this documentation says, it is recommended to use different domain-ids https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-...
Here it says " In such setups, each active master must be configured with its own distinct replication domain ID, gtid_domain_id. The binlog will then in effect consists of multiple independent streams, one per active master. Within one replication domain, binlog order is always the same on every server."
And as I'm trying to run a slave from multiple masters, this relates to my current setup doesn't it?
No. Multi-master setup and Galera cluster are completely different things. You should think about all servers in your Galera cluster as if it's a single server. And they all should have the same domain_id.
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id?
The secound value is the server-id.
Ok, so that means that each value on the various servers in a galera clusters will be unique, like node 1 will have gtid 1-1-xxx and node 2 will have 1-2-xxx and so on? According to what you mention further up about domain id's being unique.
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
I don't know what you are doing. All I can say Im doing also MariaDB GTID slaves and it works. Even Im not sure if domain-id matters - I haven't set them at all - be sure to have log_slave_updates and bin_log enabled.
What I'm trying to do is actually pretty simple when you think about it. I have three servers running mariadb and being in a galera cluster. Each server has haproxy and keepalived running to move a virtual ip over and haproxy for checking if the actual service is up and running. On another site I have a mariadb server running with master set to the virtual ip assigned by keepalived. All this server has to do is replicate data from the mysql server it reaches once it connects.
This works fine when it reaches the first server, but once it jumps to the next server I get a message saying that the GTID is not in the current binlog. The using_gtid value is set to slave_pos.
log_slave_updates is enabled on all three servers running galera, and so is binlog using ROW.
Hope this explains a little more on what I'm trying to achieve.
Regards Erkan
Regards Johnny
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
参加者 (3)
-
erkan yanar
-
Johnny Antonsen
-
Pavel Ivanov