[Maria-discuss] Questions regarding Galera Cluster
Hello. I just saw this YouTube video: https://www.youtube.com/watch?v=hYSGsgIyshs It was really informative and it seems like a simple and easy setup. Now, I have a few questions which is regarding maintaining a Galera Cluster, which I hope to get some answers on. 1. A cluster should have an odd number of nodes to avoid brain split. In case I hit the wall with insufficient disk space and I need to scale up my servers, which steps would be the appropriate? My initial thoughts would be taking down one node at a time and add a larger node, what for it to have replicated from the other nodes. Would the - now even numbered of nodes - handle this? And what will have when I do this to the bootstrapped server (the one the others connects to)? 2. According to the video, a slave replication of the cluster would be a way of securing my data in case of disaster. Would creating dumps regularly not suffice? And secondly, in case of hardware failure on one of the nodes, bringing in a new node (probably asking the same as before) would hopefully just replicate the data when it joins again? Thank you for your time. -- Dan Storm
----- On 12 Jun, 2015, at 6:36 PM, Dan Storm storm@err0r.dk wrote:
Hello.
I just saw this YouTube video: https://www.youtube.com/watch?v=hYSGsgIyshs It was really informative and it seems like a simple and easy setup.
Now, I have a few questions which is regarding maintaining a Galera Cluster, which I hope to get some answers on.
1. A cluster should have an odd number of nodes to avoid brain split.
Split brain is avoided with 3+ nodes. You won't get split brain with even nodes as the quorum is 1/2 + 1 nodes. i.e. It does mean 1/2 the nodes can't form a quorum.
In case I hit the wall with insufficient disk space and I need to scale up my servers, which steps would be the appropriate? My initial thoughts would be taking down one node at a time and add a larger node, what for it to have replicated from the other nodes. Would the - now even numbered of nodes - handle this?
Bit vague. Stick to your even (>2) nodes.
And what will have when I do this to the bootstrapped server (the one the others connects to)?
same as any cluster initialisation
2. According to the video, a slave replication of the cluster would be a way of securing my data in case of disaster. Would creating dumps regularly not suffice? And secondly, in case of hardware failure on one of the nodes, bringing in a new node (probably asking the same as before) would hopefully just replicate the data when it joins again?
Look up the important conceptual difference between backup and DR. Make sure what you have meets your requirements for doing a restore (related to backup) in a workable time covering the business requirements. Backup covers things like dumb user did a DELETE without a where clause, SQL injection and general data corruption that will affect all nodes. The DR recovery mechanism (hardware failure etc... ) covers your calculated business and technical risks. Hint, you many need more that one mechanism. -- -- Daniel Black, Engineer @ Open Query (http://openquery.com.au) Remote expertise & maintenance for MySQL/MariaDB server environments.
Hi, On Sat, Jun 13, 2015 at 7:32 AM, Daniel Black <daniel.black@openquery.com.au> wrote:
1. A cluster should have an odd number of nodes to avoid brain split.
Split brain is avoided with 3+ nodes. You won't get split brain with even nodes as the quorum is 1/2 + 1 nodes. i.e. It does mean 1/2 the nodes can't form a quorum.
Just to add a little more to this, "split brain" is when a node (or group of nodes) thinks to itself "was it me or those guys, who left? There's not enough people to concur with". This normally happens with networking problems between two datacenters or if the database server crashes. So in your situation, two nodes running while the third one is being upgraded is fine. However, during the time the third node is "down", you will get a split brain scenario *if* one of the online nodes die. The larger case of split brain is when your networking layer is more prone to have problems. It's unlikely that the connection between nodes on the same switch will lose connectivity, but if the nodes were across regions, your chances of some transient connection issues goes up a lot. -will -- Will Fong, Support Engineer MariaDB Corporation
Hi, Not to be pedantic, but you are describing the quorum, not a split brain. The quorum prevents split brain. A split brain is when nodes are split, usually due to a network partition, in such a way that both sides of the partition believe that they are the master. Each writes conflicting transactions into the diverged sides, and when the sides converge, if the system is not partition tolerant and self healing through a conflict resolution mechanism, chaos will ensue since both sides now have different pictures of the data. As will suggested, the weighted quorum mechanism prevents this from happening. If you have 5 nodes, at least 3 nodes (a majority) must be able to communicate. If no partition contains 3 nodes, no nodes will be able to accept changes, or even answer queries. --Justin On Sun, Jun 14, 2015 at 9:34 PM, Will Fong <will.fong@mariadb.com> wrote:
Hi,
On Sat, Jun 13, 2015 at 7:32 AM, Daniel Black <daniel.black@openquery.com.au> wrote:
1. A cluster should have an odd number of nodes to avoid brain split.
Split brain is avoided with 3+ nodes. You won't get split brain with even nodes as the quorum is 1/2 + 1 nodes. i.e. It does mean 1/2 the nodes can't form a quorum.
Just to add a little more to this, "split brain" is when a node (or group of nodes) thinks to itself "was it me or those guys, who left? There's not enough people to concur with". This normally happens with networking problems between two datacenters or if the database server crashes.
So in your situation, two nodes running while the third one is being upgraded is fine. However, during the time the third node is "down", you will get a split brain scenario *if* one of the online nodes die.
The larger case of split brain is when your networking layer is more prone to have problems. It's unlikely that the connection between nodes on the same switch will lose connectivity, but if the nodes were across regions, your chances of some transient connection issues goes up a lot.
-will
-- Will Fong, Support Engineer MariaDB Corporation
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (4)
-
Dan Storm
-
Daniel Black
-
Justin Swanhart
-
Will Fong