On Fri, Nov 15, 2013 at 1:28 AM, Alex Yurchenko <alexey.yurchenko@codership.com> wrote:
Please pardon this arrogant interruption of your discussion and shameless self-promotion, but I just could not help noticing that Galera replication was designed specifically with these goals in mind. And it does seem to achieve them better than semi-sync plugin. Have you considered Galera? What makes you prefer semi-sync over Galera, if I may ask?
To be honest I never looked at how Galera works before. I've looked at it now and I don't see how it can fit with us. The major disadvantages I immediately see: 1. Synchronous replication. That means client must wait while transaction is applied on all nodes which is unacceptably big latency of each transaction. And what if there's a network blip and some node becomes inaccessible? All writes will just freeze? I see the statement that "failed nodes automatically excluded from the cluster", but to do that cluster must wait for some timeout in case it's indeed a network blip and node will "quickly" reconnect. And every client must wait for cluster to decide what happened with that one node. 2. Let's say node fell out of the cluster for 5 minutes and then reconnected. I guess it will be treated as "new node", it will generate state transfer and the node will start downloading the whole database? And while it's trying to download say 500GB of data files all other nodes (or maybe just donor?) won't be able to change those files locally and thus will blow up its memory consumption. That means they could quickly run out of memory and "new node" won't be able to finish its "initialization"... 3. It looks like there's strong asymmetry in starting cluster nodes -- the first one should be started with empty wsrep_cluster_address and all others should be started with the address of the first node. So I can't start all nodes uniformly and then issue some commands to connect them to each other. That's bad. 4. What's the transition path? How do I upgrade MySQL/MariaDB replicating using usual replication to Galera? It looks like there's no such path and the solution is stop the world using regular replication and restart it using Galera. Sorry I can't do that with our production systems. I believe these problems are severe enough for us, so that we can't work with Galera. Pavel