On Mon, Sep 30, 2013 at 11:47 PM, Michael Widenius <monty@askmonty.org> wrote:
Pavel> Kristian, Pavel> Currently MariaDB (as well as MySQL of all previous versions) has a Pavel> very big problem related to replicating same server_id. There is Pavel> --replicate-same-server-id flag which as I understand (when set to 0) Pavel> controls two things: Pavel> 1) It doesn't allow slave to connect to a master with the same server_id. Pavel> 2) Slave ignores all binlog events in the replication stream that have Pavel> the same server_id as slave. Pavel> And this flag cannot be set to 1 when --log-slave-updates is used. And Pavel> that is a big problem.
Pavel> Consider the following scenario: let's say we have two servers S1 Pavel> (master) and S2 (slave). Let's say at some moment in time they are Pavel> completely in sync and you bring down S2 to take cold backup (you can Pavel> even include binlogs in it). Then you bring it back up, S1 is still Pavel> master. Now you execute some transactions, then you do a failover, Pavel> make S2 master and execute some more transactions.
The above is all ok.
Pavel> Then you bring down Pavel> S1, restore it from the backup taken earlier and connect to replicate Pavel> from S2 again.
The above is not ok and has never been supported before in MySQL/MariaDB.
What one should do is to use S2 to setup a new S1 or change server id on S1.
Unfortunately both advices are unacceptable in highly available production environments. - Using S2 to setup a new S1 means we have to bring down database completely for a prolonged period of time which doesn't line up with high availability at all. - Changing server_id for S1 means we have to remember all server ids that ever were a master for the database. When any master failover and server restart is a manual process this could be feasible, but in automated environments this is virtually impossible.
The reason is that you can't logically get the above to work safe with server id's in all scenario's.
An example:
Assume you have a ring-replication or setup between S1 and S2.
I believe the circular replication is ill-advised and it's impossible to build any sane production system based on it (and I would be glad to hear about any examples to the contrary). So I would love to see some flag that disables any possibility of circular replication along with removing any features that exist only to facilitate such configuration...
If you now restore S1 to an older state, you can't know which of the events S1 you get from S2 have already been applied.
Here is an example:
A) S1 sends one event S1.1 to S2 B) backup C) S1 sends one event, S1.2 to S2 D) S2 sends events S2.1, S1.1 and S1.2 to S1
If you restore S1 to state B and start replication, data from D) will be sent to S1, but based on servid it's not possible to know that S1.1 has to be skipped and S1.2 to be executed.
With GTID we can do things better.
Are you suggesting that currently if slaves always connect to master using GTID something can be implemented that will allow to re-play binlog events with the same server id without turning on --replicate-same-server-id flag?
knielsen> Maybe in GTID strict mode we could make it an error if we are about to skip an knielsen> event with our own server_id that has a higher seq_no than what we have in our knielsen> binlog. Then we at least get safe behaviour in strict mode in non-ring knielsen> topologies.
Wouldn't it be safe to just give a warning that we have found already applied events and then skip them?
Thank you, Pavel