Re: [Maria-developers] Replicating same server_id problem

1 Oct 2013


      On Mon, Sep 30, 2013 at 11:47 PM, Michael Widenius <monty@askmonty.org> wrote:
...
Pavel> Kristian,
Pavel> Currently MariaDB (as well as MySQL of all previous versions) has a
Pavel> very big problem related to replicating same server_id. There is
Pavel> --replicate-same-server-id flag which as I understand (when set to 0)
Pavel> controls two things:
Pavel> 1) It doesn't allow slave to connect to a master with the same server_id.
Pavel> 2) Slave ignores all binlog events in the replication stream that have
Pavel> the same server_id as slave.
Pavel> And this flag cannot be set to 1 when --log-slave-updates is used. And
Pavel> that is a big problem.
Pavel> Consider the following scenario: let's say we have two servers S1
Pavel> (master) and S2 (slave). Let's say at some moment in time they are
Pavel> completely in sync and you bring down S2 to take cold backup (you can
Pavel> even include binlogs in it). Then you bring it back up, S1 is still
Pavel> master. Now you execute some transactions, then you do a failover,
Pavel> make S2 master and execute some more transactions.
The above is all ok.
Pavel> Then you bring down
Pavel> S1, restore it from the backup taken earlier and connect to replicate
Pavel> from S2 again.
The above is not ok and has never been supported before in MySQL/MariaDB.
What one should do is to use S2 to setup a new S1 or change server id
on S1.
Unfortunately both advices are unacceptable in highly available
production environments.
- Using S2 to setup a new S1 means we have to bring down database
completely for a prolonged period of time which doesn't line up with
high availability at all.
- Changing server_id for S1 means we have to remember all server ids
that ever were a master for the database. When any master failover and
server restart is a manual process this could be feasible, but in
automated environments this is virtually impossible.
...
The reason is that you can't logically get the above to work safe with
server id's in all scenario's.
An example:
Assume you have a ring-replication or setup between S1 and S2.
I believe the circular replication is ill-advised and it's impossible
to build any sane production system based on it (and I would be glad
to hear about any examples to the contrary). So I would love to see
some flag that disables any possibility of circular replication along
with removing any features that exist only to facilitate such
configuration...
...
If you now restore S1 to an older state, you can't know which of the
events S1 you get from S2 have already been applied.
Here is an example:
A) S1 sends one event S1.1 to S2
B) backup
C) S1 sends one event, S1.2 to S2
D) S2 sends events S2.1, S1.1 and S1.2 to S1
If you restore S1 to state B and start replication, data from D) will
be sent to S1, but based on servid it's not possible to know that S1.1
has to be skipped and S1.2 to be executed.
With GTID we can do things better.
Are you suggesting that currently if slaves always connect to master
using GTID something can be implemented that will allow to re-play
binlog events with the same server id without turning on
--replicate-same-server-id flag?
...
knielsen> Maybe in GTID strict mode we could make it an error if we are about to skip an
knielsen> event with our own server_id that has a higher seq_no than what we have in our
knielsen> binlog. Then we at least get safe behaviour in strict mode in non-ring
knielsen> topologies.
Wouldn't it be safe to just give a warning that we have found already
applied events and then skip them?
Thank you,
Pavel

Re: [Maria-developers] Replicating same server_id problem

Pavel Ivanov