Hi!
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
<cut>
What one should do is to use S2 to setup a new S1 or change server id on S1.
Pavel> Unfortunately both advices are unacceptable in highly available Pavel> production environments. Pavel> - Using S2 to setup a new S1 means we have to bring down database Pavel> completely for a prolonged period of time which doesn't line up with Pavel> high availability at all. The way people are doing it now: - Taking a snapshot of the file systems of S2 and use that as a base This works of course only for some file systems and setup. - One has a S3 replicate, either after S1 or S2. Taking this down and use this is backup works for most. Pavel> - Changing server_id for S1 means we have to remember all server ids Pavel> that ever were a master for the database. When any master failover and Pavel> server restart is a manual process this could be feasible, but in Pavel> automated environments this is virtually impossible. You only have to avoid those server_id's that are 'active' (ie, in a binary log file that you will read). If you rotate your binary log file once a week, there should be many easy ways to assign and reuse server_id's. But I agree that this is not a long term solution that works for anyone.
The reason is that you can't logically get the above to work safe with server id's in all scenario's.
An example:
Assume you have a ring-replication or setup between S1 and S2.
Pavel> I believe the circular replication is ill-advised and it's impossible Pavel> to build any sane production system based on it (and I would be glad Pavel> to hear about any examples to the contrary). So I would love to see Pavel> some flag that disables any possibility of circular replication along Pavel> with removing any features that exist only to facilitate such Pavel> configuration... There are a LOT of MySQL and MariaDB users that are using circular replication. As far as I know, Yahoo is using this to replicate coast to coast.
If you now restore S1 to an older state, you can't know which of the events S1 you get from S2 have already been applied.
Here is an example:
A) S1 sends one event S1.1 to S2 B) backup C) S1 sends one event, S1.2 to S2 D) S2 sends events S2.1, S1.1 and S1.2 to S1
If you restore S1 to state B and start replication, data from D) will be sent to S1, but based on servid it's not possible to know that S1.1 has to be skipped and S1.2 to be executed.
With GTID we can do things better.
Pavel> Are you suggesting that currently if slaves always connect to master Pavel> using GTID something can be implemented that will allow to re-play Pavel> binlog events with the same server id without turning on Pavel> --replicate-same-server-id flag? What I was saying is that in MariaDB/MySQL 5.5 this was never working and one could never get this to work safely in your setup. With GTID we know better the state of the master and we should be able to add a bit of code to ignore events that we know we have already executed. Regards, Monty