Pavel Ivanov <pivanof@google.com> writes:
--replicate-same-server-id flag which as I understand (when set to 0) controls two things: 1) It doesn't allow slave to connect to a master with the same server_id. 2) Slave ignores all binlog events in the replication stream that have the same server_id as slave. And this flag cannot be set to 1 when --log-slave-updates is used. And that is a big problem.
Hm, I was not aware of this. It seems wrong. For (1), I don't think slave should ever be allowed to connect to server with the same server_id. And for the reason you mentioned, it seems wrong that --log-slave-updates and --replicate-same-server-id can not be used together. After all, --replicate-same-server-id is only a problem in ring topologies. It does not really seem related to GTID though, the exact same problems would occur when using old-style replication. Of course an easy work-around is to change the server id on the restored server S1, but the problem is if one is not aware of this ahead of time... On the other hand, in GTID strict mode, the problem of creating a loop does not exist. Any attempt to binlog an event that is already in the binlog will cause an error. So it would make sense to allow --replicate-same-server-id together with --log-slave-updates when GTID strict mode is enabled. On the other hand, I would be tempted to just allow the two to be used together freely - users that want to do ring topologies must in any case be very aware of all the possible pitfalls.
What do you think about how this should be fixed? As I understand you explicitly wanted to support replication cycles, so you still want the skipping of transactions with the same server_id to exist. But the situation above is a valid production use case. Maybe in GTID world it can be solved better? E.g. if transaction has the same server_id, but the GTID wasn't applied yet then it shouldn't be skipped?
The main problem I see is what should be the default? I suppose we cannot safely change the default for --replicate-same-server-id. On the other hand, if users explicitly set --replicate-same-server-id=0, then it really does not seem correct that some events with same server id are nevertheless replicated depending on some complicated GTID semantics. So the curse of backwards compatibility seems to hit here... Maybe in GTID strict mode we could make it an error if we are about to skip an event with our own server_id that has a higher seq_no than what we have in our binlog. Then we at least get safe behaviour in strict mode in non-ring topologies. With respect to ring topologies, I frankly find them quite dangerous to rely on, and for now I am mainly concerned with making sure that anything that worked in 5.5 will continue to work in 10.0. - Kristian.