Kristian, thanks for more remarks!
If you “forget" the domain on the upstream server what happens if there are downstream slaves? I think you’ll break replication if they disconnect from this box and try to reconnect. Their GTID information will no longer match. IMO and if I’ve understood correctly this is broken.
It should not break replication. It is allowed for a slave with GTID position 0-1-100,10-2-200 to connect to a master that has nothing in domain 10, this is normal.
To me in a sense this is "implicit" IGNORE_DOMAIN_IDS on domains that master does not have.
I am not sure what the use-case of replicating DELETE DOMAIN to a slave would be. Domain deletion does not have a point-in-time property like normal transactions, so it does not help to have it replicated inline in the event stream. If it has an effect on a slave, this effect occurs only when the slave is restarted/reconnected.
The use-case must've been the suspected loss of connectivity by slaves.
I really think there’s a need to indicate what domains should be forgotten/ignored
If CHANGE MASTER ... IGNORE_DOMAIN_IDS is fixed to also ignore the extra domains on master upon connect, it is probably a better way to ignore domains in many cases. It is persisted (in the slave's master.info), and it can be set individually for each slave, which is more flexible (what if one slave needs to ignore a domain but another slave needs to replicate it?).
KN> The procedure to fix it will then be:
1. FLUSH BINARY LOGS, note the new GTID position.
2. Ensure that all slaves are past the problematic point with MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files are no longer needed. 3. PURGE BINARY LOGS to remove the errorneous logs.
4. FLUSH BINARY LOG DELETE DOMAIN d
So this was what I suggested at some point related to MDEV-12012. But probably this is not the best suggestion, as I realised later.
1. In MDEV-12012, two independent masters were originally using the same domain id, so their history looks diverged in terms of GTID. This can be fixed by injecting a dummy transaction to make them up-to-date with one another in that domain. Deleting (possibly valuable) part of the history is not needed.
2. Another case, a slave needs to ignore the part of the history on a master connected with some domain. IGNORE_DOMAIN_IDS, once fixed, can do this, again there is no need to delete possibly valuable history on the master.
Right. The feature we've been discussing solely deals with p.3.
3. At some point, a domain that was unused for long may no longer appear anywhere, _except_ in gtid_binlog_state and gtid_slave_pos. This may eventually clutter the output and be an annoyance. The original idea with FLUSH BINARY LOGS DELETE DOMAIN was to allow to fix this annoyance by removing such domains from gtid_binlog_state once they are no longer needed anywhere.
I am not sure my original suggestion of using PURGE LOGS was ever a good idea, or is ever needed.
I think it remains as optional which I wrote in my reply last night. Cheers, Andrei
- Kristian.