Hello, Simon, Kristian. (The mail was meant to be sent out yesterday, but it got stuck in my outgoing box).
Simon Mudd <simon.mudd@booking.com> writes:
ids. Obviously once all appropriate bin logs have been purged (naturally by other means) then no special processing will be needed.
Right. Hence my original idea (which was unfortunately never implemented so far). If at some point a domain has been unused for so long that all GTIDs in that domain are gone, it is relatively safe to pretend that the domain never existed.
I would like to understand if you can think of significant use cases where the DBA needs to have active binlog files in the master containing some domain, while simultaneously pretending that this domain never existed.
Or if it is more of a general concern, and the inconvenience for users to have to save old binlogs somewhere else than the master's data directory and binlog index (SHOW BINARY LOGS).
removing old binary logs should _not_ IMO be done as a way of forgetting the past obsolete domains. BINLOGS are important so throwing them away is an issue. I think that somehow the code needs to be aware of the cut-off point and when the “stale domain ids” are removed.)
Simon, initially I thought of masking out the problematic domain so that the most recent binlog file would not have it in its Gtid_list header. Yet I've given up that idea to have agreed the strict setup on master weighs much more.
I understand the desire to not delete binlog files.
And the mdev-12012 use case might even not require to conduct this purging/flush-delete-domain procedure if IGNORE_DOMAIN_IDS (for the zero id domain in question) would do. There seems to be MDEV-9108 in the way though, but conceptually the DBA may have a way to stay with the binlog files even having a problematic domain.
The problem is: If you want to have GTIDs with some domain in your active binlog files, _and_ you also want to pretend that this domain never existed, what does it mean? What is the semantics? It creates a lot of complexities for defining the semantics, for documenting it, for the users to understand it, and for the code to implement it correctly.
So basically, I do not understand what is the intended meaning of FLUSH BINARY LOGS DELETE DOMAIN d _and_ at the same time keeping GTIDs with domain d around in active binlog files? In what respects is the domain deleted, and in what respects not?
For the master, the binlog files are mainly used to stream to connecting slaves. Deleting a domain means replacing the conceptual binlog history with one in which that domain never existed. So that domain will be ignored in a connecting slaves position, assuming it is served by another multi-source master. If a new GTID in that domain appears later, it will be considered the very first GTID ever in that domain.
So consider what happens if there is anyway GTIDs in that domain deeper in the binlog:
1. An already connected slave may be happily replicating those GTIDs. If that slave reconnects (temporary network error for example), it will instead fail with unknown GTID, or perhaps just start silently ignoring all further GTIDs in that domain. This kind of unpredictable behaviour seems bad.
2. Suppose a slave connects with a position without the deleted domain. The master starts reading the binlog from some point. What happens if a GTID is encountered that contains the deleted domain? The slave will start replicating that domain from some arbitrary point that depends on where it happened to be in other domains at the last disconnect. This also seems undesirable.
There may be other scenarios that I did not think about.
DBAs do not like to remove bin logs “early" as unless you keep a copy somewhere you may lose valuable information, for recovery, for backups etc. Not everyone will be making automatic copies (as MySQL does not provide an automatic way to do this)
Understood. Maybe what is needed is a PURGE BINARY LOGS that removes the entries from the binlog index (SHOW BINARY LOGS), but leaves the files in the file system for the convenience of the sysadmin? (Well, you can just hand-edit binlog.index, but that requires master restart I think).
Like I said above, a filtering solution could be helpful.
The other comment I see mentioned here was “make sure all slaves are up to date”. That’s going to be hard. The master can only be aware of “connected slaves” and if you have intermediate masters, or a
Indeed, the master cannot ensure this. The idea is that the DBA, who decides to delete a domain, must understand that this should not be done if any slave still needs GTIDs from that domain. This is similar to configuring normal binlog purge, where the DBA needs to ensure that binlogs are kept long enough for the needs of the slowest slave.
FWIW expiring old domains is good to do. There’s a similar FR for
completely different the problem space is the same. Coming up with a solution which is simple to use and understand and also avoids where that’s possible making mistakes which may break replication is good. So thanks for looking at this.
Indeed. And the input from people like you with strong operational experience is very valuable to end up with a good solution, hence my request for additional input.
- Kristian.
I hope clarifications given by Kristian back up the new feature idea. On the other hand it would be good to have a way for DBA to cope with the likes of mdev-12012 (there are few reports on this matter) without purging. I rely on a solution to MDEV-9108 in that regard. Cheers, Andrei