Hi all,
On 8 Sep 2017, at 16:48, andrei.elkin@pp.inet.fi wrote:
Kristian, hello.
Now to the implementation matter,
The procedure to fix it will then be:
1. FLUSH BINARY LOGS, note the new GTID position.
2. Ensure that all slaves are past the problematic point with MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files are no longer needed.
3. PURGE BINARY LOGS to remove the errorneous logs.
4. FLUSH BINARY LOG DELETE DOMAIN d
I think we could optimize the list. How about
1. Take note of @@global.gtid_binlog_state 2. Ensure that all slaves are past the last event of being deleted domain 'd' 3. PURGE BINARY LOGS DELETE DELETE 'd'
The effect of the last step would include purging all the binary log files plus a planned implicit FLUSH LOGS discarding 'd' from the new emerged binlog.
removing old binary logs should _not_ IMO be done as a way of forgetting the past obsolete domains. BINLOGS are important so throwing them away is an issue. I think that somehow the code needs to be aware of the cut-off point and when the “stale domain ids” are removed.) DBAs do not like to remove bin logs “early" as unless you keep a copy somewhere you may lose valuable information, for recovery, for backups etc. Not everyone will be making automatic copies (as MySQL does not provide an automatic way to do this) so in theory you have just one copy. Throwing these away is a really bad idea if it’s part of the solution of forgetting about “some of the past”. Please consider the operational point of view and make MariaDB aware of the past and aware that it can ignore/forget these domain ids. Obviously once all appropriate bin logs have been purged (naturally by other means) then no special processing will be needed. The other comment I see mentioned here was “make sure all slaves are up to date”. That’s going to be hard. The master can only be aware of “connected slaves” and if you have intermediate masters, or a stopped slave then it won’t be aware of these servers. That may be obvious but there’s always the situation that “stopped slaves” or “downstream slaves (of an intermediate master)” are still lagging. Of course catching and checking that is going to be hard to please make the comments explicit if really all you are going to do is to check “connected slaves” as MariaDB is never going to be aware of servers not connected directly to the master. If the required pre-conditions to trigger the “obsolete old domains” is that a DBA needs to be “aware” then make this requirement clear so that people reading the documentation understand what’s needed and what MariaDB expects to see etc. FWIW expiring old domains is good to do. There’s a similar FR for Oracle’s MySQL and while the GTID implementations are completely different the problem space is the same. Coming up with a solution which is simple to use and understand and also avoids where that’s possible making mistakes which may break replication is good. So thanks for looking at this. Just a thought. Simon