Pavel Ivanov <pivanof@google.com> writes:
How does it detect if binlogs were closed properly?
This is the standard binlog crash recovery going back to 5.1 (or possibly 5.0): When a new binlog file is created, the first Format_description event has a flag LOG_EVENT_BINLOG_IN_USE_F set. At normal shutdown, everything is written and fsync()ed to disk, then finally the flag is cleared. At startup, if the flag is still set in the most recent binlog, it means we did not shut down cleanly.
I mean if at the start MariaDB sees that there's no binlogs at all but there is <log-bin>.state file will it read it?
If there are no binlog files at all, it will not read <log-bin>.state. But then, I do not see how it is possible that <log-bin>.state exists but no binlog files exist? This would seem to indicate severe corruption of the data directory?
And do I understand correctly that gtid_pos will be combined from the last records in <log-bin>.state for each domain?
Yes. If we want to point an existing slave to a new master, we need to start at the position of the last replicated GTID, which is in mysql.rpl_slave_state. But if we have a master server that we want to make a slave, and point it to a new master (eg. an old slave which is now promoted as master), we need to start replication at the last transaction originated on the master, which will be the last transaction logged in the binlog. Originally I wanted that the exact same CHANGE MASTER TO master_use_gtid=1 command would work for both cases. So the code tries to guess whether the server was last a slave or a master, by looking which of mysql.rpl_slave_state and the binlog has the newest transaction. And then use either the slave state or the binlog state as the start position. So currently we have this magic where the value of @@GLOBAL.gtid_pos is sometimes taken from what is in mysql.rpl_slave_state, and sometimes from what is in the binlog. However, I am having second thoughts about this idea, it is too magical. I want to change it so that @@GLOBAL.gtid_pos is just the last replicated transaction, as stored in mysql.rpl_slave_state. And there will be separate CHANGE MASTER options for the two separate cases (pointing an old slave to a new master, or making an old master into a slave). The current stuff is just too magical. I was planning to do this after I've finished START SLAVE UNTIL master_gtid_pos=xxx. But this is basically a user interface issue only, it does not affect how mysql.rpl_slave_state or <log-bin>.state is handled.
In other words is just having <log-bin>.state enough to restore gtid_pos after graceful shutdown?
There are two separate issues here. One is to load the binlog state at startup, you should think of this _only_ as conceptually scanning the binlog files and remembering last GTID logged per domain and server_id. We optimise this to avoid the binlog scan by caching the binlog state in <log-bin>.state, but it would be a bug if this optimisation yielded different results than a re-scan of the binlog files would give. The second issue is how the value of @@GLOBAL.gtid_pos is derived. This is a function of what is in mysql.rpl_slave_state and what is in the binlog. The <log-bin>.state should not affect this in any way (except make server startup faster). I have a hunch that what you are really asking is how to take a backup of the master server and restore this on a new server to provision a new slave? My intention for this was to use the existing facilities for such backup/restore, like mysqldump and XtraDB. If you take a mysqldump or XtraDB backup of the master, there are facilities to get the current binlog position (filename/offset) in a non-blocking way. You can then convert that into a GTID position with BINLOG_GTID_POS(filename, offset). Finally, on the new slave, you SET GLOBAL gtid_pos to the correct value. This is how I planned slave provisioning to work. You should not think of somehow copying around <log-bin>.state on its own without the binlog files, this will not work. Of course, a full consistent filesystem copy of everything (datadir and binlogs), like tar file of a stopped server or LVM snapshot or whatever, is ok. - Kristian.