Jan Lindström <jplindst@mariadb.org> writes:
Hi,
On 08/13/2013 09:49 AM, Kristian Nielsen wrote:
You can always use the contents of the binlogs to know this. You can search the binlogs for your GTID and determine if it was a) logged in an earlier binlog that was purged, b) found in the binlog, c) a "hole" due to filtering or whatever, or d) not yet existing at all in the binlog (not yet received from the master or completely alternate future).
There is a big assumption here, that you have that binlog file available.
It is not "that binlog file", or any individual binlog file. It is the whole binlog (master-bin.index, master-bin.XXXXXX, master.info). "The binlog" is always available on the master. Even if you RESET MASTER, and make the binlog empty - it is still the binlog. On a slave, the binlog is not available, so the design is carefully made so that the code never needs to access the binlog on the slave for things to work properly. If some part (or all) of the binlog is missing for whatever reason (RESET MASTER, purge, manual delete), and this part would be needed to safely replicate, an error results and replication is halted (at least in strict mode).
If it is c) and d) look very similar if this binlog is the only one available currently.
Sorry, I do not understand that sentence. c) is when slave asks for D-S-N1 which never existed in our binlog, but we do have D-S-N2, with N2 > N1. Since sequence numbers are guaranteed to be monotonic, we can be sure that this transaction is missing and give an error in strict mode. d) is when the last GTID we have for domain D and server id S is D-S-N1, but slave asks for D-S-N2, N2 > N1. We cannot know for sure if this is because slave has an alternate future, or if it is just because slave got D-S-N2 from an upper-level master before we did. Since we cannot know for sure, we give an error in all cases (strict or non-strict) to be safe. If we have some other GTID D-S'-N3 with N3 >= N2, we can guess that we have the "alternate future" case and give a more detailed error as Pavel suggests, but as I explained in the previous mail this is somewhat unreliable.
What exactly is it that the implementation does not allow, or which is hard to implement?
Let's assume we do not have master up and running. But there are several slaves to choose. If every slave have different last GTID as executed, what automatic rule you could come up to choose the correct up the date slave as master ? The fact that some slave has largest GTID does not mean that it has executed all the same GTIDs as the former master (now dead and might have lost its binlog).
You have to understand that there is a difference between what the mysqld server code can do, and what the application layer (or user, or failover scripts or whatever) can do. The GTID design is made so that the mysqld code is not allowed to assume that sequence numbers are monotonic (and it is written so that it does not have to). This means the user is not obliged to obey such assumption; even if she does not, the code still promises to work correctly. But the user is free to decide to restrict to globally monotonic GTID. In fact, such decision is encouraged and the gtid_strict_mode is available to assist such users (by giving an error if the restriction would otherwise be violated). Such disciplined user is then free to use this restriction to make more informed decisions such as which slave to promote and so on. Note btw. that there is a standard solution described in the GTID documentation that works for your particular problem in any case. You can pick any slave at random, let it replicate from every other slave with START SLAVE UNTIL to catch up to each one, and then it will be suitable as the master. This is needed in the general case with more than one replication domain. But I agree that a common case will be a single domain, gtid_strict_mode=1, and just pick the slave with highest sequence number as the new master. This will work fine in strict mode. MariaDB GTID is flexible, without sacrificing any safety. At least that is the ambition.
If all slaves are in different alternate futures, not sure which one to select or maybe we should select superposition of all of those ;-)
Well, either you want alternate futures in your replication hierarchy or you do not. If you do not want it, you can enable strict mode and get an error if you mess up. If you do want it, I think the burden is on you to define what it means to correctly promote a new master. No? - Kristian.