Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
Hi, Andrei! On Mar 22, Andrei Elkin wrote:
Here it goes.
As after master crash and failover to slave, the demoted-to-slave ex-master must be ready to face and accept its own (generated by) events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are refined/relaxed for the semisync slave connected to master in the GTID mode which ensures (under gtid_strict_mode ON) that there can't be any duplicate events for execution on such an ex-master slave. Non-GTID "binlog-truncated" ex-master semisync slave has to be configured with --replicate-same-server-id.
Why would you generally care about same server id in the gtid mode? If gtid is not present locally, then it should be applied, right? Why to look at server id at all? It's a thing of the past, from pre-gtid times.
+--let $query1 = INSERT INTO t VALUES (20) +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */ +--source binlog_truncate_active_log.inc + +--echo # Case B. +# The inverted sequence ends up to truncate only $query2 +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10 +--let $query1 = DELETE FROM t2 WHERE f = 0
why not `= 666 /* no such record */` ? is `= 0` important here?
Yes, there's a fine detail here.
'The inverted sequence ends up' to leave in binlog
$query1 = DELETE FROM t2 WHERE f = 0
which is as ineffective as 666. The patch recognizes the ineffectiveness through a special value of engines involved counter in Gtid event. In principle the truncation could start from it rather than the following $query2. But that would mean more complicated coding. So the semantics of the zero is to be good, as we're so used to :-)!
Sorry, I didn't understand anything. What does the value in the WHERE clause has to do with engines and where the binlog is truncated?
ok. after discussing the issue on slack, the answer appears to be "the value in WHERE has nothing to do with any of that, it doesn't matter whether it's 0 or 666 as long as it doesn't match any rows" then, I'd say, better change 0 to 667 in the test. We wouldn't be having this discussion at all if you'd have some number that appeared arbitrary (and "667" is fine). My confusion was exactly caused by the fact that it looked like =0 matters, that it was important to hav 0 there.
+--let $query2 = INSERT INTO t VALUES (20) +--source binlog_truncate_active_log.inc ... diff --git a/sql/sql_class.h b/sql/sql_class.h index 140394fefc1..87fa88d4c89 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -4632,7 +4632,7 @@ class THD :public Statement, LF_PINS *tdc_hash_pins; LF_PINS *xid_hash_pins; bool fix_xid_hash_pins(); - + bool is_1pc_ro_trans;
I don't like how it looks, it's in THD (for every single connection and more) and it's only used in some exotic cases. It's set in two places around ha_commit_one_phase(), when it would be better to set it inside ha_commit_one_phase(), in one place only.
But I cannot suggest how to get rid of it yet as I don't understand what it's for, I've asked a question about it below, in Gtid_log_event::Gtid_log_event
The flag is to help sorting out ineffective (no engine data change) STATEMENT binlog-format logged transaction. I thought it's practical enough to face and important to handle. Through an example, let the table `t' be initially empty, and let binlog contain
trx1: INSERT INTO t SET a = 1; /* slave acked, trx got committed in engine */ trx2: INSERT INTO t SET a = 2; /* not acked, trx remains prepared in engine */ trx3: DELETE FROM t WHERE a = 0; /* ineffective, does not have Xid event */
prior a crash with their states as commented. The truncate trx should be trx2 as the trx3's status suggests it's harmless for involving into truncation. However we have yet to distinguish it from unsafe (to truncate) event group ("transaction") that also ends with COMMIT query not Xid event.
Hmm. I see. But thd isn't a universal dumpster for passing arguments to the function, don't use it like that.
Well, I had to resort to that. To explain,
Please, pass this "read-only transaction with no xid" flag to the Gtid_log_event constructor as an argument. Or recalculate it inside the constructor, it's not such an expensive thing to do.
To pass a flag ensues adding an extra arg to struct handlerton::commit which I don't feel easy about considering the reason.
yes, one more reason to remove binlog_hton. But for now, like with binlog_commit_by_xid, I'd say, call binlog_commit directly and make binlog_hton->commit a no-op. Like: --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1935,6 +1935,7 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, if (ha_info) { + binlog_commit(ht, thd, all); for (; ha_info; ha_info= ha_info_next) { int err;
+ if (!truncate_validated) + { + DBUG_ASSERT(round <= 2); + /* + Either truncate was not set or was reset, else + it gets incremented, otherwise it may only set to an earlier + offset only at the turn out of the 1st round.
sorry, I cannot parse that :(
The comments are for the assert in a branch that acts when the truncate position (gtid) is not yet finalized. Let me annotate under each assert's line.
+ */ + DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
'truncate was not set or was reset' ^, that is truncate_gtid. The 'was reset' is actually not present in the assert - that'd be reflected with truncate_reset_done == true
+ last_gtid_coord >= binlog_truncate_coord ||
'it gets incremented' ^
why does that mean "it gets incremented" ?
I think my English failed me.
It is getting incremented
I understand that "it gets incremented" means "it = it + 1", no problem. How this assert verifies that "it gets incremented" ?
meant to say (in my last reply as well). The use case is that at this point there's a truncate candidate /* 1 */ and it has not been settled yet /* 2 */,
!truncate_validated == true /* 2 */ && truncate_gtid.seq_no > 0 /* 1 */
and the current gtid is still "better" to take over as the candidate in this block. The candidate's offset must not decrease (unless...
+ (round == 2 && truncate_set_in_1st));
And if it's neither of the above there must've been truncate candidate set in the 1st round.
... ... the above ).
switch (typ) { + case FORMAT_DESCRIPTION_EVENT: + if (round > 1) + { + if (fdle != fdle_arg) + delete fdle; + fdle= (Format_description_log_event*) ev;
why is it needed? all binlog files should have the same Format_description_log_event anyway.
Actually contrary to what was stated (the patch removed that claim) in the parent revision of this commit, the FD:s of the recovery binlog files may have different checksums. I.e
set @@global.binlog_checksum = value
can be issued to rotate binlog to a new hot file having different checksum value while the binlog *CheckPoint* file is behind. So this is possible to have at recovery:
B_1(CP, checksum = NONE), B_2(hot, checksum = CRC32)
which also manifests a normal use case bug I have not reported any bug (though wanted to).
okay, checksum can differ. But how does it affect Format_description_log_event ?
FD holds knowledge of whether events in the file are with the checksum info. In my example above B_1 events would be (are) considered as checksummed by the base.
Okay. Please, 1. add a comment about it 2. add a test case for it ideally, also, move it into a separate commit Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (1)
-
Sergei Golubchik