developers
Threads by month
- ----- 2025 -----
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 7 participants
- 6851 discussions

Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 17 Mar '21
by Sergei Golubchik 17 Mar '21
17 Mar '21
Hi, Andrei!
Few comments/questions below.
I still have troubles understanding the logic of your solution,
so I did not ask any questions about those your replies - as I'd be just
repeating the same thing over and over. After I'll start getting it
finally, I'll go over your reply again.
Meanwhile - minor comments/questions, not related to the main algorithm.
On Mar 04, Andrei Elkin wrote:
> >> To facilitate the failover on the slave side conditions to accept
> >> own events (having been discarded by the above recovery) are
> >> relaxed to let so for the semisync slave that connects to master in
> >> gtid mode. gtid_strict_mode is further recommended to secure from
> >> inadvertent re-applying out of order gtids in general. Non-gtid
> >> mode connected semisync slave would require
> >> --replicate-same-server-id (mind --log-slave-updates must be OFF
> >> then).
> >
> > Sorry, I failed to understand this paragraph :(
>
> 'gtid_strict_mode is further recommended to secure
> from inadvertent re-applying out of order gtids in general.'
>
> attempted to promote GTID connection with master,
> which of course should not be of this patch's
> business, but in a way it is 'cos the non-gtid's mode
> (CHANGE MASTER TO ... master_use_gtid = NO) is covered by the patch
> with a limitation of '--log-slave-updates must be OFF'.
>
> The GTID-ON connection is covered, and without any `--replicate-same-server-id'.
This doesn't seem to clarify anything.
I mean the whole paragraph, starting from "To facilitate" and to "then)."
Could you try to rephrase it to make it clearer, please?
> >> +--let $query1 = INSERT INTO t VALUES (20)
> >> +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */
> >> +--source binlog_truncate_active_log.inc
> >> +
> >> +--echo # Case B.
> >> +# The inverted sequence ends up to truncate only $query2
> >> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10
> >> +--let $query1 = DELETE FROM t2 WHERE f = 0
> >
> > why not `= 666 /* no such record */` ?
> > is `= 0` important here?
>
> Yes, there's a fine detail here.
>
> 'The inverted sequence ends up' to leave in binlog
>
> $query1 = DELETE FROM t2 WHERE f = 0
>
> which is as ineffective as 666. The patch recognizes the ineffectiveness
> through a special value of engines involved counter in Gtid event.
> In principle the truncation could
> start from it rather than the following $query2.
> But that would mean more complicated coding.
> So the semantics of the zero is to be good, as we're so used to :-)!
Sorry, I didn't understand anything. What does the value in the WHERE
clause has to do with engines and where the binlog is truncated?
> >
> >> +--let $query2 = INSERT INTO t VALUES (20)
> >> +--source binlog_truncate_active_log.inc
...
> >> +--connection master2
> >> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> >> +if ($pre_q2)
> >
> > is $pre_q2 a flag or a query string?
> > it's used as a flag, it's set as a query string.
>
> It's actually both with the idea execute it when defined.
> Legit, right?
legit, but it's not both. Were it both, you'd have
if ($pre_q2) {
eval $pre_q2;
}
but you have,
if ($pre_q2) {
CALL sp_blank_xa;
}
> >> +{
> >> + CALL sp_blank_xa;
> >> +}
...
> >> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> >> new file mode 100644
> >> index 00000000000..fe153e5703c
> >> --- /dev/null
> >> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> >> +#
> >> +# ==== Implementation ====
> >> +#
> >> +# Steps:
> >> +# 0 - Create two tables in innodb and rocksdb engines,
> >> +#
> >> +# In loop for A,B,C cases (below) do 1-5:
> >
> > there's no loop now (which is good, don't add it please :)
[x] ?
> >> +# The 1st trx binlogs, rotate binlog and hold on before committing at engine
> >> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR master1_go";
> >
> > you don't wait for master1_ready anywhere. intentional?
>
> No, actually
>
> SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
>
> must be run by below master2 (sim to how master3 waits for master2).
meaning [x] ?
> >> +--source include/have_binlog_format_row.inc
> >> +--let $rpl_topology=1->2
> >> +--source include/rpl_init.inc
> >
> > isn't this a normal master-slave.inc topology?
>
> Initially yes.
>
> I am guessing you mean why not to
> --source it.
> The test heavily uses 'server_`index`' format to automate
> replication direction.
as far as I remember, master-slave.inc also defines server_1 and
server_2 connections.
> >> diff --git a/sql/sql_class.h b/sql/sql_class.h
> >> index 140394fefc1..87fa88d4c89 100644
> >> --- a/sql/sql_class.h
> >> +++ b/sql/sql_class.h
> >> @@ -4632,7 +4632,7 @@ class THD :public Statement,
> >> LF_PINS *tdc_hash_pins;
> >> LF_PINS *xid_hash_pins;
> >> bool fix_xid_hash_pins();
> >> -
> >> + bool is_1pc_ro_trans;
> >
> > I don't like how it looks, it's in THD (for every single connection
> > and more) and it's only used in some exotic cases. It's set in two
> > places around ha_commit_one_phase(), when it would be better to set
> > it inside ha_commit_one_phase(), in one place only.
> >
> > But I cannot suggest how to get rid of it yet as I don't understand
> > what it's for, I've asked a question about it below, in
> > Gtid_log_event::Gtid_log_event
>
> The flag is to help sorting out ineffective (no engine data change)
> STATEMENT binlog-format logged transaction. I thought it's practical
> enough to face and important to handle.
> Through an example,
> let the table `t' be initially empty, and
> let binlog contain
>
> trx1: INSERT INTO t SET a = 1; /* slave acked, trx got committed in engine */
> trx2: INSERT INTO t SET a = 2; /* not acked, trx remains prepared in engine */
> trx3: DELETE FROM t WHERE a = 0; /* ineffective, does not have Xid event */
>
> prior a crash with their states as commented.
> The truncate trx should be trx2 as the trx3's status
> suggests it's harmless for involving into
> truncation.
> However we have yet to distinguish it from unsafe (to truncate)
> event group ("transaction") that also ends with COMMIT query not Xid event.
Hmm. I see. But thd isn't a universal dumpster for passing
arguments to the function, don't use it like that.
Please, pass this "read-only transaction with no xid" flag to the
Gtid_log_event constructor as an argument. Or recalculate it inside the
constructor, it's not such an expensive thing to do.
> One rather drastic solution could be to not binlog DELETE altogether.
> I did not look into this with much of interest earlier, but how about it?
> Perhaps it'd more like a bug fixes than breaking legacy.
No, I suspect it's too drastic
> >> /* Members related to temporary tables. */
> >> public:
> >> /* Opened table states. */
> >> diff --git a/sql/slave.cc b/sql/slave.cc
> >> index 28e08e32346..9b3f73e5341 100644
> >> --- a/sql/slave.cc
> >> +++ b/sql/slave.cc
> >> @@ -6944,7 +6945,9 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
> >> }
> >> else
> >> if ((s_id == global_system_variables.server_id &&
> >> - !mi->rli.replicate_same_server_id) ||
> >> + (!mi->rli.replicate_same_server_id &&
> >> + !(semisync_recovery= (rpl_semi_sync_slave_enabled &&
> >> + mi->using_gtid != Master_info::USE_GTID_NO)))) ||
> >
> > How can "semisync recovery" code path reach queue_event()?
>
> This is a failover now, after recovery.
> The slave which is ex-master tries to adopt "own" (generated by it) events.
1. don't call it "semisync recovery" then
2. add a comment explaining this case (it's a rare and not obvious one)
3. why does it have to be specially recognized anyway?
> >
> >> event_that_should_be_ignored(buf) ||
> >> /*
> >> the following conjunction deals with IGNORE_SERVER_IDS, if set
> >> diff --git a/sql/log_event.h b/sql/log_event.h
> >> index 8a342cb5cd3..1588ab85104 100644
> >> --- a/sql/log_event.h
> >> +++ b/sql/log_event.h
> >> @@ -482,6 +482,16 @@ class String;
> >> */
> >> #define LOG_EVENT_IGNORABLE_F 0x80
> >>
> >> +/**
> >> + @def LOG_EVENT_ACCEPT_OWN_F
> >> +
> >> + Flag sets by the gtid-mode connected semisync slave for
> >> + the same server_id ("own") events which the slave must not have
> >> + in its state. Typically such events were never committed by
> >> + their originator (this server) and discared at its crash recovery
> >
> > sorry, I failed to understand that
>
> This should've been explained above.
> The slave which is ex-master truncated at recovery some of its ("own") transactions.
> The flag serves to help recognize that at execution of the replication events.
1. yes, this is clearer, thanks. may be you should rewrite your comment
putting this explanation in it?
2. why does it have to be specially recognized anyway?
...
> > 3. also it's somewhat an unfounded assumption that only r/w engines
> > will have the transaction prepared. But we can make it a fact, if we
> > won't prepare r/o engines at all (in ha_prepare).
>
> [?]
>
> Sorry, I don't get it.
> Do you mean that an r/o engine (branch) might need to be counted by
> `ha_count_rw()`? But then it would be contributing with binlogged
> events of its branch... I can't imagine that.
I mean the following. Say, you have two engines participating in a
transaction, like in INSERT innodb_table SELECT * FROM rocksdb_table;
One is r/w, the other is r/o. It's a 2pc transaction, the server does
prepare for everyone, then commit for everyone.
I don't think there is any guarantee that prepare for an r/o engine will
be a no-op. It is completely up to the engine what to do in this case,
it's possible that the engine will optimize it to a no-op, but it is not
a certainty.
What I think we should do is not to call prepare() for r/o engines at
all. Just skip straight to commit/rollback on the second phase.
> >> diff --git a/sql/handler.cc b/sql/handler.cc
> >> index 6792e80b8fe..247ab7267bf 100644
> >> --- a/sql/handler.cc
> >> +++ b/sql/handler.cc
> >> @@ -1245,6 +1245,24 @@ int ha_prepare(THD *thd)
> >> DBUG_RETURN(error);
> >> }
> >>
> >> +/*
> >> + Returns counted number of
> >> + read-write recoverable transaction participants.
> >> +*/
> >> +uint ha_count_rw(THD *thd, bool all)
> >
> > the name doesn't match the implementation, please, rename
>
> [?]
>
> Please help here.
> I could not find what specifically you mean.
> It
> returns rw_ha_count
>
> which I think as 'read-write recoverable transaction participants.'
I read the name is "a number of rw engines in the ha_info",
it does not mention "recoverable", and I from the name I woulnd't have
guessed. May be ha_count_rw_2pc ?
> >> +{
> >> + unsigned rw_ha_count= 0;
> >> + THD_TRANS *trans=all ? &thd->transaction.all : &thd->transaction.stmt;
> >> +
> >> + for (Ha_trx_info * ha_info= trans->ha_list; ha_info;
> >> + ha_info= ha_info->next())
> >> + {
> >> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> >> + ++rw_ha_count;
> >> + }
> >> + return rw_ha_count;
> >> +}
> >> +
> >> /**
> >> Check if we can skip the two-phase commit.
> >>
> >> @@ -1960,8 +1982,122 @@ struct xarecover_st
> >> XID *list;
> >> HASH *commit_list;
> >> bool dry_run;
> >> + MEM_ROOT *mem_root;
> >> + bool error;
> >> };
> >>
> >> +/**
> >> + Inserts a new hash member.
> >> +
> >> + returns a successfully created and inserted @c xid_recovery_member
> >> + into hash @c hash_arg,
> >> + or NULL.
> >> +*/
> >> +static xid_recovery_member*
> >> +xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*)
> >> + alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
> >> + if (!member)
> >> + return NULL;
> >> +
> >> + member->xid= xid_arg;
> >> + member->in_engine_prepare= 1;
> >> + member->decided_to_commit= false;
> >> +
> >> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> >> +}
> >> +
> >> +/*
> >> + Inserts a new or updates an existing hash member to increment
> >> + the member's prepare counter.
> >> +
> >> + returns false on success,
> >> + true otherwise.
> >> +*/
> >> +static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
> >> + MEM_ROOT *ptr_mem_root)
> >> +{
> >> + xid_recovery_member* member;
> >> + if ((member= (xid_recovery_member *)
> >> + my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
> >> + member->in_engine_prepare++;
> >> + else
> >> + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root);
> >> +
> >> + return member == NULL;
> >> +}
> >> +
> >> +/*
> >> + Hash iterate function to complete with commit or rollback as decided
> >> + (typically at binlog recovery processing) in member->in_engine_prepare.
> >> +*/
> >> +static my_bool xarecover_do_commit_or_rollback(void *member_arg,
> >> + void *hton_arg)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> >> + handlerton *hton= (handlerton*) hton_arg;
> >> + xid_t x;
> >> + my_bool rc;
> >> +
> >> + x.set(member->xid);
> >> + rc= member->decided_to_commit ? hton->commit_by_xid(hton, &x) :
> >> + hton->rollback_by_xid(hton, &x);
> >> +
> >> + DBUG_ASSERT(rc || member->in_engine_prepare > 0);
> >> +
> >> + if (!rc)
> >> + {
> >
> > I don't think you can trust rc on that.
> > if it's non-zero, it's an error all right.
> > but it's a bit of a stretch to presume that
> > nonexisting xid is always an error.
> >
> > also commit_by_xid returns int, not my_bool.
>
> [?]
I mean, if an engine returns success when you ask it to commit or
rollback an unknown xid - this is not a bug, right? Commit means "make
all changes by this xid persistent", if there are no changes, there is
nothing to do, so doing nothing and reporting success isn't exactly
wrong. Same for rollback.
> I'd rather to stick with the patch and require from any engine to
> return \cite{xa spec}
>
> [XAER_NOTA] The specified XID is not known by the resource manager
>
> What'd you say?
Okay. I'd say I was wrong, if the standard requires a resource manager
to return an error for an unknown XID, then you surely can rely on that
:)
Sorry for confusion.
> >> +
> >> + return false;
> >> +}
> >> +
> >> +static my_bool xarecover_do_count_in_prepare(void *member_arg,
> >> + void *ptr_count)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> >> + if (member->in_engine_prepare)
> >> + {
> >> + *(uint*) ptr_count += member->in_engine_prepare;
> >
> > This is a rather weird semantics.
> > it's kind of a number of transactions times number of engines.
> > if one transaction wasn't committed in two engines and another
> > transaction wasn't rolled back in one engine, the counter will be 3.
> > how are you going to explain this to users?
>
> So `ptr_count` actually is for transaction branches, not transactions.
> But the ultimate warning speaks erronously the latter
>
> + sql_print_warning("Could not complete %u number of transactions in "
> "engines.", count_in_prepare);
>
> So I'd convert the counter to count the transactions:
>
> (*(uint*) ptr_count) ++
>
> [?]
yes, that'd be easier to understand, thanks
> >> diff --git a/sql/log.cc b/sql/log.cc
> >> index 8073f09ab88..a9808ed8823 100644
> >> --- a/sql/log.cc
> >> +++ b/sql/log.cc
...
> >> + char buf[21];
> >> +
> >> + longlong10_to_str(ptr_gtid->seq_no, buf, 10);
> >> + sql_print_information("Successfully truncated binlog file:%s "
> >> + "to pos:%llu to remove transactions starting from "
> >> + "GTID %u-%u-%s", file_name, pos,
> >> + ptr_gtid->domain_id, ptr_gtid->server_id, buf);
> >> + }
> >> + if (!(error= init_io_cache(&cache, file, IO_SIZE, WRITE_CACHE,
> >> + (my_off_t) pos, 0, MYF(MY_WME|MY_NABP))))
> >> + {
> >> + /*
> >> + Write Stop_log_event to ensure clean end point for the binary log being
> >> + truncated.
> >> + */
> >
> > I'm not sure about it. The less you touch the corrupted binlog the better.
> > simply truncating it would be enough, wouldn't it?
>
> Yes, it would.
> But I somewhat preferred a tidy-up style.
>
> [?]
>
> If you feel strong to remove it will be done.
I feel it'd be safer not to write anything into the corrupted binlog.
> >> + Stop_log_event se;
> >> + se.checksum_alg= cs_alg;
> >> + if ((error= write_event(&se, NULL, &cache)))
> >> + {
> >> + sql_print_error("Failed to write stop event to "
> >> + "binary log. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >> + goto end;
> >> + }
> >> + clear_inuse_flag_when_closing(cache.file);
> >> + if ((error= flush_io_cache(&cache)) ||
> >> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> >> + {
> >> + sql_print_error("Faild to write stop event to "
> >> + "binary log. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >> + }
> >> + }
> >> + else
> >> + sql_print_error("Failed to initialize binary log "
> >> + "cache for writing stop event. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >
> > 1. I don't see why you need to sync after every operation.
>
> Yep, there's one fsync before Stop event writing, redundant if Stop will
> stay in the final patch.
As far as I understand, if Stop is removed, you shouldn't need any syncs
at all, just truncate and close.
> >> + int next_binlog_or_round(int& round,
> >> + const char *last_log_name,
> >> + const char *binlog_checkpoint_name,
> >> + LOG_INFO *linfo, MYSQL_BIN_LOG *log);
> >> + bool is_safe()
> >
> > what do you mean "safe" ?
>
> Safe to truncate condition is explained elsewhere to mean there are no
> non-transactional group of events within the truncated tail.
> Earlier mentioned ineffective COMMIT-query ended "transactions" are safe.
may be is_safe_to_truncate() then?
...
> >> + if (!truncate_validated)
> >> + {
> >> + DBUG_ASSERT(round <= 2);
> >> + /*
> >> + Either truncate was not set or was reset, else
> >> + it gets incremented, otherwise it may only set to an earlier
> >> + offset only at the turn out of the 1st round.
> >
> > sorry, I cannot parse that :(
>
> The comments are for the assert in a branch that acts when the truncate position
> (gtid) is not yet finalized.
> Let me annotate under each assert's line.
>
> >
> >> + */
> >> + DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
>
> 'truncate was not set or was reset' ^, that is truncate_gtid.
> The 'was reset' is actually not present in the assert - that'd be
> reflected with
> truncate_reset_done == true
>
>
> >> + last_gtid_coord >= binlog_truncate_coord ||
>
> 'it gets incremented' ^
why does that mean "it gets incremented" ?
>
> >> + (round == 2 && truncate_set_in_1st));
>
> And if it's neither of the above there must've been truncate candidate
> set in the 1st round.
...
> >> switch (typ)
> >> {
> >> + case FORMAT_DESCRIPTION_EVENT:
> >> + if (round > 1)
> >> + {
> >> + if (fdle != fdle_arg)
> >> + delete fdle;
> >> + fdle= (Format_description_log_event*) ev;
> >
> > why is it needed? all binlog files should have the same
> > Format_description_log_event anyway.
>
> Actually contrary to what was stated (the patch removed that claim) in
> the parent revision of this commit, the FD:s of the
> recovery binlog files may have different checksums. I.e
>
> set @@global.binlog_checksum = value
>
> can be issued to rotate binlog to a new hot file having different checksum value
> while the binlog *CheckPoint* file is behind. So this is possible to
> have at recovery:
>
> B_1(CP, checksum = NONE), B_2(hot, checksum = CRC32)
>
> which also manifests a normal use case bug I have not reported any bug (though wanted to).
okay, checksum can differ. But how does it affect
Format_description_log_event ?
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0

Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 15 Mar '21
by Sergei Golubchik 15 Mar '21
15 Mar '21
Hi, Andrei!
On Mar 01, Andrei Elkin wrote:
> > I've reviewed almost everything, see comments below. But not the
> > Recovery_context methods. Please explain how it works and how all
> > these truncate_validated, truncate_reset_done, truncate_set_in_1st,
> > etc all work together.
>
> ...specifically to this point. Just in case I hope you did not miss to
> read recovery_design.txt from the MDEV, which does not go into coding
> details that you're effectively about in above.
I've read it now. Still I don't understand why you need an extra round
of binlog scanning (two rounds if only one file, three rounds if many).
Also, this recovery_design.txt is not part of the code, so whoever will
look at it later will be just as puzzled as I was.
...
> [ G1, G2, ..., G_k, g_{k+1}, ... g_n ]
>
> here the uppercase `G' stands for committed trx, the smallcase `g' for
> prepared,`_k' - sub-scripts in the recovery sequence. As the capital
> letter first rules in the single-engine case the first occurrence of a
> pattern `G_k,g_k+1' identifies the truncate index. The patch reflects
> such fact with raising `truncate_validated' flag.
>
> But it's more complicated in the multiple binlog files / engines case.
> The first `Gg' letter-case drop is not guaranteed to be the only drop
> so `truncate_reset_done' and `truncate_set_in_1st' are introduced to
> help with truncate index identification.
Why is it not guaranteed?
> When `truncate_validated' is set that indicates the truncate index is
> determined and may not change in the current (1st of 2nd) nor future
> rounds. `truncate_reset_done' says that an "inverse" `g_k,G_k+1' pair
> is found so that any earlier truncation candidate gets reset (to
> "zero"). If there will be later any candidate found in *this* (1st or
> 2nd) round in the sequence its index will be obviously greater.
>
> `truncate_set_in_1st' function is to remember that the truncate
> candidate was found in the 1st round (in the "hot" binlog file), but
> if the candidate has not been validated `!truncate_validated' it may
> be exacted in the 2nd round and then to an earlier transaction. So the
> flag helps to handle exception from truncate candidate monotony rule:
> e.g the hot binlog B2 contains `[g5,g6,...g_n]' and a ref to binlog
> checkpoint file B1 that contains `[G1,g2,G3,g4]'. The first round
> truncate candidate of g5 would be first exacted to `g2' before finally
> ascertained to `g4' in the 2nd round. (Notice `g2 -> g4' preserves the
> truncate index monotony).
>
> Notice that due to exacting like `g2 -> g4' in the 2nd round of the
> above example `g2' got "to be up-cased" into `G2' for committing
> (feasible with two trx:s on two different engine scenario - g2 with
> Innodb only got prepared, G4 - on Rocksdb, and got committed prior the
> crash). That's what the 3rd round is for.
>
> I hope this will be helpful.
Unfortunately, not very much. It describes how you juggle with variables
and scanning rounds. But not *what* you're trying to find. And it's kind
of difficult to reverse engineer your "how" back into "what" :(
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0

[Maria-developers] MDEV-17399: JSON_TABLE: Incorrect code with table elimination
by Sergey Petrunia 15 Mar '21
by Sergey Petrunia 15 Mar '21
15 Mar '21
Hi Alexey,
> diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
> index 3958797ec44..f2497d524ec 100644
> --- a/sql/opt_table_elimination.cc
> +++ b/sql/opt_table_elimination.cc
> @@ -637,6 +637,16 @@ void eliminate_tables(JOIN *join)
> List_iterator<Item> it(join->fields_list);
> while ((item= it++))
> used_tables |= item->used_tables();
> +
> + {
> + List_iterator<TABLE_LIST> it(*join->join_list);
> + TABLE_LIST *tbl;
> + while ((tbl= it++))
> + {
> + if (tbl->table_function)
> + used_tables|= tbl->table_function->used_tables();
> + }
> + }
>
This only walks the tables that are "at the top level" of the join. if a
JSON_TABLE(...) is inside a nested outer join, it will not be found.
Please walk select_lex->leaf_tables instead. Please add the testcase (I provide
one below).
Please also add a comment, clarifying what is being done, something like:
Table function JSON_TABLE() can have references to other tables. Do not
eliminate the tables that JSON_TABLE() refers to. Note: the JSON_TABLE itself
cannot be eliminated as it doesn't have unique keys.
== Testcase ==
create table t20 (a int not null);
create table t21 (a int not null primary key, js varchar(100));
insert into t20 select seq from seq_1_to_100;
insert into t21 select a, '{"a":100}' from t20;
create table t31(a int);
create table t32(b int);
insert into t31 values (1);
insert into t32 values (1);
explain
select
t20.a, jt1.ab
from
t20
left join t21 on t20.a=t21.a
join
(t31 left join (t32 join JSON_TABLE(t21.js,'$' COLUMNS (ab INT PATH '$.a')) AS jt1) on t31.a<3);
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
3

Re: [Maria-developers] f1db957c5da: MDEV-21469: Implement crash-safe binary logging of the user XA
by Sergei Golubchik 15 Mar '21
by Sergei Golubchik 15 Mar '21
15 Mar '21
Hi, Sujatha!
Note, this is a review of a combined diff dc92235f21e + f1db957c5da
On Mar 14, Sujatha wrote:
> revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da)
> parent(s): dc92235f21e
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2020-12-21 16:10:46 +0200
> message:
>
> MDEV-21469: Implement crash-safe binary logging of the user XA
>
> Make XA PREPARE, XA COMMIT and XA ROLLBACK statements
> crash-safe when --log-bin is specified.
>
> At execution of XA PREPARE, XA COMMIT and XA ROLLBACK their replication
> events are made written into the binary log prior to execution
> of the commands in storage engines.
> In case the server crashes, after writing to binary log but before all
> involved engines have processed the command, the following recovery
> will execute the command's replication events to equalize the states
> of involved engines with that of binlog.
> That applies to all XA PREPARE *group* and XA COMMIT or ROLLBACK.
>
> On the implementation level the recovery time binary log parsing
> is augmented to pay attention to
> the user XA xids to identify the XA transactions' state:s in binary log, and
> eventually match them against their states in engines, see
> MYSQL_BIN_LOG::recover_explicit_xa_prepare().
> In discrepancy cases the outdated state in the engines is corrected with
> resubmitting the transaction prepare group of events, or completion
> ones. The multi-engine partly prepared XA PREPARE case
> the XA is rolled back first.
> The fact of multiple-engine involved is registered into
> Gtid_log_event::flags2 as one bit. The boolean value is
> sufficient and precise to deal with two engines in XA transaction.
> With more than 2 recoverable engines the flag method is still correct though
> may be pessimistic, as it treats all recoverable engines as XA
> participants. So when the number of such Engines exceeds the number
> of prepared engines of the XA that XA is treated
> as partially completed, with all that ensued.
> As an optimization no new bit is allocated in flags2, instead a
> pre-existing ones (of MDEV-742) are reused, observing that
> A. XA "COMPLETE" does not require multi-engine hint for its recovery and that
> B. the MDEV-742 XA-completion bit is not anyhow used by
> XA-PREPARE and its GTID log event.
>
> Notice the multi-engine flagging is superceded by MDEV-21117 extension
> in Gtid log event so this part should be taken from there.
> diff --git a/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> new file mode 100644
> index 00000000000..d0852de2fcc
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> @@ -0,0 +1,86 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA crash recovery works fine across multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate an explicit XA transaction. Using debug simulation hold the
"debug sync" != "debug sim" :)
> +# execution of XA PREPARE statement after the XA PREPARE is written to
> +# the binary log. With this the prepare will not be done in engine.
> +# 1 - By executing FLUSH LOGS generate multiple binary logs.
> +# 2 - Now make the server to disappear at this point.
> +# 3 - Restart the server. During recovery the XA PREPARE from the binary
> +# log will be read. It is cross checked with engine. Since it is not
> +# present in engine it will be executed once again.
> +# 4 - When server is up execute XA RECOVER to check that the XA is
> +# prepared in engine as well.
> +# 5 - XA COMMIT the transaction and check the validity of the data.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +#
> +
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_log_bin.inc
> +
> +RESET MASTER;
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +connect(con1,localhost,root,,);
> +XA START 'xa1';
> +INSERT INTO t1 SET a=1;
> +SET DEBUG_SYNC= "simulate_hang_after_binlog_prepare SIGNAL con1_ready WAIT_FOR con1_go";
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +XA END 'xa1';
> +--send XA PREPARE 'xa1';
> +
> +connection default;
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +FLUSH LOGS;
> +FLUSH LOGS;
> +FLUSH LOGS;
> +
> +--source include/show_binary_logs.inc
> +--let $binlog_file= master-bin.000004
> +--let $binlog_start= 4
> +--source include/show_binlog_events.inc
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
I don't see why you'd need to wait first, it'd be simpler to restart right away.
> +EOF
> +
> +--error 0,2013
> +SET DEBUG_SYNC= "now SIGNAL con1_go";
> +--source include/wait_until_disconnected.inc
> +
> +--connection con1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +
> +SELECT * FROM t1;
> +
> +# Clean up.
> +connection default;
> +DROP TABLE t1;
> +
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> new file mode 100644
> index 00000000000..e972e3f09de
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> @@ -0,0 +1,98 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The COMMIT should be executed during recovery.
> +# 3 - Check the data in table. Both rows should be present in table.
> +# 4 - Trying to commit 'xa2' should report unknown 'XA' error as COMMIT is
> +# already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
Same. Why do you wait?
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> new file mode 100644
> index 00000000000..14cebbd9b13
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> @@ -0,0 +1,119 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that if for some reason an event cannot be applied during
> +# recovery, appropriate error is reported.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. Using debug simulation point make XA COMMIT 'xa2'
> +# execution to fail. The server will resume anyway
> +# to leave the error in the errlog (see "Recovery: Error..").
> +# 3 - Work around the simulated failure with Commit once again
> +# from a connection that turns OFF binlogging.
> +# Slave must catch up with the master.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CALL mtr.add_suppression("Recovery: Error .Out of memory..");
> +CALL mtr.add_suppression("Crash recovery failed.");
> +CALL mtr.add_suppression("Can.t init tc log");
> +CALL mtr.add_suppression("Aborting");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --debug-dbug=d,trans_xa_commit_fail
> +EOF
> +
> +connection default;
> +--source include/wait_until_disconnected.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--echo *** must be no 'xa2' commit seen, as it's still prepared:
> +SELECT * FROM t;
> +XA RECOVER;
> +
> +# Commit it manually now to work around the extra binlog record
> +# by turning binlogging OFF by the connection.
> +
> +SET GLOBAL DEBUG_DBUG="";
> +SET SQL_LOG_BIN=0;
> +--error 0
why error 0? What will happen if you don't disable binlog?
> +XA COMMIT 'xa2';
> +SET SQL_LOG_BIN=1;
> +
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +
> +--sync_slave_with_master
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> new file mode 100644
> index 00000000000..7b987c7f29b
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> @@ -0,0 +1,95 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA PREPARE be done in binary log and crash the
> +# server so that it is not prepared in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' is
> +# present.
> +# 4 - Commit the XA transaction and verify its correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ditto
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> new file mode 100644
> index 00000000000..9d2c5cce528
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> @@ -0,0 +1,117 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 3 explicit XA transactions. 'xa1', 'xa2' and 'xa3'.
> +# Using debug simulation hold the execution of second XA PREPARE
> +# statement after the XA PREPARE is written to the binary log.
> +# With this the prepare will not be done in engine.
> +# 1 - For 'xa3' allow the PREPARE statement to be written to binary log and
> +# simulate server crash.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2' and 'xa3'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' and 'xa3'
> +# are present along with 'xa1'.
> +# 4 - Commit all the XA transactions and verify their correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ok, in this test you actually use master2 :)
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Found 2 prepared XA transactions");
> +CALL mtr.add_suppression("Found 3 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +use test;
> +xa start 'xa2';
> +insert into t values (30);
> +xa end 'xa2';
> +SET DEBUG_SYNC="simulate_hang_after_binlog_prepare SIGNAL reached WAIT_FOR go";
> +send xa prepare 'xa2';
> +
> +--connection master2
> +let $wait_condition=
> + SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
> + WHERE STATE like "debug sync point: simulate_hang_after_binlog_prepare%";
> +--source include/wait_condition.inc
eh? you can just 'WAIT_FOR signal', couldn't you?
> +
> +XA START 'xa3';
> +INSERT INTO t VALUES (40);
> +XA END 'xa3';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa3';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +XA COMMIT 'xa2';
> +XA COMMIT 'xa3';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> new file mode 100644
> index 00000000000..1d19f96116e
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> @@ -0,0 +1,97 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
XA ROLLBACK
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA ROLLBACK be done in binary log and crash the
> +# server so that it is not committed in engine.
not rolled back
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The ROLLBACK should be executed during recovery.
> +# 3 - Check the data in table. Only one row should be present in table.
> +# 4 - Trying to rollback 'xa2' should report unknown 'XA' error as rollback
> +# is already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
unused again
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA ROLLBACK 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA ROLLBACK 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> new file mode 100644
> index 00000000000..c7cefbda4bf
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> @@ -0,0 +1,46 @@
> +# MDEV-742, MDEV-21469 XA replication, and xa crash-safe.
> +# Tests prove xa state is recovered to a prepared or completed state
> +# upon post-crash recovery, incl a multi-engine case.
> +
> +--source include/have_rocksdb.inc
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_binlog_format_row.inc
> +
> +--connection slave
> +--source include/stop_slave.inc
> +
> +--connection master
> +CALL mtr.add_suppression("Found . prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=rocksdb;
> +CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=innodb;
> +INSERT INTO t1 SET a = 1;
> +INSERT INTO t2 SET a = 1;
> +
> +--let $xa=xa1
> +--let $when=after_binlog
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--let $xa=xa2
> +--let $when=after_first_engine
> +--let $finally_expected=$xa in the list
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +
> +--connection master
> +--sync_slave_with_master
> +let $diff_tables= master:t1, slave:t1;
> +source include/diff_tables.inc;
> +let $diff_tables= master:t2, slave:t2;
> +source include/diff_tables.inc;
> +
> +--connection master
> +SET SESSION DEBUG_DBUG="";
> +drop table t1,t2;
> +--sync_slave_with_master
> +
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> new file mode 100644
> index 00000000000..312b8b6a19e
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> @@ -0,0 +1,67 @@
> +# callee of rpl_rocksdb_xa_recover.test
> +# requires t1,t2 as defined in the caller.
> +
> +--let $at=_prepare
> +--let $finally_expected=$xa in the list
> +--let $init_val=`SELECT MAX(a) from t1 as t_1`
why as t_1 ?
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 1
> +--eval INSERT INTO t2 SET a=$init_val + 1
> +--eval XA END '$xa'
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA PREPARE '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval XA ROLLBACK '$xa'
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (rolled back)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 2
> +--eval INSERT INTO t2 SET a=$init_val + 2
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA ROLLBACK '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (committed away)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 3
> +--eval INSERT INTO t2 SET a=$init_val + 3
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA COMMIT '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as three FROM t1
> +--eval SELECT MAX(a) - $init_val as three FROM t2
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 639cbfbe7aa..e1880339e85 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -3614,6 +3616,12 @@ class Gtid_log_event: public Log_event
> static const uchar FL_PREPARED_XA= 64;
> /* FL_"COMMITTED or ROLLED-BACK"_XA is set for XA transaction. */
> static const uchar FL_COMPLETED_XA= 128;
> + /*
> + To mark the fact of multiple transactional engine participants
> + in the prepared XA. The FL_COMPLETED_XA bit is reused by XA_PREPARE_LOG_EVENT,
What do you mean by that? XA_prepare_log_event does not inherit from
Gtid_log_event, it cannot use Gtid_log_event flags.
And, please, explain in a comment why you can reuse a bit for two different
flags. Like "the ambiguity between FL_COMPLETED_XA and FL_MULTI_ENGINE_XA is
resolved by the flag FL_PREPARED_XA. if it's set, then 128 means
FL_MULTI_ENGINE_XA, if it's not set, then 128 means FL_COMPLETED_XA"
> + oth the XA completion events do not need such marking.
other
> + */
> + static const uchar FL_MULTI_ENGINE_XA= 128;
>
> #ifdef MYSQL_SERVER
> Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
> diff --git a/sql/handler.h b/sql/handler.h
> index 0eff7bd930d..9acdd47bfc2 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -1880,8 +1891,18 @@ class Ha_trx_info
> m_ht= ht_arg;
> m_flags= (int) TRX_READ_ONLY; /* Assume read-only at start. */
>
> - m_next= trans->ha_list;
> - trans->ha_list= this;
> + if (likely(!past_head))
> + {
> + m_next= trans->ha_list;
> + trans->ha_list= this;
> + }
> + else
> + {
> + DBUG_ASSERT(trans->ha_list);
> +
> + m_next= trans->ha_list->m_next;
> + trans->ha_list->m_next= this;
> + }
I don't like this fake generalization. There's no point for an arbitrary
engine to be put at a specific place in the list. Only binlog is special.
At least, remove this new argument and check for binlog_hton here.
But really, think of this - binlog is so special, may be it'd be easier not
to pretend it's a storage engine at all? No need for a fake binlog_hton,
no need to register it, etc. It kind of made sense in 2005, but it doesn't
necessarily make it now.
> }
>
> /** Clear, prepare for reuse. */
> diff --git a/sql/handler.cc b/sql/handler.cc
> index ffb89b30d92..b981294f0d6 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1338,6 +1345,9 @@ int ha_prepare(THD *thd)
> handlerton *ht= ha_info->ht();
> if (ht->prepare)
> {
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_prepare",
> + if (!ha_info->next()) DBUG_SUICIDE(););
looks like not simulate_crash_after_first_engine_prepare, but
more like simulate_crash_before_last_engine_prepare.
it's the same point in time if you have only two engines,
but the code still implements "before last", not "after first"
> +
> if (unlikely(prepare_or_error(ht, thd, all)))
> {
> ha_rollback_trans(thd, all);
> @@ -1368,22 +1378,22 @@ int ha_prepare(THD *thd)
> }
>
> /*
> - Like ha_check_and_coalesce_trx_read_only to return counted number of
> - read-write transaction participants limited to two, but works in the 'all'
> - context.
> - Also returns the last found rw ha_info through the 2nd argument.
> + Returns counted number of
> + read-write recoverable transaction participants optionally limited to two.
> + Also optionally returns the last found rw ha_info through the 2nd argument.
> */
> -uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info)
> +uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info, bool count_through)
I don't understand why you need a new argument. ptr_ha_info==NULL means don't
return a value there, this is a standard convention used everywhere.
also, it's not an "count rw all" it's are there more rw-all htons than N
where N can be 1 or 2.
So:
* either take N as an argument and return as soon as rw_ha_count > N
* or don't overoptimize here and just return full count every time
> {
> unsigned rw_ha_count= 0;
>
> for (auto ha_info= thd->transaction.all.ha_list; ha_info;
> ha_info= ha_info->next())
> {
> - if (ha_info->is_trx_read_write())
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> {
> - *ptr_ha_info= ha_info;
> - if (++rw_ha_count > 1)
> + if (ptr_ha_info)
> + *ptr_ha_info= ha_info;
> + if (++rw_ha_count > 1 && !count_through)
> break;
> }
> }
> @@ -1876,6 +1886,10 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, bool is_real_trans)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
1. same as above
2. why not to use *different* names for these two crashes?
in case, you know, you'll want to crash at a specific point in the code, not
just somewhere? And in a case you'll actually want to crash just somewhere you
can always use "+d,simulate_crash_after_first_engine_commit,simulate_crash_after_first_engine_rollback"
(and it should be "before_last", of course, not "after_first")
> +
> if ((err= ht->commit(ht, thd, all)))
> {
> my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> @@ -1987,6 +2001,10 @@ int ha_rollback_trans(THD *thd, bool all)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
ditto
> +
> if ((err= ht->rollback(ht, thd, all)))
> { // cannot happen
> my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
> @@ -2098,8 +2116,24 @@ int ha_commit_or_rollback_by_xid(XID *xid, bool commit)
> xaop.xid= xid;
> xaop.result= 1;
>
> + if (binlog_hton->recover)
> + {
> + /*
> + When the binlogging service is enabled complete the transaction
> + by it first.
> + */
> + if (commit)
> + binlog_hton->commit_by_xid(binlog_hton, xid);
> + else
> + binlog_hton->rollback_by_xid(binlog_hton, xid);
> + }
> plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
> MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
> + if (binlog_hton->recover)
> + {
> + THD *thd= current_thd;
> + thd->reset_binlog_completed_by_xid();
> + }
1. this, precisely, shows that binlog does not need a hton anymoreo
2. how can binlog_hton->recover be NULL here?
>
> return xaop.result;
> }
> @@ -2222,6 +2258,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
>
> if (hton->recover)
> {
> + info->recover_htons++;
don't forget to subtract 1 for binlog_hton->recover,
otherwise it'll make you to rollback everything with a FL_MULTI_ENGINE_XA flag
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> {
> sql_print_information("Found %d prepared transaction(s) in %s",
> @@ -2263,7 +2300,21 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> _db_doprnt_("ignore xid %s", xid_to_str(buf, info->list[i]));
> });
> xid_cache_insert(info->list + i);
> + XID *foreign_xid= info->list + i;
swap those two lines ^^^ then you can write
xid_cache_insert(foreign_xid);
> info->found_foreign_xids++;
> +
> + /*
> + For each foreign xid prepraed in engine, check if it is present in
> + xa_prepared_list of binlog.
> + */
> + if (info->xa_prepared_list)
for simplicity I'd rather always initialize xa_prepared_list and remove
this if()
> + {
> + struct xa_recovery_member *member= NULL;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(info->xa_prepared_list, foreign_xid->key(),
> + foreign_xid->key_length())))
> + member->in_engine_prepare++;
> + }
> continue;
> }
> if (IF_WSREP(!(wsrep_emulate_bin_log &&
> @@ -2362,7 +2424,7 @@ int ha_recover(HASH *commit_list)
> info.found_my_xids, opt_tc_log_file);
> DBUG_RETURN(1);
> }
> - if (info.commit_list)
> + if (info.commit_list && !info.found_foreign_xids)
Why? after the finished crash recovery one can still have transactions in
doubt.
> sql_print_information("Crash recovery finished.");
> DBUG_RETURN(0);
> }
> diff --git a/sql/log.cc b/sql/log.cc
> index 731bb3e98f0..c69b8518cf4 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton __attribute__((unused)),
> return 0;
> }
>
> -
> +/*
> + The function invokes binlog_commit() and returns its result
> + when it has not yet called it already.
> + binlog_cache_mngr::completed_by_xid remembers the fact of
> + the 1st of maximum two subsequent calls.
> +*/
> static int binlog_commit_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT);
>
> - return binlog_commit(hton, thd, TRUE);
> -}
> + if (!cache_mngr->completed_by_xid)
On what code path can you have completed_by_xid == false here?
> + {
> + rc= binlog_commit(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
>
> + return rc;
> +}
>
> +/*
> + The function invokes binlog_rollback() and returns its results similarly
> + to a commit branch.
> +*/
> static int binlog_rollback_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_ROLLBACK ||
> (thd->transaction.xid_state.get_state_code() == XA_ROLLBACK_ONLY));
> - return binlog_rollback(hton, thd, TRUE);
> +
> + if (!cache_mngr->completed_by_xid)
> + {
> + rc= binlog_rollback(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
> +
> + return rc;
> }
>
>
> @@ -2195,6 +2224,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
> if (!all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
> + DBUG_SUICIDE(););
> + DEBUG_SYNC(thd, "simulate_hang_after_binlog_prepare");
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_prepare",
> + DBUG_SUICIDE(););
wouldn't you say there's one DBUG_EXECUTE_IF too many?
Just keep one, "simulate_crash_after_binlog_commit"
> +
> THD_STAGE_INFO(thd, org_stage);
> DBUG_RETURN(error);
> }
> @@ -2298,6 +2333,9 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
> thd->reset_binlog_for_next_statement();
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
same as above, use unique dbug keywords, like
"simulate_crash_after_binlog_rollback" (because it's rollback here)
> + DBUG_SUICIDE(););
> +
> DBUG_RETURN(error);
> }
>
> @@ -5749,6 +5794,14 @@ binlog_cache_mngr *THD::binlog_setup_trx_data()
> DBUG_RETURN(cache_mngr);
> }
>
> +/*
> + XA completion flag resetter for ha_commit_or_rollback_by_xid().
> +*/
> +void THD::reset_binlog_completed_by_xid()
> +{
> + binlog_setup_trx_data()->reset_completed_by_xid();
> +}
I don't think this makes any sense as a THD method
> +
> /*
> Function to start a statement and optionally a transaction for the
> binary log.
> @@ -10333,14 +10386,108 @@ start_binlog_background_thread()
> return 0;
> }
>
> +#ifdef HAVE_REPLICATION
> +/**
> + Auxiliary function for TC_LOG::recover().
> + @returns a successfully created and inserted @c xa_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xa_recovery_member*
> +xa_member_insert(HASH *hash_arg, xid_t *xid_arg, xa_binlog_state state_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xa_recovery_member *member= (xa_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xa_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid.set(xid_arg);
> + member->state= state_arg;
> + member->in_engine_prepare= 0;
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/* Inserts or updates an existing hash member with a proper state */
> +static bool xa_member_replace(HASH *hash_arg, xid_t *xid_arg, bool is_prepare,
> + MEM_ROOT *ptr_mem_root)
> +{
> + if(is_prepare)
> + {
> + if (!(xa_member_insert(hash_arg, xid_arg, XA_PREPARE, ptr_mem_root)))
> + return true;
> + }
> + else
> + {
> + /*
> + Search if XID is already present in recovery_list. If found
> + and the state is 'XA_PREPRAED' mark it as XA_COMPLETE.
> + Effectively, there won't be XA-prepare event group replay.
> + */
> + xa_recovery_member* member;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(hash_arg, xid_arg->key(), xid_arg->key_length())))
> + {
> + if (member->state == XA_PREPARE)
> + member->state= XA_COMPLETE;
> + }
> + else // We found only XA COMMIT during recovery insert to list
> + {
> + if (!(member= xa_member_insert(hash_arg,
> + xid_arg, XA_COMPLETE, ptr_mem_root)))
> + return true;
> + }
> + }
> + return false;
> +}
> +#endif
>
> +extern "C" uchar *xid_get_var_key(xid_t *entry, size_t *length,
define the first argument as const char*, then you won't need to cast in
my_hash_init, and you can be sure you pass the correct function with the
correct number of arguments, even if my_hash_get_key changes in the future.
and cast entry to xid_t* below explicitly.
> + my_bool not_used __attribute__((unused)))
> +{
> + *length= entry->key_length();
> + return (uchar*) entry->key();
> +}
> +
> +/**
> + Performs recovery based on transaction coordinator log for 2pc. At the
> + time of crash, if the binary log was in active state, then recovery for
> + "implicit" 'xid's and explicit 'XA' transactions is initiated,
> + otherwise merely the gtid binlog state is updated.
> + For 'xid' and 'XA' based recovery the following steps are performed.
> +
> + Identify the active binlog checkpoint file.
> + Scan the binary log from the beginning.
> + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state.
> + Prepare a list of 'xid's for recovery.
> + Prepare a list of explicit 'XA' transactions for recovery.
> + Recover the 'xid' transactions.
> + The explicit 'XA' transaction recovery is initiated once all the server
> + components are initialized. Please check 'execute_xa_for_recovery()'.
why explicit XA recovery is delayed?
> +
> + Called from @c MYSQL_BIN_LOG::do_binlog_recovery()
> +
> + @param linfo Store here the found log file name and position to
> + the NEXT log file name in the index file.
> +
> + @param last_log_name Name of the last active binary log at the time of
> + crash.
> +
> + @param first_log Pointer to IO_CACHE of active binary log
> + @param fdle Format_description_log_event of active binary log
> + @param do_xa Is 2pc recovery needed for 'xid's and explicit XA
> + transactions.
> + @return indicates success or failure of recovery.
> + @retval 0 success
> + @retval 1 failure
> +
> +*/
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> Format_description_log_event *fdle, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> - MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> bool first_round;
> @@ -10533,10 +10693,22 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> + if (ha_recover(&xids, &xa_recover_list, &xa_recover_htons))
> goto err2;
>
> - free_root(&mem_root, MYF(0));
> + DBUG_ASSERT(!xa_recover_list.records ||
> + (binlog_checkpoint_found && binlog_checkpoint_name[0] != 0));
why is that?
> +
> + if (!xa_recover_list.records)
> + {
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> + }
> + else
> + {
> + xa_binlog_checkpoint_name= strmake_root(&mem_root, binlog_checkpoint_name,
> + strlen(binlog_checkpoint_name));
> + }
> my_hash_free(&xids);
> }
> return 0;
> @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> return 1;
> }
>
> +void MYSQL_BIN_LOG::execute_xa_for_recovery()
> +{
> + if (xa_recover_list.records)
> + (void) recover_explicit_xa_prepare();
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> +};
> +
> +/**
> + Performs recovery of user XA transactions.
> + 'xa_recover_list' contains the list of XA transactions to be recovered.
> + with possible replaying replication event from the binary log.
> +
> + @return indicates success or failure of recovery.
> + @retval false success
> + @retval true failure
> +
> +*/
> +bool MYSQL_BIN_LOG::recover_explicit_xa_prepare()
why is it bool, if you only use it in execute_xa_for_recovery()
and ignore the return value?
> +{
> +#ifndef HAVE_REPLICATION
> + /* Can't be supported without replication applier built in. */
> + return false;
> +#else
> + bool err= true;
> + int error=0;
> + Relay_log_info *rli= NULL;
> + rpl_group_info *rgi;
> + THD *thd= new THD(0); /* Needed by start_slave_threads */
> + thd->thread_stack= (char*) &thd;
> + thd->store_globals();
> + thd->security_ctx->skip_grants();
> + IO_CACHE log;
> + const char *errmsg;
> + File file;
> + bool enable_apply_event= false;
> + Log_event *ev = 0;
> + LOG_INFO linfo;
> + int recover_xa_count= xa_recover_list.records;
> + xa_recovery_member *member= NULL;
> +
> + if (!(rli= thd->rli_fake= new Relay_log_info(FALSE, "Recovery")))
> + {
> + my_error(ER_OUTOFMEMORY, MYF(ME_FATAL), 1);
> + goto err2;
> + }
> + rli->sql_driver_thd= thd;
> + static LEX_CSTRING connection_name= { STRING_WITH_LEN("Recovery") };
> + rli->mi= new Master_info(&connection_name, false);
> + if (!(rgi= thd->rgi_fake))
> + rgi= thd->rgi_fake= new rpl_group_info(rli);
> + rgi->thd= thd;
> + thd->system_thread_info.rpl_sql_info=
> + new rpl_sql_thread_info(rli->mi->rpl_filter);
> +
> + if (rli && !rli->relay_log.description_event_for_exec)
> + {
> + rli->relay_log.description_event_for_exec=
> + new Format_description_log_event(4);
> + }
> + if (find_log_pos(&linfo, xa_binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", xa_binlog_checkpoint_name);
> + goto err2;
> + }
> +
> + tmp_disable_binlog(thd);
> + thd->variables.pseudo_slave_mode= TRUE;
> + for (;;)
> + {
> + if ((file= open_binlog(&log, linfo.log_file_name, &errmsg)) < 0)
> + {
> + sql_print_error("%s", errmsg);
> + goto err1;
> + }
> + while (recover_xa_count > 0 &&
> + (ev= Log_event::read_log_event(&log,
> + rli->relay_log.description_event_for_exec,
> + opt_master_verify_checksum)))
> + {
> + if (!ev->is_valid())
> + {
> + sql_print_error("Found invalid binlog query event %s"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written));
> + goto err1;
> + }
> + enum Log_event_type typ= ev->get_type_code();
> + ev->thd= thd;
> +
> + if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event= true;
> +
> + if (typ == GTID_EVENT)
> + {
> + Gtid_log_event *gev= (Gtid_log_event *)ev;
> + if (gev->flags2 &
> + (Gtid_log_event::FL_PREPARED_XA | Gtid_log_event::FL_COMPLETED_XA))
> + {
> + if ((member=
> + (xa_recovery_member*) my_hash_search(&xa_recover_list,
> + gev->xid.key(),
> + gev->xid.key_length())))
> + {
> + /*
> + When XA PREPARE group of events (as flagged so) check
> + its actual binlog state which may be COMPLETED. If the
> + state is also PREPARED then analyze through
> + in_engine_prepare whether the transaction needs replay.
> + */
> + if (gev->flags2 & Gtid_log_event::FL_PREPARED_XA)
> + {
> + if (member->state == XA_PREPARE)
> + {
> + // XA prepared is not present in (some) engine then apply it
> + if (member->in_engine_prepare == 0)
> + enable_apply_event= true;
> + else if (gev->flags2 & Gtid_log_event::FL_MULTI_ENGINE_XA &&
> + xa_recover_htons > member->in_engine_prepare)
This needs a comment, something like "FL_MULTI_ENGINE_XA says the transaction
must be prepared in more than one engine. We don't know in how many, so if it's
not prepared in *all* engines we'll replay it just in case"
I presume that (a bit silly) logic will do away when you will port this
patch to use a counter instead of a flag. Please, don't push if before
that.
> + {
> + enable_apply_event= true;
> + // partially engine-prepared XA is first cleaned out prior replay
> + thd->lex->sql_command= SQLCOM_XA_ROLLBACK;
> + ha_commit_or_rollback_by_xid(&gev->xid, 0);
> + }
> + else
> + --recover_xa_count;
> + }
> + }
> + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA)
> + {
> + if (member->state == XA_COMPLETE &&
> + member->in_engine_prepare > 0)
> + enable_apply_event= true;
why? you cannot replay a fully prepared and partially committed transaction
> + else
> + --recover_xa_count;
> + }
> + }
> + }
> + }
> +
> + if (enable_apply_event)
> + {
> + if ((err= ev->apply_event(rgi)))
> + {
> + sql_print_error("Failed to execute binlog query event of type: %s,"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written),
> + thd->get_stmt_da()->sql_errno(),
> + thd->get_stmt_da()->message());
> + delete ev;
> + goto err1;
> + }
> + else if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event=false;
how can that happen?
> + else if (thd->lex->sql_command == SQLCOM_XA_PREPARE ||
> + thd->lex->sql_command == SQLCOM_XA_COMMIT ||
> + thd->lex->sql_command == SQLCOM_XA_ROLLBACK)
> + {
> + --recover_xa_count;
> + enable_apply_event=false;
> +
> + sql_print_information("Binlog event %s at %s:%llu"
> + " successfully applied",
> + typ == XA_PREPARE_LOG_EVENT ?
> + static_cast<XA_prepare_log_event *>(ev)->get_query() :
> + static_cast<Query_log_event *>(ev)->query,
> + linfo.log_file_name, (ev->log_pos - ev->data_written));
> + }
> + }
> + if (typ != FORMAT_DESCRIPTION_EVENT)
> + delete ev;
> + }
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + file= -1;
> + if (unlikely((error= find_next_log(&linfo, 1))))
> + {
> + if (error != LOG_INFO_EOF)
> + sql_print_error("find_log_pos() failed (error: %d)", error);
> + else
> + break;
> + }
> + }
> +err1:
> + reenable_binlog(thd);
> + /*
> + There should be no more XA transactions to recover upon successful
> + completion.
> + */
> + if (recover_xa_count > 0)
> + goto err2;
> + sql_print_information("Crash recovery finished.");
> + err= false;
> +err2:
> + if (file >= 0)
> + {
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + thd->variables.pseudo_slave_mode= FALSE;
> + delete rli->mi;
> + delete thd->system_thread_info.rpl_sql_info;
> + rgi->slave_close_thread_tables(thd);
> + thd->reset_globals();
> + delete thd;
> +
> + return err;
> +#endif /* !HAVE_REPLICATION */
> +}
>
> int
> MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Hi Alexey,
Please find below final bits of input. After these are addressed the patch
will probably be good to push (But I'll need a final pass before giving ok to
push).
== json_table_mysql* tests ==
* Please remove one of the json_table_mysql and json_table_mysql2 tests.
* Please move the remaining test to be in the same suite as json_table.test.
== Fix json_table.test ==
The test has this somewhere in the middle:
set optimizer_switch='firstmatch=off';
Please restore the original setting.
== ha_json_table::rnd_next() returns 0 on error ==
Consider this query:
select * from
json_table(
'[
{"name": "X"},
{"name2": "Y"}
]',
'$[*]' columns
(
col1 varchar(100) path '$.name2' error on empty
)) as T;
The query produces an error.
It happens like so:
* the call to ha_json_table::rnd_next() returns 0.
* The SQL layer finds out about the error by checking thd->is_error() which returns true.
Please make ha_json_table::rnd_next() return an error.
== Rename table_function.* ==
Looking at the contets of sql/table_function.{h,cc}, one can see that there's
very little code there that is applicable to generic table function. Almost
all code is specifi to JSON_TABLE.
Because of that, please rename the files to e.g. json_table.{h,cc}.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
5

[Maria-developers] MDEV-17399: JSON_TABLE: Odd code in table.cc:create_view_field ?
by Sergey Petrunia 12 Mar '21
by Sergey Petrunia 12 Mar '21
12 Mar '21
Hi Alexey,
What does the code quoted below do? I don't recall seeing it on previous review
iterations.
In any case,
* It needs a comment about why such special handling is needed.
* It needs test coverage - I have reverted these changes and didn't see
any test to fail?
> diff --git a/sql/table.cc b/sql/table.cc
> index 4f65dbd65f4..9c205fc4be6 100644
> --- a/sql/table.cc
> +++ b/sql/table.cc
> @@ -6722,6 +6722,8 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> LEX_CSTRING *name)
> {
> bool save_wrapper= thd->lex->first_select_lex()->no_wrap_view_item;
> + bool *wrapper_to_set= thd->lex->current_select ?
> + &thd->lex->current_select->no_wrap_view_item : &save_wrapper;
> Item *field= *field_ref;
> DBUG_ENTER("create_view_field");
>
> @@ -6737,17 +6739,17 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> }
>
> DBUG_ASSERT(field);
> - thd->lex->current_select->no_wrap_view_item= TRUE;
> + *wrapper_to_set= TRUE;
> if (!field->is_fixed())
> {
> if (field->fix_fields(thd, field_ref))
> {
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> DBUG_RETURN(0);
> }
> field= *field_ref;
> }
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> if (save_wrapper)
> {
> DBUG_RETURN(field);
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
1
Hi MariaDB community!
Glad to be here! My github account is @HollowMan6. Though I'm new to MariaDB community, I'm interested in MDEV-16375 & MDEV-23143: Function to normalize a json value & missing a JSON_EQUALS function for this year's GSOC project. Here are my first thoughts on these issues:
I have checked part of the codebase and I think the two issues can be merged into one. First we can create a function named JSON_NORMALIZE to normalize the json, which automatically parses the inputed json document, recursively sorts the keys (for objects) / sorts the numbers (for arrays), removes the spaces, and then return the json document string.
Then we create a function named JSON_EQUALS, which can be used to compare 2 json documents for equality realized by first seperately normalize the two json documents using JSON_NORMALIZE, then the 2 can be compared exactly as binary strings.
I have taken some inspirations from the Item_func_json_keys and json_scan_start for parsing json documents, and I think it's possible to sort the keys using std::map in STL for objects.
That's all for my ideas so far. Please correct me if I made some mistakes, and I'm going to work on my ideas later.
Cheers!
Hollow Man
1
0

Re: [Maria-developers] MDEV-25075: Ignorable index makes the resulting CREATE TABLE invalid
by Sergey Petrunia 10 Mar '21
by Sergey Petrunia 10 Mar '21
10 Mar '21
Hi Varun,
> diff --git a/sql/lex.h b/sql/lex.h
> index 542356c0e433..5a9ec2ec1b35 100644
> --- a/sql/lex.h
> +++ b/sql/lex.h
> @@ -289,6 +289,7 @@ static SYMBOL symbols[] = {
> { "IDENTIFIED", SYM(IDENTIFIED_SYM)},
> { "IF", SYM(IF_SYM)},
> { "IGNORE", SYM(IGNORE_SYM)},
> + { "IGNORED", SYM(IGNORED_SYM)},
> { "IGNORE_DOMAIN_IDS", SYM(IGNORE_DOMAIN_IDS_SYM)},
> { "IGNORE_SERVER_IDS", SYM(IGNORE_SERVER_IDS_SYM)},
> { "IMMEDIATE", SYM(IMMEDIATE_SYM)},
Above this array one can see:
>> ...
>> NOTE!!
>> If you add or delete symbols from this file, you must also update results for
>> the perfschema.start_server_low_digest_sql_length test!
>> */
>>
>> static SYMBOL symbols[] = {
and indeed the buildbot shows failures, e.g.
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64/build…
CURRENT_TEST: perfschema.start_server_low_digest_sql_length
--- /usr/share/mysql-test/suite/perfschema/r/start_server_low_digest_sql_length.result 2021-03-10 06:23:20.000000000 +0000
+++ /dev/shm/var/4/log/start_server_low_digest_sql_length.reject 2021-03-10 07:37:16.351213429 +0000
@@ -8,5 +8,5 @@
####################################
SELECT event_name, digest, digest_text, sql_text FROM events_statements_history_long;
event_name digest digest_text sql_text
-statement/sql/select beb5bd93b7e8c45bc5cb6060804988e8 SELECT ? + ? + SELECT ...
-statement/sql/truncate faf6cefb662b443f05e97b5c5ab14a59 TRUNCATE TABLE truncat...
+statement/sql/select ade774bdfbc132a71810ede8ef469660 SELECT ? + ? + SELECT ...
+statement/sql/truncate 0f84807fb4a75d0f391f8a93e7c3c182 TRUNCATE TABLE truncat...
Please fix this.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
1
Hi, Rucha!
On Mar 08, Rucha Deodhar wrote:
> revision-id: bf677c554ad (mariadb-10.5.2-402-gbf677c554ad)
> parent(s): a1542f8a573
> author: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> committer: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> timestamp: 2021-02-22 00:37:18 +0530
> message:
>
> UTF8 and utf8
Two main comments:
* your global search-and-replace utf8 to utf8mb3 everywhere has replaced
way too much. Please, revert it.
* why did you have so many changes in tests with utf8mb4? You set the
old-mode to be utf8_is_utf8mb3 by default. It should've applied to
tests too, right? So why tests have switched to utf8mb4?
Other comments below.
> diff --git a/debian/additions/innotop/innotop b/debian/additions/innotop/innotop
> index 19229a57a82..4f1e76ea497 100644
> --- a/debian/additions/innotop/innotop
> +++ b/debian/additions/innotop/innotop
> @@ -264,9 +264,9 @@ sub get_dbh {
> MKDEBUG && _d("$dbh: $sql");
> $dbh->do($sql);
> MKDEBUG && _d('Enabling charset for STDOUT');
> - if ( $charset eq 'utf8' ) {
> - binmode(STDOUT, ':utf8')
> - or die "Can't binmode(STDOUT, ':utf8'): $OS_ERROR";
> + if ( $charset eq 'utf8mb3' ) {
> + binmode(STDOUT, ':utf8mb3')
> + or die "Can't binmode(STDOUT, ':utf8mb3'): $OS_ERROR";
It seems you performed search-and-replace over the whole source tree.
I don't think it was particularly meaningful.
Here, for example, it was wrong, it's perl code and perl charser utf8,
there's no utf8mb3 in perl.
also all internal variables you've renamed, like below:
> }
> else {
> binmode(STDOUT) or die "Can't binmode(STDOUT): $OS_ERROR";
> diff --git a/sql/field.h b/sql/field.h
> index 4a4f7cee2a5..f672ae7e315 100644
> --- a/sql/field.h
> +++ b/sql/field.h
> @@ -582,7 +582,7 @@ class Virtual_column_info: public Sql_alloc,
> public:
> /* Flag indicating that the field is physically stored in the database */
> bool stored_in_db;
> - bool utf8; /* Already in utf8 */
> + bool utf8mb3; /* Already in utf8 */
I mean, here ^^^
> bool automatic_name;
> Item *expr;
> Lex_ident name; /* Name of constraint */
> diff --git a/client/mysql.cc b/client/mysql.cc
> index f2938d4c824..b0ee239c0bc 100644
> --- a/client/mysql.cc
> +++ b/client/mysql.cc
> @@ -1918,6 +1918,10 @@ static int get_options(int argc, char **argv)
>
> if ((ho_error=handle_options(&argc, &argv, my_long_options, get_one_option)))
> return(ho_error);
> + if (!strcasecmp("utf8",default_charset))
> + {
> + default_charset=(char*)"utf8mb3";
> + }
why is that? (and the same everywhere else)
>
> *mysql_params->p_max_allowed_packet= opt_max_allowed_packet;
> *mysql_params->p_net_buffer_length= opt_net_buffer_length;
> diff --git a/client/mysqldump.c b/client/mysqldump.c
> index 7c363973da2..ad69618fbd1 100644
> --- a/client/mysqldump.c
> +++ b/client/mysqldump.c
> @@ -3160,7 +3160,7 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
>
> fprintf(sql_file,
> "SET @saved_cs_client = @@character_set_client;\n"
> - "SET character_set_client = utf8;\n"
> + "SET character_set_client = utf8mb3;\n"
why is that?
> "/*!50001 CREATE TABLE %s (\n",
> result_table);
>
> diff --git a/include/m_ctype.h b/include/m_ctype.h
> index 5fa8f28ff7a..f1a8166d9df 100644
> --- a/include/m_ctype.h
> +++ b/include/m_ctype.h
> @@ -223,6 +223,7 @@ extern MY_UNI_CTYPE my_uni_ctype[256];
> #define MY_CS_NON1TO1 0x40000 /* Has a complex mapping from characters
> to weights, e.g. contractions, expansions,
> ignorable characters */
> +#define MY_CS_UTF8_IS_UTF8MB3 0x80000
this isn't a MY_CS_xxx flag, it doesn't belong here,
see all other MY_CS_xxx flags - they are *properties* of a charset.
They are set as
struct charset_info_st my_charset_tis620_thai_ci=
{
18,0,0, /* number */
MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_NON1TO1, /* state */
charset_name_tis620, /* cs name */
"tis620_thai_ci", /* name */
that is, when defining a charset.
See below
> #define MY_CHARSET_UNDEFINED 0
>
> /* Character repertoire flags */
> diff --git a/man/mysql.1 b/man/mysql.1
> index 03f23df3660..5ad953ba5eb 100644
> --- a/man/mysql.1
> +++ b/man/mysql.1
> @@ -401,7 +401,7 @@ Use
> as the default character set for the client and connection\&.
> .sp
> A common issue that can occur when the operating system uses
> -utf8
> +utf8mb3
this didn't need changing
> or another multi\-byte character set is that output from the
> \fBmysql\fR
> client is formatted incorrectly, due to the fact that the MariaDB client uses the
> diff --git a/mysys/charset.c b/mysys/charset.c
> index 32cfeb56e2d..9d035406e8d 100644
> --- a/mysys/charset.c
> +++ b/mysys/charset.c
> @@ -361,7 +361,7 @@ static int add_collation(struct charset_info_st *cs)
> newcs->state|= MY_CS_AVAILABLE | MY_CS_LOADED | MY_CS_NONASCII;
> #endif
> }
> - else if (!strcmp(cs->csname, "utf8") || !strcmp(cs->csname, "utf8mb3"))
> + else if (!strcmp(cs->csname, "utf8mb3") || !strcmp(cs->csname, "utf8mb3"))
one more result of your mechanical search-and-replace,
now you compare with "utf8mb3" twice.
> {
> #if defined (HAVE_CHARSET_utf8mb3) && defined(HAVE_UCA_COLLATIONS)
> copy_uca_collation(newcs, newcs->state & MY_CS_NOPAD ?
> @@ -767,7 +767,7 @@ static const char*
> get_charset_name_alias(const char *name)
> {
> if (!my_strcasecmp(&my_charset_latin1, name, "utf8mb3"))
> - return "utf8";
> + return "utf8mb3";
and again
> return NULL;
> }
>
> @@ -1004,13 +1004,19 @@ CHARSET_INFO *
> get_charset_by_csname(const char *cs_name, uint cs_flags, myf flags)
> {
> MY_CHARSET_LOADER loader;
> +
> + if (!strcasecmp(cs_name, "utf8"))
> + {
> + cs_name = (const char*)(cs_flags & MY_CS_UTF8_IS_UTF8MB3 ? "utf8mb3" : "utf8mb4");
your flag should be in `myf flags` not in `uint cs_flags`
(and defined in my_sys.h with other myf flags)
> + }
> +
> my_charset_loader_init_mysys(&loader);
> return my_charset_get_by_name(&loader, cs_name, cs_flags, flags);
> }
>
>
> /**
> - Resolve character set by the character set name (utf8, latin1, ...).
> + Resolve character set by the character set name (utf8mb3, latin1, ...).
>
> The function tries to resolve character set by the specified name. If
> there is character set with the given name, it is assigned to the "cs"
> @@ -1453,8 +1459,8 @@ static const MY_CSET_OS_NAME charsets[] =
>
> {"US-ASCII", "latin1", my_cs_approx},
>
> - {"utf8", "utf8", my_cs_exact},
> - {"utf-8", "utf8", my_cs_exact},
> + {"utf8mb3", "utf8mb3", my_cs_exact},
> + {"utf-8mb3", "utf8mb3", my_cs_exact},
utf-8mb3 really? :)
> #endif
> {NULL, NULL, 0}
> };
> diff --git a/plugin/win_auth_client/common.cc b/plugin/win_auth_client/common.cc
> index 8b7319252ac..f46b0ec9696 100644
> --- a/plugin/win_auth_client/common.cc
> +++ b/plugin/win_auth_client/common.cc
> @@ -338,7 +338,7 @@ UPN::UPN(): m_buf(NULL)
> m_buf= wchar_to_utf8(buf1, &m_len);
>
> if(!m_buf)
> - ERROR_LOG(ERROR, ("Failed to convert UPN to utf8"));
> + ERROR_LOG(ERROR, ("Failed to convert UPN to utf8mb3"));
probably wrong too (everything in this file), see the code - it uses
windows utf8 charset, not mariadb's utf8
>
> // Note: possible error would be indicated by the fact that m_buf is NULL.
> return;
> diff --git a/sql/mysqld.cc b/sql/mysqld.cc
> index 0bf21e02002..fd131497f8c 100644
> --- a/sql/mysqld.cc
> +++ b/sql/mysqld.cc
> @@ -4053,6 +4056,24 @@ static int init_common_variables()
> break;
> }
>
> + if (default_collation_name)
> + {
> + char *copy_of_name= (char*)default_collation_name;
> + char start[6];
> + strncpy(start, default_collation_name, 5);
> + char *fname= (char *)"utf8mb3_";
> + if (! strncasecmp("utf8_", start,5))
> + {
> + copy_of_name+= 5;
> + char result[64];
> + result[63]='\0';
> + strcpy(result, fname);
> + strcat(result, copy_of_name);
> + result[strlen(copy_of_name)+strlen(fname)]='\0';
> + default_collation_name= (char *) result;
> + }
> + }
1. wrong indentation
2. shouldn't it be in the get_charset_by_name ?
3. shouldn't it depend on the UTF8_IS_UTF8MB3 ?
> +
> if (default_collation_name)
> {
> CHARSET_INFO *default_collation;
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 50b746fe514..f0616b40361 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -5248,6 +5249,65 @@ class THD: public THD_count, /* this must be first */
> Item *sp_prepare_func_item(Item **it_addr, uint cols= 1);
> bool sp_eval_expr(Field *result_field, Item **expr_item_ptr);
>
> + CHARSET_INFO *get_charset_by_csname(const char *cs_name,
> + uint cs_flags,
> + myf myf_flags) const
> + {
> + return ::get_charset_by_csname(cs_name,
> + this->variables.old_behavior & OLD_MODE_UTF8_IS_UTF8MB3 ?
> + cs_flags | MY_CS_UTF8_IS_UTF8MB3 : cs_flags,
> + myf_flags);
> + }
> +
> + inline const char* get_collation_or_charset_name(const char* name)
shouldn't the name be get_collation_name_alias() ?
all other method names you've created match corresponding
charset-related function names
> + {
> + char *copy_of_name= (char*)name;
> + char start[6];
> + strncpy(start, name, 5);
> + char *fname= (char *)(this->variables.old_behavior & OLD_MODE_UTF8_IS_UTF8MB3?"utf8mb3_":"utf8mb4_");
> + if (!strncasecmp("utf8_", start,5))
> + {
> + copy_of_name+= 5;
> + char result[64];
> + result[63]='\0';
> + strcpy(result, fname);
> + strcat(result, copy_of_name);
> + result[strlen(copy_of_name)+strlen(fname)]='\0';
> + name= (const char *) result;
> + }
> + return name;
here you return a pointer to the local variable
also: indentation, too long line, end-of-line spaces
> + }
> +
> + inline CHARSET_INFO *
> + mysqld_collation_get_by_name(const char *name,
> + CHARSET_INFO *name_cs= system_charset_info)
> + {
> + name=this->get_collation_or_charset_name(name);
> + return ::mysqld_collation_get_by_name(name, name_cs);
> +}
> +
> +CHARSET_INFO *get_charset_by_name(const char *cs_name, myf flags)
> +{
> +
> + cs_name=this->get_collation_or_charset_name(cs_name);
> + return ::get_charset_by_name(cs_name,flags);
here it'd be better to pass MY_UTF8_IS_UTF8MB3 in flags
> +}
> +my_bool resolve_charset(const char *cs_name,
> + CHARSET_INFO *default_cs,
> + CHARSET_INFO **cs)
> +{
> + cs_name=this->get_collation_or_charset_name(cs_name);
> + return ::resolve_charset(cs_name,default_cs,cs);
perhaps it'd make sense to introduce flags argument here too?
> +}
> +
> +my_bool resolve_collation(const char *cl_name,
> + CHARSET_INFO *default_cl,
> + CHARSET_INFO **cl)
> +{
> + cl_name= this->get_collation_or_charset_name(cl_name);
> + return ::resolve_collation(cl_name,default_cl,cl);
> +}
please make sure the indentation is correct, I won't comment on it anymore
> +
> };
>
>
> diff --git a/sql/sql_db.cc b/sql/sql_db.cc
> index 9bf16220535..85e0e0dbd2f 100644
> --- a/sql/sql_db.cc
> +++ b/sql/sql_db.cc
> @@ -583,9 +583,13 @@ bool load_db_opt(THD *thd, const char *path, Schema_specification_st *create)
> default-collation commands.
> */
> if (!(create->default_table_charset=
> - get_charset_by_csname(pos+1, MY_CS_PRIMARY, MYF(0))) &&
> + thd->get_charset_by_csname(pos+1, thd->variables.old_behavior &
> + OLD_MODE_UTF8_IS_UTF8MB3 ?
> + MY_CS_UTF8_IS_UTF8MB3 | MY_CS_PRIMARY :
> + MY_CS_PRIMARY,
> + MYF(0))) &&
that's a bit redundant, don't you think?
you use both thd-> method that replaces utf_ with a real charset name
and *also* you set MY_CS_UTF8_IS_UTF8MB3 flag.
there are other places like that, to keep the review shorter I won't comment
on them below.
> !(create->default_table_charset=
> - get_charset_by_name(pos+1, MYF(0))))
> + thd->get_charset_by_name(pos+1, MYF(0))))
> {
> sql_print_error("Error while loading database options: '%s':",path);
> sql_print_error(ER_THD(thd, ER_UNKNOWN_CHARACTER_SET),pos+1);
> diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
> index a75066271d9..fd480bdced2 100644
> --- a/sql/sys_vars.cc
> +++ b/sql/sys_vars.cc
> @@ -3705,7 +3709,7 @@ static Sys_var_set Sys_old_behavior(
> "old_mode",
> "Used to emulate old behavior from earlier MariaDB or MySQL versions",
> SESSION_VAR(old_behavior), CMD_LINE(REQUIRED_ARG),
> - old_mode_names, DEFAULT(0));
> + old_mode_names, DEFAULT(8));
DEFAULT(OLD_MODE_UTF8_IS_UTF8MB3) please
>
> #if defined(HAVE_OPENSSL) && !defined(EMBEDDED_LIBRARY)
> #define SSL_OPT(X) CMD_LINE(REQUIRED_ARG,X)
> diff --git a/storage/connect/ha_connect.cc b/storage/connect/ha_connect.cc
> index 849cf7594eb..b6e8f3b3226 100644
> --- a/storage/connect/ha_connect.cc
> +++ b/storage/connect/ha_connect.cc
> @@ -6099,8 +6099,8 @@ static int connect_assisted_discovery(handlerton *, THD* thd,
> case FLD_NAME:
> if (ttp == TAB_PRX ||
> (ttp == TAB_CSV && topt->data_charset &&
> - (!stricmp(topt->data_charset, "UTF8") ||
> - !stricmp(topt->data_charset, "UTF-8"))))
> + (!stricmp(topt->data_charset, "UTF8MB3") ||
> + !stricmp(topt->data_charset, "UTF-8MB3"))))
UTF-8MB3 again
> cnm= crp->Kdata->GetCharValue(i);
> else
> cnm= encode(g, crp->Kdata->GetCharValue(i));
> diff --git a/storage/mroonga/vendor/groonga/CMakeLists.txt b/storage/mroonga/vendor/groonga/CMakeLists.txt
> index d271d4c4eb9..fa5d6f97ddd 100644
> --- a/storage/mroonga/vendor/groonga/CMakeLists.txt
> +++ b/storage/mroonga/vendor/groonga/CMakeLists.txt
> @@ -95,7 +95,7 @@ set(GRN_LOG_PATH
> "${CMAKE_INSTALL_PREFIX}/var/log/${GRN_PROJECT_NAME}/${GRN_PROJECT_NAME}.log"
> CACHE FILEPATH "log file path")
> set(GRN_DEFAULT_ENCODING
> - "utf8"
> + "utf8mb3"
utf8 is better here
> CACHE STRING "Groonga's default encoding")
> set(GRN_DEFAULT_MATCH_ESCALATION_THRESHOLD
> 0
> diff --git a/storage/perfschema/unittest/pfs_connect_attr-t.cc b/storage/perfschema/unittest/pfs_connect_attr-t.cc
> index b57ead3ec26..7e5b511d543 100644
> --- a/storage/perfschema/unittest/pfs_connect_attr-t.cc
> +++ b/storage/perfschema/unittest/pfs_connect_attr-t.cc
> @@ -343,7 +343,12 @@ int main(int, char **)
> {
> MY_INIT("pfs_connect_attr-t");
>
> - cs_cp1251= get_charset_by_csname("cp1251", MY_CS_PRIMARY, MYF(0));
> + cs_cp1251= thd->get_charset_by_csname("cp1251",
> + thd->variables.old_behavior &
> + OLD_MODE_UTF8_IS_UTF8M3 ?
> + MY_CS_PRIMARY | MY_CS_UTF8_IS_UTF8MB3 :
well, here there is surely no reason to use MY_CS_UTF8_IS_UTF8MB3,
because the charset is a hard-coded literal and it's not "utf8", is it?
> + MY_CS_PRIMARY,
> + MYF(0));
> if (!cs_cp1251)
> diag("skipping the cp1251 tests : missing character set");
> plan(59 + (cs_cp1251 ? 10 : 0));
> diff --git a/strings/ctype.c b/strings/ctype.c
> index 46951c3ae1f..9054cbfc3e6 100644
> --- a/strings/ctype.c
> +++ b/strings/ctype.c
> @@ -37,7 +37,7 @@
> */
>
> const char charset_name_latin2[]= "latin2";
> -const char charset_name_utf8[]= "utf8";
> +const char charset_name_utf8[]= "utf8mb3";
better rename the variable to charset_name_utf8mb3
> const char charset_name_utf16[]= "utf16";
> const char charset_name_utf32[]= "utf32";
> const char charset_name_ucs2[]= "ucs2";
> diff --git a/tests/mysql_client_test.c b/tests/mysql_client_test.c
> index 0043786d477..37da9e1f62e 100644
> --- a/tests/mysql_client_test.c
> +++ b/tests/mysql_client_test.c
> @@ -405,7 +405,7 @@ static void test_prepare_simple()
>
> /* update */
> strmov(query, "UPDATE test_prepare_simple SET id=? "
> - "WHERE id=? AND CONVERT(name USING utf8)= ?");
> + "WHERE id=? AND CONVERT(name USING utf8mb3)= ?");
I suspect all tests should better use "utf8"
> stmt= mysql_simple_prepare(mysql, query);
> check_stmt(stmt);
>
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Hi Varun,
I'm still looking at MDEV-8306 code. Sending the input I have so far. There's
nothing important but I thought I'd share it now so you can address it while
I'm continuing with the review.
> propagate_equal_field_for_orderby
Please use "order_by" as in all other functions, and also change "field" to
"fields". The name becomes propagate_equal_fields_for_order_by.
> void JOIN::propagate_equal_field_for_orderby()
> {
> if (!sort_nest_possible)
> return;
> ORDER *ord;
> for (ord= order; ord; ord= ord->next)
> {
> if (optimizer_flag(thd, OPTIMIZER_SWITCH_ORDERBY_EQ_PROP) && cond_equal)
> {
Is there any reason not to check optimizer_switch flag value once at the start
of the function?
> void JOIN_TAB::find_keys_that_can_achieve_ordering()
> {
> if (!join->sort_nest_possible)
> return;
>
> table->keys_with_ordering.clear_all();
> for (uint index= 0; index < table->s->keys; index++)
> {
> if (table->keys_in_use_for_query.is_set(index) &&
> test_if_order_by_key(join, join->order, table, index))
> table->keys_with_ordering.set_bit(index);
> }
> table->keys_with_ordering.intersect(table->keys_in_use_for_order_by);
> }
Is there any reason to call test_if_order_by_key() and then interset with
keys_in_use_for_order_by? Why not just check that bitmap first, like it's done
for keys_with_ordering?
...
in struct JOIN_TAB:
> void find_keys_that_can_achieve_ordering();
> bool check_if_index_satisfies_ordering(int index_used);
are "achieving ordering" and "satisfying ordering" the same thing? If yes, it's
better to use one term for this.
> Item* Item_subselect::transform(THD *thd, Item_transformer transformer,
> bool transform_subquery, uchar *arg)
We've already discussed this, but I'll mention it here to keep track:
this should ON expressions.
I would also consider adding an assert that the subquery's joins do not have
the query plans, yet.
> class Field {
> ...
> void statistics_available_via_keys();
> void statistics_available_via_stat_tables();
These names need to be better.
Function names typically start with a verb. This is especially true for
functions that return nothing.
> bool Item_func_between::predicate_selectivity_checker(void *arg)
> bool Item_func_in::predicate_selectivity_checker(void *arg)
These functions need a comment.
> bool Item_equal::predicate_selectivity_checker(void *arg)
> {
> ...
> while (it++)
> {
> Field *field= it.get_curr_field();
> field->is_range_statistics_available();
What is the above? Does the function have some side-effect? This needs to be
fixed or at least commented.
...
later in the same function:
> it.rewind();
> Item *item= (it++);
> SAME_FIELD *same_field_arg= (SAME_FIELD *) arg;
>
> if (same_field_arg->item == NULL)
> {
> item->is_resolved_by_same_column(arg);
> return false;
> }
This looks very confusing, too. Why are we checking the first element in the
Item_equal. Are we assuming it is the constant? But the execution reaches
this point even when Item_equal doesn't contain a constant?
> bool Item_equal::is_statistics_available()
Maybe, rename this to is_range_statistics_available() to make it clear with
what kind of statistics is it?
> void add_nest_tables_to_trace(Mat_join_tab_nest_info* nest_info)
rename this to trace_XXX() to be uniform with other tracing functions?
e.g. rename to trace_tables_in_sort_nest
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
0