developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 6814 discussions
[Maria-developers] 0038c24f2d3: MDEV-16146: MariaDB slave stops with following errors.
by sujatha 18 Mar '21
by sujatha 18 Mar '21
18 Mar '21
revision-id: 0038c24f2d388b9b0f3bff9219a53c8900153d02 (mariadb-10.5.4-539-g0038c24f2d3)
parent(s): 190a8312f598fc4892331225104297f6288f23ac
author: Sujatha
committer: Sujatha
timestamp: 2021-03-18 18:41:30 +0530
message:
MDEV-16146: MariaDB slave stops with following errors.
Problem:
========
180511 11:07:58 [ERROR] Slave I/O: Unexpected master's heartbeat data:
heartbeat is not compatible with local info;the event's data: log_file_name
mysql-bin.000009 log_pos 1054262041, Error_code: 1623
Analysis:
=========
In replication setup when master server doesn't have any events to send to
slave server it sends an 'Heartbeat_log_event'. This event carries the
current binary log filename and offset details. The offset values is stored
within 4 bytes of event header. When the size of binary log is higher
than UINT32_MAX the log_pos values will not fit in 4 bytes memory. It
overflows and hence slave stops with an error.
Fix:
===
Since we cannot extend the fixed header of Log_event class an additional
header named 'extra_header' is introduced. 'extra_header' contains
HB_FLAGS - 2 bytes
HB_LONG_LOG_POS_OFFSET - 8 bytes
This 'extra_header' is added only in a case where log_pos > UINT32_MAX and
'HB_LONG_LOG_POS_OFFSET_F' will be enabled. On slave while reading the
'Heartbeat_log_event' if 'HB_LONG_LOG_POS_OFFSET_F' is found to be set then
value of 'log_pos' is extracted from 'extra_header'.
---
.../suite/rpl/r/rpl_incompatible_heartbeat.result | 20 +++++++++
.../suite/rpl/t/rpl_incompatible_heartbeat.test | 47 ++++++++++++++++++++++
sql/log_event.h | 19 ++++++++-
sql/log_event_server.cc | 8 +++-
sql/slave.cc | 14 +++++++
sql/sql_repl.cc | 26 ++++++++++--
6 files changed, 128 insertions(+), 6 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_incompatible_heartbeat.result b/mysql-test/suite/rpl/r/rpl_incompatible_heartbeat.result
new file mode 100644
index 00000000000..fe2b0436f2d
--- /dev/null
+++ b/mysql-test/suite/rpl/r/rpl_incompatible_heartbeat.result
@@ -0,0 +1,20 @@
+include/master-slave.inc
+[connection master]
+connection master;
+SET @saved_dbug = @@GLOBAL.debug_dbug;
+SET @@global.debug_dbug= '+d,simulate_pos_4G';
+connection slave;
+include/stop_slave.inc
+SET @saved_dbug = @@GLOBAL.debug_dbug;
+SET @@global.debug_dbug= '+d,simulate_pos_4G';
+CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD=0.001;
+include/start_slave.inc
+connection master;
+SET @@GLOBAL.debug_dbug = @saved_dbug;
+connection slave;
+SET @@GLOBAL.debug_dbug = @saved_dbug;
+connection master;
+CREATE TABLE t (f INT) ENGINE=INNODB;
+INSERT INTO t VALUES (10);
+DROP TABLE t;
+include/rpl_end.inc
diff --git a/mysql-test/suite/rpl/t/rpl_incompatible_heartbeat.test b/mysql-test/suite/rpl/t/rpl_incompatible_heartbeat.test
new file mode 100644
index 00000000000..b1aa5435ff4
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_incompatible_heartbeat.test
@@ -0,0 +1,47 @@
+# ==== Purpose ====
+#
+# Test verifies that slave IO thread can process heartbeat events with log_pos
+# values higher than UINT32_MAX.
+#
+# ==== Implementation ====
+#
+# Steps:
+# 0 - Stop slave threads. Configure a small master_heartbeat_period.
+# 1 - Using debug points, simulate a huge binlog offset higher than
+# UINT32_MAX on master.
+# 2 - Start the slave and observe that slave IO thread is able to process
+# the offset received through heartbeat event.
+#
+# ==== References ====
+#
+# MDEV-16146: MariaDB slave stops with incompatible heartbeat
+#
+--source include/have_debug.inc
+--source include/have_innodb.inc
+--source include/have_binlog_format_mixed.inc
+# Test simulates binarylog offsets higher than UINT32_MAX
+--source include/have_64bit.inc
+--source include/master-slave.inc
+
+--connection master
+SET @saved_dbug = @@GLOBAL.debug_dbug;
+SET @@global.debug_dbug= '+d,simulate_pos_4G';
+
+--connection slave
+--source include/stop_slave.inc
+SET @saved_dbug = @@GLOBAL.debug_dbug;
+SET @@global.debug_dbug= '+d,simulate_pos_4G';
+CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD=0.001;
+--source include/start_slave.inc
+
+--connection master
+sleep 1;
+SET @@GLOBAL.debug_dbug = @saved_dbug;
+--sync_slave_with_master
+SET @@GLOBAL.debug_dbug = @saved_dbug;
+
+--connection master
+CREATE TABLE t (f INT) ENGINE=INNODB;
+INSERT INTO t VALUES (10);
+DROP TABLE t;
+--source include/rpl_end.inc
diff --git a/sql/log_event.h b/sql/log_event.h
index 4e193232f4b..72d8766cea5 100644
--- a/sql/log_event.h
+++ b/sql/log_event.h
@@ -576,6 +576,22 @@ class String;
#define MARIA_SLAVE_CAPABILITY_MINE MARIA_SLAVE_CAPABILITY_GTID
+#define HB_EXTRA_HEADER_LEN 10
+/*
+ Extra header contains flags specific to Heartbeat_log_event and
+ respective data. The length needs to be updated when new memebers are
+ added to this header
+*/
+#define HB_FLAGS_OFFSET 0
+/*
+ When the size of 'log_pos' within Heartbeat_log_event exceeds UINT_MAX32 it
+ cannot be stored in standard header as 'log_pos' is of 4 bytes size. Hence
+ extra_header is introduced. First 2 bytes represent flags. In case of long
+ 'log_pos' value 'HB_LONG_LOG_POS_OFFSET_F' bit within the flag will be set.
+ The log_pos is stored witin 'long_log_pos' variables.
+*/
+#define HB_LONG_LOG_POS_OFFSET_F 0x1
+#define HB_LONG_LOG_POS_OFFSET 2
/**
@enum Log_event_type
@@ -5718,7 +5734,8 @@ bool copy_cache_to_file_wrapped(IO_CACHE *body,
class Heartbeat_log_event: public Log_event
{
public:
- Heartbeat_log_event(const char* buf, uint event_len,
+ uint16 hb_flags;
+ Heartbeat_log_event(const char* buf, ulong event_len,
const Format_description_log_event* description_event);
Log_event_type get_type_code() { return HEARTBEAT_LOG_EVENT; }
bool is_valid() const
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
index 6dee9b9adf6..2701f4f2f9a 100644
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc
@@ -8493,14 +8493,18 @@ void Ignorable_log_event::pack_info(Protocol *protocol)
#if defined(HAVE_REPLICATION)
-Heartbeat_log_event::Heartbeat_log_event(const char* buf, uint event_len,
+Heartbeat_log_event::Heartbeat_log_event(const char* buf, ulong event_len,
const Format_description_log_event* description_event)
:Log_event(buf, description_event)
{
+ uint16 hb_flags;
uint8 header_size= description_event->common_header_len;
- ident_len = event_len - header_size;
+ ident_len = event_len - (header_size + HB_EXTRA_HEADER_LEN);
set_if_smaller(ident_len,FN_REFLEN-1);
log_ident= buf + header_size;
+ hb_flags= uint2korr(buf + header_size + ident_len);
+ if (hb_flags & HB_LONG_LOG_POS_OFFSET_F)
+ log_pos= uint8korr(buf + header_size + ident_len + HB_LONG_LOG_POS_OFFSET);
}
#endif
diff --git a/sql/slave.cc b/sql/slave.cc
index 1da030084ef..f365a1e7b5c 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -6493,6 +6493,12 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
TODO: handling `when' for SHOW SLAVE STATUS' snds behind
*/
+ DBUG_EXECUTE_IF("simulate_pos_4G",
+ {
+ inc_pos= mi->master_log_pos; // temp_save
+ mi->master_log_pos= ((ulong)2394967295);
+ };);
+
if (memcmp(mi->master_log_name, hb.get_log_ident(), hb.get_ident_len()) ||
mi->master_log_pos > hb.log_pos) {
/* missed events of heartbeat from the past */
@@ -6504,6 +6510,14 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
error_msg.append_ulonglong(hb.log_pos);
goto err;
}
+ DBUG_EXECUTE_IF("simulate_pos_4G",
+ {
+ if (hb.log_pos > UINT32_MAX)
+ {
+ mi->master_log_pos= inc_pos; // restore
+ DBUG_SET("-d, simulate_pos_4G");
+ }
+ };);
/*
Heartbeat events doesn't count in the binlog size, so we don't have to
diff --git a/sql/sql_repl.cc b/sql/sql_repl.cc
index 010ea794254..d68ae05b60e 100644
--- a/sql/sql_repl.cc
+++ b/sql/sql_repl.cc
@@ -824,12 +824,14 @@ get_slave_until_gtid(THD *thd, String *out_str)
*/
static int send_heartbeat_event(binlog_send_info *info,
NET* net, String* packet,
- const struct event_coordinates *coord,
+ struct event_coordinates *coord,
enum enum_binlog_checksum_alg checksum_alg_arg)
{
DBUG_ENTER("send_heartbeat_event");
ulong ev_offset;
+ char extra_buf[HB_EXTRA_HEADER_LEN];
+ uint16 hb_flags= 0;
if (reset_transmit_packet(info, info->flags, &ev_offset, &info->errmsg))
DBUG_RETURN(1);
@@ -850,19 +852,30 @@ static int send_heartbeat_event(binlog_send_info *info,
size_t event_len = ident_len + LOG_EVENT_HEADER_LEN +
(do_checksum ? BINLOG_CHECKSUM_LEN : 0);
int4store(header + SERVER_ID_OFFSET, global_system_variables.server_id);
+ if (coord->pos <= UINT32_MAX)
+ {
+ int4store(header + LOG_POS_OFFSET, coord->pos); // log_pos
+ }
+ else
+ {
+ hb_flags|= HB_LONG_LOG_POS_OFFSET_F;
+ int2store(extra_buf + HB_FLAGS_OFFSET, hb_flags);
+ int8store(extra_buf + HB_LONG_LOG_POS_OFFSET, coord->pos);
+ event_len+= HB_EXTRA_HEADER_LEN;
+ }
int4store(header + EVENT_LEN_OFFSET, event_len);
int2store(header + FLAGS_OFFSET, 0);
- int4store(header + LOG_POS_OFFSET, coord->pos); // log_pos
-
packet->append(header, sizeof(header));
packet->append(p, ident_len); // log_file_name
+ packet->append(extra_buf, sizeof(extra_buf));
if (do_checksum)
{
char b[BINLOG_CHECKSUM_LEN];
ha_checksum crc= my_checksum(0, (uchar*) header, sizeof(header));
crc= my_checksum(crc, (uchar*) p, ident_len);
+ crc= my_checksum(crc, (uchar*) extra_buf, sizeof(extra_buf));
int4store(b, crc);
packet->append(b, sizeof(b));
}
@@ -2525,6 +2538,13 @@ static int wait_new_events(binlog_send_info *info, /* in */
}
#endif
mysql_bin_log.unlock_binlog_end_pos();
+
+ DBUG_EXECUTE_IF("simulate_pos_4G",
+ {
+ coord.pos= ((ulong)5394967295);
+ DBUG_SET("-d, simulate_pos_4G");
+ };);
+
ret= send_heartbeat_event(info,
info->net, info->packet, &coord,
info->current_checksum_alg);
1
0
Re: [Maria-developers] 9acabefd276: MDEV-22010: use executables MariaDB named in scripts
by Sergei Golubchik 17 Mar '21
by Sergei Golubchik 17 Mar '21
17 Mar '21
Hi, Rucha!
On Mar 17, Rucha Deodhar wrote:
>
> mysql-test-run and mysql-stress-test was also supposed to be renamed as
> part of this task.
> I need to check again if debs and rpms package symlink.
>
> >
> > > diff --git a/mysql-test/.mtr b/mysql-test/.mtr
> > > new file mode 120000
> > > index 00000000000..68986c8cd7e
> > > --- /dev/null
> > > +++ b/mysql-test/.mtr
> > > @@ -0,0 +1 @@
> > > +./mariadb-test-run
> >
> > What is that? What is .mtr file?
>
> I think after i built it got committed along with other files.
please remove it from the commit
> > > diff --git a/mysql-test/mysql-stress-test.pl b/mysql-test/
> > mysql-stress-test.pl
> > > new file mode 120000
> > > index 00000000000..e704a5d70bf
> > > --- /dev/null
> > > +++ b/mysql-test/mysql-stress-test.pl
> > > @@ -0,0 +1 @@
> > > +./mariadb-stress-test.pl
> >
> > looks like you've checked in files you shouldn't have. Like symlinks.
>
> As mysql-test-run and mysql-stress-test was supposed to be renamed too. So,
> I created a symlink
> so that ./mysql-test-run and ./mysql-stress-test would still work if
> anybody uses it.
symlinks - yes. But they are created by cmake during the build, they
should not be added to a commit. please, remove.
You might want to add them to the .gitignore file though.
> > > \ No newline at end of file
> > > diff --git a/scripts/msql2mysql.sh b/scripts/msql2mysql.sh
> > > index 72a609fa6e7..11923fa15c7 100644
> > > --- a/scripts/msql2mysql.sh
> > > +++ b/scripts/msql2mysql.sh
> > > @@ -16,4 +16,4 @@
> > > # Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston
> > > # MA 02110-1335 USA.
> > >
> > > -@bindir@/replace msqlConnect mysql_connect msqlListDBs mysql_list_dbs
> > msqlNumRows mysql_num_rows msqlFetchRow mysql_fetch_row msqlFetchField
> > mysql_fetch_field msqlFreeResult mysql_free_result msqlListFields
> > mysql_list_fields msqlListTables mysql_list_tables msqlErrMsg
> > 'mysql_error(mysql)' msqlStoreResult mysql_store_result msqlQuery
> > mysql_query msqlField mysql_field msqlSelect mysql_select msqlSelectDB
> > mysql_select_db msqlNumFields mysql_num_fields msqlClose mysql_close
> > msqlDataSeek mysql_data_seek m_field MYSQL_FIELD m_result MYSQL_RES m_row
> > MYSQL_ROW msql mysql mSQL mySQL MSQL MYSQL msqlCreateDB mysql_create_db
> > msqlDropDB mysql_drop_db msqlFieldSeek mysql_field_seek -- $*
> > > +@bindir@/replace msqlConnect mysql_connect msqlListDBs mysql_list_dbs
> > msqlNumRows mysql_num_rows msqlFetchRow mysql_fetch_row msqlFetchField
> > mysql_fetch_field msqlFreeResult mysql_free_result msqlListFields
> > mysql_list_fields msqlListTables mysql_list_tables msqlErrMsg
> > 'mysql_error(mysql)' msqlStoreResult mysql_store_result msqlQuery
> > mysql_query msqlField mysql_field msqlSelect mysql_select msqlSelectDB
> > mysql_select_db msqlNumFields mysql_num_fields msqlClose mysql_close
> > msqlDataSeek mysql_data_seek m_field MYSQL_FIELD m_result MYSQL_RES m_row
> > MYSQL_ROW msql mysql mysql mariadb mSQL mySQL MSQL MYSQL msqlCreateDB
> > mysql_create_db msqlDropDB mysql_drop_db msqlFieldSeek mysql_field_seek --
> > $*
> >
> > Why?
>
> So that mysql in bin directory would get replaced with mariadb
no, that's not it. See other renames,
msqlListTables -> mysql_list_tables,
msqlFetchRow -> mysql_fetch_row
etc
this (prehistoric) script was supposed to help migrating programs that
work with mSQL (https://en.wikipedia.org/wiki/MSQL) to MySQL by renaming
C API calls in the source code.
In short, revert this change, it doesn't belong to this script.
> > > diff --git a/scripts/mysql_install_db.sh b/scripts/mysql_install_db.sh
> > > index 5f183afe8fc..084af42d01c 100644
> > > --- a/scripts/mysql_install_db.sh
> > > +++ b/scripts/mysql_install_db.sh
> > > @@ -338,11 +338,15 @@ then
> > > cannot_find_file resolveip @resolveip_locations@
> > > exit 1
> > > fi
> > > - mysqld=`find_in_dirs mysqld @mysqld_locations@`
> > > + mysqld=`find_in_dirs mariadbd @mysqld_locations@`
> > > if test -z "$mysqld"
> > > then
> > > - cannot_find_file mysqld @mysqld_locations@
> > > - exit 1
> > > + mysqld=`find_in_dirs mysqld @mysqld_locations@`
> > > + if test -z "$mysqld"
> > > + then
> > > + cannot_find_file mysqld @mysqld_locations@
> >
> > Hmm. At least, it should be mariadbd in the cannot_find_file line.
> >
> > Why do you search for mysqld after mariadbd wasn't found?
> >
> I added it for additional check, just in case there is mysqld and not
> mariadbd or if it could affect compatibility.
I don't think it could. These scripts you've edited are installed
together with MariaDB, there is no supported configuration where these
scripts exist and mariadb* binaries don't.
That is, I believe this extra check is not needed.
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Re: [Maria-developers] 9acabefd276: MDEV-22010: use executables MariaDB named in scripts
by Sergei Golubchik 17 Mar '21
by Sergei Golubchik 17 Mar '21
17 Mar '21
Hi, Rucha!
On Mar 17, Rucha Deodhar wrote:
> revision-id: 9acabefd276 (mariadb-10.5.2-386-g9acabefd276)
> parent(s): de407e7cb4d
> author: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> committer: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> timestamp: 2021-02-11 15:19:29 +0530
> message:
>
> MDEV-22010: use executables MariaDB named in scripts
>
> As a part of this MDEV following changes were made:
> 1) Mariadb named executables used instead of mysql named executables in scripts
> 2) renamed mysql-test-run and mysql-stress-test to mariadb-test-run and
> mariadb-stress-test and created a symlink.
>
> diff --git a/debian/mariadb-test.links b/debian/mariadb-test.links
> index 718a5355474..b3577667d01 100644
> --- a/debian/mariadb-test.links
> +++ b/debian/mariadb-test.links
> @@ -2,5 +2,7 @@ usr/bin/mariadb-client-test usr/bin/mysql_client_test
> usr/bin/mariadb-client-test-embedded usr/bin/mysql_client_test_embedded
> usr/bin/mariadb-test usr/bin/mysqltest
> usr/bin/mariadb-test-embedded usr/bin/mysqltest_embedded
> -usr/share/mysql/mysql-test/mysql-test-run.pl usr/share/mysql/mysql-test/mtr
> -usr/share/mysql/mysql-test/mysql-test-run.pl usr/share/mysql/mysql-test/mysql-test-run
> +usr/share/mysql/mysql-test/mariadb-test-run.pl usr/share/mysql/mysql-test/mysql-test-run.pl
> +usr/share/mysql/mysql-test/mariadb-stress-test.pl usr/share/mysql/mysql-test/mysql-stress-test.pl
> +usr/share/mysql/mysql-test/mariadb-test-run.pl usr/share/mysql/mysql-test/mysql-test-run
> +usr/share/mysql/mysql-test/mariadb-test-run.pl usr/share/mysql/mysql-test/mtr
Not sure that was part of the task, but ok.
Did you check that both debs and rpms package all new symlinks correctly?
> diff --git a/mysql-test/.mtr b/mysql-test/.mtr
> new file mode 120000
> index 00000000000..68986c8cd7e
> --- /dev/null
> +++ b/mysql-test/.mtr
> @@ -0,0 +1 @@
> +./mariadb-test-run
What is that? What is .mtr file?
> \ No newline at end of file
> diff --git a/mysql-test/mysql-stress-test.pl b/mysql-test/mysql-stress-test.pl
> new file mode 120000
> index 00000000000..e704a5d70bf
> --- /dev/null
> +++ b/mysql-test/mysql-stress-test.pl
> @@ -0,0 +1 @@
> +./mariadb-stress-test.pl
looks like you've checked in files you shouldn't have. Like symlinks.
> \ No newline at end of file
> diff --git a/mysql-test/mysql-test-run.pl b/mysql-test/mysql-test-run.pl
> new file mode 120000
> index 00000000000..56150a0ecad
> --- /dev/null
> +++ b/mysql-test/mysql-test-run.pl
> @@ -0,0 +1 @@
> +./mariadb-test-run.pl
another one
> \ No newline at end of file
> diff --git a/scripts/msql2mysql.sh b/scripts/msql2mysql.sh
> index 72a609fa6e7..11923fa15c7 100644
> --- a/scripts/msql2mysql.sh
> +++ b/scripts/msql2mysql.sh
> @@ -16,4 +16,4 @@
> # Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston
> # MA 02110-1335 USA.
>
> -@bindir@/replace msqlConnect mysql_connect msqlListDBs mysql_list_dbs msqlNumRows mysql_num_rows msqlFetchRow mysql_fetch_row msqlFetchField mysql_fetch_field msqlFreeResult mysql_free_result msqlListFields mysql_list_fields msqlListTables mysql_list_tables msqlErrMsg 'mysql_error(mysql)' msqlStoreResult mysql_store_result msqlQuery mysql_query msqlField mysql_field msqlSelect mysql_select msqlSelectDB mysql_select_db msqlNumFields mysql_num_fields msqlClose mysql_close msqlDataSeek mysql_data_seek m_field MYSQL_FIELD m_result MYSQL_RES m_row MYSQL_ROW msql mysql mSQL mySQL MSQL MYSQL msqlCreateDB mysql_create_db msqlDropDB mysql_drop_db msqlFieldSeek mysql_field_seek -- $*
> +@bindir@/replace msqlConnect mysql_connect msqlListDBs mysql_list_dbs msqlNumRows mysql_num_rows msqlFetchRow mysql_fetch_row msqlFetchField mysql_fetch_field msqlFreeResult mysql_free_result msqlListFields mysql_list_fields msqlListTables mysql_list_tables msqlErrMsg 'mysql_error(mysql)' msqlStoreResult mysql_store_result msqlQuery mysql_query msqlField mysql_field msqlSelect mysql_select msqlSelectDB mysql_select_db msqlNumFields mysql_num_fields msqlClose mysql_close msqlDataSeek mysql_data_seek m_field MYSQL_FIELD m_result MYSQL_RES m_row MYSQL_ROW msql mysql mysql mariadb mSQL mySQL MSQL MYSQL msqlCreateDB mysql_create_db msqlDropDB mysql_drop_db msqlFieldSeek mysql_field_seek -- $*
Why?
> diff --git a/scripts/mysql_install_db.sh b/scripts/mysql_install_db.sh
> index 5f183afe8fc..084af42d01c 100644
> --- a/scripts/mysql_install_db.sh
> +++ b/scripts/mysql_install_db.sh
> @@ -338,11 +338,15 @@ then
> cannot_find_file resolveip @resolveip_locations@
> exit 1
> fi
> - mysqld=`find_in_dirs mysqld @mysqld_locations@`
> + mysqld=`find_in_dirs mariadbd @mysqld_locations@`
> if test -z "$mysqld"
> then
> - cannot_find_file mysqld @mysqld_locations@
> - exit 1
> + mysqld=`find_in_dirs mysqld @mysqld_locations@`
> + if test -z "$mysqld"
> + then
> + cannot_find_file mysqld @mysqld_locations@
Hmm. At least, it should be mariadbd in the cannot_find_file line.
Why do you search for mysqld after mariadbd wasn't found?
> + exit 1
> + fi
> fi
> langdir=`find_in_dirs --dir errmsg.sys @errmsg_locations@`
> if test -z "$langdir"
> @@ -579,7 +583,7 @@ else
> echo
> echo " shell> $mysqld --skip-grant-tables --general-log &"
> echo
> - echo "and use the command line tool $bindir/mysql"
> + echo "and use the command line tool $bindir/mariadb or $bindir/mysql"
again, why both?
> echo "to connect to the mysql database and look at the grant tables:"
> echo
> echo " shell> $bindir/mysql -u root mysql"
> @@ -613,7 +617,9 @@ then
> echo "PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !"
> echo "To do so, start the server, then issue the following commands:"
> echo
> + echo "'$bindir/mariadb-admin' -u root password 'new-password' or"
> echo "'$bindir/mysqladmin' -u root password 'new-password'"
> + echo "'$bindir/mariadb-admin' -u root -h $hostname password 'new-password' or"
> echo "'$bindir/mysqladmin' -u root -h $hostname password 'new-password'"
and here
> echo
> echo "Alternatively you can run:"
> diff --git a/scripts/mysql_secure_installation.sh b/scripts/mysql_secure_installation.sh
> index b2a9edf4953..12364f1be43 100644
> --- a/scripts/mysql_secure_installation.sh
> +++ b/scripts/mysql_secure_installation.sh
> @@ -159,15 +159,19 @@ then
> cannot_find_file my_print_defaults $basedir/bin $basedir/extra
> exit 1
> fi
> - mysql_command=`find_in_basedir mysql bin`
> + mysql_command=`find_in_basedir mariadb bin`
> if test -z "$mysql_command"
> then
> - cannot_find_file mysql $basedir/bin
> - exit 1
> + mysql_command=`find_in_basedir mysql bin`
> + if test -z "$mysql_command"
> + then
> + cannot_find_file mysql $basedir/bin
> + exit 1
> + fi
same as above
> fi
> else
> print_defaults="@bindir@/my_print_defaults"
> - mysql_command="@bindir@/mysql"
> + mysql_command="@bindir@/mariadb"
> fi
>
> if test ! -x "$print_defaults"
> diff --git a/scripts/mysqld_safe.sh b/scripts/mysqld_safe.sh
> index 126eb924814..7763fb9db91 100644
> --- a/scripts/mysqld_safe.sh
> +++ b/scripts/mysqld_safe.sh
> @@ -531,7 +531,11 @@ else
> ledir='@libexecdir@'
> fi
>
> -helper=`find_in_bin mysqld_safe_helper`
> +helper=`find_in_bin mariadbd-safe-helper`
> +if test -x helper
> +then
> + helper=`find_in_bin mysqld_safe_helper`
> +fi
and here
> print_defaults=`find_in_bin my_print_defaults`
> # Check if helper exists
> command -v $helper --help >/dev/null 2>&1
> diff --git a/scripts/wsrep_sst_common.sh b/scripts/wsrep_sst_common.sh
> index 5e134570881..cf38bda8f70 100644
> --- a/scripts/wsrep_sst_common.sh
> +++ b/scripts/wsrep_sst_common.sh
> @@ -273,13 +273,21 @@ SCRIPTS_DIR="$(cd $(dirname "$0"); pwd -P)"
> EXTRA_DIR="$SCRIPTS_DIR/../extra"
> CLIENT_DIR="$SCRIPTS_DIR/../client"
>
> -if [ -x "$CLIENT_DIR/mysql" ]; then
> +if [-x "$CLIENT_DIR/mariadb"]; then
> + MYSQL_CLIENT="$CLIENT_DIR/mariadb"
> +elif [-x mariadb] then;
> + MYSQL_CLIENT=mariadb
> +elif [ -x "$CLIENT_DIR/mysql" ]; then
> MYSQL_CLIENT="$CLIENT_DIR/mysql"
> else
> MYSQL_CLIENT=mysql
> fi
>
> -if [ -x "$CLIENT_DIR/mysqldump" ]; then
> +if [ -x "$CLIENT_DIR/mariadb-dump" ]; then
> + MYSQLDUMP="$CLIENT_DIR/mariadb-dump"
> +elif [-x mariadb-dump]; then
> + MYSQLDUMP=mariadb-dump
> +elif [ -x "$CLIENT_DIR/mysqldump" ]; then
> MYSQLDUMP="$CLIENT_DIR/mysqldump"
> else
> MYSQLDUMP=mysqldump
and here
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 16 Mar '21
by Sergei Golubchik 16 Mar '21
16 Mar '21
Hi, Andrei!
Few comments/questions below.
I still have troubles understanding the logic of your solution,
so I did not ask any questions about those your replies - as I'd be just
repeating the same thing over and over. After I'll start getting it
finally, I'll go over your reply again.
Meanwhile - minor comments/questions, not related to the main algorithm.
On Mar 04, Andrei Elkin wrote:
> >> To facilitate the failover on the slave side conditions to accept
> >> own events (having been discarded by the above recovery) are
> >> relaxed to let so for the semisync slave that connects to master in
> >> gtid mode. gtid_strict_mode is further recommended to secure from
> >> inadvertent re-applying out of order gtids in general. Non-gtid
> >> mode connected semisync slave would require
> >> --replicate-same-server-id (mind --log-slave-updates must be OFF
> >> then).
> >
> > Sorry, I failed to understand this paragraph :(
>
> 'gtid_strict_mode is further recommended to secure
> from inadvertent re-applying out of order gtids in general.'
>
> attempted to promote GTID connection with master,
> which of course should not be of this patch's
> business, but in a way it is 'cos the non-gtid's mode
> (CHANGE MASTER TO ... master_use_gtid = NO) is covered by the patch
> with a limitation of '--log-slave-updates must be OFF'.
>
> The GTID-ON connection is covered, and without any `--replicate-same-server-id'.
This doesn't seem to clarify anything.
I mean the whole paragraph, starting from "To facilitate" and to "then)."
Could you try to rephrase it to make it clearer, please?
> >> +--let $query1 = INSERT INTO t VALUES (20)
> >> +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */
> >> +--source binlog_truncate_active_log.inc
> >> +
> >> +--echo # Case B.
> >> +# The inverted sequence ends up to truncate only $query2
> >> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10
> >> +--let $query1 = DELETE FROM t2 WHERE f = 0
> >
> > why not `= 666 /* no such record */` ?
> > is `= 0` important here?
>
> Yes, there's a fine detail here.
>
> 'The inverted sequence ends up' to leave in binlog
>
> $query1 = DELETE FROM t2 WHERE f = 0
>
> which is as ineffective as 666. The patch recognizes the ineffectiveness
> through a special value of engines involved counter in Gtid event.
> In principle the truncation could
> start from it rather than the following $query2.
> But that would mean more complicated coding.
> So the semantics of the zero is to be good, as we're so used to :-)!
Sorry, I didn't understand anything. What does the value in the WHERE
clause has to do with engines and where the binlog is truncated?
> >
> >> +--let $query2 = INSERT INTO t VALUES (20)
> >> +--source binlog_truncate_active_log.inc
...
> >> +--connection master2
> >> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> >> +if ($pre_q2)
> >
> > is $pre_q2 a flag or a query string?
> > it's used as a flag, it's set as a query string.
>
> It's actually both with the idea execute it when defined.
> Legit, right?
legit, but it's not both. Were it both, you'd have
if ($pre_q2) {
eval $pre_q2;
}
but you have,
if ($pre_q2) {
CALL sp_blank_xa;
}
> >> +{
> >> + CALL sp_blank_xa;
> >> +}
...
> >> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> >> new file mode 100644
> >> index 00000000000..fe153e5703c
> >> --- /dev/null
> >> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> >> +#
> >> +# ==== Implementation ====
> >> +#
> >> +# Steps:
> >> +# 0 - Create two tables in innodb and rocksdb engines,
> >> +#
> >> +# In loop for A,B,C cases (below) do 1-5:
> >
> > there's no loop now (which is good, don't add it please :)
[x] ?
> >> +# The 1st trx binlogs, rotate binlog and hold on before committing at engine
> >> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR master1_go";
> >
> > you don't wait for master1_ready anywhere. intentional?
>
> No, actually
>
> SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
>
> must be run by below master2 (sim to how master3 waits for master2).
meaning [x] ?
> >> +--source include/have_binlog_format_row.inc
> >> +--let $rpl_topology=1->2
> >> +--source include/rpl_init.inc
> >
> > isn't this a normal master-slave.inc topology?
>
> Initially yes.
>
> I am guessing you mean why not to
> --source it.
> The test heavily uses 'server_`index`' format to automate
> replication direction.
as far as I remember, master-slave.inc also defines server_1 and
server_2 connections.
> >> diff --git a/sql/sql_class.h b/sql/sql_class.h
> >> index 140394fefc1..87fa88d4c89 100644
> >> --- a/sql/sql_class.h
> >> +++ b/sql/sql_class.h
> >> @@ -4632,7 +4632,7 @@ class THD :public Statement,
> >> LF_PINS *tdc_hash_pins;
> >> LF_PINS *xid_hash_pins;
> >> bool fix_xid_hash_pins();
> >> -
> >> + bool is_1pc_ro_trans;
> >
> > I don't like how it looks, it's in THD (for every single connection
> > and more) and it's only used in some exotic cases. It's set in two
> > places around ha_commit_one_phase(), when it would be better to set
> > it inside ha_commit_one_phase(), in one place only.
> >
> > But I cannot suggest how to get rid of it yet as I don't understand
> > what it's for, I've asked a question about it below, in
> > Gtid_log_event::Gtid_log_event
>
> The flag is to help sorting out ineffective (no engine data change)
> STATEMENT binlog-format logged transaction. I thought it's practical
> enough to face and important to handle.
> Through an example,
> let the table `t' be initially empty, and
> let binlog contain
>
> trx1: INSERT INTO t SET a = 1; /* slave acked, trx got committed in engine */
> trx2: INSERT INTO t SET a = 2; /* not acked, trx remains prepared in engine */
> trx3: DELETE FROM t WHERE a = 0; /* ineffective, does not have Xid event */
>
> prior a crash with their states as commented.
> The truncate trx should be trx2 as the trx3's status
> suggests it's harmless for involving into
> truncation.
> However we have yet to distinguish it from unsafe (to truncate)
> event group ("transaction") that also ends with COMMIT query not Xid event.
Hmm. I see. But thd isn't a universal dumpster for passing
arguments to the function, don't use it like that.
Please, pass this "read-only transaction with no xid" flag to the
Gtid_log_event constructor as an argument. Or recalculate it inside the
constructor, it's not such an expensive thing to do.
> One rather drastic solution could be to not binlog DELETE altogether.
> I did not look into this with much of interest earlier, but how about it?
> Perhaps it'd more like a bug fixes than breaking legacy.
No, I suspect it's too drastic
> >> /* Members related to temporary tables. */
> >> public:
> >> /* Opened table states. */
> >> diff --git a/sql/slave.cc b/sql/slave.cc
> >> index 28e08e32346..9b3f73e5341 100644
> >> --- a/sql/slave.cc
> >> +++ b/sql/slave.cc
> >> @@ -6944,7 +6945,9 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
> >> }
> >> else
> >> if ((s_id == global_system_variables.server_id &&
> >> - !mi->rli.replicate_same_server_id) ||
> >> + (!mi->rli.replicate_same_server_id &&
> >> + !(semisync_recovery= (rpl_semi_sync_slave_enabled &&
> >> + mi->using_gtid != Master_info::USE_GTID_NO)))) ||
> >
> > How can "semisync recovery" code path reach queue_event()?
>
> This is a failover now, after recovery.
> The slave which is ex-master tries to adopt "own" (generated by it) events.
1. don't call it "semisync recovery" then
2. add a comment explaining this case (it's a rare and not obvious one)
3. why does it have to be specially recognized anyway?
> >
> >> event_that_should_be_ignored(buf) ||
> >> /*
> >> the following conjunction deals with IGNORE_SERVER_IDS, if set
> >> diff --git a/sql/log_event.h b/sql/log_event.h
> >> index 8a342cb5cd3..1588ab85104 100644
> >> --- a/sql/log_event.h
> >> +++ b/sql/log_event.h
> >> @@ -482,6 +482,16 @@ class String;
> >> */
> >> #define LOG_EVENT_IGNORABLE_F 0x80
> >>
> >> +/**
> >> + @def LOG_EVENT_ACCEPT_OWN_F
> >> +
> >> + Flag sets by the gtid-mode connected semisync slave for
> >> + the same server_id ("own") events which the slave must not have
> >> + in its state. Typically such events were never committed by
> >> + their originator (this server) and discared at its crash recovery
> >
> > sorry, I failed to understand that
>
> This should've been explained above.
> The slave which is ex-master truncated at recovery some of its ("own") transactions.
> The flag serves to help recognize that at execution of the replication events.
1. yes, this is clearer, thanks. may be you should rewrite your comment
putting this explanation in it?
2. why does it have to be specially recognized anyway?
...
> > 3. also it's somewhat an unfounded assumption that only r/w engines
> > will have the transaction prepared. But we can make it a fact, if we
> > won't prepare r/o engines at all (in ha_prepare).
>
> [?]
>
> Sorry, I don't get it.
> Do you mean that an r/o engine (branch) might need to be counted by
> `ha_count_rw()`? But then it would be contributing with binlogged
> events of its branch... I can't imagine that.
I mean the following. Say, you have two engines participating in a
transaction, like in INSERT innodb_table SELECT * FROM rocksdb_table;
One is r/w, the other is r/o. It's a 2pc transaction, the server does
prepare for everyone, then commit for everyone.
I don't think there is any guarantee that prepare for an r/o engine will
be a no-op. It is completely up to the engine what to do in this case,
it's possible that the engine will optimize it to a no-op, but it is not
a certainty.
What I think we should do is not to call prepare() for r/o engines at
all. Just skip straight to commit/rollback on the second phase.
> >> diff --git a/sql/handler.cc b/sql/handler.cc
> >> index 6792e80b8fe..247ab7267bf 100644
> >> --- a/sql/handler.cc
> >> +++ b/sql/handler.cc
> >> @@ -1245,6 +1245,24 @@ int ha_prepare(THD *thd)
> >> DBUG_RETURN(error);
> >> }
> >>
> >> +/*
> >> + Returns counted number of
> >> + read-write recoverable transaction participants.
> >> +*/
> >> +uint ha_count_rw(THD *thd, bool all)
> >
> > the name doesn't match the implementation, please, rename
>
> [?]
>
> Please help here.
> I could not find what specifically you mean.
> It
> returns rw_ha_count
>
> which I think as 'read-write recoverable transaction participants.'
I read the name is "a number of rw engines in the ha_info",
it does not mention "recoverable", and I from the name I woulnd't have
guessed. May be ha_count_rw_2pc ?
> >> +{
> >> + unsigned rw_ha_count= 0;
> >> + THD_TRANS *trans=all ? &thd->transaction.all : &thd->transaction.stmt;
> >> +
> >> + for (Ha_trx_info * ha_info= trans->ha_list; ha_info;
> >> + ha_info= ha_info->next())
> >> + {
> >> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> >> + ++rw_ha_count;
> >> + }
> >> + return rw_ha_count;
> >> +}
> >> +
> >> /**
> >> Check if we can skip the two-phase commit.
> >>
> >> @@ -1960,8 +1982,122 @@ struct xarecover_st
> >> XID *list;
> >> HASH *commit_list;
> >> bool dry_run;
> >> + MEM_ROOT *mem_root;
> >> + bool error;
> >> };
> >>
> >> +/**
> >> + Inserts a new hash member.
> >> +
> >> + returns a successfully created and inserted @c xid_recovery_member
> >> + into hash @c hash_arg,
> >> + or NULL.
> >> +*/
> >> +static xid_recovery_member*
> >> +xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*)
> >> + alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
> >> + if (!member)
> >> + return NULL;
> >> +
> >> + member->xid= xid_arg;
> >> + member->in_engine_prepare= 1;
> >> + member->decided_to_commit= false;
> >> +
> >> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> >> +}
> >> +
> >> +/*
> >> + Inserts a new or updates an existing hash member to increment
> >> + the member's prepare counter.
> >> +
> >> + returns false on success,
> >> + true otherwise.
> >> +*/
> >> +static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
> >> + MEM_ROOT *ptr_mem_root)
> >> +{
> >> + xid_recovery_member* member;
> >> + if ((member= (xid_recovery_member *)
> >> + my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
> >> + member->in_engine_prepare++;
> >> + else
> >> + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root);
> >> +
> >> + return member == NULL;
> >> +}
> >> +
> >> +/*
> >> + Hash iterate function to complete with commit or rollback as decided
> >> + (typically at binlog recovery processing) in member->in_engine_prepare.
> >> +*/
> >> +static my_bool xarecover_do_commit_or_rollback(void *member_arg,
> >> + void *hton_arg)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> >> + handlerton *hton= (handlerton*) hton_arg;
> >> + xid_t x;
> >> + my_bool rc;
> >> +
> >> + x.set(member->xid);
> >> + rc= member->decided_to_commit ? hton->commit_by_xid(hton, &x) :
> >> + hton->rollback_by_xid(hton, &x);
> >> +
> >> + DBUG_ASSERT(rc || member->in_engine_prepare > 0);
> >> +
> >> + if (!rc)
> >> + {
> >
> > I don't think you can trust rc on that.
> > if it's non-zero, it's an error all right.
> > but it's a bit of a stretch to presume that
> > nonexisting xid is always an error.
> >
> > also commit_by_xid returns int, not my_bool.
>
> [?]
I mean, if an engine returns success when you ask it to commit or
rollback an unknown xid - this is not a bug, right? Commit means "make
all changes by this xid persistent", if there are no changes, there is
nothing to do, so doing nothing and reporting success isn't exactly
wrong. Same for rollback.
> I'd rather to stick with the patch and require from any engine to
> return \cite{xa spec}
>
> [XAER_NOTA] The specified XID is not known by the resource manager
>
> What'd you say?
Okay. I'd say I was wrong, if the standard requires a resource manager
to return an error for an unknown XID, then you surely can rely on that
:)
Sorry for confusion.
> >> +
> >> + return false;
> >> +}
> >> +
> >> +static my_bool xarecover_do_count_in_prepare(void *member_arg,
> >> + void *ptr_count)
> >> +{
> >> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> >> + if (member->in_engine_prepare)
> >> + {
> >> + *(uint*) ptr_count += member->in_engine_prepare;
> >
> > This is a rather weird semantics.
> > it's kind of a number of transactions times number of engines.
> > if one transaction wasn't committed in two engines and another
> > transaction wasn't rolled back in one engine, the counter will be 3.
> > how are you going to explain this to users?
>
> So `ptr_count` actually is for transaction branches, not transactions.
> But the ultimate warning speaks erronously the latter
>
> + sql_print_warning("Could not complete %u number of transactions in "
> "engines.", count_in_prepare);
>
> So I'd convert the counter to count the transactions:
>
> (*(uint*) ptr_count) ++
>
> [?]
yes, that'd be easier to understand, thanks
> >> diff --git a/sql/log.cc b/sql/log.cc
> >> index 8073f09ab88..a9808ed8823 100644
> >> --- a/sql/log.cc
> >> +++ b/sql/log.cc
...
> >> + char buf[21];
> >> +
> >> + longlong10_to_str(ptr_gtid->seq_no, buf, 10);
> >> + sql_print_information("Successfully truncated binlog file:%s "
> >> + "to pos:%llu to remove transactions starting from "
> >> + "GTID %u-%u-%s", file_name, pos,
> >> + ptr_gtid->domain_id, ptr_gtid->server_id, buf);
> >> + }
> >> + if (!(error= init_io_cache(&cache, file, IO_SIZE, WRITE_CACHE,
> >> + (my_off_t) pos, 0, MYF(MY_WME|MY_NABP))))
> >> + {
> >> + /*
> >> + Write Stop_log_event to ensure clean end point for the binary log being
> >> + truncated.
> >> + */
> >
> > I'm not sure about it. The less you touch the corrupted binlog the better.
> > simply truncating it would be enough, wouldn't it?
>
> Yes, it would.
> But I somewhat preferred a tidy-up style.
>
> [?]
>
> If you feel strong to remove it will be done.
I feel it'd be safer not to write anything into the corrupted binlog.
> >> + Stop_log_event se;
> >> + se.checksum_alg= cs_alg;
> >> + if ((error= write_event(&se, NULL, &cache)))
> >> + {
> >> + sql_print_error("Failed to write stop event to "
> >> + "binary log. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >> + goto end;
> >> + }
> >> + clear_inuse_flag_when_closing(cache.file);
> >> + if ((error= flush_io_cache(&cache)) ||
> >> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> >> + {
> >> + sql_print_error("Faild to write stop event to "
> >> + "binary log. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >> + }
> >> + }
> >> + else
> >> + sql_print_error("Failed to initialize binary log "
> >> + "cache for writing stop event. Errno:%d",
> >> + (cache.error == -1) ? my_errno : error);
> >
> > 1. I don't see why you need to sync after every operation.
>
> Yep, there's one fsync before Stop event writing, redundant if Stop will
> stay in the final patch.
As far as I understand, if Stop is removed, you shouldn't need any syncs
at all, just truncate and close.
> >> + int next_binlog_or_round(int& round,
> >> + const char *last_log_name,
> >> + const char *binlog_checkpoint_name,
> >> + LOG_INFO *linfo, MYSQL_BIN_LOG *log);
> >> + bool is_safe()
> >
> > what do you mean "safe" ?
>
> Safe to truncate condition is explained elsewhere to mean there are no
> non-transactional group of events within the truncated tail.
> Earlier mentioned ineffective COMMIT-query ended "transactions" are safe.
may be is_safe_to_truncate() then?
...
> >> + if (!truncate_validated)
> >> + {
> >> + DBUG_ASSERT(round <= 2);
> >> + /*
> >> + Either truncate was not set or was reset, else
> >> + it gets incremented, otherwise it may only set to an earlier
> >> + offset only at the turn out of the 1st round.
> >
> > sorry, I cannot parse that :(
>
> The comments are for the assert in a branch that acts when the truncate position
> (gtid) is not yet finalized.
> Let me annotate under each assert's line.
>
> >
> >> + */
> >> + DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
>
> 'truncate was not set or was reset' ^, that is truncate_gtid.
> The 'was reset' is actually not present in the assert - that'd be
> reflected with
> truncate_reset_done == true
>
>
> >> + last_gtid_coord >= binlog_truncate_coord ||
>
> 'it gets incremented' ^
why does that mean "it gets incremented" ?
>
> >> + (round == 2 && truncate_set_in_1st));
>
> And if it's neither of the above there must've been truncate candidate
> set in the 1st round.
...
> >> switch (typ)
> >> {
> >> + case FORMAT_DESCRIPTION_EVENT:
> >> + if (round > 1)
> >> + {
> >> + if (fdle != fdle_arg)
> >> + delete fdle;
> >> + fdle= (Format_description_log_event*) ev;
> >
> > why is it needed? all binlog files should have the same
> > Format_description_log_event anyway.
>
> Actually contrary to what was stated (the patch removed that claim) in
> the parent revision of this commit, the FD:s of the
> recovery binlog files may have different checksums. I.e
>
> set @@global.binlog_checksum = value
>
> can be issued to rotate binlog to a new hot file having different checksum value
> while the binlog *CheckPoint* file is behind. So this is possible to
> have at recovery:
>
> B_1(CP, checksum = NONE), B_2(hot, checksum = CRC32)
>
> which also manifests a normal use case bug I have not reported any bug (though wanted to).
okay, checksum can differ. But how does it affect
Format_description_log_event ?
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 15 Mar '21
by Sergei Golubchik 15 Mar '21
15 Mar '21
Hi, Andrei!
On Mar 01, Andrei Elkin wrote:
> > I've reviewed almost everything, see comments below. But not the
> > Recovery_context methods. Please explain how it works and how all
> > these truncate_validated, truncate_reset_done, truncate_set_in_1st,
> > etc all work together.
>
> ...specifically to this point. Just in case I hope you did not miss to
> read recovery_design.txt from the MDEV, which does not go into coding
> details that you're effectively about in above.
I've read it now. Still I don't understand why you need an extra round
of binlog scanning (two rounds if only one file, three rounds if many).
Also, this recovery_design.txt is not part of the code, so whoever will
look at it later will be just as puzzled as I was.
...
> [ G1, G2, ..., G_k, g_{k+1}, ... g_n ]
>
> here the uppercase `G' stands for committed trx, the smallcase `g' for
> prepared,`_k' - sub-scripts in the recovery sequence. As the capital
> letter first rules in the single-engine case the first occurrence of a
> pattern `G_k,g_k+1' identifies the truncate index. The patch reflects
> such fact with raising `truncate_validated' flag.
>
> But it's more complicated in the multiple binlog files / engines case.
> The first `Gg' letter-case drop is not guaranteed to be the only drop
> so `truncate_reset_done' and `truncate_set_in_1st' are introduced to
> help with truncate index identification.
Why is it not guaranteed?
> When `truncate_validated' is set that indicates the truncate index is
> determined and may not change in the current (1st of 2nd) nor future
> rounds. `truncate_reset_done' says that an "inverse" `g_k,G_k+1' pair
> is found so that any earlier truncation candidate gets reset (to
> "zero"). If there will be later any candidate found in *this* (1st or
> 2nd) round in the sequence its index will be obviously greater.
>
> `truncate_set_in_1st' function is to remember that the truncate
> candidate was found in the 1st round (in the "hot" binlog file), but
> if the candidate has not been validated `!truncate_validated' it may
> be exacted in the 2nd round and then to an earlier transaction. So the
> flag helps to handle exception from truncate candidate monotony rule:
> e.g the hot binlog B2 contains `[g5,g6,...g_n]' and a ref to binlog
> checkpoint file B1 that contains `[G1,g2,G3,g4]'. The first round
> truncate candidate of g5 would be first exacted to `g2' before finally
> ascertained to `g4' in the 2nd round. (Notice `g2 -> g4' preserves the
> truncate index monotony).
>
> Notice that due to exacting like `g2 -> g4' in the 2nd round of the
> above example `g2' got "to be up-cased" into `G2' for committing
> (feasible with two trx:s on two different engine scenario - g2 with
> Innodb only got prepared, G4 - on Rocksdb, and got committed prior the
> crash). That's what the 3rd round is for.
>
> I hope this will be helpful.
Unfortunately, not very much. It describes how you juggle with variables
and scanning rounds. But not *what* you're trying to find. And it's kind
of difficult to reverse engineer your "how" back into "what" :(
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
[Maria-developers] MDEV-17399: JSON_TABLE: Incorrect code with table elimination
by Sergey Petrunia 15 Mar '21
by Sergey Petrunia 15 Mar '21
15 Mar '21
Hi Alexey,
> diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
> index 3958797ec44..f2497d524ec 100644
> --- a/sql/opt_table_elimination.cc
> +++ b/sql/opt_table_elimination.cc
> @@ -637,6 +637,16 @@ void eliminate_tables(JOIN *join)
> List_iterator<Item> it(join->fields_list);
> while ((item= it++))
> used_tables |= item->used_tables();
> +
> + {
> + List_iterator<TABLE_LIST> it(*join->join_list);
> + TABLE_LIST *tbl;
> + while ((tbl= it++))
> + {
> + if (tbl->table_function)
> + used_tables|= tbl->table_function->used_tables();
> + }
> + }
>
This only walks the tables that are "at the top level" of the join. if a
JSON_TABLE(...) is inside a nested outer join, it will not be found.
Please walk select_lex->leaf_tables instead. Please add the testcase (I provide
one below).
Please also add a comment, clarifying what is being done, something like:
Table function JSON_TABLE() can have references to other tables. Do not
eliminate the tables that JSON_TABLE() refers to. Note: the JSON_TABLE itself
cannot be eliminated as it doesn't have unique keys.
== Testcase ==
create table t20 (a int not null);
create table t21 (a int not null primary key, js varchar(100));
insert into t20 select seq from seq_1_to_100;
insert into t21 select a, '{"a":100}' from t20;
create table t31(a int);
create table t32(b int);
insert into t31 values (1);
insert into t32 values (1);
explain
select
t20.a, jt1.ab
from
t20
left join t21 on t20.a=t21.a
join
(t31 left join (t32 join JSON_TABLE(t21.js,'$' COLUMNS (ab INT PATH '$.a')) AS jt1) on t31.a<3);
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
3
Re: [Maria-developers] f1db957c5da: MDEV-21469: Implement crash-safe binary logging of the user XA
by Sergei Golubchik 14 Mar '21
by Sergei Golubchik 14 Mar '21
14 Mar '21
Hi, Sujatha!
Note, this is a review of a combined diff dc92235f21e + f1db957c5da
On Mar 14, Sujatha wrote:
> revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da)
> parent(s): dc92235f21e
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2020-12-21 16:10:46 +0200
> message:
>
> MDEV-21469: Implement crash-safe binary logging of the user XA
>
> Make XA PREPARE, XA COMMIT and XA ROLLBACK statements
> crash-safe when --log-bin is specified.
>
> At execution of XA PREPARE, XA COMMIT and XA ROLLBACK their replication
> events are made written into the binary log prior to execution
> of the commands in storage engines.
> In case the server crashes, after writing to binary log but before all
> involved engines have processed the command, the following recovery
> will execute the command's replication events to equalize the states
> of involved engines with that of binlog.
> That applies to all XA PREPARE *group* and XA COMMIT or ROLLBACK.
>
> On the implementation level the recovery time binary log parsing
> is augmented to pay attention to
> the user XA xids to identify the XA transactions' state:s in binary log, and
> eventually match them against their states in engines, see
> MYSQL_BIN_LOG::recover_explicit_xa_prepare().
> In discrepancy cases the outdated state in the engines is corrected with
> resubmitting the transaction prepare group of events, or completion
> ones. The multi-engine partly prepared XA PREPARE case
> the XA is rolled back first.
> The fact of multiple-engine involved is registered into
> Gtid_log_event::flags2 as one bit. The boolean value is
> sufficient and precise to deal with two engines in XA transaction.
> With more than 2 recoverable engines the flag method is still correct though
> may be pessimistic, as it treats all recoverable engines as XA
> participants. So when the number of such Engines exceeds the number
> of prepared engines of the XA that XA is treated
> as partially completed, with all that ensued.
> As an optimization no new bit is allocated in flags2, instead a
> pre-existing ones (of MDEV-742) are reused, observing that
> A. XA "COMPLETE" does not require multi-engine hint for its recovery and that
> B. the MDEV-742 XA-completion bit is not anyhow used by
> XA-PREPARE and its GTID log event.
>
> Notice the multi-engine flagging is superceded by MDEV-21117 extension
> in Gtid log event so this part should be taken from there.
> diff --git a/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> new file mode 100644
> index 00000000000..d0852de2fcc
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> @@ -0,0 +1,86 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA crash recovery works fine across multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate an explicit XA transaction. Using debug simulation hold the
"debug sync" != "debug sim" :)
> +# execution of XA PREPARE statement after the XA PREPARE is written to
> +# the binary log. With this the prepare will not be done in engine.
> +# 1 - By executing FLUSH LOGS generate multiple binary logs.
> +# 2 - Now make the server to disappear at this point.
> +# 3 - Restart the server. During recovery the XA PREPARE from the binary
> +# log will be read. It is cross checked with engine. Since it is not
> +# present in engine it will be executed once again.
> +# 4 - When server is up execute XA RECOVER to check that the XA is
> +# prepared in engine as well.
> +# 5 - XA COMMIT the transaction and check the validity of the data.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +#
> +
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_log_bin.inc
> +
> +RESET MASTER;
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +connect(con1,localhost,root,,);
> +XA START 'xa1';
> +INSERT INTO t1 SET a=1;
> +SET DEBUG_SYNC= "simulate_hang_after_binlog_prepare SIGNAL con1_ready WAIT_FOR con1_go";
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +XA END 'xa1';
> +--send XA PREPARE 'xa1';
> +
> +connection default;
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +FLUSH LOGS;
> +FLUSH LOGS;
> +FLUSH LOGS;
> +
> +--source include/show_binary_logs.inc
> +--let $binlog_file= master-bin.000004
> +--let $binlog_start= 4
> +--source include/show_binlog_events.inc
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
I don't see why you'd need to wait first, it'd be simpler to restart right away.
> +EOF
> +
> +--error 0,2013
> +SET DEBUG_SYNC= "now SIGNAL con1_go";
> +--source include/wait_until_disconnected.inc
> +
> +--connection con1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +
> +SELECT * FROM t1;
> +
> +# Clean up.
> +connection default;
> +DROP TABLE t1;
> +
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> new file mode 100644
> index 00000000000..e972e3f09de
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> @@ -0,0 +1,98 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The COMMIT should be executed during recovery.
> +# 3 - Check the data in table. Both rows should be present in table.
> +# 4 - Trying to commit 'xa2' should report unknown 'XA' error as COMMIT is
> +# already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
Same. Why do you wait?
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> new file mode 100644
> index 00000000000..14cebbd9b13
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> @@ -0,0 +1,119 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that if for some reason an event cannot be applied during
> +# recovery, appropriate error is reported.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. Using debug simulation point make XA COMMIT 'xa2'
> +# execution to fail. The server will resume anyway
> +# to leave the error in the errlog (see "Recovery: Error..").
> +# 3 - Work around the simulated failure with Commit once again
> +# from a connection that turns OFF binlogging.
> +# Slave must catch up with the master.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CALL mtr.add_suppression("Recovery: Error .Out of memory..");
> +CALL mtr.add_suppression("Crash recovery failed.");
> +CALL mtr.add_suppression("Can.t init tc log");
> +CALL mtr.add_suppression("Aborting");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --debug-dbug=d,trans_xa_commit_fail
> +EOF
> +
> +connection default;
> +--source include/wait_until_disconnected.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--echo *** must be no 'xa2' commit seen, as it's still prepared:
> +SELECT * FROM t;
> +XA RECOVER;
> +
> +# Commit it manually now to work around the extra binlog record
> +# by turning binlogging OFF by the connection.
> +
> +SET GLOBAL DEBUG_DBUG="";
> +SET SQL_LOG_BIN=0;
> +--error 0
why error 0? What will happen if you don't disable binlog?
> +XA COMMIT 'xa2';
> +SET SQL_LOG_BIN=1;
> +
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +
> +--sync_slave_with_master
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> new file mode 100644
> index 00000000000..7b987c7f29b
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> @@ -0,0 +1,95 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA PREPARE be done in binary log and crash the
> +# server so that it is not prepared in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' is
> +# present.
> +# 4 - Commit the XA transaction and verify its correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ditto
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> new file mode 100644
> index 00000000000..9d2c5cce528
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> @@ -0,0 +1,117 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 3 explicit XA transactions. 'xa1', 'xa2' and 'xa3'.
> +# Using debug simulation hold the execution of second XA PREPARE
> +# statement after the XA PREPARE is written to the binary log.
> +# With this the prepare will not be done in engine.
> +# 1 - For 'xa3' allow the PREPARE statement to be written to binary log and
> +# simulate server crash.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2' and 'xa3'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' and 'xa3'
> +# are present along with 'xa1'.
> +# 4 - Commit all the XA transactions and verify their correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ok, in this test you actually use master2 :)
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Found 2 prepared XA transactions");
> +CALL mtr.add_suppression("Found 3 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +use test;
> +xa start 'xa2';
> +insert into t values (30);
> +xa end 'xa2';
> +SET DEBUG_SYNC="simulate_hang_after_binlog_prepare SIGNAL reached WAIT_FOR go";
> +send xa prepare 'xa2';
> +
> +--connection master2
> +let $wait_condition=
> + SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
> + WHERE STATE like "debug sync point: simulate_hang_after_binlog_prepare%";
> +--source include/wait_condition.inc
eh? you can just 'WAIT_FOR signal', couldn't you?
> +
> +XA START 'xa3';
> +INSERT INTO t VALUES (40);
> +XA END 'xa3';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa3';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +XA COMMIT 'xa2';
> +XA COMMIT 'xa3';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> new file mode 100644
> index 00000000000..1d19f96116e
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> @@ -0,0 +1,97 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
XA ROLLBACK
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA ROLLBACK be done in binary log and crash the
> +# server so that it is not committed in engine.
not rolled back
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The ROLLBACK should be executed during recovery.
> +# 3 - Check the data in table. Only one row should be present in table.
> +# 4 - Trying to rollback 'xa2' should report unknown 'XA' error as rollback
> +# is already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
unused again
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA ROLLBACK 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA ROLLBACK 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> new file mode 100644
> index 00000000000..c7cefbda4bf
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> @@ -0,0 +1,46 @@
> +# MDEV-742, MDEV-21469 XA replication, and xa crash-safe.
> +# Tests prove xa state is recovered to a prepared or completed state
> +# upon post-crash recovery, incl a multi-engine case.
> +
> +--source include/have_rocksdb.inc
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_binlog_format_row.inc
> +
> +--connection slave
> +--source include/stop_slave.inc
> +
> +--connection master
> +CALL mtr.add_suppression("Found . prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=rocksdb;
> +CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=innodb;
> +INSERT INTO t1 SET a = 1;
> +INSERT INTO t2 SET a = 1;
> +
> +--let $xa=xa1
> +--let $when=after_binlog
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--let $xa=xa2
> +--let $when=after_first_engine
> +--let $finally_expected=$xa in the list
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +
> +--connection master
> +--sync_slave_with_master
> +let $diff_tables= master:t1, slave:t1;
> +source include/diff_tables.inc;
> +let $diff_tables= master:t2, slave:t2;
> +source include/diff_tables.inc;
> +
> +--connection master
> +SET SESSION DEBUG_DBUG="";
> +drop table t1,t2;
> +--sync_slave_with_master
> +
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> new file mode 100644
> index 00000000000..312b8b6a19e
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> @@ -0,0 +1,67 @@
> +# callee of rpl_rocksdb_xa_recover.test
> +# requires t1,t2 as defined in the caller.
> +
> +--let $at=_prepare
> +--let $finally_expected=$xa in the list
> +--let $init_val=`SELECT MAX(a) from t1 as t_1`
why as t_1 ?
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 1
> +--eval INSERT INTO t2 SET a=$init_val + 1
> +--eval XA END '$xa'
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA PREPARE '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval XA ROLLBACK '$xa'
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (rolled back)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 2
> +--eval INSERT INTO t2 SET a=$init_val + 2
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA ROLLBACK '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (committed away)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 3
> +--eval INSERT INTO t2 SET a=$init_val + 3
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA COMMIT '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as three FROM t1
> +--eval SELECT MAX(a) - $init_val as three FROM t2
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 639cbfbe7aa..e1880339e85 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -3614,6 +3616,12 @@ class Gtid_log_event: public Log_event
> static const uchar FL_PREPARED_XA= 64;
> /* FL_"COMMITTED or ROLLED-BACK"_XA is set for XA transaction. */
> static const uchar FL_COMPLETED_XA= 128;
> + /*
> + To mark the fact of multiple transactional engine participants
> + in the prepared XA. The FL_COMPLETED_XA bit is reused by XA_PREPARE_LOG_EVENT,
What do you mean by that? XA_prepare_log_event does not inherit from
Gtid_log_event, it cannot use Gtid_log_event flags.
And, please, explain in a comment why you can reuse a bit for two different
flags. Like "the ambiguity between FL_COMPLETED_XA and FL_MULTI_ENGINE_XA is
resolved by the flag FL_PREPARED_XA. if it's set, then 128 means
FL_MULTI_ENGINE_XA, if it's not set, then 128 means FL_COMPLETED_XA"
> + oth the XA completion events do not need such marking.
other
> + */
> + static const uchar FL_MULTI_ENGINE_XA= 128;
>
> #ifdef MYSQL_SERVER
> Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
> diff --git a/sql/handler.h b/sql/handler.h
> index 0eff7bd930d..9acdd47bfc2 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -1880,8 +1891,18 @@ class Ha_trx_info
> m_ht= ht_arg;
> m_flags= (int) TRX_READ_ONLY; /* Assume read-only at start. */
>
> - m_next= trans->ha_list;
> - trans->ha_list= this;
> + if (likely(!past_head))
> + {
> + m_next= trans->ha_list;
> + trans->ha_list= this;
> + }
> + else
> + {
> + DBUG_ASSERT(trans->ha_list);
> +
> + m_next= trans->ha_list->m_next;
> + trans->ha_list->m_next= this;
> + }
I don't like this fake generalization. There's no point for an arbitrary
engine to be put at a specific place in the list. Only binlog is special.
At least, remove this new argument and check for binlog_hton here.
But really, think of this - binlog is so special, may be it'd be easier not
to pretend it's a storage engine at all? No need for a fake binlog_hton,
no need to register it, etc. It kind of made sense in 2005, but it doesn't
necessarily make it now.
> }
>
> /** Clear, prepare for reuse. */
> diff --git a/sql/handler.cc b/sql/handler.cc
> index ffb89b30d92..b981294f0d6 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1338,6 +1345,9 @@ int ha_prepare(THD *thd)
> handlerton *ht= ha_info->ht();
> if (ht->prepare)
> {
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_prepare",
> + if (!ha_info->next()) DBUG_SUICIDE(););
looks like not simulate_crash_after_first_engine_prepare, but
more like simulate_crash_before_last_engine_prepare.
it's the same point in time if you have only two engines,
but the code still implements "before last", not "after first"
> +
> if (unlikely(prepare_or_error(ht, thd, all)))
> {
> ha_rollback_trans(thd, all);
> @@ -1368,22 +1378,22 @@ int ha_prepare(THD *thd)
> }
>
> /*
> - Like ha_check_and_coalesce_trx_read_only to return counted number of
> - read-write transaction participants limited to two, but works in the 'all'
> - context.
> - Also returns the last found rw ha_info through the 2nd argument.
> + Returns counted number of
> + read-write recoverable transaction participants optionally limited to two.
> + Also optionally returns the last found rw ha_info through the 2nd argument.
> */
> -uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info)
> +uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info, bool count_through)
I don't understand why you need a new argument. ptr_ha_info==NULL means don't
return a value there, this is a standard convention used everywhere.
also, it's not an "count rw all" it's are there more rw-all htons than N
where N can be 1 or 2.
So:
* either take N as an argument and return as soon as rw_ha_count > N
* or don't overoptimize here and just return full count every time
> {
> unsigned rw_ha_count= 0;
>
> for (auto ha_info= thd->transaction.all.ha_list; ha_info;
> ha_info= ha_info->next())
> {
> - if (ha_info->is_trx_read_write())
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> {
> - *ptr_ha_info= ha_info;
> - if (++rw_ha_count > 1)
> + if (ptr_ha_info)
> + *ptr_ha_info= ha_info;
> + if (++rw_ha_count > 1 && !count_through)
> break;
> }
> }
> @@ -1876,6 +1886,10 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, bool is_real_trans)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
1. same as above
2. why not to use *different* names for these two crashes?
in case, you know, you'll want to crash at a specific point in the code, not
just somewhere? And in a case you'll actually want to crash just somewhere you
can always use "+d,simulate_crash_after_first_engine_commit,simulate_crash_after_first_engine_rollback"
(and it should be "before_last", of course, not "after_first")
> +
> if ((err= ht->commit(ht, thd, all)))
> {
> my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> @@ -1987,6 +2001,10 @@ int ha_rollback_trans(THD *thd, bool all)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
ditto
> +
> if ((err= ht->rollback(ht, thd, all)))
> { // cannot happen
> my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
> @@ -2098,8 +2116,24 @@ int ha_commit_or_rollback_by_xid(XID *xid, bool commit)
> xaop.xid= xid;
> xaop.result= 1;
>
> + if (binlog_hton->recover)
> + {
> + /*
> + When the binlogging service is enabled complete the transaction
> + by it first.
> + */
> + if (commit)
> + binlog_hton->commit_by_xid(binlog_hton, xid);
> + else
> + binlog_hton->rollback_by_xid(binlog_hton, xid);
> + }
> plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
> MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
> + if (binlog_hton->recover)
> + {
> + THD *thd= current_thd;
> + thd->reset_binlog_completed_by_xid();
> + }
1. this, precisely, shows that binlog does not need a hton anymoreo
2. how can binlog_hton->recover be NULL here?
>
> return xaop.result;
> }
> @@ -2222,6 +2258,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
>
> if (hton->recover)
> {
> + info->recover_htons++;
don't forget to subtract 1 for binlog_hton->recover,
otherwise it'll make you to rollback everything with a FL_MULTI_ENGINE_XA flag
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> {
> sql_print_information("Found %d prepared transaction(s) in %s",
> @@ -2263,7 +2300,21 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> _db_doprnt_("ignore xid %s", xid_to_str(buf, info->list[i]));
> });
> xid_cache_insert(info->list + i);
> + XID *foreign_xid= info->list + i;
swap those two lines ^^^ then you can write
xid_cache_insert(foreign_xid);
> info->found_foreign_xids++;
> +
> + /*
> + For each foreign xid prepraed in engine, check if it is present in
> + xa_prepared_list of binlog.
> + */
> + if (info->xa_prepared_list)
for simplicity I'd rather always initialize xa_prepared_list and remove
this if()
> + {
> + struct xa_recovery_member *member= NULL;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(info->xa_prepared_list, foreign_xid->key(),
> + foreign_xid->key_length())))
> + member->in_engine_prepare++;
> + }
> continue;
> }
> if (IF_WSREP(!(wsrep_emulate_bin_log &&
> @@ -2362,7 +2424,7 @@ int ha_recover(HASH *commit_list)
> info.found_my_xids, opt_tc_log_file);
> DBUG_RETURN(1);
> }
> - if (info.commit_list)
> + if (info.commit_list && !info.found_foreign_xids)
Why? after the finished crash recovery one can still have transactions in
doubt.
> sql_print_information("Crash recovery finished.");
> DBUG_RETURN(0);
> }
> diff --git a/sql/log.cc b/sql/log.cc
> index 731bb3e98f0..c69b8518cf4 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton __attribute__((unused)),
> return 0;
> }
>
> -
> +/*
> + The function invokes binlog_commit() and returns its result
> + when it has not yet called it already.
> + binlog_cache_mngr::completed_by_xid remembers the fact of
> + the 1st of maximum two subsequent calls.
> +*/
> static int binlog_commit_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT);
>
> - return binlog_commit(hton, thd, TRUE);
> -}
> + if (!cache_mngr->completed_by_xid)
On what code path can you have completed_by_xid == false here?
> + {
> + rc= binlog_commit(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
>
> + return rc;
> +}
>
> +/*
> + The function invokes binlog_rollback() and returns its results similarly
> + to a commit branch.
> +*/
> static int binlog_rollback_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_ROLLBACK ||
> (thd->transaction.xid_state.get_state_code() == XA_ROLLBACK_ONLY));
> - return binlog_rollback(hton, thd, TRUE);
> +
> + if (!cache_mngr->completed_by_xid)
> + {
> + rc= binlog_rollback(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
> +
> + return rc;
> }
>
>
> @@ -2195,6 +2224,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
> if (!all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
> + DBUG_SUICIDE(););
> + DEBUG_SYNC(thd, "simulate_hang_after_binlog_prepare");
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_prepare",
> + DBUG_SUICIDE(););
wouldn't you say there's one DBUG_EXECUTE_IF too many?
Just keep one, "simulate_crash_after_binlog_commit"
> +
> THD_STAGE_INFO(thd, org_stage);
> DBUG_RETURN(error);
> }
> @@ -2298,6 +2333,9 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
> thd->reset_binlog_for_next_statement();
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
same as above, use unique dbug keywords, like
"simulate_crash_after_binlog_rollback" (because it's rollback here)
> + DBUG_SUICIDE(););
> +
> DBUG_RETURN(error);
> }
>
> @@ -5749,6 +5794,14 @@ binlog_cache_mngr *THD::binlog_setup_trx_data()
> DBUG_RETURN(cache_mngr);
> }
>
> +/*
> + XA completion flag resetter for ha_commit_or_rollback_by_xid().
> +*/
> +void THD::reset_binlog_completed_by_xid()
> +{
> + binlog_setup_trx_data()->reset_completed_by_xid();
> +}
I don't think this makes any sense as a THD method
> +
> /*
> Function to start a statement and optionally a transaction for the
> binary log.
> @@ -10333,14 +10386,108 @@ start_binlog_background_thread()
> return 0;
> }
>
> +#ifdef HAVE_REPLICATION
> +/**
> + Auxiliary function for TC_LOG::recover().
> + @returns a successfully created and inserted @c xa_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xa_recovery_member*
> +xa_member_insert(HASH *hash_arg, xid_t *xid_arg, xa_binlog_state state_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xa_recovery_member *member= (xa_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xa_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid.set(xid_arg);
> + member->state= state_arg;
> + member->in_engine_prepare= 0;
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/* Inserts or updates an existing hash member with a proper state */
> +static bool xa_member_replace(HASH *hash_arg, xid_t *xid_arg, bool is_prepare,
> + MEM_ROOT *ptr_mem_root)
> +{
> + if(is_prepare)
> + {
> + if (!(xa_member_insert(hash_arg, xid_arg, XA_PREPARE, ptr_mem_root)))
> + return true;
> + }
> + else
> + {
> + /*
> + Search if XID is already present in recovery_list. If found
> + and the state is 'XA_PREPRAED' mark it as XA_COMPLETE.
> + Effectively, there won't be XA-prepare event group replay.
> + */
> + xa_recovery_member* member;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(hash_arg, xid_arg->key(), xid_arg->key_length())))
> + {
> + if (member->state == XA_PREPARE)
> + member->state= XA_COMPLETE;
> + }
> + else // We found only XA COMMIT during recovery insert to list
> + {
> + if (!(member= xa_member_insert(hash_arg,
> + xid_arg, XA_COMPLETE, ptr_mem_root)))
> + return true;
> + }
> + }
> + return false;
> +}
> +#endif
>
> +extern "C" uchar *xid_get_var_key(xid_t *entry, size_t *length,
define the first argument as const char*, then you won't need to cast in
my_hash_init, and you can be sure you pass the correct function with the
correct number of arguments, even if my_hash_get_key changes in the future.
and cast entry to xid_t* below explicitly.
> + my_bool not_used __attribute__((unused)))
> +{
> + *length= entry->key_length();
> + return (uchar*) entry->key();
> +}
> +
> +/**
> + Performs recovery based on transaction coordinator log for 2pc. At the
> + time of crash, if the binary log was in active state, then recovery for
> + "implicit" 'xid's and explicit 'XA' transactions is initiated,
> + otherwise merely the gtid binlog state is updated.
> + For 'xid' and 'XA' based recovery the following steps are performed.
> +
> + Identify the active binlog checkpoint file.
> + Scan the binary log from the beginning.
> + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state.
> + Prepare a list of 'xid's for recovery.
> + Prepare a list of explicit 'XA' transactions for recovery.
> + Recover the 'xid' transactions.
> + The explicit 'XA' transaction recovery is initiated once all the server
> + components are initialized. Please check 'execute_xa_for_recovery()'.
why explicit XA recovery is delayed?
> +
> + Called from @c MYSQL_BIN_LOG::do_binlog_recovery()
> +
> + @param linfo Store here the found log file name and position to
> + the NEXT log file name in the index file.
> +
> + @param last_log_name Name of the last active binary log at the time of
> + crash.
> +
> + @param first_log Pointer to IO_CACHE of active binary log
> + @param fdle Format_description_log_event of active binary log
> + @param do_xa Is 2pc recovery needed for 'xid's and explicit XA
> + transactions.
> + @return indicates success or failure of recovery.
> + @retval 0 success
> + @retval 1 failure
> +
> +*/
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> Format_description_log_event *fdle, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> - MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> bool first_round;
> @@ -10533,10 +10693,22 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> + if (ha_recover(&xids, &xa_recover_list, &xa_recover_htons))
> goto err2;
>
> - free_root(&mem_root, MYF(0));
> + DBUG_ASSERT(!xa_recover_list.records ||
> + (binlog_checkpoint_found && binlog_checkpoint_name[0] != 0));
why is that?
> +
> + if (!xa_recover_list.records)
> + {
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> + }
> + else
> + {
> + xa_binlog_checkpoint_name= strmake_root(&mem_root, binlog_checkpoint_name,
> + strlen(binlog_checkpoint_name));
> + }
> my_hash_free(&xids);
> }
> return 0;
> @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> return 1;
> }
>
> +void MYSQL_BIN_LOG::execute_xa_for_recovery()
> +{
> + if (xa_recover_list.records)
> + (void) recover_explicit_xa_prepare();
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> +};
> +
> +/**
> + Performs recovery of user XA transactions.
> + 'xa_recover_list' contains the list of XA transactions to be recovered.
> + with possible replaying replication event from the binary log.
> +
> + @return indicates success or failure of recovery.
> + @retval false success
> + @retval true failure
> +
> +*/
> +bool MYSQL_BIN_LOG::recover_explicit_xa_prepare()
why is it bool, if you only use it in execute_xa_for_recovery()
and ignore the return value?
> +{
> +#ifndef HAVE_REPLICATION
> + /* Can't be supported without replication applier built in. */
> + return false;
> +#else
> + bool err= true;
> + int error=0;
> + Relay_log_info *rli= NULL;
> + rpl_group_info *rgi;
> + THD *thd= new THD(0); /* Needed by start_slave_threads */
> + thd->thread_stack= (char*) &thd;
> + thd->store_globals();
> + thd->security_ctx->skip_grants();
> + IO_CACHE log;
> + const char *errmsg;
> + File file;
> + bool enable_apply_event= false;
> + Log_event *ev = 0;
> + LOG_INFO linfo;
> + int recover_xa_count= xa_recover_list.records;
> + xa_recovery_member *member= NULL;
> +
> + if (!(rli= thd->rli_fake= new Relay_log_info(FALSE, "Recovery")))
> + {
> + my_error(ER_OUTOFMEMORY, MYF(ME_FATAL), 1);
> + goto err2;
> + }
> + rli->sql_driver_thd= thd;
> + static LEX_CSTRING connection_name= { STRING_WITH_LEN("Recovery") };
> + rli->mi= new Master_info(&connection_name, false);
> + if (!(rgi= thd->rgi_fake))
> + rgi= thd->rgi_fake= new rpl_group_info(rli);
> + rgi->thd= thd;
> + thd->system_thread_info.rpl_sql_info=
> + new rpl_sql_thread_info(rli->mi->rpl_filter);
> +
> + if (rli && !rli->relay_log.description_event_for_exec)
> + {
> + rli->relay_log.description_event_for_exec=
> + new Format_description_log_event(4);
> + }
> + if (find_log_pos(&linfo, xa_binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", xa_binlog_checkpoint_name);
> + goto err2;
> + }
> +
> + tmp_disable_binlog(thd);
> + thd->variables.pseudo_slave_mode= TRUE;
> + for (;;)
> + {
> + if ((file= open_binlog(&log, linfo.log_file_name, &errmsg)) < 0)
> + {
> + sql_print_error("%s", errmsg);
> + goto err1;
> + }
> + while (recover_xa_count > 0 &&
> + (ev= Log_event::read_log_event(&log,
> + rli->relay_log.description_event_for_exec,
> + opt_master_verify_checksum)))
> + {
> + if (!ev->is_valid())
> + {
> + sql_print_error("Found invalid binlog query event %s"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written));
> + goto err1;
> + }
> + enum Log_event_type typ= ev->get_type_code();
> + ev->thd= thd;
> +
> + if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event= true;
> +
> + if (typ == GTID_EVENT)
> + {
> + Gtid_log_event *gev= (Gtid_log_event *)ev;
> + if (gev->flags2 &
> + (Gtid_log_event::FL_PREPARED_XA | Gtid_log_event::FL_COMPLETED_XA))
> + {
> + if ((member=
> + (xa_recovery_member*) my_hash_search(&xa_recover_list,
> + gev->xid.key(),
> + gev->xid.key_length())))
> + {
> + /*
> + When XA PREPARE group of events (as flagged so) check
> + its actual binlog state which may be COMPLETED. If the
> + state is also PREPARED then analyze through
> + in_engine_prepare whether the transaction needs replay.
> + */
> + if (gev->flags2 & Gtid_log_event::FL_PREPARED_XA)
> + {
> + if (member->state == XA_PREPARE)
> + {
> + // XA prepared is not present in (some) engine then apply it
> + if (member->in_engine_prepare == 0)
> + enable_apply_event= true;
> + else if (gev->flags2 & Gtid_log_event::FL_MULTI_ENGINE_XA &&
> + xa_recover_htons > member->in_engine_prepare)
This needs a comment, something like "FL_MULTI_ENGINE_XA says the transaction
must be prepared in more than one engine. We don't know in how many, so if it's
not prepared in *all* engines we'll replay it just in case"
I presume that (a bit silly) logic will do away when you will port this
patch to use a counter instead of a flag. Please, don't push if before
that.
> + {
> + enable_apply_event= true;
> + // partially engine-prepared XA is first cleaned out prior replay
> + thd->lex->sql_command= SQLCOM_XA_ROLLBACK;
> + ha_commit_or_rollback_by_xid(&gev->xid, 0);
> + }
> + else
> + --recover_xa_count;
> + }
> + }
> + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA)
> + {
> + if (member->state == XA_COMPLETE &&
> + member->in_engine_prepare > 0)
> + enable_apply_event= true;
why? you cannot replay a fully prepared and partially committed transaction
> + else
> + --recover_xa_count;
> + }
> + }
> + }
> + }
> +
> + if (enable_apply_event)
> + {
> + if ((err= ev->apply_event(rgi)))
> + {
> + sql_print_error("Failed to execute binlog query event of type: %s,"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written),
> + thd->get_stmt_da()->sql_errno(),
> + thd->get_stmt_da()->message());
> + delete ev;
> + goto err1;
> + }
> + else if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event=false;
how can that happen?
> + else if (thd->lex->sql_command == SQLCOM_XA_PREPARE ||
> + thd->lex->sql_command == SQLCOM_XA_COMMIT ||
> + thd->lex->sql_command == SQLCOM_XA_ROLLBACK)
> + {
> + --recover_xa_count;
> + enable_apply_event=false;
> +
> + sql_print_information("Binlog event %s at %s:%llu"
> + " successfully applied",
> + typ == XA_PREPARE_LOG_EVENT ?
> + static_cast<XA_prepare_log_event *>(ev)->get_query() :
> + static_cast<Query_log_event *>(ev)->query,
> + linfo.log_file_name, (ev->log_pos - ev->data_written));
> + }
> + }
> + if (typ != FORMAT_DESCRIPTION_EVENT)
> + delete ev;
> + }
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + file= -1;
> + if (unlikely((error= find_next_log(&linfo, 1))))
> + {
> + if (error != LOG_INFO_EOF)
> + sql_print_error("find_log_pos() failed (error: %d)", error);
> + else
> + break;
> + }
> + }
> +err1:
> + reenable_binlog(thd);
> + /*
> + There should be no more XA transactions to recover upon successful
> + completion.
> + */
> + if (recover_xa_count > 0)
> + goto err2;
> + sql_print_information("Crash recovery finished.");
> + err= false;
> +err2:
> + if (file >= 0)
> + {
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + thd->variables.pseudo_slave_mode= FALSE;
> + delete rli->mi;
> + delete thd->system_thread_info.rpl_sql_info;
> + rgi->slave_close_thread_tables(thd);
> + thd->reset_globals();
> + delete thd;
> +
> + return err;
> +#endif /* !HAVE_REPLICATION */
> +}
>
> int
> MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Hi Alexey,
Please find below final bits of input. After these are addressed the patch
will probably be good to push (But I'll need a final pass before giving ok to
push).
== json_table_mysql* tests ==
* Please remove one of the json_table_mysql and json_table_mysql2 tests.
* Please move the remaining test to be in the same suite as json_table.test.
== Fix json_table.test ==
The test has this somewhere in the middle:
set optimizer_switch='firstmatch=off';
Please restore the original setting.
== ha_json_table::rnd_next() returns 0 on error ==
Consider this query:
select * from
json_table(
'[
{"name": "X"},
{"name2": "Y"}
]',
'$[*]' columns
(
col1 varchar(100) path '$.name2' error on empty
)) as T;
The query produces an error.
It happens like so:
* the call to ha_json_table::rnd_next() returns 0.
* The SQL layer finds out about the error by checking thd->is_error() which returns true.
Please make ha_json_table::rnd_next() return an error.
== Rename table_function.* ==
Looking at the contets of sql/table_function.{h,cc}, one can see that there's
very little code there that is applicable to generic table function. Almost
all code is specifi to JSON_TABLE.
Because of that, please rename the files to e.g. json_table.{h,cc}.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
5
[Maria-developers] MDEV-17399: JSON_TABLE: Odd code in table.cc:create_view_field ?
by Sergey Petrunia 12 Mar '21
by Sergey Petrunia 12 Mar '21
12 Mar '21
Hi Alexey,
What does the code quoted below do? I don't recall seeing it on previous review
iterations.
In any case,
* It needs a comment about why such special handling is needed.
* It needs test coverage - I have reverted these changes and didn't see
any test to fail?
> diff --git a/sql/table.cc b/sql/table.cc
> index 4f65dbd65f4..9c205fc4be6 100644
> --- a/sql/table.cc
> +++ b/sql/table.cc
> @@ -6722,6 +6722,8 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> LEX_CSTRING *name)
> {
> bool save_wrapper= thd->lex->first_select_lex()->no_wrap_view_item;
> + bool *wrapper_to_set= thd->lex->current_select ?
> + &thd->lex->current_select->no_wrap_view_item : &save_wrapper;
> Item *field= *field_ref;
> DBUG_ENTER("create_view_field");
>
> @@ -6737,17 +6739,17 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> }
>
> DBUG_ASSERT(field);
> - thd->lex->current_select->no_wrap_view_item= TRUE;
> + *wrapper_to_set= TRUE;
> if (!field->is_fixed())
> {
> if (field->fix_fields(thd, field_ref))
> {
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> DBUG_RETURN(0);
> }
> field= *field_ref;
> }
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> if (save_wrapper)
> {
> DBUG_RETURN(field);
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
1
Hi MariaDB community!
Glad to be here! My github account is @HollowMan6. Though I'm new to MariaDB community, I'm interested in MDEV-16375 & MDEV-23143: Function to normalize a json value & missing a JSON_EQUALS function for this year's GSOC project. Here are my first thoughts on these issues:
I have checked part of the codebase and I think the two issues can be merged into one. First we can create a function named JSON_NORMALIZE to normalize the json, which automatically parses the inputed json document, recursively sorts the keys (for objects) / sorts the numbers (for arrays), removes the spaces, and then return the json document string.
Then we create a function named JSON_EQUALS, which can be used to compare 2 json documents for equality realized by first seperately normalize the two json documents using JSON_NORMALIZE, then the 2 can be compared exactly as binary strings.
I have taken some inspirations from the Item_func_json_keys and json_scan_start for parsing json documents, and I think it's possible to sort the keys using std::map in STL for objects.
That's all for my ideas so far. Please correct me if I made some mistakes, and I'm going to work on my ideas later.
Cheers!
Hollow Man
1
0