developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 4 participants
- 6818 discussions
Re: [Maria-developers] f1db957c5da: MDEV-21469: Implement crash-safe binary logging of the user XA
by Sergei Golubchik 14 Mar '21
by Sergei Golubchik 14 Mar '21
14 Mar '21
Hi, Sujatha!
Note, this is a review of a combined diff dc92235f21e + f1db957c5da
On Mar 14, Sujatha wrote:
> revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da)
> parent(s): dc92235f21e
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2020-12-21 16:10:46 +0200
> message:
>
> MDEV-21469: Implement crash-safe binary logging of the user XA
>
> Make XA PREPARE, XA COMMIT and XA ROLLBACK statements
> crash-safe when --log-bin is specified.
>
> At execution of XA PREPARE, XA COMMIT and XA ROLLBACK their replication
> events are made written into the binary log prior to execution
> of the commands in storage engines.
> In case the server crashes, after writing to binary log but before all
> involved engines have processed the command, the following recovery
> will execute the command's replication events to equalize the states
> of involved engines with that of binlog.
> That applies to all XA PREPARE *group* and XA COMMIT or ROLLBACK.
>
> On the implementation level the recovery time binary log parsing
> is augmented to pay attention to
> the user XA xids to identify the XA transactions' state:s in binary log, and
> eventually match them against their states in engines, see
> MYSQL_BIN_LOG::recover_explicit_xa_prepare().
> In discrepancy cases the outdated state in the engines is corrected with
> resubmitting the transaction prepare group of events, or completion
> ones. The multi-engine partly prepared XA PREPARE case
> the XA is rolled back first.
> The fact of multiple-engine involved is registered into
> Gtid_log_event::flags2 as one bit. The boolean value is
> sufficient and precise to deal with two engines in XA transaction.
> With more than 2 recoverable engines the flag method is still correct though
> may be pessimistic, as it treats all recoverable engines as XA
> participants. So when the number of such Engines exceeds the number
> of prepared engines of the XA that XA is treated
> as partially completed, with all that ensued.
> As an optimization no new bit is allocated in flags2, instead a
> pre-existing ones (of MDEV-742) are reused, observing that
> A. XA "COMPLETE" does not require multi-engine hint for its recovery and that
> B. the MDEV-742 XA-completion bit is not anyhow used by
> XA-PREPARE and its GTID log event.
>
> Notice the multi-engine flagging is superceded by MDEV-21117 extension
> in Gtid log event so this part should be taken from there.
> diff --git a/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> new file mode 100644
> index 00000000000..d0852de2fcc
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> @@ -0,0 +1,86 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA crash recovery works fine across multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate an explicit XA transaction. Using debug simulation hold the
"debug sync" != "debug sim" :)
> +# execution of XA PREPARE statement after the XA PREPARE is written to
> +# the binary log. With this the prepare will not be done in engine.
> +# 1 - By executing FLUSH LOGS generate multiple binary logs.
> +# 2 - Now make the server to disappear at this point.
> +# 3 - Restart the server. During recovery the XA PREPARE from the binary
> +# log will be read. It is cross checked with engine. Since it is not
> +# present in engine it will be executed once again.
> +# 4 - When server is up execute XA RECOVER to check that the XA is
> +# prepared in engine as well.
> +# 5 - XA COMMIT the transaction and check the validity of the data.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +#
> +
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_log_bin.inc
> +
> +RESET MASTER;
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +connect(con1,localhost,root,,);
> +XA START 'xa1';
> +INSERT INTO t1 SET a=1;
> +SET DEBUG_SYNC= "simulate_hang_after_binlog_prepare SIGNAL con1_ready WAIT_FOR con1_go";
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +XA END 'xa1';
> +--send XA PREPARE 'xa1';
> +
> +connection default;
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +FLUSH LOGS;
> +FLUSH LOGS;
> +FLUSH LOGS;
> +
> +--source include/show_binary_logs.inc
> +--let $binlog_file= master-bin.000004
> +--let $binlog_start= 4
> +--source include/show_binlog_events.inc
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
I don't see why you'd need to wait first, it'd be simpler to restart right away.
> +EOF
> +
> +--error 0,2013
> +SET DEBUG_SYNC= "now SIGNAL con1_go";
> +--source include/wait_until_disconnected.inc
> +
> +--connection con1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +
> +SELECT * FROM t1;
> +
> +# Clean up.
> +connection default;
> +DROP TABLE t1;
> +
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> new file mode 100644
> index 00000000000..e972e3f09de
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> @@ -0,0 +1,98 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The COMMIT should be executed during recovery.
> +# 3 - Check the data in table. Both rows should be present in table.
> +# 4 - Trying to commit 'xa2' should report unknown 'XA' error as COMMIT is
> +# already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
Same. Why do you wait?
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> new file mode 100644
> index 00000000000..14cebbd9b13
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> @@ -0,0 +1,119 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that if for some reason an event cannot be applied during
> +# recovery, appropriate error is reported.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. Using debug simulation point make XA COMMIT 'xa2'
> +# execution to fail. The server will resume anyway
> +# to leave the error in the errlog (see "Recovery: Error..").
> +# 3 - Work around the simulated failure with Commit once again
> +# from a connection that turns OFF binlogging.
> +# Slave must catch up with the master.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CALL mtr.add_suppression("Recovery: Error .Out of memory..");
> +CALL mtr.add_suppression("Crash recovery failed.");
> +CALL mtr.add_suppression("Can.t init tc log");
> +CALL mtr.add_suppression("Aborting");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --debug-dbug=d,trans_xa_commit_fail
> +EOF
> +
> +connection default;
> +--source include/wait_until_disconnected.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--echo *** must be no 'xa2' commit seen, as it's still prepared:
> +SELECT * FROM t;
> +XA RECOVER;
> +
> +# Commit it manually now to work around the extra binlog record
> +# by turning binlogging OFF by the connection.
> +
> +SET GLOBAL DEBUG_DBUG="";
> +SET SQL_LOG_BIN=0;
> +--error 0
why error 0? What will happen if you don't disable binlog?
> +XA COMMIT 'xa2';
> +SET SQL_LOG_BIN=1;
> +
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +
> +--sync_slave_with_master
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> new file mode 100644
> index 00000000000..7b987c7f29b
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> @@ -0,0 +1,95 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA PREPARE be done in binary log and crash the
> +# server so that it is not prepared in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' is
> +# present.
> +# 4 - Commit the XA transaction and verify its correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ditto
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> new file mode 100644
> index 00000000000..9d2c5cce528
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> @@ -0,0 +1,117 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 3 explicit XA transactions. 'xa1', 'xa2' and 'xa3'.
> +# Using debug simulation hold the execution of second XA PREPARE
> +# statement after the XA PREPARE is written to the binary log.
> +# With this the prepare will not be done in engine.
> +# 1 - For 'xa3' allow the PREPARE statement to be written to binary log and
> +# simulate server crash.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2' and 'xa3'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' and 'xa3'
> +# are present along with 'xa1'.
> +# 4 - Commit all the XA transactions and verify their correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ok, in this test you actually use master2 :)
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Found 2 prepared XA transactions");
> +CALL mtr.add_suppression("Found 3 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +use test;
> +xa start 'xa2';
> +insert into t values (30);
> +xa end 'xa2';
> +SET DEBUG_SYNC="simulate_hang_after_binlog_prepare SIGNAL reached WAIT_FOR go";
> +send xa prepare 'xa2';
> +
> +--connection master2
> +let $wait_condition=
> + SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
> + WHERE STATE like "debug sync point: simulate_hang_after_binlog_prepare%";
> +--source include/wait_condition.inc
eh? you can just 'WAIT_FOR signal', couldn't you?
> +
> +XA START 'xa3';
> +INSERT INTO t VALUES (40);
> +XA END 'xa3';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa3';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +XA COMMIT 'xa2';
> +XA COMMIT 'xa3';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> new file mode 100644
> index 00000000000..1d19f96116e
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> @@ -0,0 +1,97 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
XA ROLLBACK
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA ROLLBACK be done in binary log and crash the
> +# server so that it is not committed in engine.
not rolled back
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The ROLLBACK should be executed during recovery.
> +# 3 - Check the data in table. Only one row should be present in table.
> +# 4 - Trying to rollback 'xa2' should report unknown 'XA' error as rollback
> +# is already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
unused again
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA ROLLBACK 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA ROLLBACK 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> new file mode 100644
> index 00000000000..c7cefbda4bf
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> @@ -0,0 +1,46 @@
> +# MDEV-742, MDEV-21469 XA replication, and xa crash-safe.
> +# Tests prove xa state is recovered to a prepared or completed state
> +# upon post-crash recovery, incl a multi-engine case.
> +
> +--source include/have_rocksdb.inc
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_binlog_format_row.inc
> +
> +--connection slave
> +--source include/stop_slave.inc
> +
> +--connection master
> +CALL mtr.add_suppression("Found . prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=rocksdb;
> +CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=innodb;
> +INSERT INTO t1 SET a = 1;
> +INSERT INTO t2 SET a = 1;
> +
> +--let $xa=xa1
> +--let $when=after_binlog
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--let $xa=xa2
> +--let $when=after_first_engine
> +--let $finally_expected=$xa in the list
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +
> +--connection master
> +--sync_slave_with_master
> +let $diff_tables= master:t1, slave:t1;
> +source include/diff_tables.inc;
> +let $diff_tables= master:t2, slave:t2;
> +source include/diff_tables.inc;
> +
> +--connection master
> +SET SESSION DEBUG_DBUG="";
> +drop table t1,t2;
> +--sync_slave_with_master
> +
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> new file mode 100644
> index 00000000000..312b8b6a19e
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> @@ -0,0 +1,67 @@
> +# callee of rpl_rocksdb_xa_recover.test
> +# requires t1,t2 as defined in the caller.
> +
> +--let $at=_prepare
> +--let $finally_expected=$xa in the list
> +--let $init_val=`SELECT MAX(a) from t1 as t_1`
why as t_1 ?
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 1
> +--eval INSERT INTO t2 SET a=$init_val + 1
> +--eval XA END '$xa'
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA PREPARE '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval XA ROLLBACK '$xa'
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (rolled back)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 2
> +--eval INSERT INTO t2 SET a=$init_val + 2
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA ROLLBACK '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (committed away)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 3
> +--eval INSERT INTO t2 SET a=$init_val + 3
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA COMMIT '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as three FROM t1
> +--eval SELECT MAX(a) - $init_val as three FROM t2
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 639cbfbe7aa..e1880339e85 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -3614,6 +3616,12 @@ class Gtid_log_event: public Log_event
> static const uchar FL_PREPARED_XA= 64;
> /* FL_"COMMITTED or ROLLED-BACK"_XA is set for XA transaction. */
> static const uchar FL_COMPLETED_XA= 128;
> + /*
> + To mark the fact of multiple transactional engine participants
> + in the prepared XA. The FL_COMPLETED_XA bit is reused by XA_PREPARE_LOG_EVENT,
What do you mean by that? XA_prepare_log_event does not inherit from
Gtid_log_event, it cannot use Gtid_log_event flags.
And, please, explain in a comment why you can reuse a bit for two different
flags. Like "the ambiguity between FL_COMPLETED_XA and FL_MULTI_ENGINE_XA is
resolved by the flag FL_PREPARED_XA. if it's set, then 128 means
FL_MULTI_ENGINE_XA, if it's not set, then 128 means FL_COMPLETED_XA"
> + oth the XA completion events do not need such marking.
other
> + */
> + static const uchar FL_MULTI_ENGINE_XA= 128;
>
> #ifdef MYSQL_SERVER
> Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
> diff --git a/sql/handler.h b/sql/handler.h
> index 0eff7bd930d..9acdd47bfc2 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -1880,8 +1891,18 @@ class Ha_trx_info
> m_ht= ht_arg;
> m_flags= (int) TRX_READ_ONLY; /* Assume read-only at start. */
>
> - m_next= trans->ha_list;
> - trans->ha_list= this;
> + if (likely(!past_head))
> + {
> + m_next= trans->ha_list;
> + trans->ha_list= this;
> + }
> + else
> + {
> + DBUG_ASSERT(trans->ha_list);
> +
> + m_next= trans->ha_list->m_next;
> + trans->ha_list->m_next= this;
> + }
I don't like this fake generalization. There's no point for an arbitrary
engine to be put at a specific place in the list. Only binlog is special.
At least, remove this new argument and check for binlog_hton here.
But really, think of this - binlog is so special, may be it'd be easier not
to pretend it's a storage engine at all? No need for a fake binlog_hton,
no need to register it, etc. It kind of made sense in 2005, but it doesn't
necessarily make it now.
> }
>
> /** Clear, prepare for reuse. */
> diff --git a/sql/handler.cc b/sql/handler.cc
> index ffb89b30d92..b981294f0d6 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1338,6 +1345,9 @@ int ha_prepare(THD *thd)
> handlerton *ht= ha_info->ht();
> if (ht->prepare)
> {
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_prepare",
> + if (!ha_info->next()) DBUG_SUICIDE(););
looks like not simulate_crash_after_first_engine_prepare, but
more like simulate_crash_before_last_engine_prepare.
it's the same point in time if you have only two engines,
but the code still implements "before last", not "after first"
> +
> if (unlikely(prepare_or_error(ht, thd, all)))
> {
> ha_rollback_trans(thd, all);
> @@ -1368,22 +1378,22 @@ int ha_prepare(THD *thd)
> }
>
> /*
> - Like ha_check_and_coalesce_trx_read_only to return counted number of
> - read-write transaction participants limited to two, but works in the 'all'
> - context.
> - Also returns the last found rw ha_info through the 2nd argument.
> + Returns counted number of
> + read-write recoverable transaction participants optionally limited to two.
> + Also optionally returns the last found rw ha_info through the 2nd argument.
> */
> -uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info)
> +uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info, bool count_through)
I don't understand why you need a new argument. ptr_ha_info==NULL means don't
return a value there, this is a standard convention used everywhere.
also, it's not an "count rw all" it's are there more rw-all htons than N
where N can be 1 or 2.
So:
* either take N as an argument and return as soon as rw_ha_count > N
* or don't overoptimize here and just return full count every time
> {
> unsigned rw_ha_count= 0;
>
> for (auto ha_info= thd->transaction.all.ha_list; ha_info;
> ha_info= ha_info->next())
> {
> - if (ha_info->is_trx_read_write())
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> {
> - *ptr_ha_info= ha_info;
> - if (++rw_ha_count > 1)
> + if (ptr_ha_info)
> + *ptr_ha_info= ha_info;
> + if (++rw_ha_count > 1 && !count_through)
> break;
> }
> }
> @@ -1876,6 +1886,10 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, bool is_real_trans)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
1. same as above
2. why not to use *different* names for these two crashes?
in case, you know, you'll want to crash at a specific point in the code, not
just somewhere? And in a case you'll actually want to crash just somewhere you
can always use "+d,simulate_crash_after_first_engine_commit,simulate_crash_after_first_engine_rollback"
(and it should be "before_last", of course, not "after_first")
> +
> if ((err= ht->commit(ht, thd, all)))
> {
> my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> @@ -1987,6 +2001,10 @@ int ha_rollback_trans(THD *thd, bool all)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
ditto
> +
> if ((err= ht->rollback(ht, thd, all)))
> { // cannot happen
> my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
> @@ -2098,8 +2116,24 @@ int ha_commit_or_rollback_by_xid(XID *xid, bool commit)
> xaop.xid= xid;
> xaop.result= 1;
>
> + if (binlog_hton->recover)
> + {
> + /*
> + When the binlogging service is enabled complete the transaction
> + by it first.
> + */
> + if (commit)
> + binlog_hton->commit_by_xid(binlog_hton, xid);
> + else
> + binlog_hton->rollback_by_xid(binlog_hton, xid);
> + }
> plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
> MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
> + if (binlog_hton->recover)
> + {
> + THD *thd= current_thd;
> + thd->reset_binlog_completed_by_xid();
> + }
1. this, precisely, shows that binlog does not need a hton anymoreo
2. how can binlog_hton->recover be NULL here?
>
> return xaop.result;
> }
> @@ -2222,6 +2258,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
>
> if (hton->recover)
> {
> + info->recover_htons++;
don't forget to subtract 1 for binlog_hton->recover,
otherwise it'll make you to rollback everything with a FL_MULTI_ENGINE_XA flag
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> {
> sql_print_information("Found %d prepared transaction(s) in %s",
> @@ -2263,7 +2300,21 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> _db_doprnt_("ignore xid %s", xid_to_str(buf, info->list[i]));
> });
> xid_cache_insert(info->list + i);
> + XID *foreign_xid= info->list + i;
swap those two lines ^^^ then you can write
xid_cache_insert(foreign_xid);
> info->found_foreign_xids++;
> +
> + /*
> + For each foreign xid prepraed in engine, check if it is present in
> + xa_prepared_list of binlog.
> + */
> + if (info->xa_prepared_list)
for simplicity I'd rather always initialize xa_prepared_list and remove
this if()
> + {
> + struct xa_recovery_member *member= NULL;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(info->xa_prepared_list, foreign_xid->key(),
> + foreign_xid->key_length())))
> + member->in_engine_prepare++;
> + }
> continue;
> }
> if (IF_WSREP(!(wsrep_emulate_bin_log &&
> @@ -2362,7 +2424,7 @@ int ha_recover(HASH *commit_list)
> info.found_my_xids, opt_tc_log_file);
> DBUG_RETURN(1);
> }
> - if (info.commit_list)
> + if (info.commit_list && !info.found_foreign_xids)
Why? after the finished crash recovery one can still have transactions in
doubt.
> sql_print_information("Crash recovery finished.");
> DBUG_RETURN(0);
> }
> diff --git a/sql/log.cc b/sql/log.cc
> index 731bb3e98f0..c69b8518cf4 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton __attribute__((unused)),
> return 0;
> }
>
> -
> +/*
> + The function invokes binlog_commit() and returns its result
> + when it has not yet called it already.
> + binlog_cache_mngr::completed_by_xid remembers the fact of
> + the 1st of maximum two subsequent calls.
> +*/
> static int binlog_commit_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT);
>
> - return binlog_commit(hton, thd, TRUE);
> -}
> + if (!cache_mngr->completed_by_xid)
On what code path can you have completed_by_xid == false here?
> + {
> + rc= binlog_commit(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
>
> + return rc;
> +}
>
> +/*
> + The function invokes binlog_rollback() and returns its results similarly
> + to a commit branch.
> +*/
> static int binlog_rollback_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_ROLLBACK ||
> (thd->transaction.xid_state.get_state_code() == XA_ROLLBACK_ONLY));
> - return binlog_rollback(hton, thd, TRUE);
> +
> + if (!cache_mngr->completed_by_xid)
> + {
> + rc= binlog_rollback(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
> +
> + return rc;
> }
>
>
> @@ -2195,6 +2224,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
> if (!all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
> + DBUG_SUICIDE(););
> + DEBUG_SYNC(thd, "simulate_hang_after_binlog_prepare");
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_prepare",
> + DBUG_SUICIDE(););
wouldn't you say there's one DBUG_EXECUTE_IF too many?
Just keep one, "simulate_crash_after_binlog_commit"
> +
> THD_STAGE_INFO(thd, org_stage);
> DBUG_RETURN(error);
> }
> @@ -2298,6 +2333,9 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
> thd->reset_binlog_for_next_statement();
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
same as above, use unique dbug keywords, like
"simulate_crash_after_binlog_rollback" (because it's rollback here)
> + DBUG_SUICIDE(););
> +
> DBUG_RETURN(error);
> }
>
> @@ -5749,6 +5794,14 @@ binlog_cache_mngr *THD::binlog_setup_trx_data()
> DBUG_RETURN(cache_mngr);
> }
>
> +/*
> + XA completion flag resetter for ha_commit_or_rollback_by_xid().
> +*/
> +void THD::reset_binlog_completed_by_xid()
> +{
> + binlog_setup_trx_data()->reset_completed_by_xid();
> +}
I don't think this makes any sense as a THD method
> +
> /*
> Function to start a statement and optionally a transaction for the
> binary log.
> @@ -10333,14 +10386,108 @@ start_binlog_background_thread()
> return 0;
> }
>
> +#ifdef HAVE_REPLICATION
> +/**
> + Auxiliary function for TC_LOG::recover().
> + @returns a successfully created and inserted @c xa_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xa_recovery_member*
> +xa_member_insert(HASH *hash_arg, xid_t *xid_arg, xa_binlog_state state_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xa_recovery_member *member= (xa_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xa_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid.set(xid_arg);
> + member->state= state_arg;
> + member->in_engine_prepare= 0;
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/* Inserts or updates an existing hash member with a proper state */
> +static bool xa_member_replace(HASH *hash_arg, xid_t *xid_arg, bool is_prepare,
> + MEM_ROOT *ptr_mem_root)
> +{
> + if(is_prepare)
> + {
> + if (!(xa_member_insert(hash_arg, xid_arg, XA_PREPARE, ptr_mem_root)))
> + return true;
> + }
> + else
> + {
> + /*
> + Search if XID is already present in recovery_list. If found
> + and the state is 'XA_PREPRAED' mark it as XA_COMPLETE.
> + Effectively, there won't be XA-prepare event group replay.
> + */
> + xa_recovery_member* member;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(hash_arg, xid_arg->key(), xid_arg->key_length())))
> + {
> + if (member->state == XA_PREPARE)
> + member->state= XA_COMPLETE;
> + }
> + else // We found only XA COMMIT during recovery insert to list
> + {
> + if (!(member= xa_member_insert(hash_arg,
> + xid_arg, XA_COMPLETE, ptr_mem_root)))
> + return true;
> + }
> + }
> + return false;
> +}
> +#endif
>
> +extern "C" uchar *xid_get_var_key(xid_t *entry, size_t *length,
define the first argument as const char*, then you won't need to cast in
my_hash_init, and you can be sure you pass the correct function with the
correct number of arguments, even if my_hash_get_key changes in the future.
and cast entry to xid_t* below explicitly.
> + my_bool not_used __attribute__((unused)))
> +{
> + *length= entry->key_length();
> + return (uchar*) entry->key();
> +}
> +
> +/**
> + Performs recovery based on transaction coordinator log for 2pc. At the
> + time of crash, if the binary log was in active state, then recovery for
> + "implicit" 'xid's and explicit 'XA' transactions is initiated,
> + otherwise merely the gtid binlog state is updated.
> + For 'xid' and 'XA' based recovery the following steps are performed.
> +
> + Identify the active binlog checkpoint file.
> + Scan the binary log from the beginning.
> + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state.
> + Prepare a list of 'xid's for recovery.
> + Prepare a list of explicit 'XA' transactions for recovery.
> + Recover the 'xid' transactions.
> + The explicit 'XA' transaction recovery is initiated once all the server
> + components are initialized. Please check 'execute_xa_for_recovery()'.
why explicit XA recovery is delayed?
> +
> + Called from @c MYSQL_BIN_LOG::do_binlog_recovery()
> +
> + @param linfo Store here the found log file name and position to
> + the NEXT log file name in the index file.
> +
> + @param last_log_name Name of the last active binary log at the time of
> + crash.
> +
> + @param first_log Pointer to IO_CACHE of active binary log
> + @param fdle Format_description_log_event of active binary log
> + @param do_xa Is 2pc recovery needed for 'xid's and explicit XA
> + transactions.
> + @return indicates success or failure of recovery.
> + @retval 0 success
> + @retval 1 failure
> +
> +*/
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> Format_description_log_event *fdle, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> - MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> bool first_round;
> @@ -10533,10 +10693,22 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> + if (ha_recover(&xids, &xa_recover_list, &xa_recover_htons))
> goto err2;
>
> - free_root(&mem_root, MYF(0));
> + DBUG_ASSERT(!xa_recover_list.records ||
> + (binlog_checkpoint_found && binlog_checkpoint_name[0] != 0));
why is that?
> +
> + if (!xa_recover_list.records)
> + {
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> + }
> + else
> + {
> + xa_binlog_checkpoint_name= strmake_root(&mem_root, binlog_checkpoint_name,
> + strlen(binlog_checkpoint_name));
> + }
> my_hash_free(&xids);
> }
> return 0;
> @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> return 1;
> }
>
> +void MYSQL_BIN_LOG::execute_xa_for_recovery()
> +{
> + if (xa_recover_list.records)
> + (void) recover_explicit_xa_prepare();
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> +};
> +
> +/**
> + Performs recovery of user XA transactions.
> + 'xa_recover_list' contains the list of XA transactions to be recovered.
> + with possible replaying replication event from the binary log.
> +
> + @return indicates success or failure of recovery.
> + @retval false success
> + @retval true failure
> +
> +*/
> +bool MYSQL_BIN_LOG::recover_explicit_xa_prepare()
why is it bool, if you only use it in execute_xa_for_recovery()
and ignore the return value?
> +{
> +#ifndef HAVE_REPLICATION
> + /* Can't be supported without replication applier built in. */
> + return false;
> +#else
> + bool err= true;
> + int error=0;
> + Relay_log_info *rli= NULL;
> + rpl_group_info *rgi;
> + THD *thd= new THD(0); /* Needed by start_slave_threads */
> + thd->thread_stack= (char*) &thd;
> + thd->store_globals();
> + thd->security_ctx->skip_grants();
> + IO_CACHE log;
> + const char *errmsg;
> + File file;
> + bool enable_apply_event= false;
> + Log_event *ev = 0;
> + LOG_INFO linfo;
> + int recover_xa_count= xa_recover_list.records;
> + xa_recovery_member *member= NULL;
> +
> + if (!(rli= thd->rli_fake= new Relay_log_info(FALSE, "Recovery")))
> + {
> + my_error(ER_OUTOFMEMORY, MYF(ME_FATAL), 1);
> + goto err2;
> + }
> + rli->sql_driver_thd= thd;
> + static LEX_CSTRING connection_name= { STRING_WITH_LEN("Recovery") };
> + rli->mi= new Master_info(&connection_name, false);
> + if (!(rgi= thd->rgi_fake))
> + rgi= thd->rgi_fake= new rpl_group_info(rli);
> + rgi->thd= thd;
> + thd->system_thread_info.rpl_sql_info=
> + new rpl_sql_thread_info(rli->mi->rpl_filter);
> +
> + if (rli && !rli->relay_log.description_event_for_exec)
> + {
> + rli->relay_log.description_event_for_exec=
> + new Format_description_log_event(4);
> + }
> + if (find_log_pos(&linfo, xa_binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", xa_binlog_checkpoint_name);
> + goto err2;
> + }
> +
> + tmp_disable_binlog(thd);
> + thd->variables.pseudo_slave_mode= TRUE;
> + for (;;)
> + {
> + if ((file= open_binlog(&log, linfo.log_file_name, &errmsg)) < 0)
> + {
> + sql_print_error("%s", errmsg);
> + goto err1;
> + }
> + while (recover_xa_count > 0 &&
> + (ev= Log_event::read_log_event(&log,
> + rli->relay_log.description_event_for_exec,
> + opt_master_verify_checksum)))
> + {
> + if (!ev->is_valid())
> + {
> + sql_print_error("Found invalid binlog query event %s"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written));
> + goto err1;
> + }
> + enum Log_event_type typ= ev->get_type_code();
> + ev->thd= thd;
> +
> + if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event= true;
> +
> + if (typ == GTID_EVENT)
> + {
> + Gtid_log_event *gev= (Gtid_log_event *)ev;
> + if (gev->flags2 &
> + (Gtid_log_event::FL_PREPARED_XA | Gtid_log_event::FL_COMPLETED_XA))
> + {
> + if ((member=
> + (xa_recovery_member*) my_hash_search(&xa_recover_list,
> + gev->xid.key(),
> + gev->xid.key_length())))
> + {
> + /*
> + When XA PREPARE group of events (as flagged so) check
> + its actual binlog state which may be COMPLETED. If the
> + state is also PREPARED then analyze through
> + in_engine_prepare whether the transaction needs replay.
> + */
> + if (gev->flags2 & Gtid_log_event::FL_PREPARED_XA)
> + {
> + if (member->state == XA_PREPARE)
> + {
> + // XA prepared is not present in (some) engine then apply it
> + if (member->in_engine_prepare == 0)
> + enable_apply_event= true;
> + else if (gev->flags2 & Gtid_log_event::FL_MULTI_ENGINE_XA &&
> + xa_recover_htons > member->in_engine_prepare)
This needs a comment, something like "FL_MULTI_ENGINE_XA says the transaction
must be prepared in more than one engine. We don't know in how many, so if it's
not prepared in *all* engines we'll replay it just in case"
I presume that (a bit silly) logic will do away when you will port this
patch to use a counter instead of a flag. Please, don't push if before
that.
> + {
> + enable_apply_event= true;
> + // partially engine-prepared XA is first cleaned out prior replay
> + thd->lex->sql_command= SQLCOM_XA_ROLLBACK;
> + ha_commit_or_rollback_by_xid(&gev->xid, 0);
> + }
> + else
> + --recover_xa_count;
> + }
> + }
> + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA)
> + {
> + if (member->state == XA_COMPLETE &&
> + member->in_engine_prepare > 0)
> + enable_apply_event= true;
why? you cannot replay a fully prepared and partially committed transaction
> + else
> + --recover_xa_count;
> + }
> + }
> + }
> + }
> +
> + if (enable_apply_event)
> + {
> + if ((err= ev->apply_event(rgi)))
> + {
> + sql_print_error("Failed to execute binlog query event of type: %s,"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written),
> + thd->get_stmt_da()->sql_errno(),
> + thd->get_stmt_da()->message());
> + delete ev;
> + goto err1;
> + }
> + else if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event=false;
how can that happen?
> + else if (thd->lex->sql_command == SQLCOM_XA_PREPARE ||
> + thd->lex->sql_command == SQLCOM_XA_COMMIT ||
> + thd->lex->sql_command == SQLCOM_XA_ROLLBACK)
> + {
> + --recover_xa_count;
> + enable_apply_event=false;
> +
> + sql_print_information("Binlog event %s at %s:%llu"
> + " successfully applied",
> + typ == XA_PREPARE_LOG_EVENT ?
> + static_cast<XA_prepare_log_event *>(ev)->get_query() :
> + static_cast<Query_log_event *>(ev)->query,
> + linfo.log_file_name, (ev->log_pos - ev->data_written));
> + }
> + }
> + if (typ != FORMAT_DESCRIPTION_EVENT)
> + delete ev;
> + }
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + file= -1;
> + if (unlikely((error= find_next_log(&linfo, 1))))
> + {
> + if (error != LOG_INFO_EOF)
> + sql_print_error("find_log_pos() failed (error: %d)", error);
> + else
> + break;
> + }
> + }
> +err1:
> + reenable_binlog(thd);
> + /*
> + There should be no more XA transactions to recover upon successful
> + completion.
> + */
> + if (recover_xa_count > 0)
> + goto err2;
> + sql_print_information("Crash recovery finished.");
> + err= false;
> +err2:
> + if (file >= 0)
> + {
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + thd->variables.pseudo_slave_mode= FALSE;
> + delete rli->mi;
> + delete thd->system_thread_info.rpl_sql_info;
> + rgi->slave_close_thread_tables(thd);
> + thd->reset_globals();
> + delete thd;
> +
> + return err;
> +#endif /* !HAVE_REPLICATION */
> +}
>
> int
> MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Hi Alexey,
Please find below final bits of input. After these are addressed the patch
will probably be good to push (But I'll need a final pass before giving ok to
push).
== json_table_mysql* tests ==
* Please remove one of the json_table_mysql and json_table_mysql2 tests.
* Please move the remaining test to be in the same suite as json_table.test.
== Fix json_table.test ==
The test has this somewhere in the middle:
set optimizer_switch='firstmatch=off';
Please restore the original setting.
== ha_json_table::rnd_next() returns 0 on error ==
Consider this query:
select * from
json_table(
'[
{"name": "X"},
{"name2": "Y"}
]',
'$[*]' columns
(
col1 varchar(100) path '$.name2' error on empty
)) as T;
The query produces an error.
It happens like so:
* the call to ha_json_table::rnd_next() returns 0.
* The SQL layer finds out about the error by checking thd->is_error() which returns true.
Please make ha_json_table::rnd_next() return an error.
== Rename table_function.* ==
Looking at the contets of sql/table_function.{h,cc}, one can see that there's
very little code there that is applicable to generic table function. Almost
all code is specifi to JSON_TABLE.
Because of that, please rename the files to e.g. json_table.{h,cc}.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
5
[Maria-developers] MDEV-17399: JSON_TABLE: Odd code in table.cc:create_view_field ?
by Sergey Petrunia 12 Mar '21
by Sergey Petrunia 12 Mar '21
12 Mar '21
Hi Alexey,
What does the code quoted below do? I don't recall seeing it on previous review
iterations.
In any case,
* It needs a comment about why such special handling is needed.
* It needs test coverage - I have reverted these changes and didn't see
any test to fail?
> diff --git a/sql/table.cc b/sql/table.cc
> index 4f65dbd65f4..9c205fc4be6 100644
> --- a/sql/table.cc
> +++ b/sql/table.cc
> @@ -6722,6 +6722,8 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> LEX_CSTRING *name)
> {
> bool save_wrapper= thd->lex->first_select_lex()->no_wrap_view_item;
> + bool *wrapper_to_set= thd->lex->current_select ?
> + &thd->lex->current_select->no_wrap_view_item : &save_wrapper;
> Item *field= *field_ref;
> DBUG_ENTER("create_view_field");
>
> @@ -6737,17 +6739,17 @@ Item *create_view_field(THD *thd, TABLE_LIST *view, Item **field_ref,
> }
>
> DBUG_ASSERT(field);
> - thd->lex->current_select->no_wrap_view_item= TRUE;
> + *wrapper_to_set= TRUE;
> if (!field->is_fixed())
> {
> if (field->fix_fields(thd, field_ref))
> {
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> DBUG_RETURN(0);
> }
> field= *field_ref;
> }
> - thd->lex->current_select->no_wrap_view_item= save_wrapper;
> + *wrapper_to_set= save_wrapper;
> if (save_wrapper)
> {
> DBUG_RETURN(field);
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
1
Hi MariaDB community!
Glad to be here! My github account is @HollowMan6. Though I'm new to MariaDB community, I'm interested in MDEV-16375 & MDEV-23143: Function to normalize a json value & missing a JSON_EQUALS function for this year's GSOC project. Here are my first thoughts on these issues:
I have checked part of the codebase and I think the two issues can be merged into one. First we can create a function named JSON_NORMALIZE to normalize the json, which automatically parses the inputed json document, recursively sorts the keys (for objects) / sorts the numbers (for arrays), removes the spaces, and then return the json document string.
Then we create a function named JSON_EQUALS, which can be used to compare 2 json documents for equality realized by first seperately normalize the two json documents using JSON_NORMALIZE, then the 2 can be compared exactly as binary strings.
I have taken some inspirations from the Item_func_json_keys and json_scan_start for parsing json documents, and I think it's possible to sort the keys using std::map in STL for objects.
That's all for my ideas so far. Please correct me if I made some mistakes, and I'm going to work on my ideas later.
Cheers!
Hollow Man
1
0
Re: [Maria-developers] MDEV-25075: Ignorable index makes the resulting CREATE TABLE invalid
by Sergey Petrunia 10 Mar '21
by Sergey Petrunia 10 Mar '21
10 Mar '21
Hi Varun,
> diff --git a/sql/lex.h b/sql/lex.h
> index 542356c0e433..5a9ec2ec1b35 100644
> --- a/sql/lex.h
> +++ b/sql/lex.h
> @@ -289,6 +289,7 @@ static SYMBOL symbols[] = {
> { "IDENTIFIED", SYM(IDENTIFIED_SYM)},
> { "IF", SYM(IF_SYM)},
> { "IGNORE", SYM(IGNORE_SYM)},
> + { "IGNORED", SYM(IGNORED_SYM)},
> { "IGNORE_DOMAIN_IDS", SYM(IGNORE_DOMAIN_IDS_SYM)},
> { "IGNORE_SERVER_IDS", SYM(IGNORE_SERVER_IDS_SYM)},
> { "IMMEDIATE", SYM(IMMEDIATE_SYM)},
Above this array one can see:
>> ...
>> NOTE!!
>> If you add or delete symbols from this file, you must also update results for
>> the perfschema.start_server_low_digest_sql_length test!
>> */
>>
>> static SYMBOL symbols[] = {
and indeed the buildbot shows failures, e.g.
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64/build…
CURRENT_TEST: perfschema.start_server_low_digest_sql_length
--- /usr/share/mysql-test/suite/perfschema/r/start_server_low_digest_sql_length.result 2021-03-10 06:23:20.000000000 +0000
+++ /dev/shm/var/4/log/start_server_low_digest_sql_length.reject 2021-03-10 07:37:16.351213429 +0000
@@ -8,5 +8,5 @@
####################################
SELECT event_name, digest, digest_text, sql_text FROM events_statements_history_long;
event_name digest digest_text sql_text
-statement/sql/select beb5bd93b7e8c45bc5cb6060804988e8 SELECT ? + ? + SELECT ...
-statement/sql/truncate faf6cefb662b443f05e97b5c5ab14a59 TRUNCATE TABLE truncat...
+statement/sql/select ade774bdfbc132a71810ede8ef469660 SELECT ? + ? + SELECT ...
+statement/sql/truncate 0f84807fb4a75d0f391f8a93e7c3c182 TRUNCATE TABLE truncat...
Please fix this.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
1
Hi, Rucha!
On Mar 08, Rucha Deodhar wrote:
> revision-id: bf677c554ad (mariadb-10.5.2-402-gbf677c554ad)
> parent(s): a1542f8a573
> author: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> committer: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> timestamp: 2021-02-22 00:37:18 +0530
> message:
>
> UTF8 and utf8
Two main comments:
* your global search-and-replace utf8 to utf8mb3 everywhere has replaced
way too much. Please, revert it.
* why did you have so many changes in tests with utf8mb4? You set the
old-mode to be utf8_is_utf8mb3 by default. It should've applied to
tests too, right? So why tests have switched to utf8mb4?
Other comments below.
> diff --git a/debian/additions/innotop/innotop b/debian/additions/innotop/innotop
> index 19229a57a82..4f1e76ea497 100644
> --- a/debian/additions/innotop/innotop
> +++ b/debian/additions/innotop/innotop
> @@ -264,9 +264,9 @@ sub get_dbh {
> MKDEBUG && _d("$dbh: $sql");
> $dbh->do($sql);
> MKDEBUG && _d('Enabling charset for STDOUT');
> - if ( $charset eq 'utf8' ) {
> - binmode(STDOUT, ':utf8')
> - or die "Can't binmode(STDOUT, ':utf8'): $OS_ERROR";
> + if ( $charset eq 'utf8mb3' ) {
> + binmode(STDOUT, ':utf8mb3')
> + or die "Can't binmode(STDOUT, ':utf8mb3'): $OS_ERROR";
It seems you performed search-and-replace over the whole source tree.
I don't think it was particularly meaningful.
Here, for example, it was wrong, it's perl code and perl charser utf8,
there's no utf8mb3 in perl.
also all internal variables you've renamed, like below:
> }
> else {
> binmode(STDOUT) or die "Can't binmode(STDOUT): $OS_ERROR";
> diff --git a/sql/field.h b/sql/field.h
> index 4a4f7cee2a5..f672ae7e315 100644
> --- a/sql/field.h
> +++ b/sql/field.h
> @@ -582,7 +582,7 @@ class Virtual_column_info: public Sql_alloc,
> public:
> /* Flag indicating that the field is physically stored in the database */
> bool stored_in_db;
> - bool utf8; /* Already in utf8 */
> + bool utf8mb3; /* Already in utf8 */
I mean, here ^^^
> bool automatic_name;
> Item *expr;
> Lex_ident name; /* Name of constraint */
> diff --git a/client/mysql.cc b/client/mysql.cc
> index f2938d4c824..b0ee239c0bc 100644
> --- a/client/mysql.cc
> +++ b/client/mysql.cc
> @@ -1918,6 +1918,10 @@ static int get_options(int argc, char **argv)
>
> if ((ho_error=handle_options(&argc, &argv, my_long_options, get_one_option)))
> return(ho_error);
> + if (!strcasecmp("utf8",default_charset))
> + {
> + default_charset=(char*)"utf8mb3";
> + }
why is that? (and the same everywhere else)
>
> *mysql_params->p_max_allowed_packet= opt_max_allowed_packet;
> *mysql_params->p_net_buffer_length= opt_net_buffer_length;
> diff --git a/client/mysqldump.c b/client/mysqldump.c
> index 7c363973da2..ad69618fbd1 100644
> --- a/client/mysqldump.c
> +++ b/client/mysqldump.c
> @@ -3160,7 +3160,7 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
>
> fprintf(sql_file,
> "SET @saved_cs_client = @@character_set_client;\n"
> - "SET character_set_client = utf8;\n"
> + "SET character_set_client = utf8mb3;\n"
why is that?
> "/*!50001 CREATE TABLE %s (\n",
> result_table);
>
> diff --git a/include/m_ctype.h b/include/m_ctype.h
> index 5fa8f28ff7a..f1a8166d9df 100644
> --- a/include/m_ctype.h
> +++ b/include/m_ctype.h
> @@ -223,6 +223,7 @@ extern MY_UNI_CTYPE my_uni_ctype[256];
> #define MY_CS_NON1TO1 0x40000 /* Has a complex mapping from characters
> to weights, e.g. contractions, expansions,
> ignorable characters */
> +#define MY_CS_UTF8_IS_UTF8MB3 0x80000
this isn't a MY_CS_xxx flag, it doesn't belong here,
see all other MY_CS_xxx flags - they are *properties* of a charset.
They are set as
struct charset_info_st my_charset_tis620_thai_ci=
{
18,0,0, /* number */
MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_NON1TO1, /* state */
charset_name_tis620, /* cs name */
"tis620_thai_ci", /* name */
that is, when defining a charset.
See below
> #define MY_CHARSET_UNDEFINED 0
>
> /* Character repertoire flags */
> diff --git a/man/mysql.1 b/man/mysql.1
> index 03f23df3660..5ad953ba5eb 100644
> --- a/man/mysql.1
> +++ b/man/mysql.1
> @@ -401,7 +401,7 @@ Use
> as the default character set for the client and connection\&.
> .sp
> A common issue that can occur when the operating system uses
> -utf8
> +utf8mb3
this didn't need changing
> or another multi\-byte character set is that output from the
> \fBmysql\fR
> client is formatted incorrectly, due to the fact that the MariaDB client uses the
> diff --git a/mysys/charset.c b/mysys/charset.c
> index 32cfeb56e2d..9d035406e8d 100644
> --- a/mysys/charset.c
> +++ b/mysys/charset.c
> @@ -361,7 +361,7 @@ static int add_collation(struct charset_info_st *cs)
> newcs->state|= MY_CS_AVAILABLE | MY_CS_LOADED | MY_CS_NONASCII;
> #endif
> }
> - else if (!strcmp(cs->csname, "utf8") || !strcmp(cs->csname, "utf8mb3"))
> + else if (!strcmp(cs->csname, "utf8mb3") || !strcmp(cs->csname, "utf8mb3"))
one more result of your mechanical search-and-replace,
now you compare with "utf8mb3" twice.
> {
> #if defined (HAVE_CHARSET_utf8mb3) && defined(HAVE_UCA_COLLATIONS)
> copy_uca_collation(newcs, newcs->state & MY_CS_NOPAD ?
> @@ -767,7 +767,7 @@ static const char*
> get_charset_name_alias(const char *name)
> {
> if (!my_strcasecmp(&my_charset_latin1, name, "utf8mb3"))
> - return "utf8";
> + return "utf8mb3";
and again
> return NULL;
> }
>
> @@ -1004,13 +1004,19 @@ CHARSET_INFO *
> get_charset_by_csname(const char *cs_name, uint cs_flags, myf flags)
> {
> MY_CHARSET_LOADER loader;
> +
> + if (!strcasecmp(cs_name, "utf8"))
> + {
> + cs_name = (const char*)(cs_flags & MY_CS_UTF8_IS_UTF8MB3 ? "utf8mb3" : "utf8mb4");
your flag should be in `myf flags` not in `uint cs_flags`
(and defined in my_sys.h with other myf flags)
> + }
> +
> my_charset_loader_init_mysys(&loader);
> return my_charset_get_by_name(&loader, cs_name, cs_flags, flags);
> }
>
>
> /**
> - Resolve character set by the character set name (utf8, latin1, ...).
> + Resolve character set by the character set name (utf8mb3, latin1, ...).
>
> The function tries to resolve character set by the specified name. If
> there is character set with the given name, it is assigned to the "cs"
> @@ -1453,8 +1459,8 @@ static const MY_CSET_OS_NAME charsets[] =
>
> {"US-ASCII", "latin1", my_cs_approx},
>
> - {"utf8", "utf8", my_cs_exact},
> - {"utf-8", "utf8", my_cs_exact},
> + {"utf8mb3", "utf8mb3", my_cs_exact},
> + {"utf-8mb3", "utf8mb3", my_cs_exact},
utf-8mb3 really? :)
> #endif
> {NULL, NULL, 0}
> };
> diff --git a/plugin/win_auth_client/common.cc b/plugin/win_auth_client/common.cc
> index 8b7319252ac..f46b0ec9696 100644
> --- a/plugin/win_auth_client/common.cc
> +++ b/plugin/win_auth_client/common.cc
> @@ -338,7 +338,7 @@ UPN::UPN(): m_buf(NULL)
> m_buf= wchar_to_utf8(buf1, &m_len);
>
> if(!m_buf)
> - ERROR_LOG(ERROR, ("Failed to convert UPN to utf8"));
> + ERROR_LOG(ERROR, ("Failed to convert UPN to utf8mb3"));
probably wrong too (everything in this file), see the code - it uses
windows utf8 charset, not mariadb's utf8
>
> // Note: possible error would be indicated by the fact that m_buf is NULL.
> return;
> diff --git a/sql/mysqld.cc b/sql/mysqld.cc
> index 0bf21e02002..fd131497f8c 100644
> --- a/sql/mysqld.cc
> +++ b/sql/mysqld.cc
> @@ -4053,6 +4056,24 @@ static int init_common_variables()
> break;
> }
>
> + if (default_collation_name)
> + {
> + char *copy_of_name= (char*)default_collation_name;
> + char start[6];
> + strncpy(start, default_collation_name, 5);
> + char *fname= (char *)"utf8mb3_";
> + if (! strncasecmp("utf8_", start,5))
> + {
> + copy_of_name+= 5;
> + char result[64];
> + result[63]='\0';
> + strcpy(result, fname);
> + strcat(result, copy_of_name);
> + result[strlen(copy_of_name)+strlen(fname)]='\0';
> + default_collation_name= (char *) result;
> + }
> + }
1. wrong indentation
2. shouldn't it be in the get_charset_by_name ?
3. shouldn't it depend on the UTF8_IS_UTF8MB3 ?
> +
> if (default_collation_name)
> {
> CHARSET_INFO *default_collation;
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 50b746fe514..f0616b40361 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -5248,6 +5249,65 @@ class THD: public THD_count, /* this must be first */
> Item *sp_prepare_func_item(Item **it_addr, uint cols= 1);
> bool sp_eval_expr(Field *result_field, Item **expr_item_ptr);
>
> + CHARSET_INFO *get_charset_by_csname(const char *cs_name,
> + uint cs_flags,
> + myf myf_flags) const
> + {
> + return ::get_charset_by_csname(cs_name,
> + this->variables.old_behavior & OLD_MODE_UTF8_IS_UTF8MB3 ?
> + cs_flags | MY_CS_UTF8_IS_UTF8MB3 : cs_flags,
> + myf_flags);
> + }
> +
> + inline const char* get_collation_or_charset_name(const char* name)
shouldn't the name be get_collation_name_alias() ?
all other method names you've created match corresponding
charset-related function names
> + {
> + char *copy_of_name= (char*)name;
> + char start[6];
> + strncpy(start, name, 5);
> + char *fname= (char *)(this->variables.old_behavior & OLD_MODE_UTF8_IS_UTF8MB3?"utf8mb3_":"utf8mb4_");
> + if (!strncasecmp("utf8_", start,5))
> + {
> + copy_of_name+= 5;
> + char result[64];
> + result[63]='\0';
> + strcpy(result, fname);
> + strcat(result, copy_of_name);
> + result[strlen(copy_of_name)+strlen(fname)]='\0';
> + name= (const char *) result;
> + }
> + return name;
here you return a pointer to the local variable
also: indentation, too long line, end-of-line spaces
> + }
> +
> + inline CHARSET_INFO *
> + mysqld_collation_get_by_name(const char *name,
> + CHARSET_INFO *name_cs= system_charset_info)
> + {
> + name=this->get_collation_or_charset_name(name);
> + return ::mysqld_collation_get_by_name(name, name_cs);
> +}
> +
> +CHARSET_INFO *get_charset_by_name(const char *cs_name, myf flags)
> +{
> +
> + cs_name=this->get_collation_or_charset_name(cs_name);
> + return ::get_charset_by_name(cs_name,flags);
here it'd be better to pass MY_UTF8_IS_UTF8MB3 in flags
> +}
> +my_bool resolve_charset(const char *cs_name,
> + CHARSET_INFO *default_cs,
> + CHARSET_INFO **cs)
> +{
> + cs_name=this->get_collation_or_charset_name(cs_name);
> + return ::resolve_charset(cs_name,default_cs,cs);
perhaps it'd make sense to introduce flags argument here too?
> +}
> +
> +my_bool resolve_collation(const char *cl_name,
> + CHARSET_INFO *default_cl,
> + CHARSET_INFO **cl)
> +{
> + cl_name= this->get_collation_or_charset_name(cl_name);
> + return ::resolve_collation(cl_name,default_cl,cl);
> +}
please make sure the indentation is correct, I won't comment on it anymore
> +
> };
>
>
> diff --git a/sql/sql_db.cc b/sql/sql_db.cc
> index 9bf16220535..85e0e0dbd2f 100644
> --- a/sql/sql_db.cc
> +++ b/sql/sql_db.cc
> @@ -583,9 +583,13 @@ bool load_db_opt(THD *thd, const char *path, Schema_specification_st *create)
> default-collation commands.
> */
> if (!(create->default_table_charset=
> - get_charset_by_csname(pos+1, MY_CS_PRIMARY, MYF(0))) &&
> + thd->get_charset_by_csname(pos+1, thd->variables.old_behavior &
> + OLD_MODE_UTF8_IS_UTF8MB3 ?
> + MY_CS_UTF8_IS_UTF8MB3 | MY_CS_PRIMARY :
> + MY_CS_PRIMARY,
> + MYF(0))) &&
that's a bit redundant, don't you think?
you use both thd-> method that replaces utf_ with a real charset name
and *also* you set MY_CS_UTF8_IS_UTF8MB3 flag.
there are other places like that, to keep the review shorter I won't comment
on them below.
> !(create->default_table_charset=
> - get_charset_by_name(pos+1, MYF(0))))
> + thd->get_charset_by_name(pos+1, MYF(0))))
> {
> sql_print_error("Error while loading database options: '%s':",path);
> sql_print_error(ER_THD(thd, ER_UNKNOWN_CHARACTER_SET),pos+1);
> diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
> index a75066271d9..fd480bdced2 100644
> --- a/sql/sys_vars.cc
> +++ b/sql/sys_vars.cc
> @@ -3705,7 +3709,7 @@ static Sys_var_set Sys_old_behavior(
> "old_mode",
> "Used to emulate old behavior from earlier MariaDB or MySQL versions",
> SESSION_VAR(old_behavior), CMD_LINE(REQUIRED_ARG),
> - old_mode_names, DEFAULT(0));
> + old_mode_names, DEFAULT(8));
DEFAULT(OLD_MODE_UTF8_IS_UTF8MB3) please
>
> #if defined(HAVE_OPENSSL) && !defined(EMBEDDED_LIBRARY)
> #define SSL_OPT(X) CMD_LINE(REQUIRED_ARG,X)
> diff --git a/storage/connect/ha_connect.cc b/storage/connect/ha_connect.cc
> index 849cf7594eb..b6e8f3b3226 100644
> --- a/storage/connect/ha_connect.cc
> +++ b/storage/connect/ha_connect.cc
> @@ -6099,8 +6099,8 @@ static int connect_assisted_discovery(handlerton *, THD* thd,
> case FLD_NAME:
> if (ttp == TAB_PRX ||
> (ttp == TAB_CSV && topt->data_charset &&
> - (!stricmp(topt->data_charset, "UTF8") ||
> - !stricmp(topt->data_charset, "UTF-8"))))
> + (!stricmp(topt->data_charset, "UTF8MB3") ||
> + !stricmp(topt->data_charset, "UTF-8MB3"))))
UTF-8MB3 again
> cnm= crp->Kdata->GetCharValue(i);
> else
> cnm= encode(g, crp->Kdata->GetCharValue(i));
> diff --git a/storage/mroonga/vendor/groonga/CMakeLists.txt b/storage/mroonga/vendor/groonga/CMakeLists.txt
> index d271d4c4eb9..fa5d6f97ddd 100644
> --- a/storage/mroonga/vendor/groonga/CMakeLists.txt
> +++ b/storage/mroonga/vendor/groonga/CMakeLists.txt
> @@ -95,7 +95,7 @@ set(GRN_LOG_PATH
> "${CMAKE_INSTALL_PREFIX}/var/log/${GRN_PROJECT_NAME}/${GRN_PROJECT_NAME}.log"
> CACHE FILEPATH "log file path")
> set(GRN_DEFAULT_ENCODING
> - "utf8"
> + "utf8mb3"
utf8 is better here
> CACHE STRING "Groonga's default encoding")
> set(GRN_DEFAULT_MATCH_ESCALATION_THRESHOLD
> 0
> diff --git a/storage/perfschema/unittest/pfs_connect_attr-t.cc b/storage/perfschema/unittest/pfs_connect_attr-t.cc
> index b57ead3ec26..7e5b511d543 100644
> --- a/storage/perfschema/unittest/pfs_connect_attr-t.cc
> +++ b/storage/perfschema/unittest/pfs_connect_attr-t.cc
> @@ -343,7 +343,12 @@ int main(int, char **)
> {
> MY_INIT("pfs_connect_attr-t");
>
> - cs_cp1251= get_charset_by_csname("cp1251", MY_CS_PRIMARY, MYF(0));
> + cs_cp1251= thd->get_charset_by_csname("cp1251",
> + thd->variables.old_behavior &
> + OLD_MODE_UTF8_IS_UTF8M3 ?
> + MY_CS_PRIMARY | MY_CS_UTF8_IS_UTF8MB3 :
well, here there is surely no reason to use MY_CS_UTF8_IS_UTF8MB3,
because the charset is a hard-coded literal and it's not "utf8", is it?
> + MY_CS_PRIMARY,
> + MYF(0));
> if (!cs_cp1251)
> diag("skipping the cp1251 tests : missing character set");
> plan(59 + (cs_cp1251 ? 10 : 0));
> diff --git a/strings/ctype.c b/strings/ctype.c
> index 46951c3ae1f..9054cbfc3e6 100644
> --- a/strings/ctype.c
> +++ b/strings/ctype.c
> @@ -37,7 +37,7 @@
> */
>
> const char charset_name_latin2[]= "latin2";
> -const char charset_name_utf8[]= "utf8";
> +const char charset_name_utf8[]= "utf8mb3";
better rename the variable to charset_name_utf8mb3
> const char charset_name_utf16[]= "utf16";
> const char charset_name_utf32[]= "utf32";
> const char charset_name_ucs2[]= "ucs2";
> diff --git a/tests/mysql_client_test.c b/tests/mysql_client_test.c
> index 0043786d477..37da9e1f62e 100644
> --- a/tests/mysql_client_test.c
> +++ b/tests/mysql_client_test.c
> @@ -405,7 +405,7 @@ static void test_prepare_simple()
>
> /* update */
> strmov(query, "UPDATE test_prepare_simple SET id=? "
> - "WHERE id=? AND CONVERT(name USING utf8)= ?");
> + "WHERE id=? AND CONVERT(name USING utf8mb3)= ?");
I suspect all tests should better use "utf8"
> stmt= mysql_simple_prepare(mysql, query);
> check_stmt(stmt);
>
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Hi Varun,
I'm still looking at MDEV-8306 code. Sending the input I have so far. There's
nothing important but I thought I'd share it now so you can address it while
I'm continuing with the review.
> propagate_equal_field_for_orderby
Please use "order_by" as in all other functions, and also change "field" to
"fields". The name becomes propagate_equal_fields_for_order_by.
> void JOIN::propagate_equal_field_for_orderby()
> {
> if (!sort_nest_possible)
> return;
> ORDER *ord;
> for (ord= order; ord; ord= ord->next)
> {
> if (optimizer_flag(thd, OPTIMIZER_SWITCH_ORDERBY_EQ_PROP) && cond_equal)
> {
Is there any reason not to check optimizer_switch flag value once at the start
of the function?
> void JOIN_TAB::find_keys_that_can_achieve_ordering()
> {
> if (!join->sort_nest_possible)
> return;
>
> table->keys_with_ordering.clear_all();
> for (uint index= 0; index < table->s->keys; index++)
> {
> if (table->keys_in_use_for_query.is_set(index) &&
> test_if_order_by_key(join, join->order, table, index))
> table->keys_with_ordering.set_bit(index);
> }
> table->keys_with_ordering.intersect(table->keys_in_use_for_order_by);
> }
Is there any reason to call test_if_order_by_key() and then interset with
keys_in_use_for_order_by? Why not just check that bitmap first, like it's done
for keys_with_ordering?
...
in struct JOIN_TAB:
> void find_keys_that_can_achieve_ordering();
> bool check_if_index_satisfies_ordering(int index_used);
are "achieving ordering" and "satisfying ordering" the same thing? If yes, it's
better to use one term for this.
> Item* Item_subselect::transform(THD *thd, Item_transformer transformer,
> bool transform_subquery, uchar *arg)
We've already discussed this, but I'll mention it here to keep track:
this should ON expressions.
I would also consider adding an assert that the subquery's joins do not have
the query plans, yet.
> class Field {
> ...
> void statistics_available_via_keys();
> void statistics_available_via_stat_tables();
These names need to be better.
Function names typically start with a verb. This is especially true for
functions that return nothing.
> bool Item_func_between::predicate_selectivity_checker(void *arg)
> bool Item_func_in::predicate_selectivity_checker(void *arg)
These functions need a comment.
> bool Item_equal::predicate_selectivity_checker(void *arg)
> {
> ...
> while (it++)
> {
> Field *field= it.get_curr_field();
> field->is_range_statistics_available();
What is the above? Does the function have some side-effect? This needs to be
fixed or at least commented.
...
later in the same function:
> it.rewind();
> Item *item= (it++);
> SAME_FIELD *same_field_arg= (SAME_FIELD *) arg;
>
> if (same_field_arg->item == NULL)
> {
> item->is_resolved_by_same_column(arg);
> return false;
> }
This looks very confusing, too. Why are we checking the first element in the
Item_equal. Are we assuming it is the constant? But the execution reaches
this point even when Item_equal doesn't contain a constant?
> bool Item_equal::is_statistics_available()
Maybe, rename this to is_range_statistics_available() to make it clear with
what kind of statistics is it?
> void add_nest_tables_to_trace(Mat_join_tab_nest_info* nest_info)
rename this to trace_XXX() to be uniform with other tracing functions?
e.g. rename to trace_tables_in_sort_nest
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
0
Hi Alexey,
Please check out this patch:
http://lists.askmonty.org/pipermail/commits/2021-March/014492.html
Any objections to it?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
0
01 Mar '21
Hi Alexey,
I was looking at Json_table_nested_path::set_position(), wondering why does
it have an assignment
np->m_null= TRUE;
but doesn't clear the NULL values and trying to come up with an example of this
going wrong when I've hit this crash:
select * from
json_table(
'[
{"name": "X",
"colors":["blue"], "sizes": [1,2,3,4], "prices" : [10,20]},
{"name": "Y",
"colors":["red"], "sizes": [10,11], "prices" : [100,200,300]}
]',
'$[*]' columns
(
seq0 for ordinality,
name varchar(4) path '$.name',
nested path '$.colors[*]' columns (
seq1 for ordinality,
color text path '$'
),
nested path '$.sizes[*]' columns (
seq2 for ordinality,
size int path '$'
),
nested path '$.prices[*]' columns (
seq3 for ordinality,
price int path '$'
)
)
) as T order by seq0, name;
Note this==NULL:
(gdb) wher
#0 0x00005555560edf72 in Json_table_nested_path::set_position (this=0x0, j_start=0x7ffeb0016e68 "[ \n {\"name\": \"X\", \n \"colors\":[\"blue\"], \"sizes\": [1,2,3,4], \"prices\" : [10,20]},\n {\"name\": \"Y\", \n \"colors\":[\"red\"], \"sizes\": [10,11], \"prices\" : [100,200,300]}\n]", j_end=0x7ffeb0016f12 "", pos=0x7ffeb0035e51 "\245\245\245\245\245\245\245\245\006") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:239
#1 0x00005555560ee12f in Json_table_nested_path::set_position (this=0x7ffeb0017060, j_start=0x7ffeb0016e68 "[ \n {\"name\": \"X\", \n \"colors\":[\"blue\"], \"sizes\": [1,2,3,4], \"prices\" : [10,20]},\n {\"name\": \"Y\", \n \"colors\":[\"red\"], \"sizes\": [10,11], \"prices\" : [100,200,300]}\n]", j_end=0x7ffeb0016f12 "", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:262
#2 0x00005555560ee9f0 in ha_json_table::rnd_pos (this=0x7ffeb0014f00, buf=0x7ffeb0025570 "\377", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:434
#3 0x00005555561ca6a4 in handler::ha_rnd_pos (this=0x7ffeb0014f00, buf=0x7ffeb0025570 "\377", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/handler.cc:3101
#4 0x00005555563852e3 in rr_from_pointers (info=0x7ffeb001f9e0) at /home/psergey/dev-git2/10.6-hf-review6/sql/records.cc:615
#5 0x0000555555da4a75 in READ_RECORD::read_record (this=0x7ffeb001f9e0) at /home/psergey/dev-git2/10.6-hf-review6/sql/records.h:81
#6 0x0000555555ee1876 in join_init_read_record (tab=0x7ffeb001f918) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:21644
#7 0x0000555555edf35a in sub_select (join=0x7ffeb001d948, join_tab=0x7ffeb001f918, end_of_records=false) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:20666
#8 0x0000555555ede8e6 in do_select (join=0x7ffeb001d948, procedure=0x0) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:20216
#9 0x0000555555eb24e7 in JOIN::exec_inner (this=0x7ffeb001d948) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:4484
#10 0x0000555555eb1613 in JOIN::exec (this=0x7ffeb001d948) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:4264
Please fix.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
2
Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 26 Feb '21
by Sergei Golubchik 26 Feb '21
26 Feb '21
Hi, Sujatha!
The main comment - the logic is so complex, I wasn't able to understand
it, unfortunately.
I've reviewed almost everything, see comments below.
But not the Recovery_context methods. Please explain how it works and
how all these truncate_validated, truncate_reset_done,
truncate_set_in_1st, etc all work together.
On Feb 26, Sujatha wrote:
> revision-id: ee538938345 (mariadb-10.3.26-68-gee538938345)
> parent(s): 7d04ce6a2d4
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2021-02-08 17:58:03 +0200
> message:
>
> MDEV-21117: refine the server binlog-based recovery for semisync
>
this should be MDEV subject. But if you like this better
you can rename the MDEV instead
>
> Problem:
> =======
> When the semisync master is crashed and restarted as slave it could
> recover transactions that former slaves may never have seen.
> A known method existed to clear out all prepared transactions
> with --tc-heuristic-recover=rollback does not care to adjust
> binlog accordingly.
>
> Fix:
> ===
> The binlog-based recovery is made to concern of the slave semisync role of
> post-crash restarted server.
> No changes in behaviour is done to the "normal" binloggging server
> and the semisync master.
>
> When the restarted server is configured with
> --rpl-semi-sync-slave-enabled=1
> the refined recovery attempts to roll back prepared transactions
> and truncate binlog accordingly.
> In case of a partically committed (that is committed at least
>
partially
>
> in one of the engine participants) such transaction gets committed.
> It's guaranteed no (partially as well) committed transactions
> exist beyond the truncate position.
> In case there exists a non-transactional replication event
> (being in a way a committed transaction) past the
> computed truncate position the recovery ends with an error.
>
> To facilite the failover on the slave side
facilitate
> conditions to accept own events (having been discarded by the above recovery)
> are relaxed to let so for the semisync slave that connects to master
> in gtid mode. gtid_strict_mode is further recommended to secure
> from inadvertent re-applying out of order gtids in general.
> Non-gtid mode connected semisync slave would require
> --replicate-same-server-id (mind --log-slave-updates must be OFF then).
Sorry, I failed to understand this paragraph :(
>
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.test b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> new file mode 100644
> index 00000000000..cf89525dcac
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> @@ -0,0 +1,76 @@
> +# ==== Purpose ====
> +#
> +# Test verifies the truncation of single binary log file.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Create table t1 and insert/commit a row.
> +# 1 - Insert an another row such that it gets written to binlog but commit
> +# in engine fails as server crashed at this point.
> +# 2 - Restart server with --rpl-semi-sync-slave-enabled=1
> +# 3 - Upon server start 'master-bin.000001' will be truncated to contain
> +# only the first insert
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_aria.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_binlog_format_statement.inc
have_binlog_format_statement obviously implies have_log_bin.
it's redundant to specify it explicitly.
and you don't need have_debug here
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +# The following cases are tested:
> +# A. 2pc transaction is followed by a blank "zero-engines" one
> +# B. 2pc transaction follows the blank one
> +# C. Similarly to A, with the XA blank transaction
> +
> +--connection default
it's connection default by default (not surprisingly) :)
> +RESET MASTER;
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +CREATE TABLE t2 ( f INT ) ENGINE=INNODB;
> +CREATE TABLE tm ( f INT ) ENGINE=Aria;
Why Aria?
I mean, the default "standard" non-transactional engine is MyISAM.
If you use Aria, it means you thought about it and decided to deviate
intentionally from the default. What were your considerations?
> +
> +--echo # Case A.
> +# Both are doomed into truncation.
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-6
why not simply --let SEARCH_PATTERN = Successfully truncated...etc...
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */
> +--source binlog_truncate_active_log.inc
> +
> +--echo # Case B.
> +# The inverted sequence ends up to truncate only $query2
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10
> +--let $query1 = DELETE FROM t2 WHERE f = 0
why not `= 666 /* no such record */` ?
is `= 0` important here?
> +--let $query2 = INSERT INTO t VALUES (20)
> +--source binlog_truncate_active_log.inc
> +
> +
> +delimiter |;
> +CREATE PROCEDURE sp_blank_xa()
> +BEGIN
> + XA START 'blank';
> + DELETE FROM t2 WHERE f = 666 /* no such record */;
> + XA END 'blank';
> + XA PREPARE 'blank';
> +END|
> +delimiter ;|
> +
> +
> +--echo # Case C.
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-13
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $pre_q2 = CALL sp_blank_xa
> +--let $query2 = XA COMMIT 'blank'
> +--source binlog_truncate_active_log.inc
> +DROP PROCEDURE sp_blank_xa;
> +
> +--echo # Cleanup
> +DROP TABLE t,t2,tm;
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> new file mode 100644
> index 00000000000..b1ffaf18268
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> @@ -0,0 +1,70 @@
you need
source include/have_debug_sync.inc
here, as this file uses DEBUG_SYNC
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
just FYI (no need to change anything),
you know that parentheses and extra commas are optional, right?
that is, it could be written as
connect master1,localhost,root;
connect master2,localhost,root;
connect master3,localhost,root;
> +
> +--connection default
> +
> +# First to commit few transactions
> +INSERT INTO t VALUES (10);
> +INSERT INTO tm VALUES (10);
> +
> +--connection master1
> +# Hold insert after write to binlog and before "run_commit_ordered" in engine
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master1_ready WAIT_FOR signal_never_arrives";
> +--send_eval $query1
> +
> +--connection master2
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +if ($pre_q2)
is $pre_q2 a flag or a query string?
it's used as a flag, it's set as a query string.
> +{
> + CALL sp_blank_xa;
> +}
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master2_ready";
> +# To binlog non-xid transactional group which will be truncated all right
> +--send_eval $query2
> +
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SELECT @@global.gtid_binlog_pos as 'Before the crash';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
you know that kill_mysqld both writes 'wait' to the expect file
and includes wait_until_disconnected ?
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --rpl-semi-sync-slave-enabled=1
> +EOF
why "append_file" and not "write_file" ?
and why wait, you could've written restart right away, couldn't you?
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for a successful truncate message.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
isn't it always $MYSQLTEST_VARDIR/log/mysqld.1.err?
other tests don't do it conditionally
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
if you're searching in the error log - and you are -
it's best not to set the SEARCH_RANGE at all. Then it'll search
from the last CURRENT_TEST: label. That is, it'll search in all
error log entries generated by the current test and not in the
error log that came from earlier tests.
> +--let SEARCH_PATTERN=$this_search_pattern
> +--replace_regex /FOUND [0-9]+/FOUND #/
can it be found multiple times? Why would binlog be truncated more than once?
> +--source include/search_pattern_in_file.inc
> +
> +SELECT @@global.gtid_binlog_pos as 'After the crash';
> +--echo "One row should be present in table 't'"
> +SELECT COUNT(*) as 'One' FROM t;
Do you know which one? If yes, why not `SELECT * FROM t` ?
> +
> +# Local cleanup
> +DELETE FROM t;
> +--disconnect master1
> +--disconnect master2
> +--disconnect master3
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> new file mode 100644
> index 00000000000..fe153e5703c
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> @@ -0,0 +1,78 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs.
and "with multiple transactional storage engines" ?
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Create two tables in innodb and rocksdb engines,
> +#
> +# In loop for A,B,C cases (below) do 1-5:
there's no loop now (which is good, don't add it please :)
> +# 1 - execute FLUSH LOGS command to generate a new binary log.
> +# Start a transaction inserting rows of sufficient sizes
> +# so binary log gets rotated at commit
> +# 2 - Using debug simulation make the server crash at a point where
> +# the transaction is written to binary log *and* either of
> +# A. neither of them commits
> +# B. only one commits
> +# C. both commit
> +# 3 - print the # of binlog files before the transaction starts and after its
> +# commit is submitted
> +# 4 - Restart server with --tc-heuristic-recover=BINLOG_TRUNCATE
outdated comment?
> +# 5 - Restart normally to print post recovery status.
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_rocksdb.inc
> +--source include/have_log_bin.inc
this is redundant
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +--let $MYSQLD_DATADIR= `SELECT @@datadir`
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE t2 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=rocksdb;
> +
> +--let $case = A: neither engine committed => rollback & binlog truncate
> +# Hold off engine commits after write to binlog and its rotation.
> +# The transaction is killed along with the server after that.
> +--let $kill_server=1
> +--let $debug_sync_action = "commit_after_release_LOCK_log SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 1 row should be present in both tables; binlog is truncated; number of binlogs at reconnect - 3
> + --source binlog_truncate_multi_engine.inc
> +--echo Proof of the truncated binlog file is readable (two transactions must be seen):
> +--let $MYSQLD_DATADIR = `select @@datadir`
no need to set the $MYSQLD_DATADIR twice
> +--exec $MYSQL_BINLOG --short-form --skip-annotate-row-events $MYSQLD_DATADIR/master-bin.000002
> +
> +--let $case = B: one engine has committed its transaction branch
> +# Hold off after one engine has committed.
> +--let $kill_server=1
> +--let $debug_sync_action = "commit_after_run_commit_ordered SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +--let $restart_simulate_partial_commit = 1
this variable seems to be unused
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1 --debug-dbug=d,binlog_truncate_partial_commit
this seems to be a rather crude way of faking a partially committed
transaction. better to crash after the first engine has committed,
that'd be much more natural.
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; one extra binlog file compare with A; number of binlogs at reconnect - 4
> + --source binlog_truncate_multi_engine.inc
> +
> +--let $case = C: both engines have committed its transaction branch
you didn't do --let $debug_sync_action = "reset", so the old value
is used. Intentional?
> +# Hold off after both engines have committed. The server is shut down.
> +--let $kill_server=0
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; the same # of binlog files as in B; number of binlogs at reconnect - 4
> + --source binlog_truncate_multi_engine.inc
> +
> +
> +
> +DROP TABLE t1, t2;
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
better use evalp instead of eval, then you won't need replace_result
but really I think you don't need to save/restore max_binlog_size at all,
because you restart the server anyway.
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> new file mode 100644
> index 00000000000..c260a7987e2
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> @@ -0,0 +1,64 @@
> +#
> +# Loop body of binlog_truncate_multi_engine.test
> +# Parameters:
> +# $debug_sync_action describes debug-sync actions
> +# $kill_server 1 when to crash, 0 for regular restart
> +# $restart_parameters the caller may simulate partial commit at recovery
> +# $test_outcome summary of extected results
> +# $MYSQLD_DATADIR
> +
> +--echo #
> +--echo #
> +--echo # Case $case
> +--echo #
> +RESET MASTER;
> +FLUSH LOGS;
> +SET GLOBAL max_binlog_size= 4096;
> +
> +connect(con1,localhost,root,,);
> +#--connection con1
> +--echo List of binary logs before rotation
> +--source include/show_binary_logs.inc
> +INSERT INTO t1 VALUES (1, REPEAT("x", 1));
> +INSERT INTO t2 VALUES (1, REPEAT("x", 1));
> +BEGIN;
> + INSERT INTO t1 VALUES (2, REPEAT("x", 4100));
> + INSERT INTO t2 VALUES (2, REPEAT("x", 4100));
> +
> +--eval SET DEBUG_SYNC= $debug_sync_action
> +send COMMIT;
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +--echo List of binary logs after rotation
> +--source include/show_binary_logs.inc
> +
> +--echo # restart the server with $restart_parameters
> +if ($kill_server)
> +{
> + --echo # the server is crashed
> + --source include/kill_mysqld.inc
> + --source include/start_mysqld.inc
> +}
> +if (!$kill_server)
> +{
> + --echo # the server is restarted
> + --source include/restart_mysqld.inc
it'd be simpler not to use $kill_server at all. And instead write
let $shutdown_timeout=0;
instead of $kill_server=1;
and
let $shutdown_timeout=;
instead of $kill_server=0;
and above you won't need two if's, you can just do
source include/restart_mysqld.inc;
> +}
> +
> +--connection default
> +--echo #
> +--echo # *** Summary: $test_outcome:
> +--echo #
> +SELECT COUNT(*) FROM t1;
> +SELECT COUNT(*) FROM t2;
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +--echo List of binary logs at the end of the tests
> +--source include/show_binary_logs.inc
> +--echo # ***
> +# cleanup
> +DELETE FROM t1;
> +DELETE FROM t2;
> +--disconnect con1
> +--echo #
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> new file mode 100644
> index 00000000000..231e90dbdc9
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> @@ -0,0 +1,105 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 1 - Set max_binlog_size= 4096, to help a series of inserts into a
> +# transaction table 'ti' get binlog rotated so many time while the
> +# transactions won't be committed, being stopped at
> +# a prior to commit debug_sync point
> +# 2 - kill and restart the server as semisync slave successfully to
> +# end with an expected first binlog and the gtid state.
> +#
> +# ==== References ====
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +SET @@global.max_binlog_size= 4096;
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=MyISAM;
> +
> +connect(master1,localhost,root,,);
> +--echo "List of binary logs before rotation"
> +--source include/show_binary_logs.inc
> +
> +# Some load to either non- and transactional egines
> +# that should not affect the following recovery:
> +INSERT INTO ti VALUES(1,"I am gonna survive");
> +INSERT INTO tm VALUES(1,"me too!");
> +
> +# hold on near engine commit
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR con1_go";
> +--send_eval INSERT INTO ti VALUES (2, REPEAT("x", 4100))
> +
> +connect(master2,localhost,root,,);
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES (3, "not gonna survive")
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +--echo "List of binary logs before crash"
> +--source include/show_binary_logs.inc
> +--echo # The gtid binlog state prior the crash will be truncated at the end of the test
> +SELECT @@global.gtid_binlog_state;
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --rpl-semi-sync-slave-enabled=1
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for a successful truncate message.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=truncated binlog file:.*master.*000002
> +--replace_regex /FOUND [0-9]+/FOUND #/
> +--source include/search_pattern_in_file.inc
> +
> +
> +--echo "One record should be present in table"
> +SELECT count(*) FROM ti;
> +
> +--echo # The truncated gtid binlog state
> +SELECT @@global.gtid_binlog_state;
> +SELECT @@global.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
> +DROP TABLE ti;
> +
many of my comments above apply to this test too
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> new file mode 100644
> index 00000000000..6aba2935fde
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> @@ -0,0 +1,142 @@
> +# ==== Purpose ====
> +# The test verifies attempt to recover by the semisync slave server whose
> +# binlog is unsafe for truncation.
> +#
> +# ==== Implementation ====
> +# 2 binlog files are created with the 1st one destined to be the binlog
> +# checkpoint file for recovery.
> +# The final group of events is replication unsafe (myisam INSERT).
> +# Therefore the semisync slave recovery may not.
> +#
> +# Steps:
> +# 0 - Set max_binlog_size= 4096, to help an insert into a
> +# transaction table 'ti' get binlog rotated while the
> +# transaction won't be committed, being stopped at
> +# a prior to commit debug_sync point
> +# 1 - insert into a non-transactional 'tm' table completes with
> +# binary logging as well
> +# 2 - kill and attempt to restart the server as semisync slave that
> +# must produce an expected unsafe-to-recover error
> +# 3 - complete the test with a normal restart that successfully finds and
> +# commits the transaction in doubt.
> +#
> +# ==== References ====
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +SET @@global.max_binlog_size= 4096;
> +
> +call mtr.add_suppression("Table '.*tm' is marked as crashed and should be repaired");
> +call mtr.add_suppression("Got an error from unknown thread");
> +call mtr.add_suppression("Checking table: '.*tm'");
> +call mtr.add_suppression("Recovering table: '.*tm'");
> +call mtr.add_suppression("Cannot trim the binary log to file");
> +call mtr.add_suppression("Crash recovery failed");
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("Found 1 prepared transactions");
> +call mtr.add_suppression("mysqld: Table.*tm.*is marked as crashed");
> +call mtr.add_suppression("Checking table.*tm");
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (f INT) ENGINE=MYISAM;
> +
> +--let $row_count = 5
> +--let $i = 3
> +--disable_query_log
> +while ($i)
> +{
> + --eval INSERT INTO ti VALUES ($i, REPEAT("x", 1))
> +--dec $i
> +}
> +--enable_query_log
> +INSERT INTO tm VALUES(1);
> +
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
> +
> +--connection master1
> +
> +# The 1st trx binlogs, rotate binlog and hold on before committing at engine
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR master1_go";
you don't wait for master1_ready anywhere. intentional?
> +--send_eval INSERT INTO ti VALUES ($row_count - 1, REPEAT("x", 4100))
if you plan to insert rows 1,2,3,4,5 then your $i=3 and $row_count=5
are not independent, you should use $i=$row_count-2.
(probably $i=`select $row_count-2`)
or don't pretent this test is tunable and use literals instead of variables.
> +
> +--connection master2
> +
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES ($row_count, REPEAT("x", 1))
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master3_ready";
> +--send INSERT INTO tm VALUES (2)
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master3_ready";
> +--echo # The gtid binlog state prior the crash must be restored at the end of the testSELECT @@global.gtid_binlog_state;
eh? Was this SELECT supposed to be on a separate line?
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +SELECT @@global.gtid_binlog_state;
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restarts
> +#
> +--echo # Failed restart as the semisync slave
> +--error 1
> +--exec $MYSQLD_LAST_CMD --rpl-semi-sync-slave-enabled=1 >> $MYSQLTEST_VARDIR/log/mysqld.1.err 2>&1
> +
> +--echo # Normal restart
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart:
Okay. *this* is a reasonable use of wait/restart in the expect file.
In other cases, I think, you can just restart, no need to wait,
as you don't do anything while the server is down.
still, here you don't need to write 'wait' to the expect file
and don't need to wait_until_disconnected (kill_mysqld is doing both)
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for correct messages.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=Cannot trim the binary log to file
> +--replace_regex /FOUND [0-9]+/FOUND #/
> +--source include/search_pattern_in_file.inc
> +
> +--echo # Proof that the in-doubt transactions are recovered by the 2nd normal server restart
> +--eval SELECT COUNT(*) = $row_count as 'True' FROM ti
> +# myisam table may require repair (which is not tested here)
> +--disable_warnings
> +SELECT COUNT(*) <= 1 FROM tm;
> +--enable_warnings
> +
> +--echo # The gtid binlog state prior the crash is restored now
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
see comments elsewhere about $old_max_binlog_size, -50000, $log_error_, etc
> +DROP TABLE ti, tm;
> +--echo End of test
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> new file mode 100644
> index 00000000000..3518eb95b67
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> @@ -0,0 +1,13 @@
> +!include suite/rpl/rpl_1slave_base.cnf
> +!include include/default_client.cnf
> +
> +
> +[mysqld.1]
> +log-slave-updates
> +gtid-strict-mode=1
> +max_binlog_size= 4096
> +
> +[mysqld.2]
> +log-slave-updates
> +gtid-strict-mode=1
> +max_binlog_size= 4096
why not an .opt file?
why max_binlog_size here and not at run-time like in other tests?
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> new file mode 100644
> index 00000000000..972aaf2c8b4
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> @@ -0,0 +1,144 @@
> +# ==== Purpose ====
> +#
> +# Test verifies replication failover scenario.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Having two servers 1 and 2 enable semi-sync replication with
> +# with the master wait 'after_sync'.
> +# 1 - Insert a row. While inserting second row simulate
> +# a server crash at once the transaction is written to binlog, flushed
> +# and synced but the binlog position is not updated.
> +# 2 - Post crash-recovery on the old master execute there CHANGE MASTER
> +# TO command to connect to server id 2.
> +# 3 - The old master new slave server 1 must connect to the new
> +# master server 2.
> +# 4 - repeat the above to crash the new master and restore in role the old one
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +--let $rpl_topology=1->2
> +--source include/rpl_init.inc
isn't this a normal master-slave.inc topology?
> +
> +--connection server_2
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +RESET MASTER;
> +
> +--connection server_2
> +RESET MASTER;
> +set @@global.rpl_semi_sync_slave_enabled = 1;
> +set @@global.gtid_slave_pos = "";
> +CHANGE MASTER TO master_use_gtid= slave_pos;
> +--source include/start_slave.inc
> +
> +
> +--connection server_1
> +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
why?
> +set @@global.rpl_semi_sync_master_enabled = 1;
> +set @@global.rpl_semi_sync_master_wait_point=AFTER_SYNC;
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("1 client is using or hasn.t closed the table properly");
> +call mtr.add_suppression("Table './mtr/test_suppressions' is marked as crashed and should be repaired");
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +INSERT INTO t1 VALUES (1, 'dummy1');
> +
> +#
> +# CRASH the original master, and FAILOVER to the new
> +#
> +
> +# value 1 for server id 1 -> 2 failover
> +--let $failover_to_slave=1
> +--let $query_to_crash= INSERT INTO t1 VALUES (2, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*master.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_2
> +--let $rows_so_far=3
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'dummy3')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_1
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +#
> +# CRASH the new master and FAILOVER back to the original
> +#
> +
> +# value 0 for the reverse server id 2 -> 1 failover
> +--let $failover_to_slave=0
> +--let $query_to_crash = INSERT INTO t1 VALUES (4, REPEAT("x", 4100))
> +--let $query2_to_crash= INSERT INTO t1 VALUES (5, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*slave.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_1
> +--let $rows_so_far=6
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'Done')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +
> +--let $diff_tables=server_1:t1, server_2:t1
> +--source include/diff_tables.inc
> +
> +#
> +--echo # Cleanup
> +#
> +--connection server_1
> +DROP TABLE t1;
> +--save_master_pos
> +
> +--connection server_2
> +--sync_with_master
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +RESET SLAVE;
> +RESET MASTER;
> +
> +--connection server_2
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +
> +--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1
> +eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=no;
evalp
> +--source include/start_slave.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc b/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc
> new file mode 100644
> index 00000000000..4289df9155f
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc
> @@ -0,0 +1,87 @@
> +if ($failover_to_slave)
> +{
> + --let $server_to_crash=1
> + --let $server_to_promote=2
> + --let $new_master_port=$SERVER_MYPORT_2
> + --let $client_port=$SERVER_MYPORT_1
> +
> + --connect (conn_client,127.0.0.1,root,,test,$SERVER_MYPORT_1,)
> +}
> +if (!$failover_to_slave)
> +{
> + --let $server_to_crash=2
> + --let $server_to_promote=1
> + --let $new_master_port=$SERVER_MYPORT_1
> + --let $client_port=$SERVER_MYPORT_2
> +
> + --connect (conn_client,127.0.0.1,root,,test,$SERVER_MYPORT_2,)
> +}
> +
> +
> +# Hold insert after write to binlog and before "run_commit_ordered" in engine
> +
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL con1_ready WAIT_FOR con1_go";
> +--send_eval $query_to_crash
> +
> +# complicate recovery with an extra binlog file
> +if (!$failover_to_slave)
> +{
> + --connect (conn_client_2,127.0.0.1,root,,test,$SERVER_MYPORT_2,)
> + # use the same signal with $query_to_crash
> + SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> + SET DEBUG_SYNC= "commit_after_release_LOCK_lock SIGNAL con1_ready WAIT_FOR con2_go";
both signal con1_ready? which one will you be waiting below then?
> + --send_eval $query2_to_crash
> +}
> +
> +--connection server_$server_to_crash
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +--connection server_$server_to_promote
> +--error 2003
> +--source include/stop_slave.inc
> +SELECT @@GLOBAL.gtid_current_pos;
> +
> +--let $_expect_file_name=$MYSQLTEST_VARDIR/tmp/mysqld.$server_to_crash.expect
wasn't $_expect_file_name already set above?
> +--let $restart_parameters=--rpl-semi-sync-slave-enabled=1
> +--let $allow_rpl_inited=1
> +--source include/start_mysqld.inc
> +
> +--connection server_$server_to_crash
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for correct messages.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.$server_to_crash.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=$log_search_pattern
> +--source include/search_pattern_in_file.inc
> +
> +--disconnect conn_client
> +
> +#
> +# FAIL OVER now to new master
> +#
> +--connection server_$server_to_promote
> +set global rpl_semi_sync_master_enabled = 1;
> +set global rpl_semi_sync_master_wait_point=AFTER_SYNC;
> +
> +--connection server_$server_to_crash
> +--let $master_port=$SERVER_MYPORT_2
> +if (`select $server_to_crash = 2`)
> +{
> + --let $master_port=$SERVER_MYPORT_1
> +}
> +--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1 $SERVER_MYPORT_2 SERVER_MYPORT_2
> +eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$new_master_port, master_user='root', master_use_gtid=SLAVE_POS;
> +set global rpl_semi_sync_slave_enabled = 1;
> +set @@global.gtid_slave_pos=@@global.gtid_binlog_pos;
> +--source include/start_slave.inc
see comments about -50000, $log_error_, wait_until_disconnected, evalp, etc
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 140394fefc1..87fa88d4c89 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -4632,7 +4632,7 @@ class THD :public Statement,
> LF_PINS *tdc_hash_pins;
> LF_PINS *xid_hash_pins;
> bool fix_xid_hash_pins();
> -
> + bool is_1pc_ro_trans;
I don't like how it looks, it's in THD (for every single connection and more)
and it's only used in some exotic cases. It's set in two places around
ha_commit_one_phase(), when it would be better to set it inside
ha_commit_one_phase(), in one place only.
But I cannot suggest how to get rid of it yet as I don't understand what it's
for, I've asked a question about it below, in Gtid_log_event::Gtid_log_event
> /* Members related to temporary tables. */
> public:
> /* Opened table states. */
> diff --git a/sql/handler.h b/sql/handler.h
> index 2a346e8d9d1..bd39f46bf1f 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -873,6 +873,14 @@ typedef struct xid_t XID;
> /* The 'buf' has to have space for at least SQL_XIDSIZE bytes. */
> uint get_sql_xid(XID *xid, char *buf);
>
> +/* struct for heuristic binlog truncate recovery */
it's not "heuristic binlog truncate recovery" anymore,
please don't call it that, it's confusing
> +struct xid_recovery_member
> +{
> + my_xid xid;
> + uint in_engine_prepare; // number of engines that have xid prepared
> + bool decided_to_commit;
> +};
> +
> /* for recover() handlerton call */
> #define MIN_XID_LIST_SIZE 128
> #define MAX_XID_LIST_SIZE (1024*128)
> diff --git a/sql/slave.cc b/sql/slave.cc
> index 28e08e32346..9b3f73e5341 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -6944,7 +6945,9 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
> }
> else
> if ((s_id == global_system_variables.server_id &&
> - !mi->rli.replicate_same_server_id) ||
> + (!mi->rli.replicate_same_server_id &&
> + !(semisync_recovery= (rpl_semi_sync_slave_enabled &&
> + mi->using_gtid != Master_info::USE_GTID_NO)))) ||
How can "semisync recovery" code path reach queue_event()?
> event_that_should_be_ignored(buf) ||
> /*
> the following conjunction deals with IGNORE_SERVER_IDS, if set
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 8a342cb5cd3..1588ab85104 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -482,6 +482,16 @@ class String;
> */
> #define LOG_EVENT_IGNORABLE_F 0x80
>
> +/**
> + @def LOG_EVENT_ACCEPT_OWN_F
> +
> + Flag sets by the gtid-mode connected semisync slave for
> + the same server_id ("own") events which the slave must not have
> + in its state. Typically such events were never committed by
> + their originator (this server) and discared at its crash recovery
sorry, I failed to understand that
> +*/
> +#define LOG_EVENT_ACCEPT_OWN_F 0x4000
> +
> /**
> @def LOG_EVENT_SKIP_REPLICATION_F
>
> diff --git a/sql/log_event.cc b/sql/log_event.cc
> index e344fc8894f..a593db10d16 100644
> --- a/sql/log_event.cc
> +++ b/sql/log_event.cc
> @@ -7918,6 +7924,7 @@ Gtid_log_event::Gtid_log_event(const char *buf, uint event_len,
> domain_id= uint4korr(buf);
> buf+= 4;
> flags2= *buf;
> + ++buf;
I'd written flags2= *buf++; :)
> if (flags2 & FL_GROUP_COMMIT_ID)
> {
> if (event_len < (uint)header_size + GTID_HEADER_LEN + 2)
> @@ -7925,9 +7932,31 @@ Gtid_log_event::Gtid_log_event(const char *buf, uint event_len,
> seq_no= 0; // So is_valid() returns false
> return;
> }
> - ++buf;
> commit_id= uint8korr(buf);
> + buf+= 8;
> + }
> + /* the extra flags check and actions */
> + if (static_cast<uint>(buf - buf_0) < event_len)
> + {
> + flags_extra= *buf;
> + ++buf;
> + /* extra flags presence is identifed by non-zero byte value at this point */
"extra engines"
> + if (flags_extra & FL_EXTRA_MULTI_ENGINE)
> + {
safety, check that buf+4-buf_0 <= event_len
> + extra_engines= uint4korr(buf);
> + buf += 4;
four bytes for the number of engines participating in a transaction is very
generous... and optimistic. I'd say one byte is more than enough.
> +
> + DBUG_ASSERT(extra_engines > 0);
> + }
> }
> + /*
> + the strict '<' part of the assert corresponds to extra zero-padded
> + trailing bytes,
> + */
> + DBUG_ASSERT(static_cast<uint>(buf - buf_0) <= event_len);
> + /* and the last of them is tested. */
> + DBUG_ASSERT(static_cast<uint>(buf - buf_0) == event_len ||
> + buf_0[event_len - 1] == 0);
> }
>
>
> @@ -7958,6 +7990,22 @@ Gtid_log_event::Gtid_log_event(THD *thd_arg, uint64 seq_no_arg,
> /* Preserve any DDL or WAITED flag in the slave's binlog. */
> if (thd_arg->rgi_slave)
> flags2|= (thd_arg->rgi_slave->gtid_ev_flags2 & (FL_DDL|FL_WAITED));
> + /* count non-zero extra recoverable engines; total = extra + 1 */
> + if (is_transactional)
> + {
> + if (has_xid)
> + {
> + extra_engines=
> + max<uint>(1, ha_count_rw(thd, thd_arg->in_multi_stmt_transaction_mode()))
1. a bit confusing to use both thd and thd_arg in the same line
2. max() is strange. it means the result is 1, when ha_count_rw() is 0.
can ha_count_rw() here even be 0?
3. also it's somewhat an unfounded assumption that only r/w engines will
have the transaction prepared. But we can make it a fact, if we won't
prepare r/o engines at all (in ha_prepare).
> + - 1;
> + }
> + else if (unlikely(thd_arg->is_1pc_ro_trans))
> + {
> + extra_engines= UINT_MAX; // neither extra nor base engine
why is that?
> + }
> + if (extra_engines > 0)
> + flags_extra|= FL_EXTRA_MULTI_ENGINE;
> + }
> }
>
>
> diff --git a/sql/handler.cc b/sql/handler.cc
> index 6792e80b8fe..247ab7267bf 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1245,6 +1245,24 @@ int ha_prepare(THD *thd)
> DBUG_RETURN(error);
> }
>
> +/*
> + Returns counted number of
> + read-write recoverable transaction participants.
> +*/
> +uint ha_count_rw(THD *thd, bool all)
the name doesn't match the implementation, please, rename
> +{
> + unsigned rw_ha_count= 0;
> + THD_TRANS *trans=all ? &thd->transaction.all : &thd->transaction.stmt;
> +
> + for (Ha_trx_info * ha_info= trans->ha_list; ha_info;
> + ha_info= ha_info->next())
> + {
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> + ++rw_ha_count;
> + }
> + return rw_ha_count;
> +}
> +
> /**
> Check if we can skip the two-phase commit.
>
> @@ -1960,8 +1982,122 @@ struct xarecover_st
> XID *list;
> HASH *commit_list;
> bool dry_run;
> + MEM_ROOT *mem_root;
> + bool error;
> };
>
> +/**
> + Inserts a new hash member.
> +
> + returns a successfully created and inserted @c xid_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xid_recovery_member*
> +xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member *member= (xid_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid= xid_arg;
> + member->in_engine_prepare= 1;
> + member->decided_to_commit= false;
> +
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/*
> + Inserts a new or updates an existing hash member to increment
> + the member's prepare counter.
> +
> + returns false on success,
> + true otherwise.
> +*/
> +static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member* member;
> + if ((member= (xid_recovery_member *)
> + my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
> + member->in_engine_prepare++;
> + else
> + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root);
> +
> + return member == NULL;
> +}
> +
> +/*
> + Hash iterate function to complete with commit or rollback as decided
> + (typically at binlog recovery processing) in member->in_engine_prepare.
> +*/
> +static my_bool xarecover_do_commit_or_rollback(void *member_arg,
> + void *hton_arg)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + handlerton *hton= (handlerton*) hton_arg;
> + xid_t x;
> + my_bool rc;
> +
> + x.set(member->xid);
> + rc= member->decided_to_commit ? hton->commit_by_xid(hton, &x) :
> + hton->rollback_by_xid(hton, &x);
> +
> + DBUG_ASSERT(rc || member->in_engine_prepare > 0);
> +
> + if (!rc)
> + {
I don't think you can trust rc on that.
if it's non-zero, it's an error all right.
but it's a bit of a stretch to presume that
nonexisting xid is always an error.
also commit_by_xid returns int, not my_bool.
> + member->in_engine_prepare--;
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("%s transaction with xid %llu",
> + member->decided_to_commit ?
> + "Committed" : "Rolled back", (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +static my_bool xarecover_do_count_in_prepare(void *member_arg,
> + void *ptr_count)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + if (member->in_engine_prepare)
> + {
> + *(uint*) ptr_count += member->in_engine_prepare;
This is a rather weird semantics.
it's kind of a number of transactions times number of engines.
if one transaction wasn't committed in two engines and another
transaction wasn't rolled back in one engine, the counter will be 3.
how are you going to explain this to users?
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("Found prepared transaction with xid %llu",
> + (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +static my_bool xarecover_binlog_truncate_handlerton(THD *unused,
> + plugin_ref plugin,
> + void *arg)
> +{
> + handlerton *hton= plugin_hton(plugin);
> +
> + if (hton->state == SHOW_OPTION_YES && hton->recover)
> + {
> + my_hash_iterate((HASH*) arg, xarecover_do_commit_or_rollback, hton);
Why is the function called xarecover_binlog_truncate_handlerton
if it doesn't truncate anything?
> + }
> +
> + return FALSE;
> +}
> +
> +uint ha_recover_complete(HASH *commit_list)
> +{
> + uint count= 0;
> +
> + plugin_foreach(NULL, xarecover_binlog_truncate_handlerton,
> + MYSQL_STORAGE_ENGINE_PLUGIN, commit_list);
> + my_hash_iterate(commit_list, xarecover_do_count_in_prepare, &count);
> +
> + return count;
> +}
> +
> static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> void *arg)
> {
> @@ -1969,6 +2105,9 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> struct xarecover_st *info= (struct xarecover_st *) arg;
> int got;
>
> + if (info->error)
> + return TRUE;
plugin_foreach() aborts as soon as the callback returns true.
so, remove info->error, and return TRUE from xarecover_handlerton
on error instead.
> +
> if (hton->state == SHOW_OPTION_YES && hton->recover)
> {
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> @@ -1988,7 +2127,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> }
> #endif /* WITH_WSREP */
>
> - for (int i=0; i < got; i ++)
> + for (int i=0; i < got && !info->error; i ++)
eh? how can info->error ever be true above?
you break out of the loop when setting info->error= true,
> {
> my_xid x= IF_WSREP(WSREP_ON && wsrep_is_wsrep_xid(&info->list[i]) ?
> wsrep_xid_seqno(info->list[i]) :
> @@ -2013,7 +2152,20 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> info->found_my_xids++;
> continue;
> }
> - // recovery mode
> +
> + /*
> + Regular and semisync slave server recovery only collects
> + xids to make decisions on them later by the caller.
> + */
> + if (info->mem_root)
> + {
> + if (xid_member_replace(info->commit_list, x, info->mem_root))
> + {
> + info->error= true;
> + sql_print_error("Error in memory allocation at xarecover_handlerton");
> + break;
> + }
> + } else
add into the else branch something like
/* this branch is only for the legacy TC_LOG_MMAP */
compile_time_assert(sizeof(TC_LOG_MMAP) > 1 );
> if (IF_WSREP((wsrep_emulate_bin_log &&
> wsrep_is_wsrep_xid(info->list + i) &&
> x <= wsrep_limit), false) ||
> diff --git a/sql/log.cc b/sql/log.cc
> index 8073f09ab88..a9808ed8823 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -7977,6 +7982,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
> #endif
> }
>
> + DEBUG_SYNC(leader->thd, "commit_before_update_end_pos");
this isn't used in any tests
> /*
> update binlog_end_pos so it can be read by dump thread
> Note: must be _after_ the RUN_HOOK(after_flush) or else
> @@ -9609,6 +9618,180 @@ int TC_LOG::using_heuristic_recover()
> /****** transaction coordinator log for 2pc - binlog() based solution ******/
> #define TC_LOG_BINLOG MYSQL_BIN_LOG
>
> +/**
> + Truncates the current binlog to specified position. Removes the rest of binlogs
> + which are present after this binlog file.
> +
> + @param truncate_file Holds the binlog name to be truncated
> + @param truncate_pos Position within binlog from where it needs to
> + truncated.
> +
> + @retval true ok
> + @retval false error
> +
> +*/
> +bool MYSQL_BIN_LOG::truncate_and_remove_binlogs(const char *file_name,
> + my_off_t pos,
> + rpl_gtid *ptr_gtid,
> + enum_binlog_checksum_alg cs_alg)
> +{
> + int error= 0;
> +#ifdef HAVE_REPLICATION
> + LOG_INFO log_info;
> + THD *thd= current_thd;
> + my_off_t index_file_offset= 0;
> + File file= -1;
> + IO_CACHE cache;
> + MY_STAT s;
> +
> + if ((error= find_log_pos(&log_info, file_name, 1)))
> + {
> + sql_print_error("Failed to locate binary log file:%s."
> + "Error:%d", file_name, error);
> + goto end;
> + }
> +
> + while (!(error= find_next_log(&log_info, 1)))
> + {
> + if (!index_file_offset)
> + {
> + index_file_offset= log_info.index_file_start_offset;
> + if ((error= open_purge_index_file(TRUE)))
> + {
> + sql_print_error("Failed to open purge index "
> + "file:%s. Error:%d", purge_index_file_name, error);
> + goto end;
> + }
> + }
> + if ((error= register_purge_index_entry(log_info.log_file_name)))
> + {
> + sql_print_error("Failed to copy %s to purge index"
> + " file. Error:%d", log_info.log_file_name, error);
> + goto end;
> + }
> + }
> +
> + if (error != LOG_INFO_EOF)
> + {
> + sql_print_error("Failed to find the next binlog to "
> + "add to purge index register. Error:%d", error);
> + goto end;
> + }
> +
> + if (is_inited_purge_index_file())
> + {
> + if (!index_file_offset)
> + index_file_offset= log_info.index_file_start_offset;
> +
> + if ((error= sync_purge_index_file()))
> + {
> + sql_print_error("Failed to flush purge index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + // Trim index file
> + if ((error=
> + mysql_file_chsize(index_file.file, index_file_offset, '\n',
> + MYF(MY_WME))) ||
> + (error=
> + mysql_file_sync(index_file.file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim binlog index "
> + "file:%s to offset:%llu. Error:%d", index_file_name,
> + index_file_offset, error);
> + goto end;
> + }
> +
> + /* Reset data in old index cache */
> + if ((error= reinit_io_cache(&index_file, READ_CACHE, (my_off_t) 0, 0, 1)))
> + {
> + sql_print_error("Failed to reinit binlog index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + /* Read each entry from purge_index_file and delete the file. */
> + if ((error= purge_index_entry(thd, NULL, TRUE)))
> + {
> + sql_print_error("Failed to process registered "
> + "files that would be purged.");
> + goto end;
> + }
> + }
> +
> + DBUG_ASSERT(pos);
> +
> + if ((file= mysql_file_open(key_file_binlog, file_name,
> + O_RDWR | O_BINARY, MYF(MY_WME))) < 0)
> + {
> + error= 1;
> + sql_print_error("Failed to open binlog file:%s for "
> + "truncation.", file_name);
> + goto end;
> + }
> + my_stat(file_name, &s, MYF(0));
> +
> + /* Change binlog file size to truncate_pos */
> + if ((error=
> + mysql_file_chsize(file, pos, 0, MYF(MY_WME))) ||
> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim the "
> + "binlog file:%s to size:%llu. Error:%d",
> + file_name, pos, error);
> + goto end;
> + }
> + else
> + {
> + char buf[21];
> +
> + longlong10_to_str(ptr_gtid->seq_no, buf, 10);
> + sql_print_information("Successfully truncated binlog file:%s "
> + "to pos:%llu to remove transactions starting from "
> + "GTID %u-%u-%s", file_name, pos,
> + ptr_gtid->domain_id, ptr_gtid->server_id, buf);
> + }
> + if (!(error= init_io_cache(&cache, file, IO_SIZE, WRITE_CACHE,
> + (my_off_t) pos, 0, MYF(MY_WME|MY_NABP))))
> + {
> + /*
> + Write Stop_log_event to ensure clean end point for the binary log being
> + truncated.
> + */
I'm not sure about it. The less you touch the corrupted binlog the better.
simply truncating it would be enough, wouldn't it?
> + Stop_log_event se;
> + se.checksum_alg= cs_alg;
> + if ((error= write_event(&se, NULL, &cache)))
> + {
> + sql_print_error("Failed to write stop event to "
> + "binary log. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
> + goto end;
> + }
> + clear_inuse_flag_when_closing(cache.file);
> + if ((error= flush_io_cache(&cache)) ||
> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Faild to write stop event to "
> + "binary log. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
> + }
> + }
> + else
> + sql_print_error("Failed to initialize binary log "
> + "cache for writing stop event. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
1. I don't see why you need to sync after every operation.
2. you should clear inuse flag just before closing, otherwise
flush_io_cache can overwrite it again if it'll happen to have it
in the cache.
> +
> +end:
> + if (file >= 0)
> + {
> + end_io_cache(&cache);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + error= error || close_purge_index_file();
> +#endif
> + return error > 0;
> +}
> int TC_LOG_BINLOG::open(const char *opt_name)
> {
> int error= 1;
> @@ -10040,26 +10223,515 @@ start_binlog_background_thread()
>
> return 0;
> }
> +#ifdef HAVE_REPLICATION
> +class Recovery_context
> +{
> +public:
> + my_off_t prev_event_pos;
> + rpl_gtid last_gtid;
> + bool last_gtid_standalone;
> + bool last_gtid_valid;
> + bool last_gtid_no2pc; // true when the group does not end with Xid event
> + uint last_gtid_engines;
> + std::pair<uint, my_off_t> last_gtid_coord;
may be a comment at the end // <binlog id, binlog offset> ?
> + /*
> + When true, it's semisync slave recovery mode
> + rolls back transactions in doubt and wipes them off from binlog.
> + The rest of declarations deal with this type of recovery.
> + */
> + bool do_truncate;
> + rpl_gtid binlog_unsafe_gtid, truncate_gtid;
> + char binlog_truncate_file_name[FN_REFLEN] ;
> + char binlog_unsafe_file_name[FN_REFLEN] ;
> + /*
> + When do_truncate is true, the truncate position may not be
> + found in one round when recovered transactions are multi-engine
> + or just on different engines.
> + In the single recoverable engine case `truncate_reset_done` and
> + therefore `truncate_validated` remain `false' when the last
> + binlog is the binlog-checkpoint one.
> + The meaning of `truncate_reset_done` is according to the following example:
> + Let round = 1, Binlog contains the sequence of replication event groups:
> + [g1, G2, g3]
> + where `G` (in capital) stands for committed, `g` for prepared.
> + g1 is first set as truncation candidate, then G2 reset it to indicate
> + the actual truncation is behind (to the right of) it.
> + `truncate_validated` is set to true when `binlog_truncate_pos` (as of g3)
> + won't change.
> + Observe last_gtid_valid is affected, so in the above example g1
> + would have to be discarded from the 1st round binlog state estimate which
> + is handled by the 2nd round (see below).
I failed to understand that either :(
> + */
> + bool truncate_validated; // trued when the truncate position settled
> + bool truncate_reset_done; // trued when the position is to reevaluate
Nor that :(
> + /* Flags the fact of truncate position estimation is done the 1st round */
> + bool truncate_set_in_1st;
> + /*
> + Monotonically indexes binlog files in the recovery list.
> + When the list is "likely" singleton the value is UINT_MAX.
> + Otherwise enumeration starts with zero for the first file, increments
> + by one for any next file except for the last file in the list, which
> + is also the initial binlog file for recovery,
> + that is enumberated with UINT_MAX.
> + */
> + uint id_binlog;
> + enum_binlog_checksum_alg cs_alg; // for Stop_event with do_truncate
rename to checksum_alg, please.
"charset algorithm" looks very puzzling
and "cs" almost everywhere means CHARSET_INFO.
> + bool single_binlog;
> + std::pair<uint, my_off_t> binlog_truncate_coord;
> + std::pair<uint, my_off_t> binlog_unsafe_coord;
> +
> + Recovery_context();
> + bool decide_or_assess(xid_recovery_member *member, int round,
> + Format_description_log_event *fdle,
> + LOG_INFO *linfo, my_off_t pos);
> + void process_gtid(int round, Gtid_log_event *gev, LOG_INFO *linfo);
> + int next_binlog_or_round(int& round,
> + const char *last_log_name,
> + const char *binlog_checkpoint_name,
> + LOG_INFO *linfo, MYSQL_BIN_LOG *log);
> + bool is_safe()
what do you mean "safe" ?
> + {
> + return !do_truncate ? true :
> + (truncate_gtid.seq_no == 0 || // no truncate
> + binlog_unsafe_coord < binlog_truncate_coord); // or unsafe is earlier
> + }
> + bool complete(MYSQL_BIN_LOG *log, HASH &xids);
> + void update_binlog_unsafe_coord_if_needed(LOG_INFO *linfo);
> + void reset_truncate_coord(my_off_t pos);
> + void set_truncate_coord(LOG_INFO *linfo, int round,
> + enum_binlog_checksum_alg fd_cs_alg);
> +};
>
comments about what every method does?
> +bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids)
> +{
> + if (!do_truncate || is_safe())
> + {
> + uint count_in_prepare= ha_recover_complete(&xids);
> + if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
> + {
> + sql_print_warning("Could not complete %u number of transactions in "
> + "engines.", count_in_prepare);
> + return false; // there's later dry run ha_recover() to error out
> + }
> + }
> +
> + /* Truncation is not done when there's no transaction to roll back */
> + if (do_truncate && truncate_gtid.seq_no > 0)
> + {
> + if (is_safe())
> + {
> + if (log->truncate_and_remove_binlogs(binlog_truncate_file_name,
> + binlog_truncate_coord.second,
> + &truncate_gtid,
> + cs_alg))
> + {
> + sql_print_error("Failed to trim the binary log to "
> + "file:%s pos:%llu.", binlog_truncate_file_name,
> + binlog_truncate_coord.second);
> + return true;
> + }
> + }
> + else
> + {
> + sql_print_error("Cannot trim the binary log to file:%s "
> + "pos:%llu as unsafe statement "
> + "is found at file:%s pos:%llu which is "
> + "beyond the truncation position "
> + "(the farest in binlog order only is reported); "
"farest" doesn't look like a word
> + "all transactions in doubt are left intact. ",
> + binlog_truncate_file_name, binlog_truncate_coord.second,
> + binlog_unsafe_file_name, binlog_unsafe_coord.second);
> + return true;
will this abort server startup?
> + }
> + }
> +
> + return false;
> +}
> +
> +Recovery_context::Recovery_context() :
> + prev_event_pos(0),
> + last_gtid_standalone(false), last_gtid_valid(false), last_gtid_no2pc(false),
> + last_gtid_engines(0),
> + do_truncate(rpl_semi_sync_slave_enabled),
> + truncate_validated(false), truncate_reset_done(false),
> + truncate_set_in_1st(false), id_binlog(UINT_MAX),
> + cs_alg(BINLOG_CHECKSUM_ALG_UNDEF), single_binlog(false)
> +{
> + last_gtid_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_truncate_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_unsafe_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_truncate_file_name[0]= 0;
> + binlog_unsafe_file_name [0]= 0;
> + binlog_unsafe_gtid= truncate_gtid= rpl_gtid();
> +}
> +
> +/**
> + Is called when a committed or to-be-committed transaction is detected.
> + @c truncate_gtid is set to "nil" with its @c rpl_gtid::seq_no := 0.
> + @c truncate_reset_done remembers the fact of that has been done at least
> + once in the current round;
> + @c binlog_truncate_coord is "suggested" to a next group start to indicate
> + the actual settled value must be at most as the last suggested one.
> +*/
> +void Recovery_context::reset_truncate_coord(my_off_t pos)
> +{
> + DBUG_ASSERT(binlog_truncate_coord.second == 0 ||
> + last_gtid_coord >= binlog_truncate_coord ||
> + truncate_set_in_1st);
> +
> + binlog_truncate_coord= std::pair<uint,my_off_t>(id_binlog, pos);
> + truncate_gtid= rpl_gtid();
> + truncate_reset_done= true;
> +}
> +
> +
> +/*
> + Sets binlog_truncate_pos to the value of the current transaction's gtid.
> + In multi-engine case that might be just an assessment to be exacted
> + in the current round and confirmed in a next one.
> +*/
> +void Recovery_context::set_truncate_coord(LOG_INFO *linfo, int round,
> + enum_binlog_checksum_alg fd_cs_alg)
> +{
> + binlog_truncate_coord= last_gtid_coord;
> + strmake_buf(binlog_truncate_file_name, linfo->log_file_name);
> +
> + truncate_gtid= last_gtid;
> + cs_alg= fd_cs_alg;
> + truncate_set_in_1st= (round == 1);
> +}
> +
> +bool Recovery_context::decide_or_assess(xid_recovery_member *member, int round,
> + Format_description_log_event *fdle,
> + LOG_INFO *linfo, my_off_t pos)
> +{
> + if (member)
> + {
> + DBUG_EXECUTE_IF("binlog_truncate_partial_commit",
> + if (last_gtid_engines == 2)
> + {
> + DBUG_ASSERT(member->in_engine_prepare > 0);
> + member->in_engine_prepare= 1;
> + });
> + /*
> + xid in doubt are resolved as follows:
> + in_engine_prepare is compared agaist binlogged info to
> + yield the commit-or-rollback decision in the normal case.
> + In the semisync-slave recovery the decision may be
> + approximate to change in later rounds.
> + */
> + if (member->in_engine_prepare > last_gtid_engines)
> + {
> + char buf[21];
> + longlong10_to_str(last_gtid.seq_no, buf, 10);
> + sql_print_error("Error to recovery multi-engine transaction: "
> + "the number of engines prepared %u exceeds the "
> + "respective number %u in its GTID %u-%u-%s "
> + "located at file:%s pos:%llu",
> + member->in_engine_prepare, last_gtid_engines,
> + last_gtid.domain_id, last_gtid.server_id, buf,
> + linfo->log_file_name, last_gtid_coord.second);
> + return true;
> + }
> + else if (member->in_engine_prepare < last_gtid_engines)
> + {
> + DBUG_EXECUTE_IF("binlog_truncate_partial_commit",
> + member->in_engine_prepare= 2;);
> + DBUG_ASSERT(member->in_engine_prepare > 0);
> + /*
> + This is an "unlikely" branch of two or more engines in transaction
> + that is partially committed, so to complete.
> + */
> + member->decided_to_commit= true;
> + if (do_truncate)
> + {
> + /*
> + A validated truncate pos may not change later.
> + Any "unlikely" (two or more engines in transaction) reset
> + to not-validated yet position ensues correcting early
> + estimates in the following round(s).
> + */
> + if (!truncate_validated)
> + reset_truncate_coord(pos);
> + }
> + }
> + else // member->in_engine_prepare == last_gtid_engines
> + {
> + if (!do_truncate) // "normal" recovery
> + {
> + member->decided_to_commit= true;
> + }
> + else
> + {
> + /*
> + The first time or further estimate the truncate position
> + until validation. Already set not validated yet postion
> + may change only through previous reset
> + unless this xid in doubt is the first in the 2nd round.
> + */
> + if (!truncate_validated)
> + {
> + DBUG_ASSERT(round <= 2);
> + /*
> + Either truncate was not set or was reset, else
> + it gets incremented, otherwise it may only set to an earlier
> + offset only at the turn out of the 1st round.
sorry, I cannot parse that :(
> + */
> + DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
> + last_gtid_coord >= binlog_truncate_coord ||
> + (round == 2 && truncate_set_in_1st));
> +
> + last_gtid_valid= false; // may still (see gtid) flip later
> + if (truncate_gtid.seq_no == 0 /* was reset or never set */ ||
> + (truncate_set_in_1st && round == 2 /* reevaluted at round turn */))
> + set_truncate_coord(linfo, round, fdle->checksum_alg);
> +
> + DBUG_ASSERT(member->decided_to_commit == false); // may flip later
> + }
> + else
> + {
> + DBUG_ASSERT(!truncate_reset_done); // the position was settled
> + /*
> + Correct earlier decisions of 1st and/or 2nd round to
> + rollback and invalidate last_gtid in binlog state.
> + */
> + if (!(member->decided_to_commit=
> + last_gtid_coord < binlog_truncate_coord))
> + {
> + last_gtid_valid= false; // settled
> + if (truncate_gtid.seq_no == 0)
> + truncate_gtid= last_gtid;
> + }
> + }
> + }
> + }
> + }
> + else if (do_truncate) // "0" < last_gtid_engines
> + {
> + /*
> + Similar to the multi-engine partial commit of "then" branch above.
> + The 2nd condition economizes away an extra (3rd) round in
I don't think "economizes" is a word
> + expected cases of first xid:s in the binlog checkpoint file
> + are actually of committed transactions. So fully committed
> + xid sequence is passed in this branch without any action.
> + */
> + if (!truncate_validated)
> + reset_truncate_coord(pos);
> + }
> +
> + return false;
> +}
> +
> +/*
> + Is invoked when a standalone or non-2pc group is detected.
> + Both are unsafe to truncate in the semisync-slave recovery so
> + the maximum unsafe coordinate may be updated.
> + In the non-2pc group case though, *exeptionally*,
> + the no-engine group is considered safe, to be invalidated
> + to not contribute to binlog state.
> +*/
> +void Recovery_context::update_binlog_unsafe_coord_if_needed(LOG_INFO *linfo)
> +{
> + if (!do_truncate)
> + return;
> +
> + if (truncate_gtid.seq_no > 0 && // g1,U2, *not* G1,U2
> + last_gtid_coord > binlog_truncate_coord)
> + {
> + DBUG_ASSERT(binlog_truncate_coord.second > 0);
> + /*
> + Potentially unsafe when the truncate coordinate is not determined,
> + just detected as unsafe when behind the latter.
> + */
> + if (last_gtid_engines == 0)
> + {
> + last_gtid_valid= false;
> + }
> + else
> + {
> + binlog_unsafe_gtid= last_gtid;
> + binlog_unsafe_coord= last_gtid_coord;
> + strmake_buf(binlog_unsafe_file_name, linfo->log_file_name);
> + }
> + }
> +}
> +
> +/*
> + Assigns last_gtid and assesses the maximum (in the binlog offset term)
> + unsafe gtid (group of events).
> +*/
> +void Recovery_context::process_gtid(int round, Gtid_log_event *gev,
> + LOG_INFO *linfo)
> +{
> + last_gtid.domain_id= gev->domain_id;
> + last_gtid.server_id= gev->server_id;
> + last_gtid.seq_no= gev->seq_no;
> + last_gtid_engines= gev->extra_engines != UINT_MAX ?
> + gev->extra_engines + 1 : 0;
> + last_gtid_coord= std::pair<uint,my_off_t>(id_binlog, prev_event_pos);
> + if (round == 1 || do_truncate)
> + {
> + DBUG_ASSERT(!last_gtid_valid);
> +
> + last_gtid_no2pc= false;
> + last_gtid_standalone=
> + (gev->flags2 & Gtid_log_event::FL_STANDALONE) ? true : false;
> + if (do_truncate && last_gtid_standalone)
> + update_binlog_unsafe_coord_if_needed(linfo);
> + /* Update the binlog state with any 'valid' GTID logged after Gtid_list. */
> + last_gtid_valid= true; // may flip at Xid when falls to truncate
> + }
> +}
> +
> +/*
> + At the end of processing of the current binlog compute next action.
> + When round increments in the semisync-slave recovery
> + truncate_validated, truncate_reset_done
> + gets reset/set for the next round,
> + as well as id_binlog gets reset either to zero or UINT_MAX
> + when recovery deals with the only binlog file.
> +
> + @param[in,out] round the current round that may increment here
> + @param last_log_name the recovery starting binlog file
> + @param binlog_checkpoint_name
> + binlog checkpoint file
> + @param linfo binlog file list struct for next file
> + @param log pointer to mysql_bin_log instance
> + @return 0 when rounds continue, maybe the current one remains
> + 1 when all rounds are done
> + -1 on error
> +*/
> +int Recovery_context::next_binlog_or_round(int& round,
> + const char *last_log_name,
> + const char *binlog_checkpoint_name,
> + LOG_INFO *linfo,
> + MYSQL_BIN_LOG *log)
> +{
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + {
> + /* Either exit the loop now, or increment round */
> + DBUG_ASSERT(round <= 3);
> +
> + if ((round <= 2 && likely(!truncate_reset_done)) || round == 3)
> + {
> + // Exit now as the 1st round ends up with the binlog-checkpoint file
> + // is the same as the initial binlog file, so already parsed; or
> + // the 2nd round has not made own truncate_reset_done; or
> + // at the end of the 3rd "truncate" round.
> + DBUG_ASSERT(do_truncate || round == 1);
> + // The 3rd round is done only when the 2nd trued truncate_reset_done
> + // and consequently truncate_validated.
> + // No instances of truncate_reset_done in first 2 rounds allows
> + // for exiting now.
our (unwritten) coding style says that multi-line comments should use /*...*/
> + DBUG_ASSERT(round < 3 || truncate_validated);
> +
> + return true;
> + }
> + else
> + {
> + DBUG_ASSERT(do_truncate);
> + /*
> + the last binlog file, having truncate_reset_done to indicate
> + needed correction to member->decided_to_commit to reflect
> + changed binlog_truncate_pos.
> + */
> + truncate_reset_done= false;
> + truncate_validated= true;
> + rpl_global_gtid_binlog_state.reset_nolock();
> +
> + if (round > 1 &&
> + log->find_log_pos(linfo, binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", binlog_checkpoint_name);
> + return -1;
> + }
> + id_binlog= (single_binlog= !strcmp(linfo->log_file_name, last_log_name)) ?
> + UINT_MAX : 0;
> + round++;
> + }
> + }
> + else if (round == 1)
> + {
> + if (do_truncate)
> + {
> + truncate_validated= truncate_reset_done;
> + rpl_global_gtid_binlog_state.reset_nolock();
> + truncate_reset_done= false;
> + id_binlog= 0;
> +
> + DBUG_ASSERT(!single_binlog);
> + }
> + round++;
> + }
> + else
> + {
> + /*
> + NOTE: reading other binlog's FD is necessary for finding out
> + the checksum status of the respective binlog file.
> + Round remains in this branch.
> + */
> + if (log->find_next_log(linfo, 1))
> + {
> + sql_print_error("Error reading binlog files during recovery. "
> + "Aborting.");
> + return -1;
> + }
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + {
> + if (do_truncate)
> + {
> + DBUG_ASSERT(!single_binlog);
> + id_binlog= UINT_MAX;
> + }
> + else
> + return 1; // regular recovery exit
> + }
> + else if (do_truncate)
> + id_binlog++;
> +
> + DBUG_ASSERT(id_binlog <= UINT_MAX); // the assert is "practical"
> + }
> +
> + DBUG_ASSERT(!single_binlog ||
> + !strcmp(linfo->log_file_name, binlog_checkpoint_name));
> +
> + return 0;
> +}
> +#endif
>
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> - Format_description_log_event *fdle, bool do_xa)
> + Format_description_log_event *fdle_arg, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> - bool first_round;
> IO_CACHE log;
> File file= -1;
> const char *errmsg;
> + Format_description_log_event *fdle= fdle_arg;
> #ifdef HAVE_REPLICATION
> - rpl_gtid last_gtid;
> - bool last_gtid_standalone= false;
> - bool last_gtid_valid= false;
> + Recovery_context ctx;
> #endif
> + /*
> + The for-loop variable is updated by the following rule set:
> + Initially set to 1.
> + After the initial binlog file is processed to identify
> + the Binlog-checkpoint file it is incremented when the latter file
> + is different from the initial one. Otherwise the only log has been
> + fully parsed and the loop exits.
> + The 2nd starts with the Binlog-checkpoint file and ends when the initial
> + binlog file is reached. It is excluded from yet another processing
> + in the "normal" non-semisync-slave configuration, and then the loop is
> + done at this point. Otherwise in the semisync slave case it may be parsed
> + over again. The 2nd round may turn to a third in "unlikely" condition of the
> + semisync-slave is being recovered having multi- or different engine
> + transactions in doubt.
> + */
I don't understand that. Why semisync-slave case is special?
> + int round;
>
> if (! fdle->is_valid() ||
> (do_xa && my_hash_init(&xids, &my_charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> @@ -10072,39 +10744,63 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
>
> + /* finds xids when root is not NULL */
> + if (do_xa && ha_recover(&xids, &mem_root))
> + goto err1;
> +
> /*
> Scan the binlog for XIDs that need to be committed if still in the
> prepared stage.
>
> Start with the latest binlog file, then continue with any other binlog
> files if the last found binlog checkpoint indicates it is needed.
> +
> + In case the semisync slave recovery the 2nd round may include
> + the initial file as well as cause a 3rd round when transactions
> + with multiple engines are discovered.
> + Additionally to find and decide on transactions in doubt that
> + the semisync slave may need to roll back, the binlog can be truncated
> + in the end to reflect the rolled back decisions.
> */
>
> binlog_checkpoint_found= false;
> - first_round= true;
> - for (;;)
> + for (round= 1;;)
> {
> - while ((ev= Log_event::read_log_event(first_round ? first_log : &log,
> + while ((ev= Log_event::read_log_event(round == 1 ? first_log : &log,
> fdle, opt_master_verify_checksum))
> && ev->is_valid())
> {
> enum Log_event_type typ= ev->get_type_code();
> switch (typ)
> {
> + case FORMAT_DESCRIPTION_EVENT:
> + if (round > 1)
> + {
> + if (fdle != fdle_arg)
> + delete fdle;
> + fdle= (Format_description_log_event*) ev;
why is it needed? all binlog files should have the same
Format_description_log_event anyway.
> + }
> + break;
> case XID_EVENT:
> + if (do_xa)
> {
> - if (do_xa)
> + xid_recovery_member *member=
> + (xid_recovery_member*)
> + my_hash_search(&xids, (uchar*) &static_cast<Xid_log_event*>(ev)->xid,
> + sizeof(my_xid));
> +#ifndef HAVE_REPLICATION
> {
> - Xid_log_event *xev=(Xid_log_event *)ev;
> - uchar *x= (uchar *) memdup_root(&mem_root, (uchar*) &xev->xid,
> - sizeof(xev->xid));
> - if (!x || my_hash_insert(&xids, x))
> - goto err2;
> + if (member)
> + member->decided_to_commit= true;
> }
> - break;
> +#else
> + if (ctx.decide_or_assess(member, round, fdle, linfo, ev->log_pos))
> + goto err2;
> +#endif
> }
> + break;
> case BINLOG_CHECKPOINT_EVENT:
> - if (first_round && do_xa)
> + if (round == 1 && do_xa)
> {
> size_t dir_len;
> Binlog_checkpoint_log_event *cev= (Binlog_checkpoint_log_event *)ev;
> @@ -10124,9 +10820,18 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> }
> }
> break;
> +#ifdef HAVE_REPLICATION
> case GTID_LIST_EVENT:
> - if (first_round)
> + if (round == 1 || (ctx.do_truncate &&
> + (ctx.id_binlog == 0 || ctx.single_binlog)))
> {
> + /*
> + Unlike the normal case, in do_truncate the initial state is
> + in the first binlog file of the recovery list.
> + */
don't understand that either :(
> + DBUG_ASSERT(!ctx.do_truncate || !ctx.single_binlog ||
> + ctx.id_binlog == UINT_MAX);
> +
> Gtid_list_log_event *glev= (Gtid_list_log_event *)ev;
>
> /* Initialise the binlog state from the Gtid_list event. */
> @@ -10135,19 +10840,16 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> }
> break;
>
> -#ifdef HAVE_REPLICATION
> case GTID_EVENT:
> - if (first_round)
> + ctx.process_gtid(round, (Gtid_log_event *)ev, linfo);
> + break;
> +
> + case QUERY_EVENT:
> + if (((Query_log_event *)ev)->is_commit() ||
> + ((Query_log_event *)ev)->is_rollback())
> {
> - Gtid_log_event *gev= (Gtid_log_event *)ev;
> -
> - /* Update the binlog state with any GTID logged after Gtid_list. */
> - last_gtid.domain_id= gev->domain_id;
> - last_gtid.server_id= gev->server_id;
> - last_gtid.seq_no= gev->seq_no;
> - last_gtid_standalone=
> - ((gev->flags2 & Gtid_log_event::FL_STANDALONE) ? true : false);
> - last_gtid_valid= true;
> + ctx.last_gtid_no2pc= true;
> + ctx.update_binlog_unsafe_coord_if_needed(linfo);
what about DML on non-transactonal tables? it won't have COMMIT/ROLLBACK will it?
> }
> break;
> #endif
> @@ -10217,35 +10923,43 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> file= -1;
> }
>
> - if (!strcmp(linfo->log_file_name, last_log_name))
> - break; // No more files to do
> +#ifdef HAVE_REPLICATION
> + int rc= ctx.next_binlog_or_round(round, last_log_name,
> + binlog_checkpoint_name, linfo, this);
> + if (rc == -1)
> + goto err2;
> + else if (rc == 1)
> + break; // all rounds done
> +#else
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + break; // No more files to do
indentation
> +#endif
> +
> if ((file= open_binlog(&log, linfo->log_file_name, &errmsg)) < 0)
> {
> sql_print_error("%s", errmsg);
> goto err2;
> }
> - /*
> - We do not need to read the Format_description_log_event of other binlog
> - files. It is not possible for a binlog checkpoint to span multiple
> - binlog files written by different versions of the server. So we can use
> - the first one read for reading from all binlog files.
> - */
> - if (find_next_log(linfo, 1))
> - {
> - sql_print_error("Error reading binlog files during recovery. Aborting.");
> - goto err2;
> - }
> fdle->reset_crypto();
> - }
> + } // end of for
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> - goto err2;
> -
> + if (binlog_checkpoint_found)
> + {
> +#ifndef HAVE_REPLICATION
> + if (ha_recover_complete(&xids))
> +#else
> + if (ctx.complete(this, xids))
> +#endif
> + goto err2;
> + }
> free_root(&mem_root, MYF(0));
> my_hash_free(&xids);
> }
> + if (fdle != fdle_arg)
> + delete fdle;
> +
> return 0;
>
> err2:
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0