developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 4 participants
- 6816 discussions
Re: [Maria-developers] e92037989f7: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 25 Apr '21
by Sergei Golubchik 25 Apr '21
25 Apr '21
Hi, Andrei!
Don't be confused by the subject, this is a review of
git diff 450c017c2d 2fa526e26e
that is of everything, combined. Not just one e92037989f7 commit.
On Apr 25, Sujatha wrote:
> revision-id: e92037989f7 (mariadb-10.3.26-128-ge92037989f7)
> parent(s): 450c017c2d9
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2021-04-13 12:26:12 +0300
> message:
>
> MDEV-21117: refine the server binlog-based recovery for semisync
> diff --git a/libmariadb b/libmariadb
> index fc431a035a2..e3824422064 160000
> --- a/libmariadb
> +++ b/libmariadb
> @@ -1 +1 @@
> -Subproject commit fc431a035a21ac1d4ef25d9d3cd8c4d7e64a8ee7
> +Subproject commit e38244220646a7e95c9be22576460aa7a4eb715f
This is clearly a mistake, you erroneously checked in
old libmaridb commit, rolling back a bunch of changes.
See your commit 4bc83b2749
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine-master.opt b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine-master.opt
> new file mode 100644
> index 00000000000..df675545bf9
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine-master.opt
please rename this file to mysql-test/include/have_rocksdb.opt
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.test b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> new file mode 100644
> index 00000000000..2b794d02dd0
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> @@ -0,0 +1,74 @@
> +# ==== Purpose ====
> +#
> +# Test verifies the truncation of single binary log file.
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_aria.inc
> +# File: binlog_truncate_active_log.inc included in test makes use of
> +# 'debug_sync' facility.
> +--source include/have_debug_sync.inc
you wouldn't need a comment if you'd include have_debug_sync.inc
directly into binlog_truncate_active_log.inc. but ok, whatever you like
> +--source include/have_binlog_format_statement.inc
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +# The following cases are tested:
> +# A. 2pc transaction is followed by a blank "zero-engines" one
> +# B. 2pc transaction follows the blank one
> +# C. Similarly to A, with the XA blank transaction
> +
> +RESET MASTER;
> +CREATE TABLE t (f INT) ENGINE=INNODB;
> +CREATE TABLE t2 (f INT) ENGINE=INNODB;
> +CREATE TABLE tm (f INT) ENGINE=Aria;
could you add a comment, explaining why you're using Aria here.
(you wrote that in an email, but please add a comment too)
> +
> +--echo # Case A.
> +# Using 'debug_sync' hold 'query1' execution after 'query1' is flushed and
> +# synced to binary log but not yet committed. In an another connection hold
> +# 'query2' execution after 'query2' is flushed and synced to binlog.
> +# Crash and restart server with --rpl-semi-sync-slave-enabled=1
> +#
> +# During recovery of binary log 'query1' status is checked with InnoDB engine,
> +# it will be in prepared but not yet commited. All transactions starting from
> +# 'query1' onwards will be removed from the binary log.
> +
> +--let $truncate_gtid_pos = 0-1-6
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $query2 = DELETE FROM t2 WHERE f = 0 /* no such record */
> +--source binlog_truncate_active_log.inc
> +
> +--echo # Case B.
> +# The inverted sequence ends up to truncate only $query2
> +--let $truncate_gtid_pos = 0-1-10
> +--let $query1 = DELETE FROM t2 WHERE f = 0
> +--let $query2 = INSERT INTO t VALUES (20)
> +--source binlog_truncate_active_log.inc
> +
> +
> +delimiter |;
> +CREATE PROCEDURE sp_blank_xa()
> +BEGIN
> + XA START 'blank';
> + DELETE FROM t2 WHERE f = 0 /* no such record */;
> + XA END 'blank';
> + XA PREPARE 'blank';
> +END|
> +delimiter ;|
> +
> +
> +--echo # Case C.
> +--let $truncate_gtid_pos = 0-1-14
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $pre_q2 = CALL sp_blank_xa
> +--let $query2 = XA COMMIT 'blank'
> +--source binlog_truncate_active_log.inc
what was truncated here?
a comment explains it for cases A and B, but not here.
may be it'd make sense to do show binlog events after every restart,
just to see the state of the binlog after truncation?
> +DROP PROCEDURE sp_blank_xa;
> +
> +--echo # Cleanup
> +DROP TABLE t,t2,tm;
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> new file mode 100644
> index 00000000000..bbc464066fc
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> @@ -0,0 +1,55 @@
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
> +
> +--connection default
> +
> +# First to commit few transactions
> +INSERT INTO t VALUES (10);
> +INSERT INTO tm VALUES (10);
> +
> +--connection master1
> +# Hold insert after write to binlog and before "run_commit_ordered" in engine
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master1_ready WAIT_FOR signal_never_arrives";
> +--send_eval $query1
> +
> +--connection master2
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +if ($pre_q2)
> +{
> + eval $pre_q2;
> +}
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master2_ready";
> +# To binlog non-xid transactional group which will be truncated all right
> +--send_eval $query2
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SELECT @@global.gtid_binlog_pos as 'Before the crash';
> +
> +--connection default
> +--source include/kill_mysqld.inc
> +--disconnect master1
> +--disconnect master2
> +--disconnect master3
> +
> +#
> +# Server restart
> +#
> +--let $restart_parameters= --rpl-semi-sync-slave-enabled=1
> +--source include/start_mysqld.inc
> +
> +# Check error log for a successful truncate message.
> +--let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err
> +
> +--let SEARCH_FILE=$log_error_
why not to set SEARCH_FILE directly? but ok, as you like
> +--let SEARCH_PATTERN=Successfully truncated.*to remove transactions starting from GTID $truncate_gtid_pos
> +--replace_regex /FOUND [0-9]+/FOUND #/
can it be found multiple times? Why would binlog be truncated more than once?
> +--source include/search_pattern_in_file.inc
> +
> +SELECT @@global.gtid_binlog_pos as 'After the crash';
> +--echo "One row should be present in table 't'"
> +SELECT * FROM t;
> +
> +# Local cleanup
> +DELETE FROM t;
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> new file mode 100644
> index 00000000000..94837e3c3ea
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> @@ -0,0 +1,56 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs with multiple transactional
> +# storage engines
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_rocksdb.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +--let $MYSQLD_DATADIR= `SELECT @@datadir`
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE t2 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=rocksdb;
> +
> +--let $case = A: neither engine committed => rollback & binlog truncate
> +# Hold off engine commits after write to binlog and its rotation.
> +# The transaction is killed along with the server after that.
> +--let $shutdown_timeout=0
> +--let $debug_sync_action = "commit_after_release_LOCK_log SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 1 row should be present in both tables; binlog is truncated; number of binlogs at reconnect - 3
> +--source binlog_truncate_multi_engine.inc
> +--echo Proof of the truncated binlog file is readable (two transactions must be seen):
> +--exec $MYSQL_BINLOG --short-form --skip-annotate-row-events $MYSQLD_DATADIR/master-bin.000002
> +
> +--let $case = B: one engine has committed its transaction branch
> +# Hold off after one engine has committed.
> +--let $shutdown_timeout=0
> +--let $debug_sync_action = "commit_after_run_commit_ordered SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +# Both debug_sync and debug-dbug are required to make sure Engines remember the commit state
> +# debug_sync alone will not help.
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1 --debug-dbug=d,binlog_truncate_partial_commit
in the first review I wrote
this seems to be a rather crude way of faking a partially committed
transaction. better to crash after the first engine has committed,
that'd be much more natural.
and you replied
This simulation aimed at (allows for) more complicated recovery time
event sequences.
In this case, indeed, crashing by demand is about of the same efforts.
I can convert to that.
[x]
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; one extra binlog file compare with A; number of binlogs at reconnect - 4
> +--source binlog_truncate_multi_engine.inc
> +
> +--let $case = C: both engines have committed its transaction branch
> +# Hold off after both engines have committed. The server is shut down.
> +--let $shutdown_timeout=
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; the same # of binlog files as in B; number of binlogs at reconnect - 4
> +--source binlog_truncate_multi_engine.inc
> +
> +
> +
> +DROP TABLE t1, t2;
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> new file mode 100644
> index 00000000000..41ae856dd9d
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> @@ -0,0 +1,54 @@
> +#
> +# Loop body of binlog_truncate_multi_engine.test
what do you mean "loop body"?
> +# Parameters:
> +# $debug_sync_action describes debug-sync actions
> +# $kill_server 1 when to crash, 0 for regular restart
> +# $restart_parameters the caller may simulate partial commit at recovery
> +# $test_outcome summary of extected results
> +# $MYSQLD_DATADIR
> +
> +--echo #
> +--echo #
> +--echo # Case $case
> +--echo #
> +RESET MASTER;
> +FLUSH LOGS;
> +SET GLOBAL max_binlog_size= 4096;
> +
> +connect(con1,localhost,root,,);
> +--echo List of binary logs before rotation
> +--source include/show_binary_logs.inc
> +INSERT INTO t1 VALUES (1, REPEAT("x", 1));
> +INSERT INTO t2 VALUES (1, REPEAT("x", 1));
I'm not sure I understand the point of REPEAT(..., 1)
but sure, if you like it that way... :)
> +BEGIN;
> + INSERT INTO t1 VALUES (2, REPEAT("x", 4100));
> + INSERT INTO t2 VALUES (2, REPEAT("x", 4100));
> +
> +--eval SET DEBUG_SYNC= $debug_sync_action
> +send COMMIT;
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +--echo List of binary logs after rotation
> +--source include/show_binary_logs.inc
> +
> +--echo # restart the server with $restart_parameters
> +--echo # the server is restarted
> +--source include/restart_mysqld.inc
> +
> +--connection default
> +--echo #
> +--echo # *** Summary: $test_outcome:
> +--echo #
> +SELECT COUNT(*) FROM t1;
> +SELECT COUNT(*) FROM t2;
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +--echo List of binary logs at the end of the tests
> +--source include/show_binary_logs.inc
> +--echo # ***
> +# cleanup
> +DELETE FROM t1;
> +DELETE FROM t2;
> +--disconnect con1
> +--echo #
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> new file mode 100644
> index 00000000000..3b557bc89b8
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> @@ -0,0 +1,78 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs.
> +#
> +# ==== References ====
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +SET @@global.max_binlog_size= 4096;
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=MyISAM;
> +
> +connect(master1,localhost,root,,);
> +--echo "List of binary logs before rotation"
> +--source include/show_binary_logs.inc
> +
> +# Some load to either non- and transactional egines
> +# that should not affect the following recovery:
> +INSERT INTO ti VALUES(1,"I am gonna survive");
> +INSERT INTO tm VALUES(1,"me too!");
> +
> +# hold on near engine commit
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR con1_go";
> +--send_eval INSERT INTO ti VALUES (2, REPEAT("x", 4100))
> +
> +connect(master2,localhost,root,,);
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES (3, "not gonna survive")
send_eval? what are you evaluating here?
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +--echo "List of binary logs before crash"
> +--source include/show_binary_logs.inc
> +--echo # The gtid binlog state prior the crash will be truncated at the end of the test
> +SELECT @@global.gtid_binlog_state;
> +
> +--connection default
> +--source include/kill_mysqld.inc
> +--disconnect master1
> +--disconnect master2
> +
> +#
> +# Server restart
> +#
> +--let $restart_parameters= --rpl-semi-sync-slave-enabled=1
> +--source include/start_mysqld.inc
> +
> +# Check error log for a successful truncate message.
> +let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_PATTERN=truncated binlog file:.*master.*000002
> +--replace_regex /FOUND [0-9]+/FOUND #/
can it be found multiple times? Why would binlog be truncated more than once?
> +--source include/search_pattern_in_file.inc
> +
> +
> +--echo "One record should be present in table"
> +SELECT * FROM ti;
> +
> +--echo # The truncated gtid binlog state
> +SELECT @@global.gtid_binlog_state;
> +SELECT @@global.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +DROP TABLE ti;
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> new file mode 100644
> index 00000000000..38a9c0832f4
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> @@ -0,0 +1,120 @@
> +# ==== Purpose ====
> +# The test verifies attempt to recover by the semisync slave server whose
> +# binlog is unsafe for truncation.
> +#
> +# ==== Implementation ====
> +# 2 binlog files are created with the 1st one destined to be the binlog
> +# checkpoint file for recovery.
> +# The final group of events is replication unsafe (myisam INSERT).
> +# Therefore the semisync slave recovery may not.
> +#
> +# Steps:
> +# 0 - Set max_binlog_size= 4096, to help an insert into a
> +# transaction table 'ti' get binlog rotated while the
> +# transaction won't be committed, being stopped at
> +# a prior to commit debug_sync point
> +# 1 - insert into a non-transactional 'tm' table completes with
> +# binary logging as well
> +# 2 - kill and attempt to restart the server as semisync slave that
> +# must produce an expected unsafe-to-recover error
> +# 3 - complete the test with a normal restart that successfully finds and
> +# commits the transaction in doubt.
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +#
> +
> +--source include/have_innodb.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +SET @@global.max_binlog_size= 4096;
> +
> +call mtr.add_suppression("Table '.*tm' is marked as crashed and should be repaired");
> +call mtr.add_suppression("Got an error from unknown thread");
> +call mtr.add_suppression("Checking table: '.*tm'");
> +call mtr.add_suppression("Recovering table: '.*tm'");
> +call mtr.add_suppression("Cannot trim the binary log to file");
> +call mtr.add_suppression("Crash recovery failed");
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("Found 1 prepared transactions");
> +call mtr.add_suppression("mysqld: Table.*tm.*is marked as crashed");
> +call mtr.add_suppression("Checking table.*tm");
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (f INT) ENGINE=MYISAM;
> +
> +--let $row_count = 5
> +--let $i = `select $row_count-2`
> +--disable_query_log
> +while ($i)
> +{
> + --eval INSERT INTO ti VALUES ($i, REPEAT("x", 1))
> + --dec $i
> +}
> +--enable_query_log
> +INSERT INTO tm VALUES(1);
> +
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
> +
> +--connection master1
> +
> +# The 1st trx binlogs, rotate binlog and hold on before committing at engine
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR master1_go";
> +--send_eval INSERT INTO ti VALUES ($row_count - 1, REPEAT("x", 4100))
> +
> +--connection master2
> +
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES ($row_count, REPEAT("x", 1))
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master3_ready";
> +--send INSERT INTO tm VALUES (2)
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master3_ready";
> +--echo # The gtid binlog state prior the crash must be restored at the end of the test;
> +SELECT @@global.gtid_binlog_state;
> +--source include/kill_mysqld.inc
> +
> +#
> +# Server restarts
> +#
> +--echo # Failed restart as the semisync slave
> +--error 1
> +--exec $MYSQLD_LAST_CMD --rpl-semi-sync-slave-enabled=1 >> $MYSQLTEST_VARDIR/log/mysqld.1.err 2>&1
> +
> +--echo # Normal restart
> +--source include/start_mysqld.inc
> +
> +# Check error log for correct messages.
> +let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_PATTERN=Cannot trim the binary log to file
> +--replace_regex /FOUND [0-9]+/FOUND #/
can it be found multiple times? Why would binlog be truncated more than once?
> +--source include/search_pattern_in_file.inc
> +
> +--echo # Proof that the in-doubt transactions are recovered by the 2nd normal server restart
> +--eval SELECT COUNT(*) = $row_count as 'True' FROM ti
> +# myisam table may require repair (which is not tested here)
> +--disable_warnings
> +SELECT COUNT(*) <= 1 FROM tm;
> +--enable_warnings
> +
> +--echo # The gtid binlog state prior the crash is restored now
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +DROP TABLE ti, tm;
> +--echo End of test
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> new file mode 100644
> index 00000000000..f8312bdc5b8
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> @@ -0,0 +1,11 @@
> +!include suite/rpl/rpl_1slave_base.cnf
> +!include include/default_client.cnf
> +
> +
> +[mysqld.1]
> +log-slave-updates
> +gtid-strict-mode=1
> +
> +[mysqld.2]
> +log-slave-updates
> +gtid-strict-mode=1
generally opt files (rpl_semi_sync_fail_over.opt in this case) are preferred,
because mtr will know what options to apply, while cnf files are more opaque
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> new file mode 100644
> index 00000000000..a8b40d6ed05
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> @@ -0,0 +1,143 @@
> +# ==== Purpose ====
> +#
> +# Test verifies replication failover scenario.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Having two servers 1 and 2 enable semi-sync replication with
> +# with the master wait 'after_sync'.
> +# 1 - Insert a row. While inserting second row simulate
> +# a server crash at once the transaction is written to binlog, flushed
> +# and synced but the binlog position is not updated.
> +# 2 - Post crash-recovery on the old master execute there CHANGE MASTER
> +# TO command to connect to server id 2.
> +# 3 - The old master new slave server 1 must connect to the new
> +# master server 2.
> +# 4 - repeat the above to crash the new master and restore in role the old one
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +--let $rpl_topology=1->2
> +--source include/rpl_init.inc
why not to source master-slave.inc if you're using a standard master-slave
topology anyway?
> +
> +--connection server_2
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +RESET MASTER;
> +SET @@global.max_binlog_size= 4096;
> +
> +--connection server_2
> +RESET MASTER;
> +SET @@global.max_binlog_size= 4096;
> +set @@global.rpl_semi_sync_slave_enabled = 1;
> +set @@global.gtid_slave_pos = "";
> +CHANGE MASTER TO master_use_gtid= slave_pos;
> +--source include/start_slave.inc
> +
> +
> +--connection server_1
> +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
here I asked "why?" and you replied
Actually not need to. There's no crash in the middle of a slave
transaction.
So it must be a copy-paste leftover.
[x]
> +set @@global.rpl_semi_sync_master_enabled = 1;
> +set @@global.rpl_semi_sync_master_wait_point=AFTER_SYNC;
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("1 client is using or hasn.t closed the table properly");
> +call mtr.add_suppression("Table './mtr/test_suppressions' is marked as crashed and should be repaired");
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +INSERT INTO t1 VALUES (1, 'dummy1');
> +
> +#
> +# CRASH the original master, and FAILOVER to the new
> +#
> +
> +# value 1 for server id 1 -> 2 failover
> +--let $failover_to_slave=1
> +--let $query_to_crash= INSERT INTO t1 VALUES (2, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*master.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_2
> +--let $rows_so_far=3
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'dummy3')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_1
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +#
> +# CRASH the new master and FAILOVER back to the original
> +#
> +
> +# value 0 for the reverse server id 2 -> 1 failover
> +--let $failover_to_slave=0
> +--let $query_to_crash = INSERT INTO t1 VALUES (4, REPEAT("x", 4100))
> +--let $query2_to_crash= INSERT INTO t1 VALUES (5, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*slave.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_1
> +--let $rows_so_far=6
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'Done')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +
> +--let $diff_tables=server_1:t1, server_2:t1
> +--source include/diff_tables.inc
> +
> +#
> +--echo # Cleanup
> +#
> +--connection server_1
> +DROP TABLE t1;
> +--save_master_pos
> +
> +--connection server_2
> +--sync_with_master
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +RESET SLAVE;
> +RESET MASTER;
> +
> +--connection server_2
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +
> +evalp CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=no;
> +--source include/start_slave.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--source include/rpl_end.inc
> diff --git a/sql/handler.h b/sql/handler.h
> index fc69d9423b4..05a62ed0021 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -873,6 +874,15 @@ typedef struct xid_t XID;
> /* The 'buf' has to have space for at least SQL_XIDSIZE bytes. */
> uint get_sql_xid(XID *xid, char *buf);
>
> +/* struct for semisync slave binlog truncate recovery */
> +struct xid_recovery_member
> +{
> + my_xid xid;
> + uint in_engine_prepare; // number of engines that have xid prepared
> + bool decided_to_commit;
> + std::pair<uint, my_off_t> binlog_coord; // semisync recovery binlog offset
wouldn't it be clearer to have a struct with named members?
in fact, I'm somewhat surprised there's no such struct for binlog coords
already.
> +};
> +
> /* for recover() handlerton call */
> #define MIN_XID_LIST_SIZE 128
> #define MAX_XID_LIST_SIZE (1024*128)
> @@ -4820,7 +4830,8 @@ int ha_commit_one_phase(THD *thd, bool all);
> int ha_commit_trans(THD *thd, bool all);
> int ha_rollback_trans(THD *thd, bool all);
> int ha_prepare(THD *thd);
> -int ha_recover(HASH *commit_list);
> +int ha_recover(HASH *commit_list, MEM_ROOT *mem_root= NULL);
> +uint ha_recover_complete(HASH *commit_list, std::pair<uint, my_off_t> *coord= NULL);
is coord a truncation position?
>
> /* transactions: these functions never call handlerton functions directly */
> int ha_enable_transaction(THD *thd, bool on);
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 8a342cb5cd3..1036e9a44d4 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -482,6 +482,16 @@ class String;
> */
> #define LOG_EVENT_IGNORABLE_F 0x80
>
> +/**
> + @def LOG_EVENT_ACCEPT_OWN_F
> +
> + Flag sets by the semisync slave for accepting
> + the same server_id ("own") events which the slave must not have
> + in its state. Typically such events were never committed by
> + their originator (this server) and discared at its semisync-slave recovery.
> +*/
> +#define LOG_EVENT_ACCEPT_OWN_F 0x4000
may be, add an assert on all received events that such a flag is not set?
it can only be set on events in relay log.
also, consider the case when this event is read from a relay log, applied,
and then sent to further slaves. In this case this flag must be removed
before sending, otherwise they'll mistakenly might apply it if the server_id
will match.
> +
> /**
> @def LOG_EVENT_SKIP_REPLICATION_F
>
> @@ -3357,6 +3367,12 @@ class Gtid_log_event: public Log_event
> uint64 commit_id;
> uint32 domain_id;
> uchar flags2;
> + uint flags_extra; // more flags area placed after the regular flags2's one
> + /*
> + Extra to a "base" engine recoverable engines participating
> + in the transaction. Zero, when the base engine only is present.
what's a "base engine"?
> + */
> + uint8 extra_engines;
>
> /* Flags2. */
>
> diff --git a/sql/handler.cc b/sql/handler.cc
> index c0a810a72bc..a46cef6b64c 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1637,9 +1672,17 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, bool is_real_trans)
> DEBUG_SYNC(thd, "commit_one_phase_2");
> if (ha_info)
> {
> + int err;
> +
> + if (has_binlog_hton(ha_info) &&
can you replace has_binlog_hton() with, like, if trx cache is not empty or
binlog enabled or something like that?
> + (err= binlog_commit(thd, all,
> + is_ro_1pc_trans(thd, ha_info, all, is_real_trans))))
> + {
> + my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> + error= 1;
> + }
> for (; ha_info; ha_info= ha_info_next)
> {
> - int err;
> handlerton *ht= ha_info->ht();
> if ((err= ht->commit(ht, thd, all)))
> {
> @@ -1962,8 +2008,177 @@ struct xarecover_st
> XID *list;
> HASH *commit_list;
> bool dry_run;
> + MEM_ROOT *mem_root;
> + bool error;
> };
>
> +/**
> + Inserts a new hash member.
> +
> + returns a successfully created and inserted @c xid_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xid_recovery_member*
> +xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member *member= (xid_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid= xid_arg;
> + member->in_engine_prepare= 1;
> + member->decided_to_commit= false;
> +
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/*
> + Inserts a new or updates an existing hash member to increment
> + the member's prepare counter.
> +
> + returns false on success,
> + true otherwise.
> +*/
> +static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member* member;
> + if ((member= (xid_recovery_member *)
> + my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
> + member->in_engine_prepare++;
> + else
> + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root);
> +
> + return member == NULL;
> +}
> +
> +/*
> + Decision to commit returns true, otherwise false for rollback.
> + Flagged to commit member is destined to commit. If it is in doubt in case
> + A. the caller does not specify coord_ptr (always so in the normal recovery), or
> + B. coord_ptr is not NULL (can only be so in the semisync slave case) and its
> + offset is greater than that of the member's the decision is rollback.
> + If both A,B do not hold - which is the semisync slave recovery case -
> + the decision is to rollback.
> +*/
> +static bool xarecover_decide(xid_recovery_member* member,
> + xid_t x, std::pair<uint, my_off_t> *coord_ptr)
> +{
> + return
> + member->decided_to_commit ? true :
> + !coord_ptr ? false :
> + (member->binlog_coord < *coord_ptr ? // semisync slave recovery
> + true : false);
> +}
> +
> +struct xarecover_iterate_arg
> +{
> + handlerton *hton;
> + std::pair<uint, my_off_t> *binlog_coord;
> +};
> +
> +/*
> + Hash iterate function to complete with commit or rollback as either
> + has been decided already or decide now (in the semisync recovery)
> + via comparison against passed offset.
> + Commit when the offset is greater than that of the member.
> +*/
> +static my_bool xarecover_do_commit_or_rollback(void *member_arg,
> + void *iter_arg)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + handlerton *hton= ((xarecover_iterate_arg*) iter_arg)->hton;
> + std::pair<uint, my_off_t> *max_coord_ptr=
> + ((xarecover_iterate_arg*) iter_arg)->binlog_coord;
> + xid_t x;
> + my_bool rc;
> +
> + x.set(member->xid);
> +
> + rc= xarecover_decide(member, x, max_coord_ptr) ?
> + hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);
> +
> + DBUG_ASSERT(rc || member->in_engine_prepare > 0);
> +
> + if (!rc)
> + {
> + /*
> + This block relies on Engine to report XAER_NOTA at
> + "complete"_by_xid for unknown xid.
> + */
> + member->in_engine_prepare--;
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("%s transaction with xid %llu",
may be not a sql_print_warning, but a sql_print_information?
it's just an informational message
> + member->decided_to_commit ?
> + "Committed" : "Rolled back", (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +static my_bool xarecover_do_count_in_prepare(void *member_arg,
> + void *ptr_count)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + if (member->in_engine_prepare)
> + {
> + (*(uint*) ptr_count)++;
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("Found prepared transaction with xid %llu",
> + (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +struct xarecover_complete_arg
> +{
> + HASH *commit_list;
> + std::pair<uint, my_off_t> *binlog_coord;
> +};
> +
> +/*
> + Completes binlog recovery to invoke a decider function for
> + each transaction in doubt.
> +*/
> +static my_bool xarecover_binlog_handlerton(THD *unused,
> + plugin_ref plugin,
> + void *arg)
> +{
> + handlerton *hton= plugin_hton(plugin);
> +
> + if (hton->state == SHOW_OPTION_YES && hton->recover)
> + {
> + xarecover_iterate_arg iter_arg=
> + {
> + hton,
> + ((xarecover_complete_arg*) arg)->binlog_coord
> + };
> + my_hash_iterate(((xarecover_complete_arg*) arg)->commit_list,
> + xarecover_do_commit_or_rollback, &iter_arg);
> + }
> +
> + return FALSE;
> +}
> +
> +/*
> + Completes binlog recovery to invoke decider functions for
> + each handerton.
> + Returns the number of transactions remained doubtful.
> +*/
> +uint ha_recover_complete(HASH *commit_list, std::pair<uint, my_off_t> *coord)
> +{
> + uint count= 0;
> + xarecover_complete_arg complete_arg= { commit_list, coord };
> + plugin_foreach(NULL, xarecover_binlog_handlerton,
> + MYSQL_STORAGE_ENGINE_PLUGIN, &complete_arg);
> + my_hash_iterate(commit_list, xarecover_do_count_in_prepare, &count);
wouldn't it be cleaner to do everything in one commit_list scan?
for every xid_recovery_member:
run plugin_foreach, commit or rollback as needed
increment a counter, if still in doubt
> +
> + return count;
> +}
> +
> static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> void *arg)
> {
> @@ -1973,6 +2188,9 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
>
> if (hton->state == SHOW_OPTION_YES && hton->recover)
> {
> +#ifndef DBUG_OFF
> + my_xid dbug_xid_list[128] __attribute__((unused)) = {0};
> +#endif
What do you use it for?
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> {
> sql_print_information("Found %d prepared transaction(s) in %s",
> diff --git a/sql/log.cc b/sql/log.cc
> index 8073f09ab88..a90d1e757e8 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2050,14 +2058,17 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
> Otherwise, we accumulate the changes.
> */
> if (likely(!error) && ending_trans(thd, all))
> + {
> + cache_mngr->ro_1pc= ro_1pc;
> error= binlog_commit_flush_trx_cache(thd, all, cache_mngr);
> + cache_mngr->ro_1pc= false;
Why do you put it in cache_mngr, instead of passing it down, like you pass `all` ?
> + }
>
> /*
> This is part of the stmt rollback.
> */
> if (!all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
> -
> THD_STAGE_INFO(thd, org_stage);
> DBUG_RETURN(error);
> }
> @@ -9609,6 +9626,147 @@ int TC_LOG::using_heuristic_recover()
> /****** transaction coordinator log for 2pc - binlog() based solution ******/
> #define TC_LOG_BINLOG MYSQL_BIN_LOG
>
> +/**
> + Truncates the current binlog to specified position. Removes the rest of binlogs
> + which are present after this binlog file.
> +
> + @param truncate_file Holds the binlog name to be truncated
> + @param truncate_pos Position within binlog from where it needs to
> + truncated.
> +
> + @retval true ok
> + @retval false error
> +
> +*/
> +bool MYSQL_BIN_LOG::truncate_and_remove_binlogs(const char *file_name,
> + my_off_t pos,
> + rpl_gtid *ptr_gtid)
> +{
> + int error= 0;
> +#ifdef HAVE_REPLICATION
> + LOG_INFO log_info;
> + THD *thd= current_thd;
> + my_off_t index_file_offset= 0;
> + File file= -1;
> + MY_STAT s;
> +
> + if ((error= find_log_pos(&log_info, file_name, 1)))
> + {
> + sql_print_error("Failed to locate binary log file:%s."
> + "Error:%d", file_name, error);
> + goto end;
> + }
> +
> + while (!(error= find_next_log(&log_info, 1)))
> + {
> + if (!index_file_offset)
> + {
> + index_file_offset= log_info.index_file_start_offset;
> + if ((error= open_purge_index_file(TRUE)))
> + {
> + sql_print_error("Failed to open purge index "
> + "file:%s. Error:%d", purge_index_file_name, error);
> + goto end;
> + }
> + }
> + if ((error= register_purge_index_entry(log_info.log_file_name)))
> + {
> + sql_print_error("Failed to copy %s to purge index"
> + " file. Error:%d", log_info.log_file_name, error);
> + goto end;
> + }
> + }
> +
> + if (error != LOG_INFO_EOF)
> + {
> + sql_print_error("Failed to find the next binlog to "
> + "add to purge index register. Error:%d", error);
> + goto end;
> + }
> +
> + if (is_inited_purge_index_file())
> + {
> + if (!index_file_offset)
> + index_file_offset= log_info.index_file_start_offset;
> +
> + if ((error= sync_purge_index_file()))
> + {
> + sql_print_error("Failed to flush purge index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + // Trim index file
> + if ((error=
> + mysql_file_chsize(index_file.file, index_file_offset, '\n',
> + MYF(MY_WME))) ||
> + (error=
> + mysql_file_sync(index_file.file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim binlog index "
> + "file:%s to offset:%llu. Error:%d", index_file_name,
> + index_file_offset, error);
> + goto end;
> + }
> +
> + /* Reset data in old index cache */
> + if ((error= reinit_io_cache(&index_file, READ_CACHE, (my_off_t) 0, 0, 1)))
> + {
> + sql_print_error("Failed to reinit binlog index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + /* Read each entry from purge_index_file and delete the file. */
> + if ((error= purge_index_entry(thd, NULL, TRUE)))
> + {
> + sql_print_error("Failed to process registered "
> + "files that would be purged.");
> + goto end;
> + }
> + }
> +
> + DBUG_ASSERT(pos);
> +
> + if ((file= mysql_file_open(key_file_binlog, file_name,
> + O_RDWR | O_BINARY, MYF(MY_WME))) < 0)
> + {
> + error= 1;
> + sql_print_error("Failed to open binlog file:%s for "
> + "truncation.", file_name);
> + goto end;
> + }
> + my_stat(file_name, &s, MYF(0));
> +
> + /* Change binlog file size to truncate_pos */
> + if ((error=
> + mysql_file_chsize(file, pos, 0, MYF(MY_WME))) ||
> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim the "
> + "binlog file:%s to size:%llu. Error:%d",
> + file_name, pos, error);
> + goto end;
> + }
> + else
> + {
> + char buf[21];
> +
> + longlong10_to_str(ptr_gtid->seq_no, buf, 10);
> + sql_print_information("Successfully truncated binlog file:%s "
> + "to pos:%llu to remove transactions starting from "
> + "GTID %u-%u-%s", file_name, pos,
> + ptr_gtid->domain_id, ptr_gtid->server_id, buf);
> + }
> +
> +end:
> + if (file >= 0)
> + mysql_file_close(file, MYF(MY_WME));
Why you don't clean inuse flag here? You used to do it in the previous version of the patch.
> +
> + error= error || close_purge_index_file();
> +#endif
> + return error > 0;
> +}
> int TC_LOG_BINLOG::open(const char *opt_name)
> {
> int error= 1;
> @@ -10215,34 +10914,50 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> end_io_cache(&log);
> mysql_file_close(file, MYF(MY_WME));
> file= -1;
> + /*
> + NOTE: reading other binlog's FD is necessary for finding out
> + the checksum status of the respective binlog file.
> + */
okay, but where do you read other binlog's FD?
In the previous patch I reviewed you had
case FORMAT_DESCRIPTION_EVENT: read FD and replace fdle.
now you don't have it anymore.
> + if (find_next_log(linfo, 1))
> + {
> + sql_print_error("Error reading binlog files during recovery. "
> + "Aborting.");
> + goto err2;
> + }
> }
>
> +#ifdef HAVE_REPLICATION
> + int rc= ctx.next_binlog_or_round(round, last_log_name,
> + binlog_checkpoint_name, linfo, this);
> + if (rc == -1)
> + goto err2;
> + else if (rc == 1)
> + break; // all rounds done
> +#else
> if (!strcmp(linfo->log_file_name, last_log_name))
> break; // No more files to do
> + round++;
> +#endif
> +
> if ((file= open_binlog(&log, linfo->log_file_name, &errmsg)) < 0)
> {
> sql_print_error("%s", errmsg);
> goto err2;
> }
> - /*
> - We do not need to read the Format_description_log_event of other binlog
> - files. It is not possible for a binlog checkpoint to span multiple
> - binlog files written by different versions of the server. So we can use
> - the first one read for reading from all binlog files.
> - */
> - if (find_next_log(linfo, 1))
> - {
> - sql_print_error("Error reading binlog files during recovery. Aborting.");
> - goto err2;
> - }
> fdle->reset_crypto();
> - }
> + } // end of for
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> - goto err2;
> -
> + if (binlog_checkpoint_found)
> + {
> +#ifndef HAVE_REPLICATION
> + if (ha_recover_complete(&xids))
so, ha_recover_complete() is for no-semisync no-replication case?
basically it should be the old behavior, exactly as before?
why do you need ha_recover_complete() then if it didn't exist before?
> +#else
> + if (ctx.complete(this, xids))
> +#endif
> + goto err2;
> + }
> free_root(&mem_root, MYF(0));
> my_hash_free(&xids);
> }
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Re: [Maria-developers] 57c19902326: MDEV-20017 Implement TO_CHAR() Oracle compatible function
by Sergei Golubchik 22 Apr '21
by Sergei Golubchik 22 Apr '21
22 Apr '21
Hi, Monty!
On Apr 13, Michael Widenius wrote:
> commit 57c19902326
> Author: Michael Widenius <michael.widenius(a)gmail.com>
> Date: Sun Jan 24 23:56:43 2021 +0200
>
> diff --git a/sql/share/errmsg-utf8.txt b/sql/share/errmsg-utf8.txt
> index aecb00563f7..b23522ac830 100644
> --- a/sql/share/errmsg-utf8.txt
> +++ b/sql/share/errmsg-utf8.txt
> @@ -7975,3 +7975,5 @@ ER_PK_INDEX_CANT_BE_IGNORED
> eng "A primary key cannot be marked as IGNORE"
> ER_FUNCTION_CANNOT_BE_USED_IN_CLAUSE
> eng "Function '%s' cannot be used in the %s clause"
> +ER_ORACLE_COMPAT_FUNCTION_ERROR
> + eng "Oracle compatibility function error: %s"
Why? We normally just say "invalid argument" or something, in no other case
we say "oracle compatibility function" or "sybase compatibility function" or
"odbc compatibility function".
> diff --git a/sql/sql_string.cc b/sql/sql_string.cc
> index 95a57017c53..9c57bb22085 100644
> --- a/sql/sql_string.cc
> +++ b/sql/sql_string.cc
> @@ -1275,3 +1275,15 @@ void Binary_string::shrink(size_t arg_length)
> }
> }
> }
> +
> +bool Binary_string::strfill(char fill, size_t len)
> +{
> + if (len)
> + {
> + if (alloc(length() + len))
> + return 1;
> + memset(Ptr + str_length, fill, len);
> + str_length+= (uint32) len;
> + }
> + return 0;
> +}
There's Binary_string::fill() already.
better use it or, at the very least, declare
bool strfill(char fill, size_t len) { return fill(str_length + len, fill); }
in sql_string.h. And this swapped order of arguments is confusing,
please fix it too.
In fact, I think it's confusing to have both:
fill(max_len, fill_char)
strfill(fill_char, fill_length)
if you want to keep both, it'd be better to rename them to something
that somehow reflects the difference, for example:
strfill -> append_many or append_repeated
but really, I personally would just delete strfill.
> diff --git a/sql/item_timefunc.h b/sql/item_timefunc.h
> index af266956b05..9b78d6c159e 100644
> --- a/sql/item_timefunc.h
> +++ b/sql/item_timefunc.h
> @@ -978,6 +978,57 @@ class Item_func_time_format: public Item_func_date_format
> };
>
>
> +/* the max length of datetime format models string in Oracle is 144 */
> +#define MAX_DATETIME_FORMAT_MODEL_LEN 144
> +
> +class Item_func_tochar :public Item_str_func
> +{
> + const MY_LOCALE *locale;
> + THD *thd;
> + String warning_message;
> + bool fixed_length;
> +
> + /*
> + When datetime format models is parsed, use uint16 integers to
> + represent the format models and store in fmt_array.
> + */
> + uint16 fmt_array[MAX_DATETIME_FORMAT_MODEL_LEN+1];
> +
> + bool check_arguments() const override
> + {
> + return check_argument_types_can_return_text(1, arg_count);
> + }
> +
> +public:
> + Item_func_tochar(THD *thd, Item *a, Item *b):
> + Item_str_func(thd, a, b), locale(0)
> + {
> + /* NOTE: max length of warning message is 64 */
> + warning_message.alloc(64);
> + warning_message.length(0);
> + }
As far as I understand, this warning_message was introduced to issue
the same error for every row, even if the format string is const_item and
is parsed only once, in fix_fields.
I don't think it's worth the trouble. Two simpler approaches are:
* if the format string is invalid - parse it every time even if const, or
* if the const format string is invalid - issue the error only once.
so, please, remove warning_message and just use push_warning or my_error
where the error is discovered.
> + ~Item_func_tochar() { warning_message.free(); }
> + String *val_str(String *str) override;
> + LEX_CSTRING func_name_cstring() const override
> + {
> + static LEX_CSTRING name= {STRING_WITH_LEN("to_char") };
> + return name;
> + }
> + bool fix_length_and_dec() override;
> + bool parse_format_string(const String *format, uint *fmt_len);
> +
> + bool check_vcol_func_processor(void *arg) override
> + {
> + if (arg_count > 2)
> + return false;
> + return mark_unsupported_function(func_name(), "()", arg, VCOL_SESSION_FUNC);
> + }
> +
> + Item *get_copy(THD *thd) override
> + { return get_item_copy<Item_func_tochar>(thd, this); }
> +};
> +
> +
> class Item_func_from_unixtime :public Item_datetimefunc
> {
> bool check_arguments() const override
> diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc
> index 04d913b0fca..44d2ec7912d 100644
> --- a/sql/item_timefunc.cc
> +++ b/sql/item_timefunc.cc
> @@ -1914,6 +1913,805 @@ String *Item_func_date_format::val_str(String *str)
> return 0;
> }
>
> +/*
> + Oracle has many formatting models, we list all but only part of them
> + are implemented, because some models depend on oracle functions
> + which mariadb is not supported.
> +
> + Models for datetime, used by TO_CHAR/TO_DATE. Normal format characters are
> + stored as short integer < 256, while format characters are stored as a
> + integer > 256
> +*/
> +
> +#define FMT_BASE 128
128? or 256?
> +#define FMT_AD FMT_BASE+1
> +#define FMT_AD_DOT FMT_BASE+2
> +#define FMT_AM FMT_BASE+3
> +#define FMT_AM_DOT FMT_BASE+4
> +#define FMT_BC FMT_BASE+5
> +#define FMT_BC_DOT FMT_BASE+6
> +#define FMT_CC FMT_BASE+7
> +#define FMT_SCC FMT_BASE+8
> +#define FMT_D FMT_BASE+9
> +#define FMT_DAY FMT_BASE+10
> +#define FMT_DD FMT_BASE+11
> +#define FMT_DDD FMT_BASE+12
> +#define FMT_DL FMT_BASE+13
> +#define FMT_DS FMT_BASE+14
> +#define FMT_DY FMT_BASE+15
> +#define FMT_E FMT_BASE+16
> +#define FMT_EE FMT_BASE+17
> +#define FMT_FF FMT_BASE+18
> +#define FMT_FM FMT_BASE+19
> +#define FMT_FX FMT_BASE+20
> +#define FMT_HH FMT_BASE+21
> +#define FMT_HH12 FMT_BASE+22
> +#define FMT_HH24 FMT_BASE+23
> +#define FMT_IW FMT_BASE+24
> +#define FMT_I FMT_BASE+25
> +#define FMT_IY FMT_BASE+26
> +#define FMT_IYY FMT_BASE+27
> +#define FMT_IYYY FMT_BASE+28
> +#define FMT_J FMT_BASE+29
> +#define FMT_MI FMT_BASE+30
> +#define FMT_MM FMT_BASE+31
> +#define FMT_MON FMT_BASE+32
> +#define FMT_MONTH FMT_BASE+33
> +#define FMT_PM FMT_BASE+34
> +#define FMT_PM_DOT FMT_BASE+35
> +#define FMT_RM FMT_BASE+37
> +#define FMT_RR FMT_BASE+38
> +#define FMT_RRRR FMT_BASE+39
> +#define FMT_SS FMT_BASE+40
> +#define FMT_SSSSSS FMT_BASE+41
> +#define FMT_TS FMT_BASE+42
> +#define FMT_TZD FMT_BASE+43
> +#define FMT_TZH FMT_BASE+44
> +#define FMT_TZM FMT_BASE+45
> +#define FMT_TZR FMT_BASE+46
> +#define FMT_W FMT_BASE+47
> +#define FMT_WW FMT_BASE+48
> +#define FMT_X FMT_BASE+49
> +#define FMT_Y FMT_BASE+50
> +#define FMT_YY FMT_BASE+51
> +#define FMT_YYY FMT_BASE+52
> +#define FMT_YYYY FMT_BASE+53
> +#define FMT_YYYY_COMMA FMT_BASE+54
> +#define FMT_YEAR FMT_BASE+55
> +#define FMT_SYYYY FMT_BASE+56
> +#define FMT_SYEAR FMT_BASE+57
Not enum? Not even safe (with parentheses) #define?
enum would be ideal here but at the very least make these defines safe.
> +
> +
> +/**
> + Modify the quotation flag and check whether the subsequent process is skipped
Could you reword it please?
> +
> + @param cftm Character or FMT... format descriptor
> + @param quotation_flag Points to 'true' if we are inside a quoted string
> +
> + @return true If we are inside a quoted string or if we found a '"' character
> + @return false Otherwise
> +*/
> +
> +static inline bool check_quotation(uint16 cfmt, bool *quotation_flag)
> +{
> + if (cfmt == '"')
> + {
> + *quotation_flag= !*quotation_flag;
> + return true;
> + }
> + return *quotation_flag;
> +}
> +
> +#define INVALID_CHARACTER(x) (((x) >= 'A' && (x) <= 'Z') ||((x) >= '0' && (x) <= '9') || (x) >= 127 || ((x) < 32))
why not to make this static inline too?
side-effect safe, no risk of double evaluation of x.
> +
> +
> +/**
> + Special characters are directly output in the result
> +
> + @return 0 If found not acceptable character
> + @return # Number of copied characters
> +*/
> +
> +static uint parse_special(char cfmt, const char *ptr, const char *end,
> + uint16 *array)
> +{
> + int offset= 0;
> + char tmp1;
> +
> + /* Non-printable character and Multibyte encoded characters */
> + if (INVALID_CHARACTER(cfmt))
> + return 0;
> +
> + /*
> + * '&' with text is used for variable input, but '&' with other
> + * special charaters like '|'. '*' is used as separator
> + */
> + if (cfmt == '&' && ptr + 1 < end)
> + {
> + tmp1= my_toupper(system_charset_info, *(ptr+1));
> + if (tmp1 >= 'A' && tmp1 <= 'Z')
> + return 0;
> + }
> +
> + do {
> + /*
> + Continuously store the special characters in fmt_array until non-special
> + characters appear
> + */
> + *array++= (uint16) (uchar) *ptr++;
> + offset++;
> + if (ptr == end)
> + break;
> + tmp1= my_toupper(system_charset_info, *ptr);
> + } while (!INVALID_CHARACTER(tmp1) && tmp1 != '"');
> + return offset;
> +}
> +
> +
> +/**
> + Parse the format string, convert it to an compact array and calculate the
> + length of output string
> +
> + @param format Format string
> + @param fmt_len Function will store max length of formated date string here
> +
> + @return 0 ok. fmt_len is updated
> + @return 1 error. In this case 'warning_string' is set to error message
> +*/
> +
> +bool Item_func_tochar::parse_format_string(const String *format, uint *fmt_len)
> +{
> + const char *ptr, *end;
> + uint16 *tmp_fmt= fmt_array;
> + uint tmp_len= 0;
> + int offset= 0;
> + bool quotation_flag= false;
> +
> + ptr= format->ptr();
> + end= ptr + format->length();
> +
> + if (format->length() > MAX_DATETIME_FORMAT_MODEL_LEN)
> + {
> + warning_message.append(STRING_WITH_LEN("datetime format string is too "
> + "long"));
> + return 1;
> + }
> +
> + for (; ptr < end; ptr++, tmp_fmt++)
> + {
> + uint ulen;
> + char cfmt, next_char;
> +
> + cfmt= my_toupper(system_charset_info, *ptr);
> +
> + /*
> + Oracle datetime format support text in double quotation marks like
> + 'YYYY"abc"MM"xyz"DD', When this happens, store the text and quotation
> + marks, and use the text as a separator in make_date_time_oracle.
> +
> + NOTE: the quotation mark is not print in return value. for example:
> + select TO_CHAR(sysdate, 'YYYY"abc"MM"xyzDD"') will return 2021abc01xyz11
> + */
> + if (check_quotation(cfmt, "ation_flag))
> + {
> + *tmp_fmt= *ptr;
> + tmp_len+= 1;
> + continue;
> + }
> +
> + switch (cfmt) {
> + case 'A': // AD/A.D./AM/A.M.
> + if (ptr+1 >= end)
> + goto error;
> + next_char= my_toupper(system_charset_info, *(ptr+1));
> + if (next_char == 'D')
> + {
> + *tmp_fmt= FMT_AD;
> + ptr+= 1;
> + tmp_len+= 2;
> + }
> + else if (next_char == 'M')
> + {
> + *tmp_fmt= FMT_AM;
> + ptr+= 1;
> + tmp_len+= 2;
> + }
> + else if (next_char == '.' && ptr+3 < end && *(ptr+3) == '.')
> + {
> + if (my_toupper(system_charset_info, *(ptr+2)) == 'D')
> + {
> + *tmp_fmt= FMT_AD_DOT;
> + ptr+= 3;
> + tmp_len+= 4;
> + }
> + else if (my_toupper(system_charset_info, *(ptr+2)) == 'M')
> + {
> + *tmp_fmt= FMT_AM_DOT;
> + ptr+= 3;
> + tmp_len+= 4;
> + }
> + else
> + goto error;
> + }
> + else
> + goto error;
> + break;
> + case 'B': // BC and B.C
> + if (ptr+1 >= end)
> + goto error;
> + next_char= my_toupper(system_charset_info, *(ptr+1));
> + if (next_char == 'C')
> + {
> + *tmp_fmt= FMT_BC;
> + ptr+= 1;
> + tmp_len+= 2;
> + }
> + else if (next_char == '.' && ptr+3 < end &&
> + my_toupper(system_charset_info, *(ptr+2)) == 'C' &&
> + *(ptr+3) == '.')
> + {
> + *tmp_fmt= FMT_BC_DOT;
> + ptr+= 3;
> + tmp_len+= 4;
> + }
> + else
> + goto error;
> + break;
> + case 'P': // PM or P.M.
> + next_char= my_toupper(system_charset_info, *(ptr+1));
> + if (next_char == 'M')
> + {
> + *tmp_fmt= FMT_PM;
> + ptr+= 1;
> + tmp_len+= 2;
> + }
> + else if (next_char == '.' &&
> + my_toupper(system_charset_info, *(ptr+2)) == 'M' &&
> + my_toupper(system_charset_info, *(ptr+3)) == '.')
> + {
> + *tmp_fmt= FMT_PM_DOT;
> + ptr+= 3;
> + tmp_len+= 4;
> + }
> + else
> + goto error;
> + break;
> + case 'Y': // Y, YY, YYY o YYYYY
> + if (ptr + 1 == end || my_toupper(system_charset_info, *(ptr+1)) != 'Y')
> + {
> + *tmp_fmt= FMT_Y;
> + tmp_len+= 1;
> + break;
> + }
> + if (ptr + 2 == end ||
> + my_toupper(system_charset_info, *(ptr+2)) != 'Y') /* YY */
> + {
> + *tmp_fmt= FMT_YY;
> + ulen= 2;
> + }
> + else
> + {
> + if (ptr + 3 < end && my_toupper(system_charset_info, *(ptr+3)) == 'Y')
> + {
> + *tmp_fmt= FMT_YYYY;
> + ulen= 4;
> + }
> + else
> + {
> + *tmp_fmt= FMT_YYY;
> + ulen= 3;
> + }
> + }
> + ptr+= ulen-1;
> + tmp_len+= ulen;
> + break;
> +
> + case 'R': // RR or RRRR
> + if (ptr + 1 == end || my_toupper(system_charset_info, *(ptr+1)) != 'R')
> + goto error;
> +
> + if (ptr + 2 == end || my_toupper(system_charset_info, *(ptr+2)) != 'R')
> + {
> + *tmp_fmt= FMT_RR;
> + ulen= 2;
> + }
> + else
> + {
> + if (ptr + 3 >= end || my_toupper(system_charset_info, *(ptr+3)) != 'R')
> + goto error;
> + *tmp_fmt= FMT_RRRR;
> + ulen= 4;
> + }
> + ptr+= ulen-1;
> + tmp_len+= ulen;
> + break;
> + case 'M':
> + {
> + char tmp1;
> + if (ptr + 1 >= end)
> + goto error;
> +
> + tmp1= my_toupper(system_charset_info, *(ptr+1));
> + if (tmp1 == 'M')
> + {
> + *tmp_fmt= FMT_MM;
> + tmp_len+= 2;
> + ptr+= 1;
> + }
> + else if (tmp1 == 'I')
> + {
> + *tmp_fmt= FMT_MI;
> + tmp_len+= 2;
> + ptr+= 1;
> + }
> + else if (tmp1 == 'O')
> + {
> + if (ptr + 2 >= end)
> + goto error;
> + char tmp2= my_toupper(system_charset_info, *(ptr+2));
> + if (tmp2 != 'N')
> + goto error;
> +
> + if (ptr + 4 >= end ||
> + my_toupper(system_charset_info, *(ptr+3)) != 'T' ||
> + my_toupper(system_charset_info, *(ptr+4)) != 'H')
> + {
> + *tmp_fmt= FMT_MON;
> + tmp_len+= 3;
> + ptr+= 2;
> + }
> + else
> + {
> + *tmp_fmt= FMT_MONTH;
> + tmp_len+= (locale->max_month_name_length *
> + my_charset_utf8mb3_bin.mbmaxlen);
> + ptr+= 4;
> + }
> + }
> + else
> + goto error;
> + }
> + break;
> + case 'D': // DD, DY, or DAY
> + {
> + if (ptr + 1 >= end)
> + goto error;
> + char tmp1= my_toupper(system_charset_info, *(ptr+1));
> +
> + if (tmp1 == 'D')
> + {
> + *tmp_fmt= FMT_DD;
> + tmp_len+= 2;
> + }
> + else if (tmp1 == 'Y')
> + {
> + *tmp_fmt= FMT_DY;
> + tmp_len+= 3;
> + }
> + else if (tmp1 == 'A') // DAY
> + {
> + if (ptr + 2 == end || my_toupper(system_charset_info, *(ptr+2)) != 'Y')
> + goto error;
> + *tmp_fmt= FMT_DAY;
> + tmp_len+= locale->max_day_name_length * my_charset_utf8mb3_bin.mbmaxlen;
> + ptr+= 1;
> + }
> + else
> + goto error;
> + ptr+= 1;
> + }
> + break;
> + case 'H': // HH, HH12 or HH23
> + {
> + char tmp1, tmp2, tmp3;
> + if (ptr + 1 >= end)
> + goto error;
> + tmp1= my_toupper(system_charset_info, *(ptr+1));
> +
> + if (tmp1 != 'H')
> + goto error;
> +
> + if (ptr+3 >= end)
> + {
> + *tmp_fmt= FMT_HH;
> + ptr+= 1;
> + }
> + else
> + {
> + tmp2= *(ptr+2);
> + tmp3= *(ptr+3);
> +
> + if (tmp2 == '1' && tmp3 == '2')
> + {
> + *tmp_fmt= FMT_HH12;
> + ptr+= 3;
> + }
> + else if (tmp2 == '2' && tmp3 == '4')
> + {
> + *tmp_fmt= FMT_HH24;
> + ptr+= 3;
> + }
> + else
> + {
> + *tmp_fmt= FMT_HH;
> + ptr+= 1;
> + }
> + }
> + tmp_len+= 2;
> + break;
> + }
> + case 'S': // SS
> + if (ptr + 1 == end || my_toupper(system_charset_info, *(ptr+1)) != 'S')
> + goto error;
> +
> + *tmp_fmt= FMT_SS;
> + tmp_len+= 2;
> + ptr+= 1;
> + break;
> + case '|':
> + /*
> + If only one '|' just ignore it, else append others, for example:
> + TO_CHAR('2000-11-05', 'YYYY|MM||||DD') --> 200011|||05
> + */
> + if (ptr + 1 == end || *(ptr+1) != '|')
> + {
> + tmp_fmt--;
> + break;
> + }
> + ptr++; // Skip first '|'
> + do
> + {
> + *tmp_fmt++= *ptr++;
> + tmp_len++;
> + } while ((ptr < end) && *ptr == '|');
> + ptr--; // Fix ptr for above for loop
> + tmp_fmt--;
> + break;
> +
> + default:
> + offset= parse_special(cfmt, ptr, end, tmp_fmt);
> + if (!offset)
> + goto error;
> + /* ptr++ is in the for loop, so we must move ptr to offset-1 */
> + ptr+= (offset-1);
> + tmp_fmt+= (offset-1);
> + tmp_len+= offset;
> + break;
> + }
> + }
> + *fmt_len= tmp_len;
> + *tmp_fmt= 0;
> + return 0;
> +
> +error:
> + warning_message.append(STRING_WITH_LEN("date format not recognized at "));
> + warning_message.append(ptr, MY_MIN(8, end- ptr));
> + return 1;
> +}
> +
> +
> +static inline bool append_val(int val, int size, String *str)
> +{
> + ulong len= 0;
> + char intbuff[15];
> +
> + len= (ulong) (int10_to_str(val, intbuff, 10) - intbuff);
> + return str->append_with_prefill(intbuff, len, size, '0');
> +}
> +
> +
> +static bool make_date_time_oracle(const uint16 *fmt_array,
> + const MYSQL_TIME *l_time,
> + const MY_LOCALE *locale,
> + String *str)
> +{
> + bool quotation_flag= false;
> + const uint16 *ptr= fmt_array;
> + uint hours_i;
> + uint weekday;
> +
> + str->length(0);
> +
> + while (*ptr)
> + {
> + if (check_quotation(*ptr, "ation_flag))
> + {
> + /* don't display '"' in the result, so if it is '"', skip it */
> + if (*ptr != '"')
> + {
> + DBUG_ASSERT(*ptr <= 255);
> + str->append((char) *ptr);
> + }
> + ptr++;
> + continue;
> + }
> +
> + switch (*ptr) {
> +
> + case FMT_AM:
> + case FMT_PM:
> + if (l_time->hour > 11)
> + str->append("PM", 2);
> + else
> + str->append("AM", 2);
> + break;
> +
> + case FMT_AM_DOT:
> + case FMT_PM_DOT:
> + if (l_time->hour > 11)
> + str->append(STRING_WITH_LEN("P.M."));
> + else
> + str->append(STRING_WITH_LEN("A.M."));
> + break;
> +
> + case FMT_AD:
> + case FMT_BC:
> + if (l_time->year > 0)
> + str->append(STRING_WITH_LEN("AD"));
> + else
> + str->append(STRING_WITH_LEN("BC"));
> + break;
> +
> + case FMT_AD_DOT:
> + case FMT_BC_DOT:
> + if (l_time->year > 0)
> + str->append(STRING_WITH_LEN("A.D."));
> + else
> + str->append(STRING_WITH_LEN("B.C."));
> + break;
> +
> + case FMT_Y:
> + if (append_val(l_time->year%10, 1, str))
> + goto err_exit;
> + break;
> +
> + case FMT_YY:
> + case FMT_RR:
> + if (append_val(l_time->year%100, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_YYY:
> + if (append_val(l_time->year%1000, 3, str))
> + goto err_exit;
> + break;
> +
> + case FMT_YYYY:
> + case FMT_RRRR:
> + if (append_val(l_time->year, 4, str))
> + goto err_exit;
> + break;
> +
> + case FMT_MM:
> + if (append_val(l_time->month, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_MON:
> + {
> + if (l_time->month == 0)
> + {
> + str->append("00", 2);
> + }
> + else
> + {
> + const char *month_name= (locale->ab_month_names->
> + type_names[l_time->month-1]);
> + size_t m_len= strlen(month_name);
> + str->append(month_name, m_len, system_charset_info);
> + }
> + }
> + break;
> +
> + case FMT_MONTH:
> + {
> + if (l_time->month == 0)
> + {
> + str->append("00", 2);
> + }
> + else
> + {
> + const char *month_name= (locale->month_names->
> + type_names[l_time->month-1]);
> + size_t month_byte_len= strlen(month_name);
> + size_t month_char_len;
> + str->append(month_name, month_byte_len, system_charset_info);
> + month_char_len= my_numchars_mb(&my_charset_utf8mb3_general_ci,
> + month_name, month_name +
> + month_byte_len);
> + if (str->strfill(' ', locale->max_month_name_length - month_char_len))
> + goto err_exit;
> + }
> + }
> + break;
> +
> + case FMT_DD:
> + if (append_val(l_time->day, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_DY:
> + {
> + if (l_time->day == 0)
> + str->append("00", 2);
> + else
> + {
> + weekday= calc_weekday(calc_daynr(l_time->year,l_time->month,
> + l_time->day), 0);
> + const char *day_name= locale->ab_day_names->type_names[weekday];
> + str->append(day_name, strlen(day_name), system_charset_info);
> + }
> + }
> + break;
> +
> + case FMT_DAY:
> + {
> + if (l_time->day == 0)
> + str->append("00", 2, system_charset_info);
> + else
> + {
> + const char *day_name;
> + size_t day_byte_len, day_char_len;
> + weekday=calc_weekday(calc_daynr(l_time->year,l_time->month,
> + l_time->day), 0);
> + day_name= locale->day_names->type_names[weekday];
> + day_byte_len= strlen(day_name);
> + str->append(day_name, day_byte_len, system_charset_info);
> + day_char_len= my_numchars_mb(&my_charset_utf8mb3_general_ci,
> + day_name, day_name + day_byte_len);
> + if (str->strfill(' ', locale->max_day_name_length - day_char_len))
> + goto err_exit;
> + }
> + }
> + break;
> +
> + case FMT_HH12:
> + case FMT_HH:
> + hours_i= (l_time->hour%24 + 11)%12+1;
> + if (append_val(hours_i, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_HH24:
> + if (append_val(l_time->hour, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_MI:
> + if (append_val(l_time->minute, 2, str))
> + goto err_exit;
> + break;
> +
> + case FMT_SS:
> + if (append_val(l_time->second, 2, str))
> + goto err_exit;
> + break;
> +
> + default:
> + str->append((char) *ptr);
> + }
> +
> + ptr++;
> + };
> + return false;
> +
> +err_exit:
> + return true;
> +}
> +
> +
> +bool Item_func_tochar::fix_length_and_dec()
> +{
> + thd= current_thd;
> + CHARSET_INFO *cs= thd->variables.collation_connection;
> + Item *arg1= args[1]->this_item();
> + my_repertoire_t repertoire= arg1->collation.repertoire;
> + StringBuffer<STRING_BUFFER_USUAL_SIZE> buffer;
> + String *str;
> +
> + locale= thd->variables.lc_time_names;
> + if (!thd->variables.lc_time_names->is_ascii)
> + repertoire|= MY_REPERTOIRE_EXTENDED;
> + collation.set(cs, arg1->collation.derivation, repertoire);
> +
> + /* first argument must be datetime or string */
> + enum_field_types arg0_mysql_type= args[0]->field_type();
> +
> + max_length= 0;
> + switch (arg0_mysql_type) {
> + case MYSQL_TYPE_TIME:
> + case MYSQL_TYPE_DATE:
> + case MYSQL_TYPE_DATETIME:
> + case MYSQL_TYPE_TIMESTAMP:
> + case MYSQL_TYPE_VARCHAR:
> + case MYSQL_TYPE_STRING:
> + break;
> + default:
> + {
> + my_printf_error(ER_ORACLE_COMPAT_FUNCTION_ERROR,
> + ER(ER_ORACLE_COMPAT_FUNCTION_ERROR),
> + MYF(0),
> + "data type of first argument must be type "
> + "date/datetime/time or string");
that's not how MariaDB works, it converts types.
In particular, using an integer 20200101 in the date context is
perfectly ok.
> + return TRUE;
> + }
> + }
> + if (args[1]->basic_const_item() && (str= args[1]->val_str(&buffer)))
> + {
> + uint ulen;
> + fixed_length= 1;
> + if (parse_format_string(str, &ulen))
> + {
> + my_printf_error(ER_ORACLE_COMPAT_FUNCTION_ERROR,
> + ER(ER_ORACLE_COMPAT_FUNCTION_ERROR),
> + MYF(0),
> + warning_message.c_ptr());
> + return TRUE;
> + }
> + max_length= (uint32) (ulen * collation.collation->mbmaxlen);
> + }
> + else
> + {
> + fixed_length= 0;
> + max_length= (uint32) MY_MIN(arg1->max_length * 10 *
> + collation.collation->mbmaxlen,
> + MAX_BLOB_WIDTH);
> + }
> + set_maybe_null();
> + return FALSE;
> +}
> +
> +
> +String *Item_func_tochar::val_str(String* str)
> + {
> + StringBuffer<64> format_buffer;
> + String *format;
> + MYSQL_TIME l_time;
> + const MY_LOCALE *lc= locale;
> + date_conv_mode_t mode= TIME_CONV_NONE;
> + size_t max_result_length= max_length;
> +
> + if (warning_message.length())
> + goto null_date;
> +
> + if ((null_value= args[0]->get_date(thd, &l_time,
> + Temporal::Options(mode, thd))))
> + return 0;
> +
> + if (!fixed_length)
> + {
> + uint ulen;
> + if (!(format= args[1]->val_str(&format_buffer)) || !format->length() ||
> + parse_format_string(format, &ulen))
> + goto null_date;
> + max_result_length= ((size_t) ulen) * collation.collation->mbmaxlen;
> + }
> +
> + if (str->alloc(max_result_length))
> + goto null_date;
> +
> + /* Create the result string */
> + str->set_charset(collation.collation);
> + if (!make_date_time_oracle(fmt_array, &l_time, lc, str))
> + return str;
> +
> +null_date:
> +
> + if (warning_message.length())
> + {
> + push_warning_printf(thd,
> + Sql_condition::WARN_LEVEL_WARN,
> + ER_ORACLE_COMPAT_FUNCTION_ERROR,
> + ER_THD(thd, ER_ORACLE_COMPAT_FUNCTION_ERROR),
> + warning_message.c_ptr());
> + if (!fixed_length)
> + warning_message.length(0);
> + }
> +
> + null_value= 1;
> + return 0;
> +}
> +
>
> bool Item_func_from_unixtime::fix_length_and_dec()
> {
Regards,
Sergei
2
1
Re: [Maria-developers] a1440737662: MDEV-20021 sql_mode="oracle" does not support MINUS set operator
by Sergei Golubchik 20 Apr '21
by Sergei Golubchik 20 Apr '21
20 Apr '21
Hi, Monty!
On Apr 13, Michael Widenius wrote:
> revision-id: a1440737662 (mariadb-10.5.2-553-ga1440737662)
> parent(s): 04a13e6ab8f
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2021-03-24 19:51:22 +0200
> message:
>
> MDEV-20021 sql_mode="oracle" does not support MINUS set operator
>
> MINUS is mapped to EXCEPT
>
> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> index 57ba9df42c0..edd2f353dd0 100644
> --- a/sql/sql_yacc.yy
> +++ b/sql/sql_yacc.yy
> @@ -16037,6 +16038,7 @@ reserved_keyword_udt_not_param_type:
> | MINUTE_MICROSECOND_SYM
> | MINUTE_SECOND_SYM
> | MIN_SYM
> + | MINUS_ORACLE_SYM
> | MODIFIES_SYM
> | MOD_SYM
> | NATURAL
this is not good. MINUS should be in reserved_keyword_udt_not_param_type
only in oracle mode, and otherwise it should be in
keyword_sp_var_and_label (or not a keyword at all, but I don't think
it's possible).
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
1
Re: [Maria-developers] 04a13e6ab8f: MDEV-24285 support oracle build-in function: sys_guid
by Sergei Golubchik 20 Apr '21
by Sergei Golubchik 20 Apr '21
20 Apr '21
Hi, Monty!
On Apr 13, Michael Widenius wrote:
> revision-id: 04a13e6ab8f (mariadb-10.5.2-552-g04a13e6ab8f)
> parent(s): c76ac1e6de8
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2021-03-24 19:42:30 +0200
> message:
>
> MDEV-24285 support oracle build-in function: sys_guid
>
> SYS_GUID() returns same as UUID(), but without any '-'
I'd rather do it as a flag in Item_func_uuid.
Less code, no need to copy-paste everything.
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
1
Re: [Maria-developers] c4de76aeff8: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
by Sergei Golubchik 18 Apr '21
by Sergei Golubchik 18 Apr '21
18 Apr '21
Hi, Aleksey!
Note, it's a review of the `git diff 82e44d60d1e 8175ce8a5f1`
so it's not only commit c4de76aeff8
On Apr 04, Aleksey Midenkov wrote:
> revision-id: c4de76aeff8 (mariadb-10.5.2-540-gc4de76aeff8)
> parent(s): 82e44d60d1e
> author: Aleksey Midenkov <midenok(a)gmail.com>
> committer: Aleksey Midenkov <midenok(a)gmail.com>
> timestamp: 2021-03-31 21:17:55 +0300
> message:
>
> MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
Overall it's very good! I have few minor questions/comments, see below.
But it's almost done, I think.
> diff --git a/mysql-test/suite/versioning/r/delete_history.result b/mysql-test/suite/versioning/r/delete_history.result
> index cb865a835b3..90c9e4777bb 100644
> --- a/mysql-test/suite/versioning/r/delete_history.result
> +++ b/mysql-test/suite/versioning/r/delete_history.result
> @@ -154,3 +152,18 @@ select * from t1;
> a
> 1
> drop table t1;
> +#
> +# MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
> +#
> +# Don't auto-create new partition on DELETE HISTORY:
> +create or replace table t (a int) with system versioning
> +partition by system_time limit 1000 auto;
> +delete history from t;
> +show create table t;
> +Table Create Table
> +t CREATE TABLE `t` (
> + `a` int(11) DEFAULT NULL
> +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
> + PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO
> +PARTITIONS 2
Hmm, and if DELETE HISTORY would auto-create new partitions,
what output would one expect here?
I mean, how one can see whether the test result is correct or wrong?
> +drop table t;
> diff --git a/mysql-test/suite/versioning/t/partition.test b/mysql-test/suite/versioning/t/partition.test
> index 445f5844630..57b80ca0b47 100644
> --- a/mysql-test/suite/versioning/t/partition.test
> +++ b/mysql-test/suite/versioning/t/partition.test
> @@ -1068,11 +1078,412 @@ alter table t1 add partition partitions 2;
> --replace_result $default_engine DEFAULT_ENGINE
> show create table t1;
> alter table t1 add partition partitions 8;
> +drop table t1;
> +
> +--echo #
> +--echo # MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
> +--echo #
> +create or replace table t1 (x int) with system versioning
> +partition by system_time limit 1 auto;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +--echo # INSERT, INSERT .. SELECT don't increment partitions
it's not really "increment", better say "don't auto-create"
> +insert into t1 values (1);
> +
> +create or replace table t2 (y int);
> +insert into t2 values (2);
> +
> +insert into t1 select * from t2;
> +insert into t2 select * from t1;
why do you need a t2 table in this test?
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +--echo # Too many partitions error
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto;
> +set timestamp= unix_timestamp('2001-01-01 00:01:00');
> +--error ER_TOO_MANY_PARTITIONS_ERROR
> +update t1 set x= x + 1;
> +
> +--enable_info
> +--echo # Increment from 3 to 5
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 3600 second
> +starts '2000-01-01 00:00:00' auto partitions 3;
> +
> +insert into t1 values (1);
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 02:00:00');
> +update t1 set x= x + 1;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> +update t1 set x= x + 2;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +--echo # Increment from 3 to 6, manual names, LOCK TABLES
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto (
> + partition p1 history,
> + partition p3 history,
> + partition pn current);
> +
> +insert into t1 values (1);
> --replace_result $default_engine DEFAULT_ENGINE
> show create table t1;
>
> +set timestamp= unix_timestamp('2000-01-01 02:00:00');
> +update t1 set x= x + 3;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> +update t1 set x= x + 4;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 04:00:00');
> +lock tables t1 write;
> +update t1 set x= x + 5;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +unlock tables;
> +set timestamp= default;
> +
> +--echo # Couple of more LOCK TABLES cases
> +create or replace table t1 (x int) with system versioning
> +partition by system_time limit 1000 auto;
> +lock tables t1 write;
> +insert into t1 values (1);
> +update t1 set x= x + 1;
> +unlock tables;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
what does this test case demonstrate?
> +
> +--echo # Overflow prevention under LOCK TABLES
> +create or replace table t1 (x int)
> +with system versioning partition by system_time
> +limit 10 auto;
> +
> +insert into t1 values (1), (2), (3), (4), (5), (6), (7), (8), (9);
> +update t1 set x= x + 10;
> +
> +lock tables t1 write;
> +update t1 set x= 1 where x = 11;
> +update t1 set x= 2 where x = 12;
> +update t1 set x= 3 where x = 13;
> +unlock tables;
> +
> +select count(x) from t1 partition (p0);
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +drop tables t1;
> +
> +--echo # Test VIEW, LOCK TABLES
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto partitions 3;
> +create or replace view v1 as select * from t1;
> +
> +insert into v1 values (1);
why would a view matter in this test case?
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 01:00:00');
> +lock tables t1 write;
> +update t1 set x= x + 3;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +unlock tables;
> +
> +drop view v1;
> drop tables t1;
>
> +--echo # Multiple increments in single command
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto partitions 3;
> +
> +create or replace table t2 (y int) with system versioning
> +partition by system_time interval 1 hour auto;
> +
> +insert into t1 values (1);
> +insert into t2 values (2);
> +
> +set timestamp= unix_timestamp('2000-01-01 01:00:00');
> +update t1, t2 set x= x + 1, y= y + 1;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t2;
> +
> +set timestamp= unix_timestamp('2000-01-01 02:00:00');
> +update t1, t2 set x= x + 1, y= y + 1;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t2;
> +
> +--echo # Here t2 is incremented too, but not updated
> +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> +update t1, t2 set t1.x= 0 where t1.x< t2.y;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +# Multiupdate_prelocking_strategy::handle_end() is processed after table open.
> +# For PS it is possible to skip unneeded auto-creation because the above happens at
> +# prepare stage and auto-creation is done at execute stage.
> +--replace_result $default_engine DEFAULT_ENGINE 'PARTITIONS 4' 'PARTITIONS ok' 'PARTITIONS 5' 'PARTITIONS ok'
eh... I don't think this is really "ok".
As far as I remember, Multiupdate_prelocking_strategy knows what tables
should be opened for writing and what for reading. Why would a new partition
be created for t2?
> +show create table t2;
> +
> +drop tables t1, t2;
> +
> +--echo # PS, SP, LOCK TABLES
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto;
> +
> +insert into t1 values (1);
> +
> +set timestamp= unix_timestamp('2000-01-01 01:00:00');
> +execute immediate 'update t1 set x= x + 5';
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +prepare s from 'update t1 set x= x + 6';
> +set timestamp= unix_timestamp('2000-01-01 02:00:00');
> +execute s; execute s;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> +lock tables t1 write;
> +execute s; execute s;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +unlock tables;
> +drop prepare s;
> +
> +create procedure sp() update t1 set x= x + 7;
> +set timestamp= unix_timestamp('2000-01-01 04:00:00');
> +call sp; call sp;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +set timestamp= unix_timestamp('2000-01-01 05:00:00');
> +lock tables t1 write;
> +call sp; call sp;
> +--replace_result $default_engine DEFAULT_ENGINE
> +show create table t1;
> +unlock tables;
> +drop procedure sp;
> +
> +set timestamp= unix_timestamp('2001-01-01 00:00:00');
> +create or replace table t1 (i int) with system versioning
> +partition by system_time interval 1 day starts '2000-01-01 00:00:00';
> +insert into t1 values (0);
> +set timestamp= unix_timestamp('2001-01-01 00:00:01');
> +prepare s from 'update t1 set i= i + 1';
> +execute s;
> +set timestamp= unix_timestamp('2001-01-02 00:00:01');
> +execute s;
> +drop prepare s;
> +
> +if (!$MTR_COMBINATION_HEAP)
because of blobs?
> +{
> +--echo # Complex table
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (
> + x int primary key auto_increment,
...
> +--echo # Transaction
> +create or replace table t1 (x int) with system versioning engine innodb
> +partition by system_time interval 1 hour auto;
> +start transaction;
> +insert into t1 values (1);
> +select * from t1;
> +
> +--connect con1, localhost, root
> +set lock_wait_timeout= 1;
> +set innodb_lock_wait_timeout= 1;
> +--error ER_LOCK_WAIT_TIMEOUT
> +update t1 set x= x + 111;
> +select * from t1;
what do you test here?
(there is no show create table in the test)
> +
> +# cleanup
> +--disconnect con1
> +--connection default
> +drop table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning engine innodb
> +partition by system_time interval 1 hour auto;
> +
> +insert into t1 values (1);
> +set timestamp= unix_timestamp('2000-01-01 01:00:00');
> +start transaction;
> +update t1 set x= 0;
> +--connect con1, localhost, root
> +select * from t1;
if you add a show create table here, what would it show?
> +--connection default
> +commit;
> +show create table t1;
> +
> +set timestamp= unix_timestamp('2000-01-01 02:00:00');
> +start transaction;
> +update t1 set x= 1;
> +--connection con1
> +select * from t1;
> +--connection default
> +rollback;
> +show create table t1;
> +--disconnect con1
> +--connection default
> +drop table t1;
> +
> +--echo #
> +--echo # MENT-724 Locking timeout caused by auto-creation affects original DML
I'd better avoid MENT references here. Let's only mention
Jira issues that users can actually see
> +--echo #
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int primary key) with system versioning engine innodb
> +partition by system_time interval 1 hour auto;
> +
> +insert into t1 values (1);
> +start transaction;
> +insert into t1 values (2);
> +
> +--connect con1, localhost, root
> +set lock_wait_timeout= 1;
> +set innodb_lock_wait_timeout= 1;
> +set timestamp= unix_timestamp('2000-01-01 01:00:01');
> +update t1 set x= x + 122 where x = 1;
isn't it a test that you already have above? with x = x + 111
> +--disconnect con1
> +--connection default
> +select * from t1;
> +
> +# cleanup
> +drop table t1;
> +set timestamp= default;
> +
> --echo #
> --echo # End of 10.5 tests
> --echo #
> diff --git a/mysql-test/suite/versioning/r/partition.result b/mysql-test/suite/versioning/r/partition.result
> index 2a277b1c2ea..8369b40d98c 100644
> --- a/mysql-test/suite/versioning/r/partition.result
> +++ b/mysql-test/suite/versioning/r/partition.result
> @@ -1236,27 +1270,752 @@ t1 CREATE TABLE `t1` (
...
> +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> +affected rows: 0
> +lock tables t1 write;
> +affected rows: 0
> +execute s;
> +affected rows: 1
> +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0
> +execute s;
> +affected rows: 1
> +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0
> +show create table t1;
> +Table Create Table
> +t1 CREATE TABLE `t1` (
> + `x` int(11) DEFAULT NULL
> +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
> + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO
> +PARTITIONS 6
why did it add two partitions here?
> +affected rows: 1
> +unlock tables;
> +affected rows: 0
> +drop prepare s;
> +affected rows: 0
> +create procedure sp() update t1 set x= x + 7;
> +affected rows: 0
> +set timestamp= unix_timestamp('2000-01-01 04:00:00');
> +affected rows: 0
> +call sp;
> +affected rows: 1
> +call sp;
> +affected rows: 1
> +show create table t1;
> +Table Create Table
> +t1 CREATE TABLE `t1` (
> + `x` int(11) DEFAULT NULL
> +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
> + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO
> +PARTITIONS 6
> +affected rows: 1
> +set timestamp= unix_timestamp('2000-01-01 05:00:00');
> +affected rows: 0
> +lock tables t1 write;
> +affected rows: 0
> +call sp;
> +affected rows: 1
> +call sp;
> +affected rows: 1
> +show create table t1;
> +Table Create Table
> +t1 CREATE TABLE `t1` (
> + `x` int(11) DEFAULT NULL
> +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
> + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO
> +PARTITIONS 8
and two here
> +affected rows: 1
> +unlock tables;
> +affected rows: 0
> +drop procedure sp;
> +affected rows: 0
> +set timestamp= unix_timestamp('2001-01-01 00:00:00');
> +affected rows: 0
> +create or replace table t1 (i int) with system versioning
> +partition by system_time interval 1 day starts '2000-01-01 00:00:00';
> +affected rows: 0
> +insert into t1 values (0);
> +affected rows: 1
> +set timestamp= unix_timestamp('2001-01-01 00:00:01');
> +affected rows: 0
> +prepare s from 'update t1 set i= i + 1';
> +affected rows: 0
> +info: Statement prepared
> +execute s;
> +affected rows: 1
> +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 1
> +Warnings:
> +Warning 4114 Versioned table `test`.`t1`: last HISTORY partition (`p0`) is out of INTERVAL, need more HISTORY partitions
why?
> +set timestamp= unix_timestamp('2001-01-02 00:00:01');
> +affected rows: 0
> +execute s;
> +affected rows: 1
> +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 1
> +Warnings:
> +Warning 4114 Versioned table `test`.`t1`: last HISTORY partition (`p0`) is out of INTERVAL, need more HISTORY partitions
> +drop prepare s;
> +affected rows: 0
> +# Complex table
...
> diff --git a/mysql-test/suite/versioning/t/rpl.test b/mysql-test/suite/versioning/t/rpl.test
> index b5be68feece..54258a8bdf1 100644
> --- a/mysql-test/suite/versioning/t/rpl.test
> +++ b/mysql-test/suite/versioning/t/rpl.test
> @@ -133,4 +133,44 @@ sync_slave_with_master;
> connection master;
> drop table t1;
>
> +--echo #
> +--echo # MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
> +--echo #
> +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> +create or replace table t1 (x int) with system versioning
> +partition by system_time interval 1 hour auto;
> +insert t1 values ();
> +set timestamp= unix_timestamp('2000-01-01 01:00:00');
> +delete from t1;
> +--replace_result InnoDB ENGINE MyISAM ENGINE MEMORY ENGINE
> +show create table t1;
> +--sync_slave_with_master
> +--replace_result InnoDB ENGINE MyISAM ENGINE MEMORY ENGINE
> +show create table t1;
> +--connection master
> +drop table t1;
> +set timestamp= default;
> +
> +--echo #
> +--echo # MENT-685 DML events for auto-partitioned tables are written into binary log twice
same comment about MENT
> +--echo #
> +create table t1 (a int) with system versioning
> +partition by system_time limit 1 auto;
> +
> +insert into t1 values (1);
> +update t1 set a= a + 1;
> +update t1 set a= a + 1;
> +--replace_result InnoDB ENGINE MyISAM ENGINE MEMORY ENGINE
> +show create table t1;
> +select * from t1;
> +
> +--sync_slave_with_master
> +--replace_result InnoDB ENGINE MyISAM ENGINE MEMORY ENGINE
> +show create table t1;
> +
> +select * from t1;
> +--connection master
> +# cleanup
> +drop table t1;
may be show binlog events?
you know, to verify that DML events aren't written into binary log twice
> +
> --source include/rpl_end.inc
> diff --git a/sql/ha_partition.h b/sql/ha_partition.h
> index 60a2d7f6762..7ade4419c22 100644
> --- a/sql/ha_partition.h
> +++ b/sql/ha_partition.h
> @@ -1610,7 +1610,7 @@ class ha_partition :public handler
> for (; part_id < part_id_end; ++part_id)
> {
> handler *file= m_file[part_id];
> - DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id));
> + bitmap_set_bit(&(m_part_info->read_partitions), part_id);
Where was it set before your patch?
> file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK | HA_STATUS_OPEN);
> part_recs+= file->stats.records;
> }
> diff --git a/sql/handler.cc b/sql/handler.cc
> index b312635c8ee..6bb6c279193 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1572,7 +1572,7 @@ int ha_commit_trans(THD *thd, bool all)
> DBUG_ASSERT(thd->transaction->stmt.ha_list == NULL ||
> trans == &thd->transaction->stmt);
>
> - if (thd->in_sub_stmt)
> + if (thd->in_sub_stmt & ~SUB_STMT_AUTO_HIST)
> {
where is this ha_commit_trans(thd, false) called from? mysql_alter_table
that adds a new partition?
> DBUG_ASSERT(0);
> /*
> diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> index 871411cf6c4..cf8dd140553 100644
> --- a/sql/partition_info.cc
> +++ b/sql/partition_info.cc
> @@ -816,10 +816,17 @@ bool partition_info::has_unique_name(partition_element *element)
> vers_info->interval Limit by fixed time interval
> vers_info->hist_part (out) Working history partition
> */
> -void partition_info::vers_set_hist_part(THD *thd)
> +uint partition_info::vers_set_hist_part(THD *thd, bool auto_hist)
> {
> + DBUG_ASSERT(!thd->lex->last_table() ||
> + !thd->lex->last_table()->vers_conditions.delete_history);
is that right?
Conceptually you need to test vers_conditions.delete_history for the table that
owns this partition_info. Is it always last_table() ?
I'd say it'd be more logical to do, like
part_field_array[0]->table->pos_in_table_list
> +
> + uint create_count= 0;
> + auto_hist= auto_hist && vers_info->auto_hist;
> +
> if (vers_info->limit)
> {
> + DBUG_ASSERT(!vers_info->interval.is_set());
> ha_partition *hp= (ha_partition*)(table->file);
> partition_element *next= NULL;
> List_iterator<partition_element> it(partitions);
> @@ -862,9 +869,183 @@ void partition_info::vers_set_hist_part(THD *thd)
> {
> vers_info->hist_part= next;
> if (next->range_value > thd->query_start())
> - return;
> + {
> + error= false;
> + break;
> + }
> + }
> + if (error)
> + {
> + if (auto_hist)
> + {
> + DBUG_ASSERT(thd->query_start() >= vers_info->hist_part->range_value);
> + my_time_t diff= thd->query_start() - (my_time_t) vers_info->hist_part->range_value;
> + if (diff > 0)
> + {
> + size_t delta= vers_info->interval.seconds();
> + create_count= (uint) (diff / delta + 1);
> + if (diff % delta)
> + create_count++;
> + }
> + else
> + create_count= 1;
> + }
> + else
> + {
> + my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
after such an overflow a table becomes essentially corrupted,
rows are in a wrong partition. So new history partitions cannot be added
anymore without reorganizing the last partition.
How is it handled now?
(it's just a question, not part of the review, as it's not something you
were changing in your patch)
> + table->s->db.str, table->s->table_name.str,
> + vers_info->hist_part->partition_name, "INTERVAL");
> + }
> }
> }
> +
> + /*
> + When hist_part is almost full LOCK TABLES my overflow the partition as we
s/my/may/
> + can't add new partitions under LOCK TABLES. Reserve one for this case.
> + */
> + if (auto_hist && create_count == 0 &&
> + thd->lex->sql_command == SQLCOM_LOCK_TABLES &&
> + vers_info->hist_part->id + 1 == vers_info->now_part->id)
> + create_count= 1;
> +
> + return create_count;
> +}
> +
> +
> +/**
> + @brief Run fast_alter_partition_table() to add new history partitions
> + for tables requiring them.
> +*/
> +bool vers_add_auto_hist_parts(THD *thd, TABLE_LIST* tl, uint num_parts)
> +{
> + bool result= true;
> + HA_CREATE_INFO create_info;
> + Alter_info alter_info;
> + partition_info *save_part_info= thd->work_part_info;
> + Query_tables_list save_query_tables;
> + Reprepare_observer *save_reprepare_observer= thd->m_reprepare_observer;
> + bool save_no_write_to_binlog= thd->lex->no_write_to_binlog;
> + thd->m_reprepare_observer= NULL;
> + thd->lex->reset_n_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt|= SUB_STMT_AUTO_HIST;
> + thd->lex->no_write_to_binlog= true;
> + TABLE *table= tl->table;
> +
> + DBUG_ASSERT(!thd->is_error());
> +
> + {
> + DBUG_ASSERT(table->s->get_table_ref_type() == TABLE_REF_BASE_TABLE);
> + DBUG_ASSERT(table->versioned());
> + DBUG_ASSERT(table->part_info);
> + DBUG_ASSERT(table->part_info->vers_info);
> + alter_info.reset();
> + alter_info.partition_flags= ALTER_PARTITION_ADD|ALTER_PARTITION_AUTO_HIST;
> + create_info.init();
> + create_info.alter_info= &alter_info;
> + Alter_table_ctx alter_ctx(thd, tl, 1, &table->s->db, &table->s->table_name);
> +
> + MDL_REQUEST_INIT(&tl->mdl_request, MDL_key::TABLE, tl->db.str,
> + tl->table_name.str, MDL_SHARED_NO_WRITE, MDL_TRANSACTION);
> + if (thd->mdl_context.acquire_lock(&tl->mdl_request,
> + thd->variables.lock_wait_timeout))
> + goto exit;
> + table->mdl_ticket= tl->mdl_request.ticket;
> +
> + create_info.db_type= table->s->db_type();
> + create_info.options|= HA_VERSIONED_TABLE;
> + DBUG_ASSERT(create_info.db_type);
> +
> + create_info.vers_info.set_start(table->s->vers_start_field()->field_name);
> + create_info.vers_info.set_end(table->s->vers_end_field()->field_name);
> +
> + partition_info *part_info= new partition_info();
> + if (unlikely(!part_info))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> + part_info->use_default_num_partitions= false;
> + part_info->use_default_num_subpartitions= false;
> + part_info->num_parts= num_parts;
> + part_info->num_subparts= table->part_info->num_subparts;
> + part_info->subpart_type= table->part_info->subpart_type;
> + if (unlikely(part_info->vers_init_info(thd)))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> +
> + // NB: set_ok_status() requires DA_EMPTY
> + thd->get_stmt_da()->reset_diagnostics_area();
would it not be DA_EMPTY without a reset? this is at the beginning
of a statement, I'd expect it normally be DA_EMPTY here. What other DA
states can you get here?
> +
> + thd->work_part_info= part_info;
> + if (part_info->set_up_defaults_for_partitioning(thd, table->file, NULL,
> + table->part_info->next_part_no(num_parts)))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN,
> + ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "setting up defaults failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + bool partition_changed= false;
> + bool fast_alter_partition= false;
> + if (prep_alter_part_table(thd, table, &alter_info, &create_info,
> + &partition_changed, &fast_alter_partition))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partitition prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + if (!fast_alter_partition)
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "fast alter partitition is not possible");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + DBUG_ASSERT(partition_changed);
> + if (mysql_prepare_alter_table(thd, table, &create_info, &alter_info,
> + &alter_ctx))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> +
> + if (fast_alter_partition_table(thd, table, &alter_info, &create_info,
> + tl, &table->s->db, &table->s->table_name))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partition table failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + }
> +
> + result= false;
> + // NB: we have to return DA_EMPTY for new command
may be DBUG_ASSERT(thd->get_stmt_da()->is_ok());
> + thd->get_stmt_da()->reset_diagnostics_area();
> +
> +exit:
> + thd->work_part_info= save_part_info;
> + thd->m_reprepare_observer= save_reprepare_observer;
> + thd->lex->restore_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt&= ~SUB_STMT_AUTO_HIST;
> + thd->lex->no_write_to_binlog= save_no_write_to_binlog;
> + return result;
> }
>
>
> diff --git a/sql/partition_info.h b/sql/partition_info.h
> index 0656238ec07..94093353d60 100644
> --- a/sql/partition_info.h
> +++ b/sql/partition_info.h
> @@ -72,9 +73,26 @@ struct Vers_part_info : public Sql_alloc
> my_time_t start;
> INTERVAL step;
> enum interval_type type;
> - bool is_set() { return type < INTERVAL_LAST; }
> + bool is_set() const { return type < INTERVAL_LAST; }
> + size_t seconds() const
> + {
> + if (step.second)
> + return step.second;
> + if (step.minute)
> + return step.minute * 60;
> + if (step.hour)
> + return step.hour * 3600;
> + if (step.day)
> + return step.day * 3600 * 24;
> + // comparison is used in rough estimates, it doesn't need to be calendar-correct
> + if (step.month)
> + return step.month * 3600 * 24 * 30;
shouldn't it be `* 28` ? you need a most pessimistic estimate to make sure
you never miss to create a partition when one is needed. You can sometimes
create one when it's not needed yet, but it's less of a problem.
> + DBUG_ASSERT(step.year);
> + return step.year * 86400 * 30 * 365;
that's one `* 30` too many :)
> + }
> } interval;
> ulonglong limit;
> + bool auto_hist;
> partition_element *now_part;
> partition_element *hist_part;
> };
> diff --git a/sql/sql_base.cc b/sql/sql_base.cc
> index 1a1186aca73..d0e255186da 100644
> --- a/sql/sql_base.cc
> +++ b/sql/sql_base.cc
> @@ -1625,6 +1625,52 @@ bool is_locked_view(THD *thd, TABLE_LIST *t)
> }
>
>
> +bool TABLE::vers_need_hist_part(const THD *thd, const TABLE_LIST *table_list) const
> +{
> +#ifdef WITH_PARTITION_STORAGE_ENGINE
> + if (part_info && part_info->part_type == VERSIONING_PARTITION &&
> + !table_list->vers_conditions.delete_history &&
> + !thd->stmt_arena->is_stmt_prepare() &&
> + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> + table_list->mdl_request.type >= MDL_SHARED_WRITE &&
> + table_list->mdl_request.type < MDL_EXCLUSIVE)
> + {
> + switch (thd->lex->sql_command)
> + {
SQLCOM_LOAD with DUP_REPLACE?
> + case SQLCOM_INSERT:
> + if (thd->lex->duplicates != DUP_UPDATE)
> + break;
> + /* fall-through: */
> + case SQLCOM_LOCK_TABLES:
> + case SQLCOM_DELETE:
> + case SQLCOM_UPDATE:
> + case SQLCOM_REPLACE:
> + case SQLCOM_REPLACE_SELECT:
> + case SQLCOM_DELETE_MULTI:
> + case SQLCOM_UPDATE_MULTI:
> + return true;
> + default:;
> + break;
> + }
> + if (thd->rgi_slave && thd->rgi_slave->current_event && thd->lex->sql_command == SQLCOM_END)
I wonder, would it be possible, instead of introducing rgi_slave->current_event,
just make row events set thd->lex->sql_command appropriately?
For example, currently row events increment thd->status_var.com_stat[]
each event for its own SQLCOM_xxx, it won't be needed if they'll just set
thd->lex->sql_command.
> + {
> + switch (thd->rgi_slave->current_event->get_type_code())
> + {
> + case UPDATE_ROWS_EVENT:
> + case UPDATE_ROWS_EVENT_V1:
> + case DELETE_ROWS_EVENT:
> + case DELETE_ROWS_EVENT_V1:
> + return true;
> + default:;
> + break;
> + }
> + }
> + }
> +#endif
> + return false;
> +}
> +
> +
> /**
> Open a base table.
>
> @@ -1777,6 +1823,11 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> DBUG_PRINT("info",("Using locked table"));
> #ifdef WITH_PARTITION_STORAGE_ENGINE
> part_names_error= set_partitions_as_used(table_list, table);
> + if (table->vers_need_hist_part(thd, table_list))
> + {
> + /* Rotation is still needed under LOCK TABLES */
a bit confusing comment, you left out stuff that are obvious
to you but not to others. I'd suggest something like
/*
New partitions are not auto-created under LOCK TABLES (TODO: fix it)
but rotation can still happen.
*/
> + table->part_info->vers_set_hist_part(thd, false);
> + }
> #endif
> goto reset;
> }
> @@ -2032,6 +2083,20 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> tc_add_table(thd, table);
> }
>
> + if (!ot_ctx->vers_create_count &&
what does this condition mean? When is ot_ctx->vers_create_count > 0 ?
> + table->vers_need_hist_part(thd, table_list))
> + {
> + ot_ctx->vers_create_count= table->part_info->vers_set_hist_part(thd, true);
> + if (ot_ctx->vers_create_count)
> + {
> + MYSQL_UNBIND_TABLE(table->file);
> + tc_release_table(table);
> + ot_ctx->request_backoff_action(Open_table_context::OT_ADD_HISTORY_PARTITION,
> + table_list);
> + DBUG_RETURN(TRUE);
> + }
> + }
> +
> if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&
> table->s->table_category < TABLE_CATEGORY_INFORMATION)
> {
> @@ -3172,10 +3239,18 @@ Open_table_context::recover_from_failed_open()
> break;
> case OT_DISCOVER:
> case OT_REPAIR:
> - if ((result= lock_table_names(m_thd, m_thd->lex->create_info,
> - m_failed_table, NULL,
> - get_timeout(), 0)))
> + case OT_ADD_HISTORY_PARTITION:
> + result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table,
> + NULL, get_timeout(), 0);
> + if (result)
> + {
> + if (m_action == OT_ADD_HISTORY_PARTITION)
> + {
> + m_thd->clear_error();
why?
> + result= false;
> + }
> break;
> + }
>
> tdc_remove_table(m_thd, m_failed_table->db.str,
> m_failed_table->table_name.str);
> diff --git a/sql/sql_error.h b/sql/sql_error.h
> index 318d5076534..92076616adb 100644
> --- a/sql/sql_error.h
> +++ b/sql/sql_error.h
> @@ -1195,7 +1195,6 @@ class Diagnostics_area: public Sql_state_errno,
>
> void copy_non_errors_from_wi(THD *thd, const Warning_info *src_wi);
>
> -private:
It doesn't seem you're using get_warning_info() anywhere
> Warning_info *get_warning_info() { return m_wi_stack.front(); }
>
> const Warning_info *get_warning_info() const { return m_wi_stack.front(); }
> diff --git a/sql/sql_partition.cc b/sql/sql_partition.cc
> index 86ba393fb7b..f4a960afe47 100644
> --- a/sql/sql_partition.cc
> +++ b/sql/sql_partition.cc
> @@ -7237,7 +7246,8 @@ uint fast_alter_partition_table(THD *thd, TABLE *table,
> }
> else if ((alter_info->partition_flags & ALTER_PARTITION_ADD) &&
> (part_info->part_type == RANGE_PARTITION ||
> - part_info->part_type == LIST_PARTITION))
> + part_info->part_type == LIST_PARTITION ||
> + alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
it'd be good it adding new empty VERSIONING partitions
would always go this way, auto or not. but it's a separate task.
> {
> /*
> ADD RANGE/LIST PARTITIONS
> @@ -7281,7 +7291,7 @@ uint fast_alter_partition_table(THD *thd, TABLE *table,
> write_log_add_change_partition(lpt) ||
> ERROR_INJECT_CRASH("crash_add_partition_4") ||
> ERROR_INJECT_ERROR("fail_add_partition_4") ||
> - mysql_change_partitions(lpt) ||
> + mysql_change_partitions(lpt, false) ||
this way you skip trans_commit_stmt/trans_commit_implicit for
ALTER TABLE ... ADD RANGE/LIST partition.
is it always ok?
A safer alternative would've been
mysql_change_partitions(lpt, !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
but it's more complex, so I'd prefer your variant, if we can be sure that it
doesn't break anything.
> ERROR_INJECT_CRASH("crash_add_partition_5") ||
> ERROR_INJECT_ERROR("fail_add_partition_5") ||
> alter_close_table(lpt) ||
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
3
12
[Maria-developers] Initial prototype for MDEV-10825 Persist exact view definitions DDL
by Vicențiu Ciorbaru 17 Apr '21
by Vicențiu Ciorbaru 17 Apr '21
17 Apr '21
Hi Sergei!
This email is a followup to the brief discussion on Zulip here:
https://mariadb.zulipchat.com/#narrow/stream/118759-general/topic/preserve.…
You mentioned we store the view's definition (source) inside the FRM. I've
used that information to extend the I_S.views table with a source column.
The patch is very small, but I have 2 questions:
1. Is this how the feature should look like? I wonder if we should
prepend *create
view <view-name>* to the SOURCE column, to make it behave very similar to
SHOW CREATE VIEW. Perhaps SOURCE as a column name is not the most well
chosen name.
2. I don't know if I should use:
table->field[11]->store(tables->source.str, tables->source.length,
tables->view_creation_ctx->get_client_cs());
or
table->field[11]->store(tables->source.str, tables->source.length,
cs);
when storing the source data.
Here is the patch:
https://github.com/MariaDB/server/compare/10.6-mdev-10825
As soon as we agree on the complete specs for the feature, I'll clean up
test failures in other tests, etc.
Vicențiu
2
1
Re: [Maria-developers] 7b20964dd240: MDEV-8334: Rename utf8 to utf8mb3
by Sergei Golubchik 15 Apr '21
by Sergei Golubchik 15 Apr '21
15 Apr '21
Hi, Rucha!
On Apr 15, Rucha Deodhar wrote:
> revision-id: 7b20964dd240
> parent(s): e9a2c9e
> author: Rucha Deodhar <rucha.deodhar(a)mariadb.com>
> timestamp: 2021-03-26 00:55:56 +0530
> message:
>
> MDEV-8334: Rename utf8 to utf8mb3
>
> This patch is made as a part of MDEV-8334 to fix failing test in unit and
> main test suite so that utf8mb3 characterset is recognized. Failing tests:
> main.mysql_client_test
> main.mysql_client_test_comp
> unit.conc_basic-t
> unit.conc_charset
> unit.conc_connection
> diff --git a/libmariadb/ma_charset.c b/libmariadb/ma_charset.c
> index ee4b0f47..307cd522 100644
> --- a/libmariadb/ma_charset.c
> +++ b/libmariadb/ma_charset.c
> @@ -67,6 +67,10 @@
> #include <langinfo.h>
> #endif
>
> +#define IS_UTF8(c)\
> +(!strcasecmp((c), "utf8") || !strcasecmp((c), "utf8mb3") ||\
> + !strcasecmp((c), "utf8mb4") || !strcasecmp((c), "utf-8"))
> +
> /*
> +----------------------------------------------------------------------+
> | PHP Version 5 |
> @@ -1269,7 +1275,7 @@ struct st_madb_os_charset MADB_OS_CHARSET[]=
> {"57010", "ISCII Gujarati", NULL, NULL, MADB_CS_UNSUPPORTED},
> {"57011", "ISCII Punjabi", NULL, NULL, MADB_CS_UNSUPPORTED},
> {"65000", "utf-7 Unicode (UTF-7)", NULL, NULL, MADB_CS_UNSUPPORTED},
> - {"65001", "utf-8 Unicode (UTF-8)", "utf8", NULL, MADB_CS_EXACT},
> + {"65001", "utf-8 Unicode (UTF-8)", "utf8mb3", NULL, MADB_CS_EXACT},
No, keep this utf8, it's still a valid charset name, the server can
figure it out what to map it to.
> /* non Windows */
> #else
> /* iconv encodings */
> @@ -1337,8 +1343,8 @@ struct st_madb_os_charset MADB_OS_CHARSET[]=
> {"gb2312", "GB2312", "gb2312", "GB2312", MADB_CS_EXACT},
> {"gbk", "GBK", "gbk", "GBK", MADB_CS_EXACT},
> {"georgianps", "Georgian", "geostd8", "GEORGIAN-PS", MADB_CS_EXACT},
> - {"utf8", "UTF8", "utf8", "UTF-8", MADB_CS_EXACT},
> - {"utf-8", "UTF8", "utf8", "UTF-8", MADB_CS_EXACT},
> + {"utf8mb3", "UTF8MB3", "utf8mb3", "UTF-8", MADB_CS_EXACT},
> + {"utf-8", "UTF8MB3", "utf8mb3", "UTF-8", MADB_CS_EXACT},
same here
> #endif
> {NULL, NULL, NULL, NULL, 0}
> };
> @@ -1361,8 +1367,8 @@ const char *madb_get_os_character_set()
> return MADB_DEFAULT_CHARSET_NAME;
> while (MADB_OS_CHARSET[i].identifier)
> {
> - if (MADB_OS_CHARSET[i].supported > MADB_CS_UNSUPPORTED &&
> - strcasecmp(MADB_OS_CHARSET[i].identifier, p) == 0)
> + if ((MADB_OS_CHARSET[i].supported > MADB_CS_UNSUPPORTED &&
> + strcasecmp(MADB_OS_CHARSET[i].identifier, p) == 0) || IS_UTF8(p))
why?
> return MADB_OS_CHARSET[i].charset;
> i++;
> }
> diff --git a/unittest/libmariadb/basic-t.c b/unittest/libmariadb/basic-t.c
> index c22e6c2b..e2943964 100644
> --- a/unittest/libmariadb/basic-t.c
> +++ b/unittest/libmariadb/basic-t.c
> @@ -310,7 +310,8 @@ static int use_utf8(MYSQL *my)
>
> while ((row= mysql_fetch_row(res)) != NULL)
> {
> - FAIL_IF(strcmp(row[0], "utf8"), "wrong character set");
> + FAIL_IF(strcmp(row[0], get_utf8_name(mysql_get_server_version(my),"utf8")),
> + "wrong character set");
technically, C/C is a separate project, can run on any server with any
config file. So it'd be safer to check that row[0] starts from utf8
and not assume that it depends on a server version in a specific way.
> }
> FAIL_IF(mysql_errno(my), mysql_error(my));
> mysql_free_result(res);
> diff --git a/unittest/libmariadb/charset.c b/unittest/libmariadb/charset.c
> index 898b6dad..ffa877bc 100644
> --- a/unittest/libmariadb/charset.c
> +++ b/unittest/libmariadb/charset.c
> @@ -71,14 +71,20 @@ int bug_8378(MYSQL *mysql) {
> int test_client_character_set(MYSQL *mysql)
> {
> MY_CHARSET_INFO cs;
> + char collation_name[19];
> char *csname= (char*) "utf8";
> char *csdefault= (char*)mysql_character_set_name(mysql);
>
> + strcpy(collation_name,(const char*)get_utf8_name(mysql_get_server_version(mysql),
> + "utf8_general_ci"));
> +
This one is simpler. It only tests that mysql_set_character_set() works.
Just don't use utf8, make it test on something else, e.g. on latin2.
> FAIL_IF(mysql_set_character_set(mysql, csname), mysql_error(mysql));
>
> mysql_get_character_set_info(mysql, &cs);
>
> - FAIL_IF(strcmp(cs.csname, "utf8") || strcmp(cs.name, "utf8_general_ci"), "Character set != UTF8");
> + FAIL_IF(strcmp(cs.csname, get_utf8_name(mysql_get_server_version(mysql),"utf8")) ||
> + strcmp(cs.name, collation_name),
> + "Wrong UTF8 characterset");
> FAIL_IF(mysql_set_character_set(mysql, csdefault), mysql_error(mysql));
>
> return OK;
> @@ -537,6 +544,9 @@ static int test_bug30472(MYSQL *mysql)
>
> SKIP_MAXSCALE;
>
> + strcpy(collation_name,(const char*)get_utf8_name(mysql_get_server_version(mysql),
> + "utf8_general_ci"));
> +
same here, the bug is https://bugs.mysql.com/bug.php?id=30472
"libmysql doesn't reset charset, insert_id after succ. mysql_change_user() call"
so, does not need utf8 specifically. Change it to some easier to use
charset.
> if (mysql_get_server_version(mysql) < 50100 || !is_mariadb)
> {
> diag("Test requires MySQL Server version 5.1 or above");
> diff --git a/unittest/libmariadb/connection.c b/unittest/libmariadb/connection.c
> index 70d347ce..eb9b39bb 100644
> --- a/unittest/libmariadb/connection.c
> +++ b/unittest/libmariadb/connection.c
> @@ -644,9 +644,8 @@ int test_conc26(MYSQL *unused __attribute__((unused)))
>
> FAIL_IF(my_test_connect(mysql, hostname, "notexistinguser", "password", schema, port, NULL, CLIENT_REMEMBER_OPTIONS),
> "Error expected");
> -
> - FAIL_IF(!mysql->options.charset_name || strcmp(mysql->options.charset_name, "utf8") != 0,
> - "expected charsetname=utf8");
> + FAIL_IF(!mysql->options.charset_name || strcmp(mysql->options.charset_name, "utf8") != 0,
> + "Wrong utf8 characterset for this version");
again, CONC-26 is "CLIENT_REMEMBER_OPTIONS flag missing"
it doesn't apparently need utf8 specifically, so just use a different
non-default charset there.
> mysql_close(mysql);
>
> mysql= mysql_init(NULL);
> @@ -981,7 +980,8 @@ static int test_sess_track_db(MYSQL *mysql)
> printf("# SESSION_TRACK_VARIABLES: %*.*s\n", (int)len, (int)len, data);
> } while (!mysql_session_track_get_next(mysql, SESSION_TRACK_SYSTEM_VARIABLES, &data, &len));
> diag("charset: %s", mysql->charset->csname);
> - FAIL_IF(strcmp(mysql->charset->csname, "utf8"), "Expected charset 'utf8'");
> + FAIL_IF(strcmp(mysql->charset->csname, get_utf8_name(mysql_get_server_version(mysql),"utf8")),
> + "Wrong utf8 characterset for this version");
same here
>
> rc= mysql_query(mysql, "SET NAMES latin1");
> check_mysql_rc(rc, mysql);
> diff --git a/unittest/libmariadb/my_test.h b/unittest/libmariadb/my_test.h
> index c30d1b6d..a040c3d9 100644
> --- a/unittest/libmariadb/my_test.h
> +++ b/unittest/libmariadb/my_test.h
> @@ -701,3 +701,23 @@ void run_tests(struct my_tests_st *test) {
> }
> }
>
> +static inline const char* get_utf8_name(unsigned long server_version,
> + const char* name)
> +{
> + const char *csname= server_version >= 100600 ? "utf8mb3" : "utf8";
> + char *corrected_name= malloc(19*sizeof(char));
> + corrected_name[18]='\0';
> +
> + if (!strchr(name, '_'))
> + {
> + strcpy(corrected_name,csname);
> + corrected_name[strlen(csname)]='\0';
> + }
> + else
> + {
> + strcpy(corrected_name,csname);
> + strcat(corrected_name,"_general_ci");
> + corrected_name[strlen(csname)+11]= '\0';
> + }
> + return (const char*)corrected_name;
> +}
shouldn't be needed
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
Re: [Maria-developers] 21a84e23cf3: MDEV-16437: merge 5.7 P_S replication instrumentation and tables
by Sergei Golubchik 09 Apr '21
by Sergei Golubchik 09 Apr '21
09 Apr '21
Hi, Sujatha!
Note, it's a review of the combined `git diff e5fc78 21a84e`
On Mar 31, Sujatha wrote:
> revision-id: 21a84e23cf3 (mariadb-10.5.2-307-g21a84e23cf3)
> parent(s): 4e6de585ff7
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Sujatha <sujatha.sivakumar(a)mariadb.com>
> timestamp: 2020-11-30 13:22:32 +0530
> message:
>
> MDEV-16437: merge 5.7 P_S replication instrumentation and tables
That was pretty good, thanks!
Just few small comments below, no big changes will be needed.
> diff --git a/mysql-test/suite/multi_source/simple.result b/mysql-test/suite/multi_source/simple.result
> index a66d49e88cb..b57e146feb5 100644
> --- a/mysql-test/suite/multi_source/simple.result
> +++ b/mysql-test/suite/multi_source/simple.result
> @@ -21,7 +21,21 @@ show all slaves status;
> Connection_name Slave_SQL_State Slave_IO_State Master_Host Master_User Master_Port Connect_Retry Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos Relay_Master_Log_File Slave_IO_Running Slave_SQL_Running Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno Last_Error Skip_Counter Exec_Master_Log_Pos Relay_Log_Space Until_Condition Until_Log_File Until_Log_Pos Master_SSL_Allowed Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert Last_IO_Errno Last_IO_Error Last_SQL_Errno Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id Master_SSL_Crl Master_SSL_Crlpath Using_Gtid Gtid_IO_Pos Replicate_Do_Domain_Ids Replicate_Ignore_Domain_Ids Parallel_Mode SQL_Delay SQL_Remaining_Delay Slave_SQL_Running_State Slave_DDL_Groups Slave_Non_Transactional_Groups Slave_Transactional_Groups Retried_transactions Max_relay_log_size Executed_log_entries Slave_received_heartbeats Slave_heartbeat_period Gtid_Slave_Pos
> slave1 Slave has read all relay log; waiting for more updates Waiting for master to send event 127.0.0.1 root MYPORT_1 60 master-bin.000001 <read_master_log_pos> mysqld-relay-bin-slave1.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space1> None 0 No 0 No 0 0 1 No optimistic 0 NULL Slave has read all relay log; waiting for more updates 0 0 0 0 1073741824 7 0 60.000
> slave2 Slave has read all relay log; waiting for more updates Waiting for master to send event 127.0.0.1 root MYPORT_2 60 master-bin.000001 <read_master_log_pos> mysqld-relay-bin-slave2.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space1> None 0 No 0 No 0 0 2 No optimistic 0 NULL Slave has read all relay log; waiting for more updates 0 0 0 0 1073741824 7 0 60.000
> +#
> +# MDEV:16437: merge 5.7 P_S replication instrumentation and tables
> +#
> +select * from performance_schema.replication_connection_configuration;
> +CONNECTION_NAME HOST PORT USER USING_GTID SSL_ALLOWED SSL_CA_FILE SSL_CA_PATH SSL_CERTIFICATE SSL_CIPHER SSL_KEY SSL_VERIFY_SERVER_CERTIFICATE SSL_CRL_FILE SSL_CRL_PATH CONNECTION_RETRY_INTERVAL CONNECTION_RETRY_COUNT HEARTBEAT_INTERVAL IGNORE_SERVER_IDS REPL_DO_DOMAIN_IDS REPL_IGNORE_DOMAIN_IDS
> +slave2 # # root NO NO NO 60 86400 60.000
> +slave1 # # root NO NO NO 60 86400 60.000
Isn't the host always 127.0.0.1 ? Why to mask it?
Also, may be it'd be easier to read the result, if you print it vertically?
It looks that CONNECTION_RETRY_COUNT is 86400.
And 86400 is clearly a timeout, not a retry count.
> start all slaves;
> +#
> +# MDEV:16437: merge 5.7 P_S replication instrumentation and tables
> +#
> +select * from performance_schema.replication_applier_status_by_coordinator;
> +CONNECTION_NAME THREAD_ID SERVICE_STATE LAST_SEEN_TRANSACTION LAST_ERROR_NUMBER LAST_ERROR_MESSAGE LAST_ERROR_TIMESTAMP LAST_TRANS_RETRY_COUNT
> +slave2 # ON 0 0000-00-00 00:00:00 0
> +slave1 # ON 0 0000-00-00 00:00:00 0
> stop slave 'slave1';
> show slave 'slave1' status;
> Slave_IO_State
> diff --git a/mysql-test/suite/rpl/include/rpl_deadlock.test b/mysql-test/suite/rpl/include/rpl_deadlock.test
> index e9191d5fcd8..bccbe044a36 100644
> --- a/mysql-test/suite/rpl/include/rpl_deadlock.test
> +++ b/mysql-test/suite/rpl/include/rpl_deadlock.test
> @@ -59,6 +59,16 @@ let $status_var_comparsion= >;
> connection slave;
> SELECT COUNT(*) FROM t2;
> COMMIT;
> +
> +--echo
> +--echo # Test that the performance schema coulumn shows > 0 values.
> +--echo
> +
> +--let $assert_text= current number of retries should be more than the value saved before deadlock.
> +--let $assert_cond= [SELECT COUNT_TRANSACTIONS_RETRIES FROM performance_schema.replication_applier_status, COUNT_TRANSACTIONS_RETRIES, 1] > "$slave_retried_transactions"
> +--source include/assert.inc
what's wrong with simple
SELECT COUNT_TRANSACTIONS_RETRIES > $slave_retried_transactions FROM performance_schema.replication_applier_status
?
> +
> +source include/check_slave_is_running.inc;
> sync_with_master;
>
> # Check the data
> diff --git a/mysql-test/suite/rpl/r/rpl_perfschema_connect_config.result b/mysql-test/suite/rpl/r/rpl_perfschema_connect_config.result
> new file mode 100644
> index 00000000000..4ace84ffac4
> --- /dev/null
> +++ b/mysql-test/suite/rpl/r/rpl_perfschema_connect_config.result
> @@ -0,0 +1,124 @@
> +include/master-slave.inc
> +[connection master]
> +# Asserted this: On master, the table should return an empty set.
> +connection slave;
> +
> +# Verify that SELECT works for every field and produces an output
> +# similar to the corresponding field in SHOW SLAVE STATUS(SSS).
> +
> +include/assert.inc [Value returned by SSS and PS table for Host should be same.]
> +include/assert.inc [Value returned by SSS and PS table for Port should be same.]
> +include/assert.inc [Value returned by SSS and PS table for User should be same.]
> +include/assert.inc [Value returned by SSS and PS table for Using_Gtid should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Allowed should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_CA_File should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_CA_Path should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Certificate should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Cipher should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Key should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Verify_Server_Certificate should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Crl_File should be same.]
> +include/assert.inc [Value returned by SSS and PS table for SSL_Crl_Path should be same.]
> +include/assert.inc [Value returned by SSS and PS table for Connection_Retry_Interval should be same.]
> +include/assert.inc [Value returned by PS table for Connection_Retry_Count should be 10.]
this is just unreadable
> +
> +# Heartbeat_Interval is part of I_S and P_S. We will compare the
> +# two to make sure both match.
...
> diff --git a/mysql-test/suite/rpl/t/rpl_perfschema_applier_config.test b/mysql-test/suite/rpl/t/rpl_perfschema_applier_config.test
> new file mode 100644
> index 00000000000..132d9912222
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_perfschema_applier_config.test
> @@ -0,0 +1,96 @@
> +# ==== Purpose ====
> +#
> +# This test script serves as the functionality testing for the table
> +# performance_schema.replication_applier_configuration. Test for ddl and dml
> +# operations is a part of the perfschema suite. The ddl/dml tests are named:
> +# 1) ddl_replication_applier_configuration.test and
> +# 2) dml_replication_applier_configuration.test.
> +#
> +# The follwing scenarios are tested in this script:
> +#
> +# - Verify that output is same as SSS on a fresh slave.
> +# - Verify that the value of this field is correct after STOP SLAVE.
> +# - Verify that, when desired delay is set, the value is shown corectly.
> +# - Verify that the value is preserved after STOP SLAVE.
> +# - Verify that, when desired delay is reset, the value is shown corectly.
> +#
> +# ==== Related Worklog ====
> +#
> +# MDEV-16437: merge 5.7 P_S replication instrumentation and tables
> +#
> +
> +source include/master-slave.inc;
> +source include/have_binlog_format_mixed.inc;
master-slave should be included last
> +
> +let $assert_text= On master, the table should return an empty set.;
> +let $assert_cond= count(*) = 0 from performance_schema.replication_applier_configuration;
> +source include/assert.inc;
again, what's wrong with just
select * from performance_schema.replication_applier_configuration;
or, may be, if you want to be very explicit
select count(*) as 'must be 0' from performance_schema.replication_applier_configuration;
> +
> +--connection slave
> +
> +--echo
> +--echo # Verify that SELECT works and produces an output similar to
> +--echo # the corresponding field in SHOW SLAVE STATUS(SSS) in all scenarios.
> +--echo
> +
> +--echo
> +--echo # Verify that output is same as SSS on a fresh slave.
> +--echo
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1);
> +let $ps_value= query_get_value(select Desired_Delay from performance_schema.replication_applier_configuration, Desired_Delay, 1);
> +let $assert_text= Value returned by SSS and PS table for Desired_Delay should be same.;
> +let $assert_cond= "$sss_value" = "$ps_value";
> +source include/assert.inc;
I'll stop commenting on every assert.inc, but please, please, stop overusing them.
> +
> +--echo
> +--echo # Verify that the value of this field is correct after STOP SLAVE.
> +--echo
> +
> +source include/stop_slave.inc;
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1);
> +let $ps_value= query_get_value(select Desired_Delay from performance_schema.replication_applier_configuration, Desired_Delay, 1);
> +let $assert_text= Value returned by SSS and PS table for Desired_Delay should be same.;
> +let $assert_cond= "$sss_value" = "$ps_value";
> +source include/assert.inc;
> +
> +--echo
> +--echo # Verify that, when desired delay is set, the value is shown corectly.
> +--echo
> +
> +eval change master to master_delay= 2;
> +source include/start_slave.inc;
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1);
> +let $ps_value= query_get_value(select Desired_Delay from performance_schema.replication_applier_configuration, Desired_Delay, 1);
> +let $assert_text= Value returned by SSS and PS table for Desired_Delay should be same.;
> +let $assert_cond= "$sss_value" = "$ps_value";
> +source include/assert.inc;
> +
> +--echo
> +--echo # Verify that the value is preserved after STOP SLAVE.
> +--echo
> +
> +source include/stop_slave.inc;
> +
> +let $ss_value= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1);
> +let $ps_value= query_get_value(select Desired_Delay from performance_schema.replication_applier_configuration, Desired_Delay, 1);
> +let $assert_text= Value returned by SSS and PS table for Desired_Delay should be same.;
> +let $assert_cond= "$sss_value" = "$ps_value";
> +source include/assert.inc;
> +
> +--echo
> +--echo # Verify that, when desired delay is reset, the value is shown corectly.
> +--echo
> +
> +eval change master to master_delay= 0;
> +source include/start_slave.inc;
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1);
> +let $ps_value= query_get_value(select Desired_Delay from performance_schema.replication_applier_configuration, Desired_Delay, 1);
> +let $assert_text= Value returned by SSS and PS table for Desired_Delay should be same.;
> +let $assert_cond= "$sss_value" = "$ps_value";
> +source include/assert.inc;
> +
> +source include/rpl_end.inc;
> diff --git a/mysql-test/suite/rpl/t/rpl_perfschema_applier_status.test b/mysql-test/suite/rpl/t/rpl_perfschema_applier_status.test
> new file mode 100644
> index 00000000000..52ee14cef2a
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_perfschema_applier_status.test
> @@ -0,0 +1,72 @@
> +# ==== Purpose ====
> +#
> +# This test script serves as the functionality testing for the table
> +# performance_schema.replication_applier_status. Test for ddl and dml
> +# operations is a part of the perfschema suite. The ddl/dml tests are named:
> +# 1) ddl_replication_applier_status.test and
> +# 2) dml_replication_applier_status.test.
> +#
> +# The follwing scenarios are tested in this script:
> +#
> +# - Verify that output is same as SSS on a fresh slave.
> +# - Verify that the value of this field is correct after STOP SLAVE.
> +# - Remaining delay is not tested.
> +# - Count_trnsaction is partially tested here making sure it can be queried.
> +# More testing in rpl/include/rpl_deadlock.test
> +#
> +# ==== Related Worklog ====
> +#
> +# MDEV-16437: merge 5.7 P_S replication instrumentation and tables
> +#
> +
> +source include/master-slave.inc;
> +source include/have_binlog_format_mixed.inc;
master-slave must always be last (I'll stop commenting on that too)
> +
> +let $assert_text= On master, the table should return an empty set.;
> +let $assert_cond= count(*) = 0 from performance_schema.replication_applier_status;
> +source include/assert.inc;
> +
> +--connection slave
> +
> +--echo
> +--echo # Verify that SELECT works and produces an output similar to
> +--echo # the corresponding field in SHOW SLAVE STATUS(SSS) in all scenarios.
> +--echo
> +
> +--echo
> +--echo # Verify that output is same as SSS on a fresh slave.
> +--echo
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, Slave_SQL_Running, 1);
> +let $ps_value= query_get_value(select Service_State from performance_schema.replication_applier_status, Service_State, 1);
> +let $assert_text= SSS shows Slave_SQL_Running as "Yes". So, Service_State from this PS table should be "ON".;
> +let $assert_cond= "$sss_value" = "Yes" AND "$ps_value"= "ON";
> +source include/assert.inc;
> +
> +let $ss_value= query_get_value(SHOW STATUS LIKE 'Slave_retried_transactions', Value, 1);
> +let $ps_value= query_get_value(select count_transactions_retries from performance_schema.replication_applier_status, count_transactions_retries, 1);
> +let $assert_text= COUNT_TRANSACTION_RETRIES should be equal to Slave_retried_transactions.;
> +let $assert_cond= "$ps_value"= "$ss_value";
> +source include/assert.inc;
> +
> +--echo
> +--echo # Verify that the fields show the correct values after STOP SLAVE.
> +--echo
> +
> +source include/stop_slave.inc;
> +
> +let $sss_value= query_get_value(SHOW SLAVE STATUS, Slave_SQL_Running, 1);
> +let $ps_value= query_get_value(select Service_State from performance_schema.replication_applier_status, Service_State, 1);
> +let $assert_text= SSS shows Slave_SQL_Running as "No". So, Service_State from this PS table should be "OFF".;
> +let $assert_cond= "$sss_value" = "No" AND "$ps_value"= "OFF";
> +source include/assert.inc;
> +
> +let $ss_value= query_get_value(SHOW STATUS LIKE 'Slave_retried_transactions', Value, 1);
> +let $ps_value= query_get_value(select count_transactions_retries from performance_schema.replication_applier_status, count_transactions_retries, 1);
> +let $assert_text= COUNT_TRANSACTION_RETRIES should be equal to Slave_retried_transactions.;
> +let $assert_cond= "$ps_value"= "$ss_value";
> +source include/assert.inc;
> +
> +source include/start_slave.inc;
> +source include/rpl_end.inc;
> +
> diff --git a/sql/rpl_mi.h b/sql/rpl_mi.h
> index 4d47689ac18..946d138d618 100644
> --- a/sql/rpl_mi.h
> +++ b/sql/rpl_mi.h
> @@ -50,13 +57,6 @@ class Domain_id_filter
> */
> DYNAMIC_ARRAY m_domain_ids[2];
>
> -public:
> - /* domain id list types */
> - enum enum_list_type {
> - DO_DOMAIN_IDS= 0,
> - IGNORE_DOMAIN_IDS
> - };
> -
Why did you move it up? Does it make any difference?
> Domain_id_filter();
>
> ~Domain_id_filter();
> diff --git a/storage/perfschema/table_replication_applier_status_by_coordinator.cc b/storage/perfschema/table_replication_applier_status_by_coordinator.cc
> index beb8620b240..30cf56ce0c2 100644
> --- a/storage/perfschema/table_replication_applier_status_by_coordinator.cc
> +++ b/storage/perfschema/table_replication_applier_status_by_coordinator.cc
> @@ -55,12 +55,14 @@ table_replication_applier_status_by_coordinator::m_share=
> sizeof(pos_t), /* ref length */
> &m_table_lock,
> { C_STRING_WITH_LEN("CREATE TABLE replication_applier_status_by_coordinator("
> - "CHANNEL_NAME CHAR(64) collate utf8_general_ci not null,"
> + "CONNECTION_NAME VARCHAR(256) collate utf8_general_ci not null,"
please, don't rename existing columns, let's keep calling it CHANNEL_NAME everywhere.
> "THREAD_ID BIGINT UNSIGNED,"
> "SERVICE_STATE ENUM('ON','OFF') not null,"
> + "LAST_SEEN_TRANSACTION CHAR(57) not null,"
I think it's better to put new columns at the end.
> "LAST_ERROR_NUMBER INTEGER not null,"
> "LAST_ERROR_MESSAGE VARCHAR(1024) not null,"
> - "LAST_ERROR_TIMESTAMP TIMESTAMP(0) not null)") },
> + "LAST_ERROR_TIMESTAMP TIMESTAMP(0) not null,"
> + "LAST_TRANS_RETRY_COUNT INTEGER not null)") },
> false /* perpetual */
> };
>
> @@ -104,15 +106,7 @@ int table_replication_applier_status_by_coordinator::rnd_next(void)
> {
> mi= (Master_info *)my_hash_element(&master_info_index->master_info_hash, m_pos.m_index);
>
> - /*
> - Construct and display SQL Thread's (Coordinator) information in
> - 'replication_applier_status_by_coordinator' table only in the case of
> - multi threaded slave mode. Code should do nothing in the case of single
> - threaded slave mode. In case of single threaded slave mode SQL Thread's
> - status will be reported as part of
> - 'replication_applier_status_by_worker' table.
> - */
is this no longer true?
> - if (mi && mi->host[0] && /*mi->rli.get_worker_count() > */ 0)
> + if (mi && mi->host[0])
> {
> make_row(mi);
> m_next_pos.set_after(&m_pos);
> @@ -147,13 +141,20 @@ int table_replication_applier_status_by_coordinator::rnd_pos(const void *pos)
> void table_replication_applier_status_by_coordinator::make_row(Master_info *mi)
> {
> m_row_exists= false;
> + rpl_gtid gtid;
> + char buf[10+1+10+1+20+1];
> + String str(buf, sizeof(buf), system_charset_info);
There's a special template for it:
StringBuffer<10+1+10+1+20+1> str;
> + bool first= true;
> +
> + str.length(0);
not needed if you use StringBuffer<>
>
> DBUG_ASSERT(mi != NULL);
>
> mysql_mutex_lock(&mi->rli.data_lock);
>
> - m_row.channel_name_length= static_cast<uint>(mi->connection_name.length);
> - memcpy(m_row.channel_name, mi->connection_name.str, m_row.channel_name_length);
> + gtid= mi->rli.last_seen_gtid;
> + m_row.connection_name_length= static_cast<uint>(mi->connection_name.length);
> + memcpy(m_row.connection_name, mi->connection_name.str, m_row.connection_name_length);
>
> if (mi->rli.slave_running)
> {
> @@ -175,6 +176,18 @@ void table_replication_applier_status_by_coordinator::make_row(Master_info *mi)
> else
> m_row.service_state= PS_RPL_NO;
>
> + if ((gtid.seq_no > 0 &&
> + !rpl_slave_state_tostring_helper(&str, >id, &first)))
> + {
> + strmake(m_row.last_seen_transaction,str.ptr(), str.length());
> + m_row.last_seen_transaction_length= str.length();
> + }
> + else
> + {
> + m_row.last_seen_transaction_length= 0;
> + memcpy(m_row.last_seen_transaction, "", 1);
a strange way to write m_row.last_seen_transaction[0]= 0, but ok :)
> + }
> +
> mysql_mutex_lock(&mi->rli.err_lock);
>
> m_row.last_error_number= (long int) mi->rli.last_error().number;
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
4
[Maria-developers] JSON_TABLE: Why one can't just use Name_resolution_context
by Sergey Petrunia 09 Apr '21
by Sergey Petrunia 09 Apr '21
09 Apr '21
Hello,
There was this question raised:
Why can't JSON_TABLE code just use the current Name_resolution_context objects?
== Context ==
Name_resolution_context object has these two members that specify a subset of
tables to use:
TABLE_LIST *first_name_resolution_table;
TABLE_LIST *last_name_resolution_table;
JSON_TABLE code also introduces this one:
table_map ignored_tables;
With the meaning that name resolution process will not search for the field of
interest in the tables listed in the bitmap.
== Rationale ==
For a JSON_TABLE invocation
select ... from json_table(ARG, ...) as JT ....
the ARG may refer to any table that precedes JT in the from clause. Let's call
the set of the tables ALLOWED_DEPS.
Can one take a JSON_TABLE invocation and pick
first_name_resolution_table=FIRST,
last_name_resolution_table=LAST
such that
ALLOWED_DEPS = { FIRST,
SECOND= FIRST->next_name_resolution_table,
THIRD= SECOND->next_name_resolution_table,
...
LAST
}
?
I claim that there are cases where this is not possible.
== An example ==
select * from
t1,
(t2 natural join
(json_table('{}', '$' COLUMNS (d for ordinality)) as JT
natural join
t3
)
);
Here, JT's ALLOWED_DEPS={t1, t2}.
Looking at the data structures (each Table name denotes a TABLE_LIST object
referring to the table)
select_lex.name_resolution_context.first_name_resolution_table= t1
t1->next_name_resolution_table= $join_nest1
$join_nest1->next_name_resolution_table= NULL
$join_nest1->nested_join.join_list= { $join_nest2, t2 }
t2->next_name_resolution_table= $join_nest2
$join_nest2->next_name_resolution_table= NULL
$join_nest2->nested_join.join_list= {t3, JT}
JT->next_name_resolution_table= t3
t3->next_name_resolution_table= NULL
Apparently there is no single chain that includes t1 and t2.
== Possible objections ==
"But nested (outer) joins do it"
Nested outer joins never need to pick a subset that includes only
- one of the tables in a NATURAL JOIN
- one of the tables outside that NATURAL JOIN.
so there's no comparison.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
1
0
Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 08 Apr '21
by Sergei Golubchik 08 Apr '21
08 Apr '21
Hi, Andrei!
On Apr 07, Andrei Elkin wrote:
> Sergei, salve.
>
> I should've asked for clarification to
>
> >>> + }
> >>> + } else
> >>
> >> add into the else branch something like
> >>
> >> /* this branch is only for the legacy TC_LOG_MMAP */
> >> compile_time_assert(sizeof(TC_LOG_MMAP) > 1 );
> >
> > [x]
>
> Actually the branch, which is
>
> } else
> if (IF_WSREP((wsrep_emulate_bin_log &&
> wsrep_is_wsrep_xid(info->list + i) &&
> x <= wsrep_limit), false) ||
> (info->commit_list ?
> my_hash_search(info->commit_list, (uchar *)&x, sizeof(x)) != 0 :
> tc_heuristic_recover == TC_HEURISTIC_RECOVER_COMMIT))
> {
> #ifndef DBUG_OFF
> int rc=
> #endif
> hton->commit_by_xid(hton, info->list+i);
>
> deals with tc_heuristic_recover.
>
> I am not sure what you meant.
I wanted to make sure all TC_LOG_MMAP code is removed when TC_LOG_MMAP
itself goes away. But apparenlty I overlooked the heuristic recover
path.
So this code deals with both heuristic recover and TC_LOG_MMAP.
Okay than, forget that comment.
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0