Re: dbe5c20b755: MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication
Hi Marko, I'd like to ask your opinion/review of the below patch for MDEV-31482: https://github.com/MariaDB/server/commit/dbe5c20b755b87f67d87990c3baabc86666... https://jira.mariadb.org/browse/MDEV-31482 As you know, in-order parallel replication has InnoDB report lock waits. If T1 must replicate before T2, and InnoDB reports that T1 is waiting for a lock held by T2, then T2 will be killed (and retried later) so that T1 can replicate. As an optimisation, auto-increment lock waits where _not_ reported, but this turns out to be a bug. This patch just removes the exception so that such waits are reported like other waits. My only concern is the possible performance impact of this. I'm hoping it will not be very big, because in many cases the auto-increment lock is not taken for replicated transactions? My understanding is: - It is not taken for INSERT INTO t VALUES (...) (and maybe not for multi-row "INSERT ... VALUES (...), (...), ..." ?) - SELECT ... INSERT is not safe for statement-based replication, so will be logged as row in MIXED mode and give a warning in STATEMENT. But I wanted to hear you if you know any case and/or MariaDB version where a significant amount of auto-increment lock waits could be expected in parallel replication? And if so, maybe this fix should be pushed only to later MariaDB versions? - Kristian. Kristian Nielsen <knielsen@knielsen-hq.org> writes:
revision-id: dbe5c20b755b87f67d87990c3baabc866667e41b (mariadb-10.4.30-2-gdbe5c20b755) parent(s): a6114df595eeb7aeed4b050c9a3f4640c4320b5f author: Kristian Nielsen committer: Kristian Nielsen timestamp: 2023-07-09 16:45:47 +0200 message:
MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication
Remove the exception that InnoDB does not report auto-increment locks waits to the parallel replication.
There was an assumption that these waits could not cause conflicts with in-order parallel replication and thus need not be reported. However, this assumption is wrong and it is possible to get conflicts that lead to hangs for the duration of --innodb-lock-wait-timeout. This can be seen with three transactions:
1. T1 is waiting for T3 on an autoinc lock 2. T2 is waiting for T1 to commit 3. T3 is waiting on a normal row lock held by T2
Here, T3 needs to be deadlock killed on the wait by T1.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
--- mysql-test/suite/rpl/r/rpl_parallel_autoinc.result | 95 ++++++++++++++ mysql-test/suite/rpl/t/rpl_parallel_autoinc.test | 140 +++++++++++++++++++++ sql/sql_class.cc | 6 - storage/innobase/lock/lock0lock.cc | 8 +- 4 files changed, 236 insertions(+), 13 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_parallel_autoinc.result b/mysql-test/suite/rpl/r/rpl_parallel_autoinc.result new file mode 100644 index 00000000000..c1829bafa1a --- /dev/null +++ b/mysql-test/suite/rpl/r/rpl_parallel_autoinc.result @@ -0,0 +1,95 @@ +include/master-slave.inc +[connection master] +MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication +include/rpl_connect.inc [creating slave2] +include/rpl_connect.inc [creating slave3] +connection master; +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; +CREATE TABLE t1 (a INT PRIMARY KEY AUTO_INCREMENT, b INT, c INT, INDEX (c)) ENGINE=InnoDB; +INSERT INTO t1 (b,c) VALUES (0, 1), (0, 1), (0, 2), (0,3), (0, 5), (0, 7), (0, 8); +CREATE TABLE t2 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; +INSERT INTO t2 VALUES (10,1), (20,2), (30,3), (40,4), (50,5); +CREATE TABLE t3 (a VARCHAR(20) PRIMARY KEY, b INT) ENGINE=InnoDB; +INSERT INTO t3 VALUES ('row for T1', 0), ('row for T2', 0), ('row for T3', 0); +include/save_master_gtid.inc +connection slave; +include/sync_with_master_gtid.inc +include/stop_slave.inc +set @@global.slave_parallel_threads= 3; +set @@global.slave_parallel_mode= OPTIMISTIC; +set @@global.innodb_lock_wait_timeout= 20; +connection master; +BEGIN; +UPDATE t3 SET b=b+1 where a="row for T1"; +INSERT INTO t1(b, c) SELECT 1, t2.b FROM t2 WHERE a=10; +Warnings: +Note 1592 Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave +COMMIT; +DELETE FROM t1 WHERE c >= 4 and c < 6; +BEGIN; +UPDATE t3 SET b=b+1 where a="row for T3"; +INSERT INTO t1(b, c) SELECT 3, t2.b FROM t2 WHERE a >= 20 AND a <= 40; +Warnings: +Note 1592 Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave +COMMIT; +include/save_master_gtid.inc +connection slave1; +BEGIN; +SELECT * FROM t3 WHERE a="row for T1" FOR UPDATE; +a b +row for T1 0 +connection slave2; +BEGIN; +SELECT * FROM t3 WHERE a="row for T3" FOR UPDATE; +a b +row for T3 0 +connection slave3; +BEGIN; +DELETE FROM t2 WHERE a=30; +connection slave; +include/start_slave.inc +connection slave2; +ROLLBACK; +connection slave1; +ROLLBACK; +connection slave3; +ROLLBACK; +connection slave; +include/sync_with_master_gtid.inc +SELECT * FROM t1 ORDER BY a; +a b c +1 0 1 +2 0 1 +3 0 2 +4 0 3 +6 0 7 +7 0 8 +8 1 1 +9 3 2 +10 3 3 +11 3 4 +SELECT * FROM t2 ORDER BY a; +a b +10 1 +20 2 +30 3 +40 4 +50 5 +SELECT * FROM t3 ORDER BY a; +a b +row for T1 1 +row for T2 0 +row for T3 1 +connection master; +CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format"); +DROP TABLE t1, t2, t3; +connection slave; +include/stop_slave.inc +SET @@global.slave_parallel_threads= 0; +SET @@global.slave_parallel_mode= conservative; +SET @@global.innodb_lock_wait_timeout= 50; +include/start_slave.inc +SELECT @@GLOBAL.innodb_autoinc_lock_mode; +@@GLOBAL.innodb_autoinc_lock_mode +1 +include/rpl_end.inc diff --git a/mysql-test/suite/rpl/t/rpl_parallel_autoinc.test b/mysql-test/suite/rpl/t/rpl_parallel_autoinc.test new file mode 100644 index 00000000000..0e96b4dfb80 --- /dev/null +++ b/mysql-test/suite/rpl/t/rpl_parallel_autoinc.test @@ -0,0 +1,140 @@ +--source include/have_binlog_format_statement.inc +--source include/have_innodb.inc +--source include/master-slave.inc + +--echo MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication + +# The scenario is transactions T1, T2, T3: +# +# T1 is waiting for T3 on an autoinc lock +# T2 is waiting for T1 to commit +# T3 is waiting on a normal row lock held by T2 +# +# This caused a hang until innodb_lock_wait_timeout, because autoinc +# locks were not reported to the in-order parallel replication, so T3 +# was not deadlock killed. + +--let $lock_wait_timeout=20 + +--let $rpl_connection_name= slave2 +--let $rpl_server_number= 2 +--source include/rpl_connect.inc + +--let $rpl_connection_name= slave3 +--let $rpl_server_number= 2 +--source include/rpl_connect.inc + +--connection master +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; + +# A table as destination for INSERT-SELECT +CREATE TABLE t1 (a INT PRIMARY KEY AUTO_INCREMENT, b INT, c INT, INDEX (c)) ENGINE=InnoDB; +INSERT INTO t1 (b,c) VALUES (0, 1), (0, 1), (0, 2), (0,3), (0, 5), (0, 7), (0, 8); + +# A table as source for INSERT-SELECT. +CREATE TABLE t2 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; +INSERT INTO t2 VALUES (10,1), (20,2), (30,3), (40,4), (50,5); + +# A table to help order slave worker threads to setup the desired scenario. +CREATE TABLE t3 (a VARCHAR(20) PRIMARY KEY, b INT) ENGINE=InnoDB; +INSERT INTO t3 VALUES ('row for T1', 0), ('row for T2', 0), ('row for T3', 0); +--source include/save_master_gtid.inc + +--connection slave +--source include/sync_with_master_gtid.inc +--source include/stop_slave.inc +--let $save_innodb_lock_wait_timeout= `SELECT @@global.innodb_lock_wait_timeout` +--let $save_slave_parallel_threads= `SELECT @@global.slave_parallel_threads` +--let $save_slave_parallel_mode= `SELECT @@global.slave_parallel_mode` +set @@global.slave_parallel_threads= 3; +set @@global.slave_parallel_mode= OPTIMISTIC; +eval set @@global.innodb_lock_wait_timeout= $lock_wait_timeout; + +--connection master +# Transaction T1. +BEGIN; +UPDATE t3 SET b=b+1 where a="row for T1"; +INSERT INTO t1(b, c) SELECT 1, t2.b FROM t2 WHERE a=10; +COMMIT; + +# Transaction T2. +DELETE FROM t1 WHERE c >= 4 and c < 6; + +# Transaction T3. +BEGIN; +UPDATE t3 SET b=b+1 where a="row for T3"; +INSERT INTO t1(b, c) SELECT 3, t2.b FROM t2 WHERE a >= 20 AND a <= 40; +COMMIT; + +--source include/save_master_gtid.inc + +--connection slave1 +# Temporarily block T1 to create the scheduling that triggers the bug. +BEGIN; +SELECT * FROM t3 WHERE a="row for T1" FOR UPDATE; + +--connection slave2 +# Temporarily block T3 from starting (so T2 can reach commit). +BEGIN; +SELECT * FROM t3 WHERE a="row for T3" FOR UPDATE; + +--connection slave3 +# This critical step blocks T3 after it has inserted its first row, +# and thus taken the auto-increment lock, but before it has reached +# the point where it gets a row lock wait on T2. Even though +# auto-increment lock waits were not reported due to the bug, +# transitive lock waits (T1 waits on autoinc of T3 which waits on row +# on T2) _were_ reported as T1 waiting on T2, and thus a deadlock kill +# happened and the bug was not triggered. +BEGIN; +DELETE FROM t2 WHERE a=30; + +--connection slave +--source include/start_slave.inc + +# First let T2 complete until it is waiting for T1 to commit. +--let $wait_condition= SELECT count(*)=1 FROM information_schema.processlist WHERE state='Waiting for prior transaction to commit' and command LIKE 'Slave_worker'; +--source include/wait_condition.inc + +# Then let T3 reach the point where it has obtained the autoinc lock, +# but it is not yet waiting for a row lock held by T2. +--connection slave2 +ROLLBACK; +--let $wait_condition= SELECT count(*)=1 FROM information_schema.processlist WHERE state='Sending data' and info LIKE 'INSERT INTO t1(b, c) SELECT 3, t2.b%' and time_ms > 500 and command LIKE 'Slave_worker'; +--source include/wait_condition.inc + +# Now let T1 continue, while T3 is holding the autoinc lock but before +# it is waiting for T2. Wait a short while to give the hang a chance to +# happen; T1 needs to get to request the autoinc lock before we let T3 +# continue. (There's a small chance the sleep will be too small, which will +# let the test occasionally pass on non-fixed server). +--connection slave1 +ROLLBACK; +--sleep 0.5 + +# Now let T3 continue; the bug was that this lead to an undetected +# deadlock that remained until innodb lock wait timeout. +--connection slave3 +ROLLBACK; + +--connection slave +--let $slave_timeout= `SELECT $lock_wait_timeout/2` +--source include/sync_with_master_gtid.inc +--let $slave_timeout= +SELECT * FROM t1 ORDER BY a; +SELECT * FROM t2 ORDER BY a; +SELECT * FROM t3 ORDER BY a; + +# Cleanup. +--connection master +CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format"); +DROP TABLE t1, t2, t3; + +--connection slave +--source include/stop_slave.inc +eval SET @@global.slave_parallel_threads= $save_slave_parallel_threads; +eval SET @@global.slave_parallel_mode= $save_slave_parallel_mode; +eval SET @@global.innodb_lock_wait_timeout= $save_innodb_lock_wait_timeout; +--source include/start_slave.inc +SELECT @@GLOBAL.innodb_autoinc_lock_mode; +--source include/rpl_end.inc diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 73bb654080a..8ed3f8a9c5e 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -5119,12 +5119,6 @@ thd_need_wait_reports(const MYSQL_THD thd) deadlock with the pre-determined commit order, we kill the later transaction, and later re-try it, to resolve the deadlock.
- This call need only receive reports about waits for locks that will remain - until the holding transaction commits. InnoDB auto-increment locks, - for example, are released earlier, and so need not be reported. (Such false - positives are not harmful, but could lead to unnecessary kill and retry, so - best avoided). - Returns 1 if the OTHER_THD will be killed to resolve deadlock, 0 if not. The actual kill will happen later, asynchronously from another thread. The caller does not need to take any actions on the return value if the diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index 26388ad95e2..b1cf2152cd6 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -6872,13 +6872,7 @@ DeadlockChecker::search() return m_start; }
- /* We do not need to report autoinc locks to the upper - layer. These locks are released before commit, so they - can not cause deadlocks with binlog-fixed commit - order. */ - if (m_report_waiters - && (lock_get_type_low(lock) != LOCK_TABLE - || lock_get_mode(lock) != LOCK_AUTO_INC)) { + if (m_report_waiters) { thd_rpl_deadlock_check(m_start->mysql_thd, lock->trx->mysql_thd); }
Hi Kristian, On Sun, Jul 9, 2023 at 6:38 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Hi Marko,
I'd like to ask your opinion/review of the below patch for MDEV-31482:
https://github.com/MariaDB/server/commit/dbe5c20b755b87f67d87990c3baabc86666... https://jira.mariadb.org/browse/MDEV-31482
Sorry for the delay. In July, I was mostly on vacation.
As an optimisation, auto-increment lock waits where _not_ reported, but this turns out to be a bug. This patch just removes the exception so that such waits are reported like other waits.
Even if the logic had been right, it could be a good idea to avoid conditional branches in modern superscalar CPUs. https://www.intel.com/content/dam/support/us/en/documents/processors/mitigat... A fairly recent example is https://jira.mariadb.org/browse/MDEV-30567 where an attempt to create "more readable" code had unnecessarily replaced simple bitwise arithmetics with conditional branches.
My only concern is the possible performance impact of this. I'm hoping it will not be very big, because in many cases the auto-increment lock is not taken for replicated transactions? My understanding is:
- It is not taken for INSERT INTO t VALUES (...) (and maybe not for multi-row "INSERT ... VALUES (...), (...), ..." ?)
- SELECT ... INSERT is not safe for statement-based replication, so will be logged as row in MIXED mode and give a warning in STATEMENT.
My understanding is that the auto-increment locks are only needed for statement-based replication and nothing else. Unlike other InnoDB locks, they are not held until transaction commit, but released at the end of each statement. Here are some related tickets: https://jira.mariadb.org/browse/MDEV-19577 Replication does not work with innodb_autoinc_lock_mode=2 https://jira.mariadb.org/browse/MDEV-27844 Set innodb_autoinc_lock_mode=2 by default, and deprecate it The purpose of innodb_autoinc_lock_mode=2 is to disable auto-increment locking. I would like to remove the auto-increment locks altogether. I understood that we could do that in statement-based replication if the binlog event group was extended to the individual AUTO_INCREMENT values that were assigned for the rows.
But I wanted to hear you if you know any case and/or MariaDB version where a significant amount of auto-increment lock waits could be expected in parallel replication? And if so, maybe this fix should be pushed only to later MariaDB versions?
I hardly know the replication in MySQL or MariaDB, and I have not been involved in discussions on performance tests that would involve replication. I think that we may need to improve on that front. Since you asked, from my point of view, it might make sense to skip this change in 10.4 and 10.5. There were some substantial changes to InnoDB locking in 10.6, and nothing since then. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc
participants (2)
-
Kristian Nielsen
-
Marko Mäkelä