Re: [Maria-developers] [Commits] 7cabdc461b2: MDEV-6860 Parallel async replication hangs on a Galera node
sachin.setiya@mariadb.com writes:
revision-id: 7cabdc461b24fdebe599799d7964efa4b53815e3 (mariadb-10.1.39-91-g7cabdc461b2)
MDEV-6860 Parallel async replication hangs on a Galera node
Wait for previous commit beore preparing next transation for galera
diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 8fef2d66635..7d38c36b840 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -1181,7 +1181,7 @@ handle_rpl_parallel_thread(void *arg) before, then wait now for the prior transaction to complete its commit. */ - if (rgi->speculation == rpl_group_info::SPECULATE_WAIT && + if ((rgi->speculation == rpl_group_info::SPECULATE_WAIT || WSREP_ON) && (err= thd->wait_for_prior_commit()))
Ouch! That's killing _all_ parallel replication when WSREP_ON :-/ Do you really need to do this? It seems quite a restriction for replicating to Galera if parallel replication is not allowed. (I wonder if this isn't just another symptom of the underlying problem that Galera has never been integrated properly into MariaDB and the group commit algorithm / transaction master?). But if the goal is to disable parallel replication in Galera, then you shouldn't do this, it will just confuse/disappoint users, and it will be slower than just using single-threaded replication. Instead, give an error if parallel replication and Galera is enabled at the same time, so users will know of the restriction. - Kristian.
Hi Kristian! On Mon, Jul 15, 2019 at 4:00 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
sachin.setiya@mariadb.com writes:
revision-id: 7cabdc461b24fdebe599799d7964efa4b53815e3 (mariadb-10.1.39-91-g7cabdc461b2)
MDEV-6860 Parallel async replication hangs on a Galera node
Wait for previous commit beore preparing next transation for galera
diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 8fef2d66635..7d38c36b840 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -1181,7 +1181,7 @@ handle_rpl_parallel_thread(void *arg) before, then wait now for the prior transaction to complete its commit. */ - if (rgi->speculation == rpl_group_info::SPECULATE_WAIT && + if ((rgi->speculation == rpl_group_info::SPECULATE_WAIT || WSREP_ON) && (err= thd->wait_for_prior_commit()))
Ouch! That's killing _all_ parallel replication when WSREP_ON :-/
Do you really need to do this? It seems quite a restriction for replicating to Galera if parallel replication is not allowed.
Actually this was just a temporary fix , it has not been reviewed by Andrei.
(I wonder if this isn't just another symptom of the underlying problem that Galera has never been integrated properly into MariaDB and the group commit algorithm / transaction master?). So the actual issue , Galera sends event at the time of prepare phase and And creates deadlock.
For example lets us consider the replication A -> B <==> C (A,B parallel replication optimistic, B,C Galera cluster nodes) Lets assume 2 inserts(T1 gtid x-x-1 and T2 x-x-2) from master A arrive to slave B. 2nd insert prepares faster then 1st insert, So it has already sent the writeset to node C. Now it is the queue waiting for its turn to commit While the first insert does prepare on galera (wsrep_run_wsrep_commit), but it is stuck because T2 transaction still haven't run post_commit on galera so galera state is still in S_WAITING T2 cant run post_commit on galera because it is waiting for T1 commit , T1 cant commit because it is waiting in prepare stage for transaction T2 to clear the galera state. Backtrace from gdb Gtid_seq_no= 2 Thread 34 (Thread 0x7fcd966d2700 (LWP 23891)): #0 0x00007fcda6d56415 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 #1 0x00005569d607d380 in safe_cond_wait (cond=0x7fcd854078e8, mp=0x7fcd85407838, file=0x5569d6240360 "/home/sachin/10.1/server/include/mysql/psi/mysql_thread.h", line=1154) at /home/sachin/10.1/server/mysys/thr_mutex.c:493 #2 0x00005569d5aec4d0 in inline_mysql_cond_wait (that=0x7fcd854078e8, mutex=0x7fcd85407838, src_file=0x5569d6240cb8 "/home/sachin/10.1/server/sql/log.cc", src_line=7387) at /home/sachin/10.1/server/include/mysql/psi/mysql_thread.h:1154 #3 0x00005569d5afeee5 in MYSQL_BIN_LOG::queue_for_group_commit (this=0x5569d692d7c0 <mysql_bin_log>, orig_entry=0x7fcd966cf440) at /home/sachin/10.1/server/sql/log.cc:7387 #4 0x00005569d5aff5c9 in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=0x5569d692d7c0 <mysql_bin_log>, entry=0x7fcd966cf440) at /home/sachin/10.1/server/sql/log.cc:7607 #5 0x00005569d5afecff in MYSQL_BIN_LOG::write_transaction_to_binlog (this=0x5569d692d7c0 <mysql_bin_log>, thd=0x7fcd84c068b0, cache_mngr=0x7fcd84c72c70, end_ev=0x7fcd966cf5e0, all=true, using_stmt_cache=true, using_trx_cache=true) at /home/sachin/10.1/server/sql/log.cc:7290 #6 0x00005569d5af0ce6 in binlog_flush_cache (thd=0x7fcd84c068b0, cache_mngr=0x7fcd84c72c70, end_ev=0x7fcd966cf5e0, all=true, using_stmt=true, using_trx=true) at /home/sachin/10.1/server/sql/log.cc:1751 #7 0x00005569d5af11bb in binlog_commit_flush_xid_caches (thd=0x7fcd84c068b0, cache_mngr=0x7fcd84c72c70, all=true, xid=2) at /home/sachin/10.1/server/sql/log.cc:1859 #8 0x00005569d5b045c8 in MYSQL_BIN_LOG::log_and_order (this=0x5569d692d7c0 <mysql_bin_log>, thd=0x7fcd84c068b0, xid=2, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/sachin/10.1/server/sql/log.cc:9575 #9 0x00005569d5a1ec0d in ha_commit_trans (thd=0x7fcd84c068b0, all=true) at /home/sachin/10.1/server/sql/handler.cc:1497 #10 0x00005569d5925e7e in trans_commit (thd=0x7fcd84c068b0) at /home/sachin/10.1/server/sql/transaction.cc:235 #11 0x00005569d5b1b1fa in Xid_log_event::do_apply_event (this=0x7fcd8542a770, rgi=0x7fcd85407800) at /home/sachin/10.1/server/sql/log_event.cc:7720 #12 0x00005569d5743fa1 in Log_event::apply_event (this=0x7fcd8542a770, rgi=0x7fcd85407800) at /home/sachin/10.1/server/sql/log_event.h:1343 #13 0x00005569d573987e in apply_event_and_update_pos_apply (ev=0x7fcd8542a770, thd=0x7fcd84c068b0, rgi=0x7fcd85407800, reason=0) at /home/sachin/10.1/server/sql/slave.cc:3479 #14 0x00005569d5739deb in apply_event_and_update_pos_for_parallel (ev=0x7fcd8542a770, thd=0x7fcd84c068b0, rgi=0x7fcd85407800) at /home/sachin/10.1/server/sql/slave.cc:3623 #15 0x00005569d597bfbe in rpt_handle_event (qev=0x7fcd85424770, rpt=0x7fcd85421c88) at /home/sachin/10.1/server/sql/rpl_parallel.cc:50 #16 0x00005569d597ed57 in handle_rpl_parallel_thread (arg=0x7fcd85421c88) at /home/sachin/10.1/server/sql/rpl_parallel.cc:1258 #17 0x00005569d5d42aa4 in pfs_spawn_thread (arg=0x7fcd85415570) at /home/sachin/10.1/server/storage/perfschema/pfs.cc:1861 #18 0x00007fcda6d5057f in start_thread () from /usr/lib/libpthread.so.0 #19 0x00007fcda5f0f0e3 in clone () from /usr/lib/libc.so.6 Gtid_seq_no= 1 Thread 33 (Thread 0x7fcd9671d700 (LWP 23890)): #0 0x00007fcda6d56415 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 #1 0x00007fcd9e7778ab in gu::Lock::wait (this=0x7fcd9671a0c0, cond=...) at galerautils/src/gu_mutex.hpp:40 #2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x7fcda12d5da0, obj=...) at galera/src/monitor.hpp:124 #3 0x00007fcd9e771f28 in galera::ReplicatorSMM::pre_commit (this=0x7fcda12d5000, trx=0x7fcd8507e000, meta=<optimized out>) at galera/src/replicator_smm.cpp:796 #5 0x00005569d59864d0 in wsrep_run_wsrep_commit (thd=0x7fcd85006a70, all=true) at /home/sachin/10.1/server/sql/wsrep_hton.cc:492 #6 0x00005569d5984d6a in wsrep_prepare (hton=0x7fcda583e270, thd=0x7fcd85006a70, all=true) at /home/sachin/10.1/server/sql/wsrep_hton.cc:208 #7 0x00005569d5a1e1b0 in prepare_or_error (ht=0x7fcda583e270, thd=0x7fcd85006a70, all=true) at /home/sachin/10.1/server/sql/handler.cc:1196 #8 0x00005569d5a1ea1c in ha_commit_trans (thd=0x7fcd85006a70, all=true) at /home/sachin/10.1/server/sql/handler.cc:1475 #9 0x00005569d5925e7e in trans_commit (thd=0x7fcd85006a70) at /home/sachin/10.1/server/sql/transaction.cc:235 #10 0x00005569d5b1b1fa in Xid_log_event::do_apply_event (this=0x7fcd8542a570, rgi=0x7fcd85407000) at /home/sachin/10.1/server/sql/log_event.cc:7720 #11 0x00005569d5743fa1 in Log_event::apply_event (this=0x7fcd8542a570, rgi=0x7fcd85407000) at /home/sachin/10.1/server/sql/log_event.h:1343 #12 0x00005569d573987e in apply_event_and_update_pos_apply (ev=0x7fcd8542a570, thd=0x7fcd85006a70, rgi=0x7fcd85407000, reason=0) at /home/sachin/10.1/server/sql/slave.cc:3479 #13 0x00005569d5739deb in apply_event_and_update_pos_for_parallel (ev=0x7fcd8542a570, thd=0x7fcd85006a70, rgi=0x7fcd85407000) at /home/sachin/10.1/server/sql/slave.cc:3623 #14 0x00005569d597bfbe in rpt_handle_event (qev=0x7fcd85423870, rpt=0x7fcd85421a80) at /home/sachin/10.1/server/sql/rpl_parallel.cc:50 #15 0x00005569d597ed57 in handle_rpl_parallel_thread (arg=0x7fcd85421a80) at /home/sachin/10.1/server/sql/rpl_parallel.cc:1258 #16 0x00005569d5d42aa4 in pfs_spawn_thread (arg=0x7fcd854152f0) at /home/sachin/10.1/server/storage/perfschema/pfs.cc:1861 #17 0x00007fcda6d5057f in start_thread () from /usr/lib/libpthread.so.0 #18 0x00007fcda5f0f0e3 in clone () from /usr/lib/libc.so.6
But if the goal is to disable parallel replication in Galera, then you shouldn't do this, it will just confuse/disappoint users, and it will be slower than just using single-threaded replication.
Instead, give an error if parallel replication and Galera is enabled at the same time, so users will know of the restriction.
Agree
- Kristian.
-- Regards Sachin Setiya Software Engineer at MariaDB
Sachin Setiya <sachin.setiya@mariadb.com> writes:
On Mon, Jul 15, 2019 at 4:00 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
(I wonder if this isn't just another symptom of the underlying problem that Galera has never been integrated properly into MariaDB and the group commit algorithm / transaction master?).
For example lets us consider the replication A -> B <==> C (A,B parallel replication optimistic, B,C Galera cluster nodes) Lets assume 2 inserts(T1 gtid x-x-1 and T2 x-x-2) from master A arrive to slave B.
2nd insert prepares faster then 1st insert, So it has already sent the writeset to node C. Now it is the queue waiting for its turn to commit
And this is the problem, IIUC. T2 has registered with the transaction coordinator that it goes after T1. Galera is not allowed to put the T2 writeset ahead of T1 since T1 is required to commit before T2. Basically, the transaction coordinator is the one that decides the commit order. In non-Galera MariaDB, the transaction coordinator is in the binlog group commit. But Galera needs to decide the commit order itself (that's the core of its synchronous replication architecture). So Galera needs to take over the role of the transaction coordinator, replacing the corresponding logic in the binlog group commit. This starts at TC_LOG_BINLOG::log_and_order(). There are already two alternate transaction coordinators (the other is TC_LOG_MMAP::log_and_order()). The whole system is designed so that something like Galera would implement TC_LOG_GALERA::log_and_order() and interface to the rest of MariaDB with functions like commit_ordered() and so on. This is the right place to start fixing all these problems that Galera has shown in MariaDB over the years, with root cause in disagreement over who decides the commit order.
While the first insert does prepare on galera (wsrep_run_wsrep_commit), but it is stuck because T2 transaction still haven't run post_commit on galera
If Galera was the transaction coordinator, it could know that T1 goes before T2 in commit order, and it could have prevented T1 from getting stuck waiting for T2. Hope this helps, - Kristian.
participants (2)
-
Kristian Nielsen
-
Sachin Setiya