Re: [Maria-developers] [Commits] 0f97f6b8398: MDEV-17346 parallel slave start and stop races to workers disappeared
Kristian, hello. Perhaps you could review this worker pool patch though it actually follows up changes made by MDEV-9573, by Monty. Cheers, Andrei
revision-id: 0f97f6b8398054ccb0507fbacc76c9deeddd47a4 (mariadb-10.1.35-71-g0f97f6b8398) parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b author: Andrei Elkin committer: Andrei Elkin timestamp: 2018-10-03 15:42:12 +0300 message:
MDEV-17346 parallel slave start and stop races to workers disappeared
The bug appears as a slave SQL thread hanging in rpl_parallel_thread_pool::get_thread() while there are no slave worker threads to awake it.
The hang could occur at parallel slave worker pool activation by a "new" started SQL thread when the pool was concurrently deactivated by being terminated "old" SQL thread. At reading the current pool size the SQL thread did not employ necessary protection designed by MDEV-9573. The pool can't be deactivated when there is an active slave but the "new" slave might set its active status too late while seeing the pool still non-empty. The end product of four computaional events
Old_slave:any_slave_sql_running() => 0 New_slave:slave_running= "active" New_slave:global_rpl_thread_pool.size > 0 => true Old_slave:global_rpl_thread_pool.size := 0
could led to the observed hang. The new SQL thread proceeds to scheduling having all workers gone.
Fixed with making the SQL thread at the pool activation first to grab the same lock as potential deactivator also does prior to destroy the pool.
--- sql/rpl_parallel.cc | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 35cddee6d4d..8fef2d66635 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -1617,13 +1617,32 @@ int rpl_parallel_resize_pool_if_no_slaves(void) }
+/** + Pool activation is preceeded by taking a "lock" of pool_mark_busy + which guarantees the number of running slaves drops to zero atomicly + with the number of pool workers. + This resolves race between the function caller thread and one + that may be attempting to deactivate the pool. +*/ int rpl_parallel_activate_pool(rpl_parallel_thread_pool *pool) { + int rc= 0; + + if ((rc= pool_mark_busy(pool, current_thd))) + return rc; // killed + if (!pool->count) - return rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads, - 0); - return 0; + { + pool_mark_not_busy(pool); + rc= rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads, + 0); + } + else + { + pool_mark_not_busy(pool); + } + return rc; }
_______________________________________________ commits mailing list commits@mariadb.org https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits
participants (1)
-
andrei.elkin@pp.inet.fi