Hi Sergei,
After QA runs done by Ramesh, we now know the latest fix candidate i.e. what is in bb-10.2-MDEV-25114-galera-v2 is incorrect. Problem is in wsrep_close_connections() as it holds LOCK_thread_count while it does abort_replicated that will call wsrep_abort_transaction and there we use find_thread_by_id that would also take LOCK_thread_count. As there is another code path here, the problem is not easily fixed. We can't just release LOCK_thread_count at wsrep_close_connections as we iterate the thread list.
I must say I'm not sure what to do now.
(gdb) bt
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1 0x000056333963a2e8 in my_write_core (sig=sig@entry=6) at /test/mtest/10.2_dbg/mysys/stacktrace.c:382
#2 0x0000563338f2993d in handle_fatal_signal (sig=6) at /test/mtest/10.2_dbg/sql/signal_handler.cc:355
#3 <signal handler called>
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5 0x000014b799572859 in __GI_abort () at abort.c:79
#6 0x000056333963edc4 in safe_mutex_lock (mp=0x563339e47220 <LOCK_thread_count>, my_flags=my_flags@entry=0,
file=file@entry=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc", line=line@entry=8902)
at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
#7 0x0000563338d1aea7 in inline_mysql_mutex_lock (src_line=8902, src_file=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc",
that=<optimized out>) at /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
#8 find_thread_by_id (id=id@entry=48, query_id=query_id@entry=false) at /test/mtest/10.2_dbg/sql/sql_parse.cc:8902
#9 0x0000563339134c39 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=0x14b680000d90, victim_thd=<optimized out>,
signal=<optimized out>) at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19821
#10 0x0000563338f38cbf in ha_abort_transaction (bf_thd=bf_thd@entry=0x14b680000d90, victim_thd=victim_thd@entry=0x14b680000d90,
signal=signal@entry=1 '\001') at /test/mtest/10.2_dbg/sql/handler.cc:6327
#11 0x0000563338ebed6d in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x14b680000d90, victim_thd_ptr=victim_thd_ptr@entry=0x14b680000d90,
signal=signal@entry=1 '\001') at /test/mtest/10.2_dbg/sql/wsrep_thd.cc:832
#12 0x0000563338eaa2bd in abort_replicated (thd=thd@entry=0x14b680000d90) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2269
#13 0x0000563338eae097 in wsrep_close_client_connections (wait_to_end=wait_to_end@entry=1 '\001',
except_caller_thd=except_caller_thd@entry=0x0) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2437
#14 0x0000563338eaedf6 in wsrep_stop_replication (thd=thd@entry=0x0) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:962
#15 0x0000563338c543d8 in kill_server (sig_ptr=sig_ptr@entry=0x0) at /test/mtest/10.2_dbg/sql/mysqld.cc:2009
#16 0x0000563338c558d5 in kill_server_thread (arg=<optimized out>) at /test/mtest/10.2_dbg/sql/mysqld.cc:2047
#17 0x000014b799a7a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#18 0x000014b79966f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
R: Jan