ragul rangarajan <ragulrangarajan@gmail.com> writes:
Hope my issue is more related to the issue MDEV-30780 optimistic parallel slave hangs after hit an error Trying to reproduce with a minimal database.
Attaching the gbd output
Thanks, that gdb output is really helpful! I agree with Andrei that this rules out MDEV-30780 as the cause. Instead it looks to be caused by MDEV-29843, see also MDEV-31427: https://jira.mariadb.org/browse/MDEV-29843 https://jira.mariadb.org/browse/MDEV-31427 This is seen in the stack trace, where all the other worker threads are waiting on one which is stuck inside pthread_cond_signal: ----------------------------------------------------------------------- Thread 80 (Thread 0x7f47ad065700 (LWP 25417)): #0 0x00007f789dca054d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f789dc9e14d in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #2 0x000055de401c23cd in inline_mysql_cond_signal (that=0x7f4798006b78) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1099 #3 dec_pending_ops (state=<synthetic pointer>, this=0x7f4798006b30) at /home/buildbot/buildbot/build/sql/sql_class.h:2535 #4 thd_decrement_pending_ops (thd=0x7f47980009b8) at /home/buildbot/buildbot/build/sql/sql_class.cc:5142 #5 0x000055de407b5726 in group_commit_lock::release (this=this@entry=0x55de41f0da80 <write_lock>, num=num@entry=216757233923465) at /home/buildbot/buildbot/build/storage/innobase/log/log0sync.cc:388 #6 0x000055de407a0a3c in log_write_up_to (lsn=<optimized out>, lsn@entry=216757233923297, flush_to_disk=flush_to_disk@entry=false, rotate_key=rotate_key@entry=false, callback=<optimized out>, callback@entry=0x7f47ad064090) at /home/buildbot/buildbot/build/storage/innobase/log/log0log.cc:844 ----------------------------------------------------------------------- The pthread_cond_signal() function normally can never block, so this indicates some corruption of the underlying condition object. This object is used to asynchroneously complete a query on a client connection when using the thread pool. The MDEV-29843 patch makes worker threads not use this asynchroneous completion, which should eliminate this problem. The stack trace strongly indicates MDEV-29843 as the cause. Except that MDEV-29843 patch is supposed to be in MariaDB 10.6.11, and you wrote:
Environment: MariaDB 10.6.11
Can you double-check if you are really seing this hang in 10.6.11, or if it could have been 10.6.10 (the only version that is supposed to be vulnerable to MDEV-29843)? Another thing you can check is if you are using --thread-handling=pool-of-threads, which I think is related to the MDEV-29843 issue. In MDEV-31427 I suggest --thread-handling=one-thread-per-connection as a possible work-around. Hope this helps, - Kristian.