Hi, We're observing the following crash using galera and mariadb versions as per subject. Usually only one of the three nodes will go down, but last night two went down within minutes of each other resulting in a fairly nasty outage. Given the frequency of these crashes (about one once a month) I highly suspect we're doing something wrong causing us to run into this, but not others. So any advise appreciated. Stack trace as per logs: 241007 1:01:25 [ERROR] mysqld got signal 11 ; Sorry, we probably made a mistake, and this is a bug. Your assistance in bug reporting will enable us to fix this for the next release. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.6.17-MariaDB-log source revision: 15c75ad083a55e198ae78324f22970694b72f22b key_buffer_size=536870912 read_buffer_size=1048576 max_used_connections=522 max_threads=10002 thread_count=83 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 10498887186 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f675c000c68 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f7f78056da8 thread_stack 0x49000 /usr/sbin/mysqld(my_print_stacktrace+0x32)[0x5632cb5f2212] /usr/sbin/mysqld(handle_fatal_signal+0x2b3)[0x5632cb133db3] /lib64/libc.so.6(+0x3c760)[0x7f7f7b653760] /usr/sbin/mysqld(+0x8ee692)[0x5632cb44b692] /usr/sbin/mysqld(+0x8ef5cd)[0x5632cb44c5cd] /usr/sbin/mysqld(+0x8acfa5)[0x5632cb409fa5] /usr/sbin/mysqld(+0x24e630)[0x5632cadab630] /usr/sbin/mysqld(+0x24e6b5)[0x5632cadab6b5] /usr/sbin/mysqld(+0x949418)[0x5632cb4a6418] /usr/sbin/mysqld(+0x960b34)[0x5632cb4bdb34] /usr/sbin/mysqld(+0x8b9e7c)[0x5632cb416e7c] /usr/sbin/mysqld(_ZN7handler10ha_rnd_posEPhS0_+0x232)[0x5632cb13b022] /usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x3e4)[0x5632cb2624c4] /usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEP14rpl_group_info+0x142)[0x5632cb2629b2] /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x35f)[0x5632cb255c9f] /usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1fd)[0x5632cb3def8d] /usr/sbin/mysqld(+0x868ff0)[0x5632cb3c5ff0] /usr/sbin/mysqld(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xb5)[0x5632cb3c6ba5] /usr/sbin/mysqld(+0xb0c921)[0x5632cb669921] /usr/sbin/mysqld(+0xb1dfd6)[0x5632cb67afd6] /usr/lib64/galera/libgalera_smm.so(+0x60724)[0x7f7f7b260724] /usr/lib64/galera/libgalera_smm.so(+0x6fa67)[0x7f7f7b26fa67] /usr/lib64/galera/libgalera_smm.so(+0x74d65)[0x7f7f7b274d65] /usr/lib64/galera/libgalera_smm.so(+0x9fdc3)[0x7f7f7b29fdc3] /usr/lib64/galera/libgalera_smm.so(+0xa098e)[0x7f7f7b2a098e] /usr/lib64/galera/libgalera_smm.so(+0x75200)[0x7f7f7b275200] /usr/lib64/galera/libgalera_smm.so(+0x4f35f)[0x7f7f7b24f35f] /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5632cb67b652] /usr/sbin/mysqld(+0x8842d1)[0x5632cb3e12d1] /usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x276)[0x5632cb3d0d16] /usr/sbin/mysqld(+0x80e941)[0x5632cb36b941] /lib64/libc.so.6(+0x8ad22)[0x7f7f7b6a1d22] /lib64/libc.so.6(+0x10698c)[0x7f7f7b71d98c] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f7efdea05db): delete from `jobs` where `id` = 808933436 Connection ID (thread ID): 2 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1031460 1031460 processes Max open files 91983 91983 files Max locked memory 8388608 8388608 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1031460 1031460 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core Kernel version: Linux version 6.4.12-uls (root@sysrescue) (gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 20230826, GNU ld (Gentoo 2.41 p2) 2.41.0) #2 SMP PREEMPT_DYNAMIC Thu Jan 4 20:10:49 SAST 2024 I'm unable to locate the referenced core file or I'd already have pulled that into gdb. Any advise or pointers appreciated. I do notice there are galera updates, and a number of mariadb upgrades available. For galera this looks like improvements in logging, so probably not the cause of the crash, and for mariadb in 10.6.18 these items stand out for me: Server crashes in JOIN_CACHE::write_record_data upon EXPLAIN with subqueries and constant tables (MDEV-21102 <https://jira.mariadb.org/browse/MDEV-21102>) - but the stack trace doesn't match. Server crash in Rows_log_event::update_sequence upon replaying binary log (MDEV-31779 <https://jira.mariadb.org/browse/MDEV-31779>) - this looks feasible. If anyone can confirm my conjecture that would be great. We'd want to test 10.11 or even 11.4 in a non-production environment before making a big jump to one of those versions in production. Kind regards, Jaco