[MariaDB discuss] Galera Cluster (MariaDB 10.6.15) – Severe INSERT/UPDATE Delay and DB Shutdown due to dict_sys.latch

4 Jun 2025

      ■ Environment
●Cluster: Galera Cluster (3 nodes)
●OS: CentOS 7.4
●DBMS: MariaDB 10.6.15
●DB Uptime: 509 days

■ Issue Overview
●Time of Occurrence: Between 00:00 and 02:00
●Initial Symptom: Single-row INSERT and DELETE queries were delayed by several seconds and eventually stalled
●Around 00:34: Massive UPDATE queries (targeting same PK) led to X locks and an increase in active sessions
●00:35: CPU usage on DB server hit 100% and stayed at critical levels; thread count spiked
●00:41: Galera node DB01 shut down automatically

Error log excerpt:
[ERROR][FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold for dict_sys.latch was exceeded.
See : https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

■ Root Cause (Internal Analysis)
●dict_sys.latch exceeded the innodb_fatal_semaphore_wait_threshold (default: 600 seconds)
●This caused InnoDB to forcefully kill the MariaDB process
●The dict_sys.latch is a global latch for the InnoDB data dictionary, which can become a severe bottleneck under high concurrency

❗ What’s Unusual:
●No clear sign of typical row locks or massive spike in transaction volume
●Even single-row INSERT and DELETE queries were delayed by thousands of seconds, which is highly abnormal
●No obvious external factors (lock contention, CPU saturation, or connection floods) were identified
●Strong suspicion of internal engine behavior or a bug

❓ Questions and Request for Input
●Has anyone experienced a similar issue related to dict_sys.latch in Galera Cluster environments?
●Are there known bugs or release notes in MariaDB 10.6.x or Galera that mention severe delays or process termination related to this latch?
●Any known workarounds or best practices to prevent this from recurring?

Your experience and advice would be greatly appreciated.

Thank you in advance!

[MariaDB discuss] Galera Cluster (MariaDB 10.6.15) – Severe INSERT/UPDATE Delay and DB Shutdown due to dict_sys.latch

idstevemg＠gmail.com