We're seeing a sudden memory usage issue on one of our replication slaves where memory usage continues to grow until OOMkiller takes out the process - at least until we dropped pool size down substantially. This server has been running in production for a few years now without any issues but it appears a recent update (Debian 12.5 repo) has caused something to break. The system is currently configured with a max inno buffer pool of 100G, but the process still chews up nearly 500G of RAM doing nothing but processing replication data (no clients).
Server version: 10.11.6-MariaDB-0+deb12u1-log Debian 12
MariaDB [(none)]> show processlist;
+-------+-------------+-----------+--------------+--------------+------+--------------------------------------------------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-------+-------------+-----------+--------------+--------------+------+--------------------------------------------------------------+------------------+----------+
| 5 | system user | | NULL | Slave_IO | 2504 | Waiting for master to send event | NULL | 0.000 |
| 7 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 9 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 10 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 11 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 12 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 13 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 14 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 15 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 16 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 17 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 18 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 19 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 20 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 21 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 22 | system user | | NULL | Slave_worker | 0 | Waiting for prior transaction to start commit | NULL | 0.000 |
| 6 | system user | | NULL | Slave_SQL | 213 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
....
MariaDB [(none)]> SELECT ((@@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size + @@query_cache_size + 12 * (@@bulk_insert_buffer_size + @@join_buffer_
size + @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + @@tmp_table_size)) / 1024 / 1024 / 1024) AS max_memory_GB;
+------------------+
| max_memory_GB |
+------------------+
| 124.333496093750 |
+------------------+
1 row in set (0.000 sec)
Does anyone have a clue how to figure out what is actually consuming this memory internally and how we could mitigate it? The system is barely stable after lowering the pool size to 100G but this isn't ideal.
thanks
-C