
On Thu, 27 Mar 2025 at 20:14, Derick Turner <derick@e-learndesign.co.uk> wrote:
We had another event today.
Everything went from fine with respect to cache hits (99.9% open table cache) and INNODB buffer pool all good (22GB size) to 15% Open table cache hit with 0 file opens and 3.11 misses and INNODB buffer pool size of 475MB. The graphs on SSM were interesting (and where I got that information)
Are you saying that your buffer pool dropped from 22GB to 475MB? The only thing that can cause that is if mysqld/mariadbd crashed and was restarted. Do you have enough file handles? The defaults in the MariaDB systemd service aren't particularly generous, it is possible your increase of table_open_cache didn't actually fully take effect because you are maxed out on file handles. Do: systemctl edit mariadb and add: [Service] LimitNOFILE=1048576 then: systemctl daemon-reload systemctl restart mariadb and see if that makes a difference. Unfortunately it is rather difficult to guess what's going on based purely on the data points you mentioned thus far.
Only unusual entry in the error log was:
2025-03-27 17:37:56 3194063 [Warning] InnoDB: A long wait (152 seconds) was observed for dict_sys.latch
(17:35 was when SSM was showing everything nose-diving)
This wait time kept growing over the next few minutes till:
2025-03-27 17:41:17 3193777 [Warning] InnoDB: A long wait (354 seconds) was observed for dict_sys.latch
I'd already switched our webservers off of the stricken DB server but everything came unstuck after that last error log entry.
What would be causing the dict_sys.latch issue? What can be done to fix it?
There seem to be at least 13 still open bugs (plus probably some more that have been merged for next release) that could be causing this: https://jira.mariadb.org/browse/MDEV-34988?jql=status%20%3D%20Open%20AND%20t... -- Gordan Bobic Database Specialist, Shattered Silicon Ltd. https://shatteredsilicon.net