[MariaDB discuss] Re: Processes getting locked out "Opening tables"

27 Mar 2025

      On Thu, 27 Mar 2025 at 20:14, Derick Turner <derick@e-learndesign.co.uk> wrote:
...
We had another event today.
Everything went from fine with respect to cache hits (99.9% open table
cache) and INNODB buffer pool all good (22GB size) to 15% Open table
cache hit with 0 file opens and 3.11 misses and INNODB buffer pool size
of 475MB.  The graphs on SSM were interesting (and where I got that
information)
Are you saying that your buffer pool dropped from 22GB to 475MB?
The only thing that can cause that is if mysqld/mariadbd crashed and
was restarted.

Do you have enough file handles? The defaults in the MariaDB systemd
service aren't particularly generous, it is possible your increase of
table_open_cache didn't actually fully take effect because you are
maxed out on file handles.

Do:
systemctl edit mariadb

and add:
[Service]
LimitNOFILE=1048576

then:
systemctl daemon-reload
systemctl restart mariadb

and see if that makes a difference.

Unfortunately it is rather difficult to guess what's going on based
purely on the data points you mentioned thus far.
...
Only unusual entry in the error log was:
2025-03-27 17:37:56 3194063 [Warning] InnoDB: A long wait (152 seconds)
was observed for dict_sys.latch
(17:35 was when SSM was showing everything nose-diving)
This wait time kept growing over the next few minutes till:
2025-03-27 17:41:17 3193777 [Warning] InnoDB: A long wait (354 seconds)
was observed for dict_sys.latch
I'd already switched our webservers off of the stricken DB server but
everything came unstuck after that last error log entry.
What would be causing the dict_sys.latch issue?  What can be done to fix it?
There seem to be at least 13 still open bugs (plus probably some more
that have been merged for next release) that could be causing this:
https://jira.mariadb.org/browse/MDEV-34988?jql=status%20%3D%20Open%20AND%20t...

-- 
Gordan Bobic
Database Specialist, Shattered Silicon Ltd.
https://shatteredsilicon.net