Processes getting locked out "Opening tables"

Hi all, We have been having some transient issues with processes sticking on the Opening tables state, which means that the total number of processes grows till max processes is hit/locking out the database. OS Ubuntu sever 24.04.02 MariaDB 11.4.5 It appears to be InnoDB deadlocks. So the example today the output from SHOW ENGINE INNODB STATUS \G showed that there was a deadlock between a couple of transactions: MariaDB [(none)]> SHOW ENGINE INNODB STATUS \G *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2025-03-21 12:49:57 0x7200aeae76c0 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 49 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 1541 srv_active, 247133 srv_idle srv_master_thread log flush and writes: 248664 ---------- SEMAPHORES ---------- ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2025-03-18 15:35:00 0x72067aea16c0 *** (1) TRANSACTION: TRANSACTION 57050747951, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1 MariaDB thread id 17, OS thread handle 125372157531840, query id 101395 *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; *** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;; RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;; *** (2) TRANSACTION: TRANSACTION 57050747928, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1 MariaDB thread id 20, OS thread handle 125372148479680, query id 101369 Write_rows_log_event::write_row(-1) on table `mdl_files` *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; *** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;; RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;; Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;; *** WE ROLL BACK TRANSACTION (2) I'm assuming that the rest of the processes were waiting on the rollback to happen before they did anything else? So my questions are: Am I right in that? Is there anything which can be done to speed up rollback? Is there anything which can be done to prevent that rollback from blocking all of the other processes, the majority of which were against completely different databases? When I was investigating this the suggestion was to dump and reload all of the databases. We have 221 databases in the cluster which takes up 880GB of filesystem space. A chunk of that is the ibdata1 file which is 290GB on that server. (we now have autoshrink on but only after we'd upgraded to a version which supports it. File per table is on, so much of this is unused space.) We manage this by pointing the webservers to a different node and waiting for the deadlock to clear but I would like to fix it properly. Thanks in advance Derick -- Derick Turner - He/Him

It sounds like you need to up table_open_cache (and maybe table_definition_cache) by a significant amount. On Fri, 21 Mar 2025, 15:35 Derick Turner via discuss, < discuss@lists.mariadb.org> wrote:
Hi all,
We have been having some transient issues with processes sticking on the Opening tables state, which means that the total number of processes grows till max processes is hit/locking out the database.
OS Ubuntu sever 24.04.02
MariaDB 11.4.5
It appears to be InnoDB deadlocks. So the example today the output from SHOW ENGINE INNODB STATUS \G showed that there was a deadlock between a couple of transactions:
MariaDB [(none)]> SHOW ENGINE INNODB STATUS \G *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2025-03-21 12:49:57 0x7200aeae76c0 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 49 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 1541 srv_active, 247133 srv_idle srv_master_thread log flush and writes: 248664 ---------- SEMAPHORES ---------- ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2025-03-18 15:35:00 0x72067aea16c0 *** (1) TRANSACTION: TRANSACTION 57050747951, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1 MariaDB thread id 17, OS thread handle 125372157531840, query id 101395 *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
*** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
*** (2) TRANSACTION: TRANSACTION 57050747928, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1 MariaDB thread id 20, OS thread handle 125372148479680, query id 101369 Write_rows_log_event::write_row(-1) on table `mdl_files` *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
*** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
*** WE ROLL BACK TRANSACTION (2)
I'm assuming that the rest of the processes were waiting on the rollback to happen before they did anything else?
So my questions are:
Am I right in that? Is there anything which can be done to speed up rollback? Is there anything which can be done to prevent that rollback from blocking all of the other processes, the majority of which were against completely different databases?
When I was investigating this the suggestion was to dump and reload all of the databases. We have 221 databases in the cluster which takes up 880GB of filesystem space. A chunk of that is the ibdata1 file which is 290GB on that server. (we now have autoshrink on but only after we'd upgraded to a version which supports it. File per table is on, so much of this is unused space.)
We manage this by pointing the webservers to a different node and waiting for the deadlock to clear but I would like to fix it properly.
Thanks in advance
Derick
-- Derick Turner - He/Him
_______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org

Thanks Gordon, That's a good shout. Current settings for table_open_cache is set to default of 2000. I've run a test to see what the difference in the opened_tables value is in the space of a minute and that returned 3426. So should definitely look at increasing that value. table_definition_cache is currently 103784 Is there a rubric for working out what these should be set to? I've re-run the MySQLTuner script against the DB but it's not mentioned anything about adjusting table_open_cache. It gave a suggestion of a small increase to the table_definition_cache (106246) Kind regards Derick On 21/03/2025 13:38, Gordan Bobic via discuss wrote:
It sounds like you need to up table_open_cache (and maybe table_definition_cache) by a significant amount.
On Fri, 21 Mar 2025, 15:35 Derick Turner via discuss, <discuss@lists.mariadb.org> wrote:
Hi all,
We have been having some transient issues with processes sticking on the Opening tables state, which means that the total number of processes grows till max processes is hit/locking out the database.
OS Ubuntu sever 24.04.02
MariaDB 11.4.5
It appears to be InnoDB deadlocks. So the example today the output from SHOW ENGINE INNODB STATUS \G showed that there was a deadlock between a couple of transactions:
MariaDB [(none)]> SHOW ENGINE INNODB STATUS \G *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2025-03-21 12:49:57 0x7200aeae76c0 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 49 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 1541 srv_active, 247133 srv_idle srv_master_thread log flush and writes: 248664 ---------- SEMAPHORES ---------- ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2025-03-18 15:35:00 0x72067aea16c0 *** (1) TRANSACTION: TRANSACTION 57050747951, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1 MariaDB thread id 17, OS thread handle 125372157531840, query id 101395 *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
*** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
*** (2) TRANSACTION: TRANSACTION 57050747928, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1 MariaDB thread id 20, OS thread handle 125372148479680, query id 101369 Write_rows_log_event::write_row(-1) on table `mdl_files` *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
*** CONFLICTING WITH: RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock mode S Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 393062613639666233633334373364663239333634323134316333303735; asc 90ba69fb3c3473df293642141c3075; (total 40 bytes); 1: len 8; hex 80000000003ac8e3; asc : ;;
Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 30; hex 393062613636353139363935316562666532646631636535653238326366; asc 90ba665196951ebfe2df1ce5e282cf; (total 40 bytes); 1: len 8; hex 8000000000b78d51; asc Q;;
*** WE ROLL BACK TRANSACTION (2)
I'm assuming that the rest of the processes were waiting on the rollback to happen before they did anything else?
So my questions are:
Am I right in that? Is there anything which can be done to speed up rollback? Is there anything which can be done to prevent that rollback from blocking all of the other processes, the majority of which were against completely different databases?
When I was investigating this the suggestion was to dump and reload all of the databases. We have 221 databases in the cluster which takes up 880GB of filesystem space. A chunk of that is the ibdata1 file which is 290GB on that server. (we now have autoshrink on but only after we'd upgraded to a version which supports it. File per table is on, so much of this is unused space.)
We manage this by pointing the webservers to a different node and waiting for the deadlock to clear but I would like to fix it properly.
Thanks in advance
Derick
-- Derick Turner - He/Him
_______________________________________________ discuss mailing list -- discuss@lists.mariadb.org To unsubscribe send an email to discuss-leave@lists.mariadb.org
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
-- Derick Turner - He/Him

On Fri, 21 Mar 2025 at 16:19, Derick Turner <derick@e-learndesign.co.uk> wrote:
Thanks Gordon,
That's a good shout. Current settings for table_open_cache is set to default of 2000. I've run a test to see what the difference in the opened_tables value is in the space of a minute and that returned 3426. So should definitely look at increasing that value.
table_definition_cache is currently 103784
That sounds very high, unless you genuinely have > 100K tables in your database (which seems rather high)
Is there a rubric for working out what these should be set to?
Ideally table_definition_cache should be set to the number of tables you have, up to about 128K. Maybe 256K at a push. Beyond that you will start to run into file handle limits even if you set LimitNOFILE to 1M in your service script. table_open_cache is less straightforward. Ideall you should size it so that it never gets maxed out. Usually at least equal to table_definition_cache, often multiples. If you are running into file handle limits, then you can assess whether it is passably adequate by checking the table cache hit rate and making sure it doesn't drop below 99%. A decent monitoring tool like Percona's PMM or Shattered Silicon's SSM have graphs for that. SSM also has a tuning advisor dashboard for the most important clearly determinable tunables.
I've re-run the MySQLTuner script against the DB but it's not mentioned anything about adjusting table_open_cache. It gave a suggestion of a small increase to the table_definition_cache (106246)
MySQLTuner is, IMHO, somewhere in the grey area between useless and misleading. Several of the things it says are based on premises that are outright false. -- Gordan Bobic Database Specialist, Shattered Silicon Ltd. https://shatteredsilicon.net

On 21/03/2025 14:32, Gordan Bobic via discuss wrote:
On Fri, 21 Mar 2025 at 16:19, Derick Turner<derick@e-learndesign.co.uk> wrote:
Thanks Gordon,
That's a good shout. Current settings for table_open_cache is set to default of 2000. I've run a test to see what the difference in the opened_tables value is in the space of a minute and that returned 3426. So should definitely look at increasing that value.
table_definition_cache is currently 103784 That sounds very high, unless you genuinely have > 100K tables in your database (which seems rather high)
Is there a rubric for working out what these should be set to? Ideally table_definition_cache should be set to the number of tables you have, up to about 128K. Maybe 256K at a push. Beyond that you will start to run into file handle limits even if you set LimitNOFILE to 1M in your service script. table_open_cache is less straightforward. Ideall you should size it so that it never gets maxed out. Usually at least equal to table_definition_cache, often multiples. If you are running into file handle limits, then you can assess whether it is passably adequate by checking the table cache hit rate and making sure it doesn't drop below 99%. OK - Sounds like I need to do some tuning calculations then. Or look at
Database servers have 221 databases defined and, yes, 108080 ibd files. the overall infrastructure and split the databases off onto separate sets of clusters (if we are getting into diminishing returns territory)
A decent monitoring tool like Percona's PMM or Shattered Silicon's SSM have graphs for that. SSM also has a tuning advisor dashboard for the most important clearly determinable tunables.
Thanks for that - I'll look at getting SSM set up.
I've re-run the MySQLTuner script against the DB but it's not mentioned anything about adjusting table_open_cache. It gave a suggestion of a small increase to the table_definition_cache (106246) MySQLTuner is, IMHO, somewhere in the grey area between useless and misleading. Several of the things it says are based on premises that are outright false.
Noted :) -- Derick Turner - He/Him

On Fri, 21 Mar 2025 at 17:36, Derick Turner <derick@e-learndesign.co.uk> wrote:
On 21/03/2025 14:32, Gordan Bobic via discuss wrote:
On Fri, 21 Mar 2025 at 16:19, Derick Turner <derick@e-learndesign.co.uk> wrote:
Thanks Gordon,
That's a good shout. Current settings for table_open_cache is set to default of 2000. I've run a test to see what the difference in the opened_tables value is in the space of a minute and that returned 3426. So should definitely look at increasing that value.
table_definition_cache is currently 103784
That sounds very high, unless you genuinely have > 100K tables in your database (which seems rather high)
Database servers have 221 databases defined and, yes, 108080 ibd files.
That is on the high side, but you are probably OK unless that grows by more than about 20% in the near future.
Is there a rubric for working out what these should be set to?
Ideally table_definition_cache should be set to the number of tables you have, up to about 128K. Maybe 256K at a push. Beyond that you will start to run into file handle limits even if you set LimitNOFILE to 1M in your service script. table_open_cache is less straightforward. Ideall you should size it so that it never gets maxed out. Usually at least equal to table_definition_cache, often multiples. If you are running into file handle limits, then you can assess whether it is passably adequate by checking the table cache hit rate and making sure it doesn't drop below 99%.
OK - Sounds like I need to do some tuning calculations then. Or look at the overall infrastructure and split the databases off onto separate sets of clusters (if we are getting into diminishing returns territory)
You are probably OK for now unless you plan to add thousands more tables in the near future.
A decent monitoring tool like Percona's PMM or Shattered Silicon's SSM have graphs for that. SSM also has a tuning advisor dashboard for the most important clearly determinable tunables.
Thanks for that - I'll look at getting SSM set up.
See here: https://shatteredsilicon.net/downloads/ Feel free to drop me a message off list, happy to talk you through it. -- Gordan Bobic Database Specialist, Shattered Silicon Ltd. https://shatteredsilicon.net

Update on this one. I upped the open_table_cache to 30,000 to see what impact this would have. I would have expected the memory foot print of the MariaDB process to grow to about 30GB but it has stayed at ~27GB (as reported by btop). This was from the SQL calculation: SELECT @@innodb_buffer_pool_size + @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@table_open_cache * @@open_files_limit + (@@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size) * @@max_connections AS MaxMemoryEstimate; Value for that returns 32082607104 (29.88GB if my maths is right). The variable is correct in the DB when I query it, but is this expected? Is it being limited by some other server/OS limit? If the memory usage remains the same I can increase it without having to add more RAM - I just don't want it to suddenly spike, run out of available resource and come up against the OOM Killer again. Kind regards Derick On 21/03/2025 15:41, Gordan Bobic wrote:
On Fri, 21 Mar 2025 at 17:36, Derick Turner<derick@e-learndesign.co.uk> wrote:
On 21/03/2025 14:32, Gordan Bobic via discuss wrote:
On Fri, 21 Mar 2025 at 16:19, Derick Turner<derick@e-learndesign.co.uk> wrote:
Thanks Gordon,
That's a good shout. Current settings for table_open_cache is set to default of 2000. I've run a test to see what the difference in the opened_tables value is in the space of a minute and that returned 3426. So should definitely look at increasing that value.
table_definition_cache is currently 103784
That sounds very high, unless you genuinely have > 100K tables in your database (which seems rather high)
Database servers have 221 databases defined and, yes, 108080 ibd files. That is on the high side, but you are probably OK unless that grows by more than about 20% in the near future.
Is there a rubric for working out what these should be set to?
Ideally table_definition_cache should be set to the number of tables you have, up to about 128K. Maybe 256K at a push. Beyond that you will start to run into file handle limits even if you set LimitNOFILE to 1M in your service script. table_open_cache is less straightforward. Ideall you should size it so that it never gets maxed out. Usually at least equal to table_definition_cache, often multiples. If you are running into file handle limits, then you can assess whether it is passably adequate by checking the table cache hit rate and making sure it doesn't drop below 99%.
OK - Sounds like I need to do some tuning calculations then. Or look at the overall infrastructure and split the databases off onto separate sets of clusters (if we are getting into diminishing returns territory) You are probably OK for now unless you plan to add thousands more tables in the near future.
A decent monitoring tool like Percona's PMM or Shattered Silicon's SSM have graphs for that. SSM also has a tuning advisor dashboard for the most important clearly determinable tunables.
Thanks for that - I'll look at getting SSM set up. See here:https://shatteredsilicon.net/downloads/ Feel free to drop me a message off list, happy to talk you through it.
-- Derick Turner - He/Him

On Mon, 24 Mar 2025, 19:54 Derick Turner, <derick@e-learndesign.co.uk> wrote:
Update on this one.
I upped the open_table_cache to 30,000 to see what impact this would have. I would have expected the memory foot print of the MariaDB process to grow to about 30GB but it has stayed at ~27GB (as reported by btop). This was from the SQL calculation:
SELECT @@innodb_buffer_pool_size + @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@table_open_cache * @@open_files_limit + (@@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size) * @@max_connections AS MaxMemoryEstimate;
That makes no sense, a file handle is about 1KB, so 30K of table_open_cache won't add more than 30MB.
Value for that returns 32082607104 (29.88GB if my maths is right).
There is a lot wrong with that equation.

On 24/03/2025 18:00, Gordan Bobic wrote:
On Mon, 24 Mar 2025, 19:54 Derick Turner, <derick@e-learndesign.co.uk> wrote:
Update on this one.
I upped the open_table_cache to 30,000 to see what impact this would have. I would have expected the memory foot print of the MariaDB process to grow to about 30GB but it has stayed at ~27GB (as reported by btop). This was from the SQL calculation:
SELECT @@innodb_buffer_pool_size + @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@table_open_cache * @@open_files_limit + (@@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size) * @@max_connections AS MaxMemoryEstimate;
That makes no sense, a file handle is about 1KB, so 30K of table_open_cache won't add more than 30MB.
Ah - that is excellent news. I worried that a large value would absolutely kill the memory. So, plenty of space to increase this a long way.
Value for that returns 32082607104 (29.88GB if my maths is right).
There is a lot wrong with that equation.
Anything quick and dirty you would recommend instead (And yes I still need to set up the proper monitoring tool :) -- Derick Turner - He/Him

On Mon, 24 Mar 2025 at 20:10, Derick Turner <derick@e-learndesign.co.uk> wrote:
On 24/03/2025 18:00, Gordan Bobic wrote:
On Mon, 24 Mar 2025, 19:54 Derick Turner, <derick@e-learndesign.co.uk> wrote:
Update on this one.
I upped the open_table_cache to 30,000 to see what impact this would have. I would have expected the memory foot print of the MariaDB process to grow to about 30GB but it has stayed at ~27GB (as reported by btop). This was from the SQL calculation:
SELECT @@innodb_buffer_pool_size + @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@table_open_cache * @@open_files_limit + (@@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size) * @@max_connections AS MaxMemoryEstimate;
That makes no sense, a file handle is about 1KB, so 30K of table_open_cache won't add more than 30MB.
Ah - that is excellent news. I worried that a large value would absolutely kill the memory. So, plenty of space to increase this a long way.
Value for that returns 32082607104 (29.88GB if my maths is right).
There is a lot wrong with that equation.
Anything quick and dirty you would recommend instead (And yes I still need to set up the proper monitoring tool :)
https://shatteredsilicon.net/mariadb-performance-tuning-and-mysql-performanc... ``` SELECT ( ( @@binlog_file_cache_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size + @@query_cache_size + ( @@max_connections * ( @@binlog_cache_size + @@binlog_stmt_cache_size + @@bulk_insert_buffer_size + @@join_buffer_size + @@net_buffer_length + @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + @@thread_stack ) ) + ( @@slave_parallel_threads * ( @@slave_parallel_max_queued ) ) + ( @@open_files_limit * 1024 ) ) / 1024 / 1024 / 1024) AS max_memory_GB; ``` All file handles have to fit within the open_files_limit, so there is no need to add table_open_cache, max_connections, table_definition_cache, etc, separately. tmp_table_size is per temporary table, and there is no hard limit on how many of those could be created by a query or how many can be active in a session, so there is no reasonable way to account for it mechanistically. max_packet_length could in theory be reached by each connection if you are sending queries large enough to max it out, but that is absurdly unlikely unless all of your threads are importing mysqldump-ed tables with massive multi-row INSERTs. -- Gordan Bobic Database Specialist, Shattered Silicon Ltd. https://shatteredsilicon.net

We had another event today. Everything went from fine with respect to cache hits (99.9% open table cache) and INNODB buffer pool all good (22GB size) to 15% Open table cache hit with 0 file opens and 3.11 misses and INNODB buffer pool size of 475MB. The graphs on SSM were interesting (and where I got that information) Only unusual entry in the error log was: 2025-03-27 17:37:56 3194063 [Warning] InnoDB: A long wait (152 seconds) was observed for dict_sys.latch (17:35 was when SSM was showing everything nose-diving) This wait time kept growing over the next few minutes till: 2025-03-27 17:41:17 3193777 [Warning] InnoDB: A long wait (354 seconds) was observed for dict_sys.latch I'd already switched our webservers off of the stricken DB server but everything came unstuck after that last error log entry. What would be causing the dict_sys.latch issue? What can be done to fix it? Kind regards Derick On 24/03/2025 18:20, Gordan Bobic via discuss wrote:
On Mon, 24 Mar 2025 at 20:10, Derick Turner <derick@e-learndesign.co.uk> wrote:
On 24/03/2025 18:00, Gordan Bobic wrote:
On Mon, 24 Mar 2025, 19:54 Derick Turner, <derick@e-learndesign.co.uk> wrote:
Update on this one.
I upped the open_table_cache to 30,000 to see what impact this would have. I would have expected the memory foot print of the MariaDB process to grow to about 30GB but it has stayed at ~27GB (as reported by btop). This was from the SQL calculation:
SELECT @@innodb_buffer_pool_size + @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@table_open_cache * @@open_files_limit + (@@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size) * @@max_connections AS MaxMemoryEstimate;
That makes no sense, a file handle is about 1KB, so 30K of table_open_cache won't add more than 30MB.
Ah - that is excellent news. I worried that a large value would absolutely kill the memory. So, plenty of space to increase this a long way.
Value for that returns 32082607104 (29.88GB if my maths is right).
There is a lot wrong with that equation.
Anything quick and dirty you would recommend instead (And yes I still need to set up the proper monitoring tool :) https://shatteredsilicon.net/mariadb-performance-tuning-and-mysql-performanc...
``` SELECT ( ( @@binlog_file_cache_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size + @@query_cache_size + ( @@max_connections * ( @@binlog_cache_size + @@binlog_stmt_cache_size + @@bulk_insert_buffer_size + @@join_buffer_size + @@net_buffer_length + @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + @@thread_stack ) ) + ( @@slave_parallel_threads * ( @@slave_parallel_max_queued ) ) + ( @@open_files_limit * 1024 ) ) / 1024 / 1024 / 1024) AS max_memory_GB; ```
All file handles have to fit within the open_files_limit, so there is no need to add table_open_cache, max_connections, table_definition_cache, etc, separately.
tmp_table_size is per temporary table, and there is no hard limit on how many of those could be created by a query or how many can be active in a session, so there is no reasonable way to account for it mechanistically. max_packet_length could in theory be reached by each connection if you are sending queries large enough to max it out, but that is absurdly unlikely unless all of your threads are importing mysqldump-ed tables with massive multi-row INSERTs.
-- Derick Turner - He/Him

On Thu, 27 Mar 2025 at 20:14, Derick Turner <derick@e-learndesign.co.uk> wrote:
We had another event today.
Everything went from fine with respect to cache hits (99.9% open table cache) and INNODB buffer pool all good (22GB size) to 15% Open table cache hit with 0 file opens and 3.11 misses and INNODB buffer pool size of 475MB. The graphs on SSM were interesting (and where I got that information)
Are you saying that your buffer pool dropped from 22GB to 475MB? The only thing that can cause that is if mysqld/mariadbd crashed and was restarted. Do you have enough file handles? The defaults in the MariaDB systemd service aren't particularly generous, it is possible your increase of table_open_cache didn't actually fully take effect because you are maxed out on file handles. Do: systemctl edit mariadb and add: [Service] LimitNOFILE=1048576 then: systemctl daemon-reload systemctl restart mariadb and see if that makes a difference. Unfortunately it is rather difficult to guess what's going on based purely on the data points you mentioned thus far.
Only unusual entry in the error log was:
2025-03-27 17:37:56 3194063 [Warning] InnoDB: A long wait (152 seconds) was observed for dict_sys.latch
(17:35 was when SSM was showing everything nose-diving)
This wait time kept growing over the next few minutes till:
2025-03-27 17:41:17 3193777 [Warning] InnoDB: A long wait (354 seconds) was observed for dict_sys.latch
I'd already switched our webservers off of the stricken DB server but everything came unstuck after that last error log entry.
What would be causing the dict_sys.latch issue? What can be done to fix it?
There seem to be at least 13 still open bugs (plus probably some more that have been merged for next release) that could be causing this: https://jira.mariadb.org/browse/MDEV-34988?jql=status%20%3D%20Open%20AND%20t... -- Gordan Bobic Database Specialist, Shattered Silicon Ltd. https://shatteredsilicon.net

Nothing in the syslog or error log to suggest that MariaDB crashed or was restarted in any way. Just what I saw in the graphs. There were some memory pressure events in error log around that time too, reporting a number of pages being released - which could be related. LimitNOFILE was set to 200000 (well below the table open cache) and mysql user ulimit for files was set to 999999. I've upped the first to 2097152 and the second to unlimited. Will apply these to the servers in turn and see if this makes a difference. Thanks again for all your help! Derick On 27/03/2025 21:43, Gordan Bobic via discuss wrote:
On Thu, 27 Mar 2025 at 20:14, Derick Turner <derick@e-learndesign.co.uk> wrote:
We had another event today.
Everything went from fine with respect to cache hits (99.9% open table cache) and INNODB buffer pool all good (22GB size) to 15% Open table cache hit with 0 file opens and 3.11 misses and INNODB buffer pool size of 475MB. The graphs on SSM were interesting (and where I got that information) Are you saying that your buffer pool dropped from 22GB to 475MB? The only thing that can cause that is if mysqld/mariadbd crashed and was restarted.
Do you have enough file handles? The defaults in the MariaDB systemd service aren't particularly generous, it is possible your increase of table_open_cache didn't actually fully take effect because you are maxed out on file handles.
Do: systemctl edit mariadb
and add: [Service] LimitNOFILE=1048576
then: systemctl daemon-reload systemctl restart mariadb
and see if that makes a difference.
Unfortunately it is rather difficult to guess what's going on based purely on the data points you mentioned thus far.
Only unusual entry in the error log was:
2025-03-27 17:37:56 3194063 [Warning] InnoDB: A long wait (152 seconds) was observed for dict_sys.latch
(17:35 was when SSM was showing everything nose-diving)
This wait time kept growing over the next few minutes till:
2025-03-27 17:41:17 3193777 [Warning] InnoDB: A long wait (354 seconds) was observed for dict_sys.latch
I'd already switched our webservers off of the stricken DB server but everything came unstuck after that last error log entry.
What would be causing the dict_sys.latch issue? What can be done to fix it? There seem to be at least 13 still open bugs (plus probably some more that have been merged for next release) that could be causing this: https://jira.mariadb.org/browse/MDEV-34988?jql=status%20%3D%20Open%20AND%20t...
-- Derick Turner - He/Him
participants (2)
-
Derick Turner
-
Gordan Bobic