It sounds like you need to up table_open_cache (and maybe table_definition_cache) by a significant amount.

On Fri, 21 Mar 2025, 15:35 Derick Turner via discuss, <discuss@lists.mariadb.org> wrote:
Hi all,

We have been having some transient issues with processes sticking on the
Opening tables state, which means that the total number of processes
grows till max processes is hit/locking out the database.

OS Ubuntu sever 24.04.02

MariaDB 11.4.5

It appears to be InnoDB deadlocks.  So the example today the output from
SHOW ENGINE INNODB STATUS \G showed that there was a deadlock between a
couple of transactions:

MariaDB [(none)]> SHOW ENGINE INNODB STATUS \G
*************************** 1. row ***************************
   Type: InnoDB
   Name:
Status:
=====================================
2025-03-21 12:49:57 0x7200aeae76c0 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 49 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1541 srv_active, 247133 srv_idle
srv_master_thread log flush and writes: 248664
----------
SEMAPHORES
----------
------------------------
LATEST DETECTED DEADLOCK
------------------------
2025-03-18 15:35:00 0x72067aea16c0
*** (1) TRANSACTION:
TRANSACTION 57050747951, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log
entries 1
MariaDB thread id 17, OS thread handle 125372157531840, query id 101395
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format;
info bits 32
  0: len 30; hex
393062613636353139363935316562666532646631636535653238326366; asc
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format;
info bits 32
  0: len 30; hex
393062613636353139363935316562666532646631636535653238326366; asc
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;


*** (2) TRANSACTION:
TRANSACTION 57050747928, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 4 row lock(s), undo log
entries 1
MariaDB thread id 20, OS thread handle 125372148479680, query id 101369
Write_rows_log_event::write_row(-1) on table `mdl_files`
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format;
info bits 32
  0: len 30; hex
393062613636353139363935316562666532646631636535653238326366; asc
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

RECORD LOCKS space id 2549335 page no 82714 n bits 320 index
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
  0: len 30; hex
393062613639666233633334373364663239333634323134316333303735; asc
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format;
info bits 32
  0: len 30; hex
393062613636353139363935316562666532646631636535653238326366; asc
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

*** WE ROLL BACK TRANSACTION (2)

I'm assuming that the rest of the processes were waiting on the rollback
to happen before they did anything else?

So my questions are:

Am I right in that?
Is there anything which can be done to speed up rollback?
Is there anything which can be done to prevent that rollback from
blocking all of the other processes, the majority of which were against
completely different databases?

When I was investigating this the suggestion was to dump and reload all
of the databases. We have 221 databases in the cluster which takes up
880GB of filesystem space.  A chunk of that is the ibdata1 file which is
290GB on that server. (we now have autoshrink on but only after we'd
upgraded to a version which supports it.  File per table is on, so much
of this is unused space.)

We manage this by pointing the webservers to a different node and
waiting for the deadlock to clear but I would like to fix it properly.

Thanks in advance

Derick

--
Derick Turner - He/Him

_______________________________________________
discuss mailing list -- discuss@lists.mariadb.org
To unsubscribe send an email to discuss-leave@lists.mariadb.org