Processes getting locked out "Opening tables"

21 Mar 2025

      Hi all,

We have been having some transient issues with processes sticking on the 
Opening tables state, which means that the total number of processes 
grows till max processes is hit/locking out the database.

OS Ubuntu sever 24.04.02

MariaDB 11.4.5

It appears to be InnoDB deadlocks.  So the example today the output from 
SHOW ENGINE INNODB STATUS \G showed that there was a deadlock between a 
couple of transactions:

MariaDB [(none)]> SHOW ENGINE INNODB STATUS \G
*************************** 1. row ***************************
   Type: InnoDB
   Name:
Status:
=====================================
2025-03-21 12:49:57 0x7200aeae76c0 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 49 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1541 srv_active, 247133 srv_idle
srv_master_thread log flush and writes: 248664
----------
SEMAPHORES
----------
------------------------
LATEST DETECTED DEADLOCK
------------------------
2025-03-18 15:35:00 0x72067aea16c0
*** (1) TRANSACTION:
TRANSACTION 57050747951, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log 
entries 1
MariaDB thread id 17, OS thread handle 125372157531840, query id 101395
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock 
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 32
  0: len 30; hex 
393062613636353139363935316562666532646631636535653238326366; asc 
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock 
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 32
  0: len 30; hex 
393062613636353139363935316562666532646631636535653238326366; asc 
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

*** (2) TRANSACTION:
TRANSACTION 57050747928, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 4 row lock(s), undo log 
entries 1
MariaDB thread id 20, OS thread handle 125372148479680, query id 101369 
Write_rows_log_event::write_row(-1) on table `mdl_files`
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

*** CONFLICTING WITH:
RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747928 lock 
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 32
  0: len 30; hex 
393062613636353139363935316562666532646631636535653238326366; asc 
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

RECORD LOCKS space id 2549335 page no 82714 n bits 320 index 
mdl_file_pat_uix of table `client1`.`mdl_files` trx id 57050747951 lock 
mode S
Record lock, heap no 40 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 30; hex 
393062613639666233633334373364663239333634323134316333303735; asc 
90ba69fb3c3473df293642141c3075; (total 40 bytes);
  1: len 8; hex 80000000003ac8e3; asc      :  ;;

Record lock, heap no 208 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 32
  0: len 30; hex 
393062613636353139363935316562666532646631636535653238326366; asc 
90ba665196951ebfe2df1ce5e282cf; (total 40 bytes);
  1: len 8; hex 8000000000b78d51; asc        Q;;

*** WE ROLL BACK TRANSACTION (2)

I'm assuming that the rest of the processes were waiting on the rollback 
to happen before they did anything else?

So my questions are:

Am I right in that?
Is there anything which can be done to speed up rollback?
Is there anything which can be done to prevent that rollback from 
blocking all of the other processes, the majority of which were against 
completely different databases?

When I was investigating this the suggestion was to dump and reload all 
of the databases. We have 221 databases in the cluster which takes up 
880GB of filesystem space.  A chunk of that is the ibdata1 file which is 
290GB on that server. (we now have autoshrink on but only after we'd 
upgraded to a version which supports it.  File per table is on, so much 
of this is unused space.)

We manage this by pointing the webservers to a different node and 
waiting for the deadlock to clear but I would like to fix it properly.

Thanks in advance

Derick

-- 
Derick Turner - He/Him

Derick Turner

Gordan Bobic

Derick Turner

Gordan Bobic

Derick Turner

Gordan Bobic

Derick Turner

Gordan Bobic

Derick Turner

Gordan Bobic

Derick Turner

Gordan Bobic

Derick Turner

tags

participants (2)