We're running MariaDB 10.1.16 on Debian 7 (wheezy). We were running v5.5.51 but we started having occasional unrecoverable semaphore waits once or twice a month. We'd have to shutdown all Apache processes running PHP code and then shutdown and restart MariaDB. Lastly we'd restart the Apache servers.

That got us by until one day we had the same thing happen every 5 minutes after we started everything back up. It was nearly exactly 5 minutes from the time the Apache services would start that we'd see the beginning symptoms (connection counts surging, CPU hitting the roof) and then within two minutes from there the log would start filling with semaphore wait messages.

So I performed an emergency upgrade to 10.1. Its been about a month and I thought this had solved the problem until last Monday... when it happened again. The usual restart process fixed it and I haven't seen any more of those messages logged.

There is a ton of information I could give but I'm not sure what will be helpful. So I'll start with the basics:

hardware/os:
12 core Intel Xeon @ 1.8GHz
64GB of RAM.
6x Samsung 850 SSDs in software RAID6 array.
Debian 7 64bit Linux

General MariaDB info:
v10.1.16 installed from MariaDB repos.
12,000 connection limit
<1,000 typically used.
Using XtraDB on all DBs other than "mysql".
InnoDB pool size 15GB
Total DB file size ~3GB
Adaptive hash index turned off
Individual InnoDB files

Workload has heavy writes, lots of subqueries and temp tables.

I counted up the various wait messages and grouped them by source file and line # and came up with this:
lock0lock.cc:05075:     16
lock0lock.cc:06671:     2
lock0lock.cc:06822:     18602
lock0lock.cc:07078:     2
lock0lock.cc:07159:     2492
lock0lock.cc:07631:     16
lock0lock.cc:07721:     6
lock0wait.cc:00079:     34301
lock0wait.cc:00097:     9
lock0wait.cc:00247:     22996
lock0wait.cc:00291:     2
lock0wait.cc:00358:     22
lock0wait.cc:00485:     11
lock0wait.cc:00543:     14
row0ins.cc:01846:       60
row0mysql.cc:01772:     172
row0undo.cc:00298:      619
srv0srv.cc:02874:       349
srv0srv.cc:03573:       2
trx0rec.cc:01458:       97

This was from the period of time covering from when I noticed a problem up to the time I shutdown MariaDB. I'm sure there are plenty of dupes in there. But thought that it might give someone an idea of where to look next.

I'm not sure where to start looking so I was hoping to get pointed in the right direction. I have config files, log output and probably anything else someone might want.

Thanks in advance for any help!

THX - Jon

-- 
Sent from my Debian Linux workstation -- http://www.debian.org/intro/about

Jon Foster
JF Possibilities, Inc.
jon@jfpossibilities.com
541-410-2760
Making computers work for you!