Re: [Maria-discuss] Semaphore hangs
On 09/12/16 09:43, Jon Foster wrote:
On 12/07/2016 06:04 PM, Daniel Black wrote:
On 08/12/16 08:51, Jon Foster wrote:
We are having trouble with MariaDB hanging due to a "semaphore wait". We then have to shut MariaDB down as it typically won't recover, unless it restarts itself, which happens if we wait long enough. But if its gone on long enough MariaDB won't even shutdown, it hangs indefinitely waiting for some other internal service. I don't remember the exact name and we've been fast enough I haven't seen it in a while.
We've had the database on two completely different servers and still see the problem. Both servers were bought new for this project and are a year or less old. They are running all SSD drives, Debian 7 64bit with MariaDB 10.1 from the MariaDB APT repository.
Since the XtraDB engine was usually mentioned in the logged messages we switched back to the Oracle InnoDB engine. Although this seems to have reduced the frequency it didn't fix it.
Can anyone give some advice on fixing this. It really seams like a bug in MariaDB. I'll try to provide any needed info.
Its either a bug or you are running without sufficient hardware. If you rule out the later I suggest opening a report at jira.mariadb.org
TIA - Jon
Needed info: * exact MariaDB version 10.1.X * mariadb configuration * what is the entire mysql log from the startup until the "semaphore wait" * how are you determining "semaphore wait" * is a SHOW FULL PROCESS LIST possible? if so contents? * when in a stalled state run on a commmand line (with gdb installed) gdb --batch --eval-command="thread apply all bt" program $(pidof mysqld)
Version: 10.10.19+maria-1~wheezy, but we started having problems with 5.5.x which after a few updates prompted us to try 10.1.
Configuration: ----------------------- [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = -10
Niceing below 0 isn't that productive. -10 might be interfering with kernel IO processes that are needed to enable userspace threads to resume. No high IO/CPU/RAM should be running on the same host however if you do -1 or -2 might help achieve a higher priority than the other process.
[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /ssd/mysql tmpdir = /ssd/tmp lc_messages_dir = /usr/share/mysql lc_messages = en_US skip-external-locking # NOTE: these two were recently added to troubleshoot ignore_builtin_innodb plugin_load=innodb=ha_innodb.so bind-address = 127.0.0.1 skip-name-resolve max_connections = 8000
At the time of the crash you seem to have about 4k threads zgrep 12:35:18 /tmp/mozilla_dan0/mysql-crash-20161207.log.gz | grep ' --Thread' | cut -f 8 -d ' ' | sort -u | wc -l 4158 I suspect you don't have enough grunt in hardware to support this. I do suspect that this level is significantly above standard load. I'd restrict max_connections to something you have seen your hardware handle.
connect_timeout = 5 wait_timeout = 60 max_allowed_packet = 1M thread_cache_size = 100 sort_buffer_size = 1M bulk_insert_buffer_size = 16M tmp_table_size = 32M max_heap_table_size = 10M join_buffer_size = 1M myisam_recover = BACKUP key_buffer_size = 128M open-files-limit = 65535 table_open_cache = 15000 myisam_sort_buffer_size = 128M concurrent_insert = 2 read_buffer_size = 256K read_rnd_buffer_size = 256K query_cache_limit = 1M query_cache_size = 16M slow_query_log=0 slow_query_log_file = /var/log/mysql/mariadb-slow.log long_query_time = 3 log_slow_verbosity = query_plan binlog_format = ROW expire_logs_days = 10 max_binlog_size = 100M default_storage_engine = InnoDB innodb_buffer_pool_size = 2600M innodb_log_buffer_size = 256M innodb_file_per_table = 1 innodb_open_files = 400 innodb_io_capacity = 400 innodb_adaptive_hash_index = 0 # These two were added just prior to this last crash. innodb_buffer_pool_dump_at_shutdown = 1 innodb_buffer_pool_load_at_startup = 1 innodb_flush_log_at_trx_commit = 0
Setting ^ to 0 will mean less IO however it means that you can loose stuff on your expected crash.
innodb_support_xa = 0 sync_binlog = 0 -----------------------
Logs: This most recent case is about 19M uncompressed. I have hosted it on my website here: http://www.jfpossibilities.com/static/pickup/mysql-crash-20161207.log.gz If you would like it delivered some other way let me know. I have about 300M of logs from a previous crash situation. That time was particularly bad since for some reason it would hang again within ten minutes of being restarted. By limiting the connections on our webservers I was able to baby it to get it going again.
I determined "semaphore wait" from the zillions of messages in the logs that say it. We shut down this one pretty quick. But I usually recognize the symptoms fairly early on: the number of DB connections escalates from ~1k up to the max 8k. The number of running queries as shown with "show processlist" escalates along with it going from less than 10 to hundreds or thousands depending on how long we wait.
sounds like max_connections of 100 might be more suitable. Matches thread_cache_size.
Then too the Apache connections on the web servers escalate. And of course the "load factor" on all the servers jumps up. Icinga usually gives us the heads up.
Is apache on the same machine? If so you configured Apache so that it doesn't consume more memory and leave MySQL with no headroom. Of course its better to have it on a different host. Make sure Icinga doesn't show swapping. Check your read and write latency on the SSD.
Obviously your last two requests will have to wait until we are having a problem again. But I have gdb installed and ready to go... I think. I don't know much about it.
SHOW ENGINE INNODB STATUS would be good too. experiment with gdb on a test machine in the same way. Even if its a different process.
participants (1)
-
Daniel Black