[Maria-discuss] MyRocks startup error in MariaDB 10.2.18
Hi all, we had a database setup running in production for quite some time (> 5 months) and it crashed during last weekend without obvious reason. MariaDB error log stops with (seemingly most useful output set to bold): 2019-01-14 9:07:47 139766003611904 [Note] RocksDB: 2 column families found 2019-01-14 9:07:47 139766003611904 [Note] RocksDB: Column Families at start: 2019-01-14 9:07:47 139766003611904 [Note] cf=default 2019-01-14 9:07:47 139766003611904 [Note] write_buffer_size=134217728 2019-01-14 9:07:47 139766003611904 [Note] target_file_size_base=268435456 2019-01-14 9:07:47 139766003611904 [Note] cf=__system__ 2019-01-14 9:07:47 139766003611904 [Note] write_buffer_size=134217728 2019-01-14 9:07:47 139766003611904 [Note] target_file_size_base=268435456 mysqld: /home/buildbot/buildbot/build/mariadb-10.2.18/storage/rocksdb/rocksdb/db/column_family.cc:1238: *rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref' failed.* 190114 9:07:58 [ERROR] mysqld got signal 6 ; ... The rest is a memory dump seemingly not too helpful(?). We tried the various recovery modes of rocksdb_wal_recovery_mode (0,2,3) but none seems to have any effect, the output in the log is exactly the same. *Version and OS infos*: MariaDB version: 10.2.18-MariaDB-10.2.18+maria~stretch-log Running on Debian Linux version 4.9.0-8-amd64 ( debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) We use *jemalloc* as memory allocator. 64GB RAM machine with RAID1 SSD 256GB for logs and 10TB HDD for tables Here is the *rocksdb config part of the my.cnf*: plugin-load-add = ha_rocksdb.so default-storage-engine = rocksdb rocksdb_unsafe_for_binlog = 1 # enables statement based replication rocksdb_datadir = /var/local/mysql/rocksdb rocksdb_wal_dir = /var/mysql_logs/rocksdb rocksdb_tmpdir = /var/mysql_logs/rocksdb rocksdb_flush_log_at_trx_commit = 0 rocksdb_use_direct_io_for_flush_and_compaction = 0 rocksdb_use_direct_reads = 0 rocksdb_max_open_files = -1 rocksdb_max_background_jobs = 8 rocksdb_max_total_wal_size = 4G rocksdb_block_size = 64K rocksdb_block_cache_size = 16G rocksdb_table_cache_numshardbits = 6 rocksdb_new_table_reader_for_compaction_inputs = 1 rocksdb_compaction_readahead_size = 4M #rocksdb_db_write_buffer_size = 0 # max write buffer across all column families, zero = disabled #rocksdb_wal_recovery_mode = 1 rocksdb_bytes_per_sync = 4M rocksdb_wal_bytes_per_sync = 4M rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're running on higher spec machines rocksdb_compaction_sequential_deletes_count_sd = 1 rocksdb_compaction_sequential_deletes = 199999 rocksdb_compaction_sequential_deletes_window = 200000 #rocksdb_rpl_lookup_rows=0 rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kSnappyCompression;bottommost_compression=kZlibCompression;compression_opts=-14:1:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio Does anyone have a idea what could cause this crash? Google did not return helpful results, most similar thing anyone had is this in my search: https://github.com/facebook/mysql-5.6/issues/391 Any help is greatly appreciated. Best regards Jonas
I did some more research yesterday and found out a little more about what is going on: checking the rocksdb log it seems that the recovery phase is successfully completed, yet the rocksdb engine receives a shutdown signal from the MariaDB process. The log does not show any error message just before it goes into shutdown, so my best guess is that the shutdown signal is not internally produced but comes from MariaDB). Here is the regarding log: ... Recovering log #70212 mode 1 2019/01/14-17:57:35.828286 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/db_impl_open.cc:552] Recovering log #70215 mode 1 2019/01/14-17:57:35.828310 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/db_impl_open.cc:552] Recovering log #70218 mode 1 2019/01/14-17:57:35.828642 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/version_set.cc:2854] Creating manifest 70220 2019/01/14-17:57:35.937947 7fd584e99900 [WARN] [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/version_set.cc:2496] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed. 2019/01/14-17:57:35.979604 7fd584e99900 EVENT_LOG_v1 {"time_micros": 1547485055979597, "job": 1, "event": "recovery_finished"} 2019/01/14-17:57:35.992961 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/db_impl_files.cc:477] [JOB 2] Delete info log file /var/local/mysql/rocksdb//LOG.old.1547441122341707 2019/01/14-17:57:36.059847 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/db_impl_open.cc:1221] DB pointer 0x7fd4fea2ae00 2019/01/14-17:57:38.210653 7fd584e99900 [/home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/db_impl.cc:295] Shutdown: canceling all background work Another observation is that rocksdb created ~6600 wal files in the regarding folder, yet most of them are just empty files. Only about 50 files are filled with content up to the point in time the crash occurred. So maybe something is wrong here. We have ~2500 files in the rocksdb datadir, a couply of them are also log files which correspond to files created in the wal dir. The MariaDB process exits with a buffer overflow message. I omitted that in my initial post, however it is probably important: mysqld: /home/buildbot/buildbot/build/mariadb-10.2.21/storage/rocksdb/rocksdb/db/column_family.cc:1238: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref' failed. 190115 10:07:32 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.2.21-MariaDB-10.2.21+maria~stretch-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=3002 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6727382 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x30000 *** buffer overflow detected ***: /usr/sbin/mysqld terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f626a34ebfb] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f626a3d71f7] /lib/x86_64-linux-gnu/libc.so.6(+0xf7330)[0x7f626a3d5330] /lib/x86_64-linux-gnu/libc.so.6(+0xf916a)[0x7f626a3d716a] /usr/sbin/mysqld(my_addr_resolve+0xe2)[0x55c80b8c0e82] /usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x55c80b8a773b] /usr/sbin/mysqld(handle_fatal_signal+0x41d)[0x55c80b346c7d] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f626be7c0c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7f626a310fff] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f626a31242a] /lib/x86_64-linux-gnu/libc.so.6(+0x2be67)[0x7f626a309e67] /lib/x86_64-linux-gnu/libc.so.6(+0x2bf12)[0x7f626a309f12] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb15ColumnFamilySetD1Ev+0x1de)[0x7f61de475b0e] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb10VersionSetD1Ev+0x4d)[0x7f61de3205cd] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb6DBImpl11CloseHelperEv+0x647)[0x7f61de260627] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb6DBImplD1Ev+0x69d)[0x7f61de26fa1d] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb6DBImplD0Ev+0x11)[0x7f61de26fa81] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb24PessimisticTransactionDBD2Ev+0x32c)[0x7f61de42e06c] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb19WriteCommittedTxnDBD0Ev+0x1f)[0x7f61de4346df] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb13TransactionDB6WrapDBEPNS_2DBERKNS_20TransactionDBOptionsERKSt6vectorImSaImEERKS6_IPNS_18ColumnFamilyHandleESaISC_EEPPS0_+0x12a)[0x7f61de430e3a] /usr/lib/mysql/plugin/ha_rocksdb.so(_ZN7rocksdb13TransactionDB4OpenERKNS_9DBOptionsERKNS_20TransactionDBOptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_22ColumnFamilyDescriptorESaISG_EEPSF_IPNS_18ColumnFamilyHandleESaISM_EEPPS0_+0x1341)[0x7f61de432251] /usr/lib/mysql/plugin/ha_rocksdb.so(+0x20e556)[0x7f61de201556] /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x62)[0x55c80b3490a2] /usr/sbin/mysqld(+0x4cdf75)[0x55c80b1bbf75] /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x7da)[0x55c80b1bd62a] /usr/sbin/mysqld(+0x427e7c)[0x55c80b115e7c] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1d48)[0x55c80b11add8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f626a2fe2e1] /usr/sbin/mysqld(_start+0x2a)[0x55c80b10e40a] ======= Memory map: ======== 55c80acee000-55c80bdcb000 r-xp 00000000 09:02 8528713 /usr/sbin/mysqld 55c80bfca000-55c80c09a000 r--p 010dc000 09:02 8528713 /usr/sbin/mysqld 55c80c09a000-55c80c151000 rw-p 011ac000 09:02 8528713 /usr/sbin/mysqld ... I have the feeling the high number of files and the assertion failure might be connected, but it is just a guess. I could imagine that it is actually just a configuration problem which results in too many files being opened simultaneously leading to the assertion failure, just guessing wildly here. I increased hard and soft limits for open files to 20000 for the mysql user, however this did not solve the problem. I also tried setting a limit to rocksdb max open files, this changed the error message in the MariaDB log, however it still failed with the same assertion failure but not with a buffer overflow anymore. I upgraded MariaDB to 10.2.21 but this did not solve the problem either. I am happy to provide more details on request or to try out other things. Any support or ideas are greatly appreciated. Am Mo., 14. Jan. 2019 um 09:24 Uhr schrieb Jonas Krauss <jkrauss82@gmail.com
:
Hi all,
we had a database setup running in production for quite some time (> 5 months) and it crashed during last weekend without obvious reason. MariaDB error log stops with (seemingly most useful output set to bold):
2019-01-14 9:07:47 139766003611904 [Note] RocksDB: 2 column families found 2019-01-14 9:07:47 139766003611904 [Note] RocksDB: Column Families at start: 2019-01-14 9:07:47 139766003611904 [Note] cf=default 2019-01-14 9:07:47 139766003611904 [Note] write_buffer_size=134217728 2019-01-14 9:07:47 139766003611904 [Note] target_file_size_base=268435456 2019-01-14 9:07:47 139766003611904 [Note] cf=__system__ 2019-01-14 9:07:47 139766003611904 [Note] write_buffer_size=134217728 2019-01-14 9:07:47 139766003611904 [Note] target_file_size_base=268435456 mysqld: /home/buildbot/buildbot/build/mariadb-10.2.18/storage/rocksdb/rocksdb/db/column_family.cc:1238: *rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref' failed.* 190114 9:07:58 [ERROR] mysqld got signal 6 ; ...
The rest is a memory dump seemingly not too helpful(?). We tried the various recovery modes of rocksdb_wal_recovery_mode (0,2,3) but none seems to have any effect, the output in the log is exactly the same.
*Version and OS infos*: MariaDB version: 10.2.18-MariaDB-10.2.18+maria~stretch-log Running on Debian Linux version 4.9.0-8-amd64 ( debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) We use *jemalloc* as memory allocator. 64GB RAM machine with RAID1 SSD 256GB for logs and 10TB HDD for tables
Here is the *rocksdb config part of the my.cnf*: plugin-load-add = ha_rocksdb.so default-storage-engine = rocksdb rocksdb_unsafe_for_binlog = 1 # enables statement based replication rocksdb_datadir = /var/local/mysql/rocksdb rocksdb_wal_dir = /var/mysql_logs/rocksdb rocksdb_tmpdir = /var/mysql_logs/rocksdb rocksdb_flush_log_at_trx_commit = 0 rocksdb_use_direct_io_for_flush_and_compaction = 0 rocksdb_use_direct_reads = 0 rocksdb_max_open_files = -1 rocksdb_max_background_jobs = 8 rocksdb_max_total_wal_size = 4G rocksdb_block_size = 64K rocksdb_block_cache_size = 16G rocksdb_table_cache_numshardbits = 6 rocksdb_new_table_reader_for_compaction_inputs = 1 rocksdb_compaction_readahead_size = 4M #rocksdb_db_write_buffer_size = 0 # max write buffer across all column families, zero = disabled #rocksdb_wal_recovery_mode = 1 rocksdb_bytes_per_sync = 4M rocksdb_wal_bytes_per_sync = 4M rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're running on higher spec machines rocksdb_compaction_sequential_deletes_count_sd = 1 rocksdb_compaction_sequential_deletes = 199999 rocksdb_compaction_sequential_deletes_window = 200000 #rocksdb_rpl_lookup_rows=0
rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kSnappyCompression;bottommost_compression=kZlibCompression;compression_opts=-14:1:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio
Does anyone have a idea what could cause this crash? Google did not return helpful results, most similar thing anyone had is this in my search: https://github.com/facebook/mysql-5.6/issues/391
Any help is greatly appreciated.
Best regards
Jonas
participants (1)
-
Jonas Krauss