[Maria-discuss] Slave corruption on myrocks update
Hi all, I am trying to upgrade MariaDB to a more recent version which includes https://jira.mariadb.org/browse/MDEV-19795 and thus a newer rocksdb version. We are running a mixed database of innodb and rocksdb tables. We are currently using 10.2.21-MariaDB-10.2.21+maria~stretch-log / mariadb.org binary distribution (on the master). The reported rocksdb version is 5.14.0, compile date Dec 30 2018. I have set up a new slave with the database hard-copied from the master's filesystem (with mariabackup) and MariaDB versions 10.4.7 / 10.3.17 and 10.2.26. As a first thing before replication starts I of course run mysql_upgrade, which goes through just fine. Once the replication catches up (row-based), it eventually hits some kind of wall and rocksdb detects a checksum mismatch during compaction, e.g.: 2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102 in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907 MariaDB then crashes with the dreaded exit code 6 (see log below). The database is unusable afterwards and does not start up anymore. The same procedure for setting up a slave and replicating works fine when staying at 10.2.21. I suspect there might be an incompatibility between rocksdb 5.14.0 and 6.2.0 (version used in the newer MariaDBs). However, I have read through the changelog at https://github.com/facebook/rocksdb/releases and have found no clear candidate for that. Any help or ideas are greatly appreciated. I can also provide more logs on request. Thanks Jonas -------------- *RocksDB options from my.cnf:* plugin-load-add = ha_rocksdb.so default-storage-engine = rocksdb default-tmp-storage-engine = MyISAM transaction-isolation = READ-COMMITTED rocksdb_unsafe_for_binlog = 1 # enables statement based replication rocksdb_datadir = /data/mysql/rocksdb rocksdb_wal_dir = /var/mysql/rocksdb rocksdb_tmpdir = /var/mysql/rocksdb rocksdb_flush_log_at_trx_commit = 0 rocksdb_use_direct_io_for_flush_and_compaction = 0 rocksdb_use_direct_reads = 0 rocksdb_max_open_files = -1 rocksdb_max_background_jobs = 1 rocksdb_max_total_wal_size = 5G rocksdb_block_size = 64K rocksdb_block_cache_size = 16G rocksdb_table_cache_numshardbits = 6 rocksdb_new_table_reader_for_compaction_inputs = 1 rocksdb_compaction_readahead_size = 4M # rate limiter rocksdb_bytes_per_sync = 4M rocksdb_wal_bytes_per_sync = 4M rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're running on higher spec machines # triggering compaction if there are many sequential deletes rocksdb_compaction_sequential_deletes_count_sd = 1 rocksdb_compaction_sequential_deletes = 199999 rocksdb_compaction_sequential_deletes_window = 200000 rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kLZ4Compression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio --------------------------- *MariaDB error log:* 190919 16:15:31 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.2.27-MariaDB-1:10.2.27+maria~bionic-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=20 max_threads=3002 thread_count=37 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6727453 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x30000 *** buffer overflow detected ***: /usr/sbin/mysqld terminated terminate called after throwing an instance of 'std::system_error' what(): Resource deadlock avoided 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: There was a corruption detected in RockDB files. Check error log emitted earlier for more details. 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: The server will exit normally and stop restart attempts. Remove /data/mysql/rocksdb/ROCKSDB_CORRUPTED file from data directory and start mysqld manually. -------------------- *RocksDB error log (last 10 lines):* 2019/09/19-16:15:22.696889 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902522696836, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181274, "file_size": 180342221, "table_properties": {"data_size": 170893376, "index_size": 333943, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 9317061, "raw_key_size": 388460606, "raw_average_key_size": 52, "raw_value_size": 191015485, "raw_average_value_size": 25, "num_data_blocks": 5571, "num_entries": 7455469, "num_deletions": 585078, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902510, "__indexstats__": "[...37 records...]"}} 2019/09/19-16:15:25.772863 7f7ae33ff700 [compaction/compaction_job.cc:1332] [default] [JOB 41] Generated table #181275: 770219 keys, 129144877 bytes 2019/09/19-16:15:25.772927 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902525772887, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181275, "file_size": 129144877, "table_properties": {"data_size": 128134168, "index_size": 86084, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 962885, "raw_key_size": 15404380, "raw_average_key_size": 20, "raw_value_size": 167199905, "raw_average_value_size": 217, "num_data_blocks": 2752, "num_entries": 770219, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902522, "__indexstats__": "[...1 records...]"}} 2019/09/19-16:15:25.883030 7f7ae33ff700 [compaction/compaction_job.cc:1395] [default] [JOB 41] Compacted 1@3 + 7@4 files to L4 => 1889443497 bytes 2019/09/19-16:15:26.573155 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.571946) [compaction/compaction_job.cc:763] [default] compacted to: base level 3 level multiplier 10.00 max bytes base 2684354560 files[0 0 0 9 23 139 1185] max score 1.00 (2 files need compaction), MB/sec: 19.1 rd, 18.6 wr, level 4, files in(1, 7) out(9) MB in(278.5, 1579.4) out(1801.9), read-write-amplify(13.1) write-amplify(6.5) OK, records in: 67833049, records dropped: 256224 output_compression: LZ4 2019/09/19-16:15:26.573173 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.572001) EVENT_LOG_v1 {"time_micros": 1568902526571979, "job": 41, "event": "compaction_finished", "compaction_time_micros": 101734971, "compaction_time_cpu_micros": 48728676, "output_level": 4, "num_output_files": 9, "total_output_size": 1889443497, "num_input_records": 67741208, "num_output_records": 67484984, "num_subcompactions": 1, "output_compression": "LZ4", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 9, 23, 139, 1185]} 2019/09/19-16:15:26.573342 7f7ae33ff700 [compaction/compaction_job.cc:1645] [default] [JOB 42] Compacting 1@4 + 15@5 files to L5, score 0.00 2019/09/19-16:15:26.573366 7f7ae33ff700 [compaction/compaction_job.cc:1649] [default] Compaction start summary: Base version 44 Base level 4, inputs: [181272(167MB)], [180583(535MB) 180788(537MB) 180963(12MB) 180964(513MB) 181094(21MB) 181095(567MB) 181096(556MB) 181097(353MB) 181098(367MB) 181099(465MB) 181101(559MB) 181102(390MB) 181103(579MB) 181104(353MB) 181105(576MB)] 2019/09/19-16:15:26.573462 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902526573441, "job": 42, "event": "compaction_started", "compaction_reason": "ManualCompaction", "files_L4": [181272], "files_L5": [180583, 180788, 180963, 180964, 181094, 181095, 181096, 181097, 181098, 181099, 181101, 181102, 181103, 181104, 181105], "score": 0, "input_data_size": 6878343156} 2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102 in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907
Update: I was looking through the open tickets on JIRA, this could be a match: https://jira.mariadb.org/browse/MDEV-20456 Am Do., 19. Sept. 2019 um 17:07 Uhr schrieb Jonas Krauss < jkrauss82@gmail.com>:
Hi all,
I am trying to upgrade MariaDB to a more recent version which includes https://jira.mariadb.org/browse/MDEV-19795 and thus a newer rocksdb version. We are running a mixed database of innodb and rocksdb tables.
We are currently using 10.2.21-MariaDB-10.2.21+maria~stretch-log / mariadb.org binary distribution (on the master). The reported rocksdb version is 5.14.0, compile date Dec 30 2018.
I have set up a new slave with the database hard-copied from the master's filesystem (with mariabackup) and MariaDB versions 10.4.7 / 10.3.17 and 10.2.26. As a first thing before replication starts I of course run mysql_upgrade, which goes through just fine. Once the replication catches up (row-based), it eventually hits some kind of wall and rocksdb detects a checksum mismatch during compaction, e.g.:
2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102 in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907
MariaDB then crashes with the dreaded exit code 6 (see log below). The database is unusable afterwards and does not start up anymore.
The same procedure for setting up a slave and replicating works fine when staying at 10.2.21.
I suspect there might be an incompatibility between rocksdb 5.14.0 and 6.2.0 (version used in the newer MariaDBs). However, I have read through the changelog at https://github.com/facebook/rocksdb/releases and have found no clear candidate for that.
Any help or ideas are greatly appreciated. I can also provide more logs on request.
Thanks
Jonas
-------------- *RocksDB options from my.cnf:*
plugin-load-add = ha_rocksdb.so default-storage-engine = rocksdb default-tmp-storage-engine = MyISAM transaction-isolation = READ-COMMITTED rocksdb_unsafe_for_binlog = 1 # enables statement based replication rocksdb_datadir = /data/mysql/rocksdb rocksdb_wal_dir = /var/mysql/rocksdb rocksdb_tmpdir = /var/mysql/rocksdb
rocksdb_flush_log_at_trx_commit = 0 rocksdb_use_direct_io_for_flush_and_compaction = 0 rocksdb_use_direct_reads = 0
rocksdb_max_open_files = -1 rocksdb_max_background_jobs = 1 rocksdb_max_total_wal_size = 5G rocksdb_block_size = 64K rocksdb_block_cache_size = 16G rocksdb_table_cache_numshardbits = 6 rocksdb_new_table_reader_for_compaction_inputs = 1 rocksdb_compaction_readahead_size = 4M
# rate limiter rocksdb_bytes_per_sync = 4M rocksdb_wal_bytes_per_sync = 4M rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're running on higher spec machines
# triggering compaction if there are many sequential deletes rocksdb_compaction_sequential_deletes_count_sd = 1 rocksdb_compaction_sequential_deletes = 199999 rocksdb_compaction_sequential_deletes_window = 200000
rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kLZ4Compression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio
--------------------------- *MariaDB error log:*
190919 16:15:31 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.
Server version: 10.2.27-MariaDB-1:10.2.27+maria~bionic-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=20 max_threads=3002 thread_count=37 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6727453 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x30000 *** buffer overflow detected ***: /usr/sbin/mysqld terminated terminate called after throwing an instance of 'std::system_error' what(): Resource deadlock avoided 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: There was a corruption detected in RockDB files. Check error log emitted earlier for more details. 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: The server will exit normally and stop restart attempts. Remove /data/mysql/rocksdb/ROCKSDB_CORRUPTED file from data directory and start mysqld manually.
-------------------- *RocksDB error log (last 10 lines):*
2019/09/19-16:15:22.696889 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902522696836, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181274, "file_size": 180342221, "table_properties": {"data_size": 170893376, "index_size": 333943, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 9317061, "raw_key_size": 388460606, "raw_average_key_size": 52, "raw_value_size": 191015485, "raw_average_value_size": 25, "num_data_blocks": 5571, "num_entries": 7455469, "num_deletions": 585078, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902510, "__indexstats__": "[...37 records...]"}} 2019/09/19-16:15:25.772863 7f7ae33ff700 [compaction/compaction_job.cc:1332] [default] [JOB 41] Generated table #181275: 770219 keys, 129144877 bytes 2019/09/19-16:15:25.772927 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902525772887, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181275, "file_size": 129144877, "table_properties": {"data_size": 128134168, "index_size": 86084, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 962885, "raw_key_size": 15404380, "raw_average_key_size": 20, "raw_value_size": 167199905, "raw_average_value_size": 217, "num_data_blocks": 2752, "num_entries": 770219, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902522, "__indexstats__": "[...1 records...]"}} 2019/09/19-16:15:25.883030 7f7ae33ff700 [compaction/compaction_job.cc:1395] [default] [JOB 41] Compacted 1@3 + 7@4 files to L4 => 1889443497 bytes 2019/09/19-16:15:26.573155 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.571946) [compaction/compaction_job.cc:763] [default] compacted to: base level 3 level multiplier 10.00 max bytes base 2684354560 files[0 0 0 9 23 139 1185] max score 1.00 (2 files need compaction), MB/sec: 19.1 rd, 18.6 wr, level 4, files in(1, 7) out(9) MB in(278.5, 1579.4) out(1801.9), read-write-amplify(13.1) write-amplify(6.5) OK, records in: 67833049, records dropped: 256224 output_compression: LZ4 2019/09/19-16:15:26.573173 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.572001) EVENT_LOG_v1 {"time_micros": 1568902526571979, "job": 41, "event": "compaction_finished", "compaction_time_micros": 101734971, "compaction_time_cpu_micros": 48728676, "output_level": 4, "num_output_files": 9, "total_output_size": 1889443497, "num_input_records": 67741208, "num_output_records": 67484984, "num_subcompactions": 1, "output_compression": "LZ4", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 9, 23, 139, 1185]} 2019/09/19-16:15:26.573342 7f7ae33ff700 [compaction/compaction_job.cc:1645] [default] [JOB 42] Compacting 1@4 + 15@5 files to L5, score 0.00 2019/09/19-16:15:26.573366 7f7ae33ff700 [compaction/compaction_job.cc:1649] [default] Compaction start summary: Base version 44 Base level 4, inputs: [181272(167MB)], [180583(535MB) 180788(537MB) 180963(12MB) 180964(513MB) 181094(21MB) 181095(567MB) 181096(556MB) 181097(353MB) 181098(367MB) 181099(465MB) 181101(559MB) 181102(390MB) 181103(579MB) 181104(353MB) 181105(576MB)] 2019/09/19-16:15:26.573462 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902526573441, "job": 42, "event": "compaction_started", "compaction_reason": "ManualCompaction", "files_L4": [181272], "files_L5": [180583, 180788, 180963, 180964, 181094, 181095, 181096, 181097, 181098, 181099, 181101, 181102, 181103, 181104, 181105], "score": 0, "input_data_size": 6878343156} 2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102 in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907
participants (1)
-
Jonas Krauss