Hi all,

I am trying to upgrade MariaDB to a more recent version which includes https://jira.mariadb.org/browse/MDEV-19795 and thus a newer rocksdb version. We are running a mixed database of innodb and rocksdb tables.

We are currently using 10.2.21-MariaDB-10.2.21+maria~stretch-log / mariadb.org binary distribution (on the master). The reported rocksdb version is 5.14.0, compile date Dec 30 2018.

I have set up a new slave with the database hard-copied from the master's filesystem (with mariabackup) and MariaDB versions 10.4.7 / 10.3.17 and 10.2.26. As a first thing before replication starts I of course run mysql_upgrade, which goes through just fine. Once the replication catches up (row-based), it eventually hits some kind of wall and rocksdb detects a checksum mismatch during compaction, e.g.:

2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102  in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907

MariaDB then crashes with the dreaded exit code 6 (see log below). The database is unusable afterwards and does not start up anymore.

The same procedure for setting up a slave and replicating works fine when staying at 10.2.21.

I suspect there might be an incompatibility between rocksdb 5.14.0 and 6.2.0 (version used in the newer MariaDBs). However, I have read through the changelog at https://github.com/facebook/rocksdb/releases and have found no clear candidate for that.

Any help or ideas are greatly appreciated. I can also provide more logs on request.

Thanks 

Jonas

--------------
RocksDB options from my.cnf:

plugin-load-add = ha_rocksdb.so
default-storage-engine = rocksdb
default-tmp-storage-engine = MyISAM
transaction-isolation = READ-COMMITTED
rocksdb_unsafe_for_binlog = 1 # enables statement based replication
rocksdb_datadir = /data/mysql/rocksdb
rocksdb_wal_dir = /var/mysql/rocksdb
rocksdb_tmpdir = /var/mysql/rocksdb

rocksdb_flush_log_at_trx_commit = 0
rocksdb_use_direct_io_for_flush_and_compaction = 0
rocksdb_use_direct_reads = 0

rocksdb_max_open_files = -1
rocksdb_max_background_jobs = 1
rocksdb_max_total_wal_size = 5G
rocksdb_block_size = 64K
rocksdb_block_cache_size = 16G
rocksdb_table_cache_numshardbits = 6
rocksdb_new_table_reader_for_compaction_inputs = 1
rocksdb_compaction_readahead_size = 4M

# rate limiter
rocksdb_bytes_per_sync = 4M
rocksdb_wal_bytes_per_sync = 4M
rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're running on higher spec machines

# triggering compaction if there are many sequential deletes
rocksdb_compaction_sequential_deletes_count_sd = 1
rocksdb_compaction_sequential_deletes = 199999
rocksdb_compaction_sequential_deletes_window = 200000

rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kLZ4Compression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio

---------------------------
MariaDB error log:

190919 16:15:31 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.2.27-MariaDB-1:10.2.27+maria~bionic-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=20
max_threads=3002
thread_count=37
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6727453 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided
2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: There was a corruption detected in RockDB files. Check error log emitted earlier for more details.
2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: The server will exit normally and stop restart attempts. Remove /data/mysql/rocksdb/ROCKSDB_CORRUPTED file from data directory and start mysqld manually.


--------------------
RocksDB error log (last 10 lines):

2019/09/19-16:15:22.696889 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902522696836, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181274, "file_size": 180342221, "table_properties": {"data_size": 170893376, "index_size": 333943, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 9317061, "raw_key_size": 388460606, "raw_average_key_size": 52, "raw_value_size": 191015485, "raw_average_value_size": 25, "num_data_blocks": 5571, "num_entries": 7455469, "num_deletions": 585078, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902510, "__indexstats__": "[...37  records...]"}}
2019/09/19-16:15:25.772863 7f7ae33ff700 [compaction/compaction_job.cc:1332] [default] [JOB 41] Generated table #181275: 770219 keys, 129144877 bytes
2019/09/19-16:15:25.772927 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902525772887, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 181275, "file_size": 129144877, "table_properties": {"data_size": 128134168, "index_size": 86084, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 962885, "raw_key_size": 15404380, "raw_average_key_size": 20, "raw_value_size": 167199905, "raw_average_value_size": 217, "num_data_blocks": 2752, "num_entries": 770219, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4", "compression_options": "window_bits=-14; level=6; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902522, "__indexstats__": "[...1  records...]"}}
2019/09/19-16:15:25.883030 7f7ae33ff700 [compaction/compaction_job.cc:1395] [default] [JOB 41] Compacted 1@3 + 7@4 files to L4 => 1889443497 bytes
2019/09/19-16:15:26.573155 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.571946) [compaction/compaction_job.cc:763] [default] compacted to: base level 3 level multiplier 10.00 max bytes base 2684354560 files[0 0 0 9 23 139 1185] max score 1.00 (2 files need compaction), MB/sec: 19.1 rd, 18.6 wr, level 4, files in(1, 7) out(9) MB in(278.5, 1579.4) out(1801.9), read-write-amplify(13.1) write-amplify(6.5) OK, records in: 67833049, records dropped: 256224 output_compression: LZ4
2019/09/19-16:15:26.573173 7f7ae33ff700 (Original Log Time 2019/09/19-16:15:26.572001) EVENT_LOG_v1 {"time_micros": 1568902526571979, "job": 41, "event": "compaction_finished", "compaction_time_micros": 101734971, "compaction_time_cpu_micros": 48728676, "output_level": 4, "num_output_files": 9, "total_output_size": 1889443497, "num_input_records": 67741208, "num_output_records": 67484984, "num_subcompactions": 1, "output_compression": "LZ4", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 9, 23, 139, 1185]}
2019/09/19-16:15:26.573342 7f7ae33ff700 [compaction/compaction_job.cc:1645] [default] [JOB 42] Compacting 1@4 + 15@5 files to L5, score 0.00
2019/09/19-16:15:26.573366 7f7ae33ff700 [compaction/compaction_job.cc:1649] [default] Compaction start summary: Base version 44 Base level 4, inputs: [181272(167MB)], [180583(535MB) 180788(537MB) 180963(12MB) 180964(513MB) 181094(21MB) 181095(567MB) 181096(556MB) 181097(353MB) 181098(367MB) 181099(465MB) 181101(559MB) 181102(390MB) 181103(579MB) 181104(353MB) 181105(576MB)]
2019/09/19-16:15:26.573462 7f7ae33ff700 EVENT_LOG_v1 {"time_micros": 1568902526573441, "job": 42, "event": "compaction_started", "compaction_reason": "ManualCompaction", "files_L4": [181272], "files_L5": [180583, 180788, 180963, 180964, 181094, 181095, 181096, 181097, 181098, 181099, 181101, 181102, 181103, 181104, 181105], "score": 0, "input_data_size": 6878343156}
2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN] [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption: block checksum mismatch: expected 1973625368, got 2324967102  in /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907