Jan Lindström <jplindst@mariadb.org> writes:
Correct log file attached.
==00:00:02:33.708 25289== Use of uninitialised value of size 8 ==00:00:02:33.708 25289== at 0x5EF865B: _itoa_word (_itoa.c:179) ==00:00:02:33.708 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.708 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.708 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.708 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.708 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.708 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== ==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s) ==00:00:02:33.709 25289== at 0x5EF8665: _itoa_word (_itoa.c:179) ==00:00:02:33.709 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.709 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.709 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.709 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.709 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.709 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== 130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896
So this is the explanation for the strange numbers in your printouts, like 892097568. Those values are uninitialised memory (probably from uninitialised stack variables?). Since the assertion you get is exactly about the values of those IO_CACHE values being wrong, this looks like something to investigate. I am however curious about why there is only a single Valgrind error about this? As there are many of those unexpected numbers in your output. However, most of the Valgrind errors seem to be in Galera code. It is hard to say for me if they are related. Seppo, can you look into that from your side? Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...
It did produce the output, not yet found the reason, some of the output just before corruption
MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: 0x7f2a200bc800 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr-> pending(): 0x0 Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248 length: 10 Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248 Bytes: (10) F8 01 00 00 00 01 74 02 77 65 my_realloc: my: ptr: 0x0 size: 1024 my_flags: 80 my_malloc: my: size: 1024 my_flags: 80 my_malloc: exit: ptr: 0x7f2a200895e0
I need to see the *complete* log to be able to help. Yes, I know it can be hundreds of megabyte maybe, but it should compress well. Upload it somewhere I can access it and I can take a look. Using this log file, we can deduce exactly which paths the code was taking as it got to that assert. I'm wondering if the ROLLBACK happens early and ends up executing some binlog code at a point where transaction start has not properly initialised the necessary data structures. So we need the dbug output from the entire execution since transaction start. (If you can get the error log/Valgrind output in the same log file as the dbug, that would make things much easier. But if not, I can still try to decipher it...) - Kristian.