Hi,

Full unedited log using --valgrind and --mysqld=--debug=+d at location: https://www.dropbox.com/s/dcufp7l1cch8dfs/mysql.err.gz

In below lines I have:

static int binlog_commit(handlerton *hton, THD *thd, bool all)
{
  int error= 0;
  DBUG_ENTER("binlog_commit");
  binlog_cache_mngr *const cache_mngr=
    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
#ifdef WITH_WSREP
  if (!cache_mngr) DBUG_RETURN(0);
#endif /* WITH_WSREP */

    WSREP_ERROR("::jan_BINLOG_COMMIT:: TRX: Pending %p pos %llu, pos_in_file %llu current_pos %llu request_pos %llu write_pos %llu", cache_mngr->trx_cache.pending(), cache_mngr->trx_cache.cache_log.pos_in_file, cache_mngr->trx_cache.cache_log.current_pos, cache_mngr->trx_cache.cache_log.request_pos, cache_mngr->trx_cache.cache_log.write_pos);
    WSREP_ERROR("::jan_COMMIT:: STMT: STMT: Pending %p pos %llu, pos_in_file %llu current_pos %llu request_pos %llu write_pos %llu", cache_mngr->stmt_cache.pending(), cache_mngr->stmt_cache.cache_log.pos_in_file, cache_mngr->stmt_cache.cache_log.current_pos, cache_mngr->stmt_cache.cache_log.request_pos, cache_mngr->stmt_cache.cache_log.write_pos);


And that macro is

// MySQL logging functions don't seem to understand long long length modifer.
// This is a workaround. It also prefixes all messages with "WSREP"
#define WSREP_LOG(fun, ...)                                       \
    {                                                             \
        char msg[1024] = {'\0'};                                  \
        snprintf(msg, sizeof(msg) - 1, ## __VA_ARGS__);           \
        fun("WSREP: %s", msg);                                    \
    }

#define WSREP_DEBUG(...)                                                \
    if (wsrep_debug)     WSREP_LOG(sql_print_information, ##__VA_ARGS__)
#define WSREP_INFO(...)  WSREP_LOG(sql_print_information, ##__VA_ARGS__)
#define WSREP_WARN(...)  WSREP_LOG(sql_print_warning,     ##__VA_ARGS__)
#define WSREP_ERROR(...) WSREP_LOG(sql_print_error,       ##__VA_ARGS__)

Thus , it seems that trx_cache and stmt_cache are not initialized, where that should happen ?

Jan Lindström <jplindst@mariadb.org> writes:

Correct log file attached.

      
==00:00:02:33.708 25289== Use of uninitialised value of size 8
==00:00:02:33.708 25289==    at 0x5EF865B: _itoa_word (_itoa.c:179)
==00:00:02:33.708 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
==00:00:02:33.708 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
==00:00:02:33.708 25289==    by 0x5F03141: snprintf (snprintf.c:34)
==00:00:02:33.708 25289==    by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054)
==00:00:02:33.708 25289==    by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516)
==00:00:02:33.708 25289==    by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436)
==00:00:02:33.709 25289==    by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363)
==00:00:02:33.709 25289==    by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373)
==00:00:02:33.709 25289==    by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820)
==00:00:02:33.709 25289==    by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651)
==00:00:02:33.709 25289==    by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
==00:00:02:33.709 25289== 
==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s)
==00:00:02:33.709 25289==    at 0x5EF8665: _itoa_word (_itoa.c:179)
==00:00:02:33.709 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
==00:00:02:33.709 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
==00:00:02:33.709 25289==    by 0x5F03141: snprintf (snprintf.c:34)
==00:00:02:33.709 25289==    by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054)
==00:00:02:33.709 25289==    by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516)
==00:00:02:33.709 25289==    by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436)
==00:00:02:33.709 25289==    by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363)
==00:00:02:33.709 25289==    by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373)
==00:00:02:33.709 25289==    by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820)
==00:00:02:33.709 25289==    by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651)
==00:00:02:33.709 25289==    by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
==00:00:02:33.709 25289== 
130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896
So this is the explanation for the strange numbers in your printouts, like
892097568. Those values are uninitialised memory (probably from uninitialised
stack variables?).

Since the assertion you get is exactly about the values of those IO_CACHE
values being wrong, this looks like something to investigate.

I am however curious about why there is only a single Valgrind error about
this? As there are many of those unexpected numbers in your output.

However, most of the Valgrind errors seem to be in Galera code. It is hard to
say for me if they are related. Seppo, can you look into that from your side?
Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...

It did produce the output, not yet found the reason, some of the output just
before corruption

MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event:
0x7f2a200bc800
MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr->
pending(): 0x0
Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248  length: 10
Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248  Bytes: (10)
F8 01 00 00 00 01 74 02 77 65
my_realloc: my: ptr: 0x0  size: 1024  my_flags: 80
my_malloc: my: size: 1024  my_flags: 80
my_malloc: exit: ptr: 0x7f2a200895e0
I need to see the *complete* log to be able to help. Yes, I know it can be
hundreds of megabyte maybe, but it should compress well. Upload it somewhere
I can access it and I can take a look.

Using this log file, we can deduce exactly which paths the code was taking as
it got to that assert. I'm wondering if the ROLLBACK happens early and ends up
executing some binlog code at a point where transaction start has not properly
initialised the necessary data structures. So we need the dbug output from the
entire execution since transaction start.

(If you can get the error log/Valgrind output in the same log file as the
dbug, that would make things much easier. But if not, I can still try to
decipher it...)

 - Kristian.

R:

--

Jan Lindström
Principal Engineer

MariaDB | MaxScale | skype: jan_p_lindstrom

www.skysql.com

Twitter Blog Facebook LinkedIn Google+