On Fri, Aug 30, 2013 at 1:44 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
The attached patch doesn't work properly because the format description event it saves is different from what master has sent. It has different number_of_event_types and potentially it has garbage (or incomplete data) in post_header_len.
Aha, so it's because of this, in Format_description_log_event::write() ?
uchar buff[FORMAT_DESCRIPTION_HEADER_LEN + BINLOG_CHECKSUM_ALG_DESC_LEN]; size_t rec_size= sizeof(buff); memcpy((char*) buff+ST_COMMON_HEADER_LEN_OFFSET + 1, (uchar*) post_header_len, LOG_EVENT_TYPES); ret= (write_header(file, rec_size) || wrapper_my_b_safe_write(file, buff, rec_size) ||
So it uses the static size of own server version, not the one of the event that was read. And we get the effect you describe.
So then it seems the fix is to modify the created and/or artificial (log_pos) fields manually in the buffer containing the master's description event, fix the checksum, and then write it out normally (not skip)? Or do you have a better suggestion?
Why not change Format_description_log_event::write()? It seems actually that it will be more correct if we change it. I made the following change (line numbers may be off) and it worked: @@ -4922,16 +4947,15 @@ bool Format_description_log_event::write(IO_CACHE* file) We don't call Start_log_event_v3::write() because this would make 2 my_b_safe_write(). */ - uchar buff[FORMAT_DESCRIPTION_HEADER_LEN + BINLOG_CHECKSUM_ALG_DESC_LEN]; - size_t rec_size= sizeof(buff); + uchar buff[START_V3_HEADER_LEN+1]; + size_t rec_size= sizeof(buff) + BINLOG_CHECKSUM_ALG_DESC_LEN + + number_of_event_types; int2store(buff + ST_BINLOG_VER_OFFSET,binlog_version); memcpy((char*) buff + ST_SERVER_VER_OFFSET,server_version,ST_SERVER_VER_LEN); if (!dont_set_created) created= get_time(); int4store(buff + ST_CREATED_OFFSET,created); buff[ST_COMMON_HEADER_LEN_OFFSET]= common_header_len; - memcpy((char*) buff+ST_COMMON_HEADER_LEN_OFFSET + 1, (uchar*) post_header_len, - LOG_EVENT_TYPES); /* if checksum is requested record the checksum-algorithm descriptor next to @@ -4944,7 +4968,7 @@ bool Format_description_log_event::write(IO_CACHE* file) #ifndef DBUG_OFF data_written= 0; // to prepare for need_checksum assert #endif - buff[FORMAT_DESCRIPTION_HEADER_LEN]= need_checksum() ? + uchar checksum_byte= need_checksum() ? checksum_alg : (uint8) BINLOG_CHECKSUM_ALG_OFF; /* FD of checksum-aware server is always checksum-equipped, (V) is in, @@ -4964,7 +4988,9 @@ bool Format_description_log_event::write(IO_CACHE* file) checksum_alg= BINLOG_CHECKSUM_ALG_CRC32; // Forcing (V) room to fill anyway } ret= (write_header(file, rec_size) || - wrapper_my_b_safe_write(file, buff, rec_size) || + wrapper_my_b_safe_write(file, buff, sizeof(buff)) || + wrapper_my_b_safe_write(file, (uchar*) post_header_len, number_of_event_types) || + wrapper_my_b_safe_write(file, &checksum_byte, sizeof(checksum_byte)) || write_footer(file)); if (no_checksum) checksum_alg= BINLOG_CHECKSUM_ALG_OFF;
What I do not understand is why we do not have the exact same bug when the slave rotates the relay log due to max_relay_log_size? It seems to me it does the exact same thing? It writes out description_event_for_queue which can have different number_of_event_types.
Probably I just didn't test this properly yet. Looking at the code, yes it shouldn't work, but the write() fix above should fix both cases.
quite well at the moment to do that. So apparently all peek*() functions should have Format_description added as parameter and it's easy to pass one from queue_event(), but then e.g. mysql_binlog_send() will need description event too to pass it to send_event_to_slave() and is_until_reached() so that those could pass it to peek*(). For correctness this format description should be read from binlog, right? That would be pretty non-trivial to do.
Isn't it just a matter of reading in each format_description_log_event as it is seen and holding on to it in some description_event_for_binlog_send? Should be easy, right?
Well, mysql_binlog_send() doesn't start to read the binlog file from the beginning. And if slave connects not using GTID but by binlog position no one reads the beginning of the binlog file before mysql_binlog_send() too. So that means that we have to add scanning of binlogs from the beginning inside mysql_binlog_send(). Of course, it will have an additional benefit of issuing sane error message if the binlog position slave requests is in the middle of event. But it's still a non-trivial change...
If you are interested in helping with testing this, I can write a patch later, but I need a way to test it without having to spend unreasonable amounts of time on it.
Sure, I'll be happy to test. Pavel