Sergei, hello.
Hi, andrei.elkin!
On May 30, andrei.elkin@pp.inet.fi wrote:
revision-id: d9913834cebe4ed8494b3d76762cd882788a8fd5 (mariadb-10.1.33-28-gd9913834ceb) parent(s): c1698e8dc50f0c342c1a6886426ba1d43396cd1e author: Andrei Elkin committer: Andrei Elkin timestamp: 2018-05-30 20:06:44 +0300 message:
MDEV-14014 Multi-Slave Replication Fail: bogus data in log event
MDEV-7257 made a dump thread to read from binlog concurrently with writers as long as the read bytes are below a water-mark (MYSQL_BIN_LOG::binlog_end_pos). However it appeared to be possible a dump thread reader reach out for bytes past the water mark through a feature of IO_CACHE that fills in the internal buffer and while doing so it could read what the reader is not supposed to see (the bytes above MYSQL_BIN_LOG::binlog_end_pos).
The issue is fixed with constraining the IO_CACHE buffer fill to respect the watermark. An added test simulates potentially unconstrained buffer fill and an assert guards this is not the case anymore.
What about a much simpler one-liner fix:
--- a/sql/sql_repl.cc +++ b/sql/sql_repl.cc @@ -2549,6 +2549,7 @@ static int send_events(binlog_send_info *info, IO_CACHE* return 1;
info->last_pos= linfo->pos; + log->end_of_file= end_pos;
That's nice! I could not guess out that so need parameter was always at my disposal.
error= Log_event::read_log_event(log, packet, info->fdev, opt_master_verify_checksum ? info->current_checksum_alg : BINLOG_CHECKSUM_ALG_OFF);
It doesn't change IO_CACHE structure and doesn't add any overhead to the IO_CACHE.
Very true, and even more - 'end_of_file' of the read cache seems to have found usability beyond being a constant.
Did you already test that your commit actually fixes the issue?
Like I said, I did not try after we found it's NFS binlog. I've done it right now, but my attempt lasted minutes while in the user's case apart of anything specific they needed hours sometimes. I am committing a new patch which sustains the simulation test of the old one. Cheers, Andrei