Michael Widenius via developers <developers@lists.mariadb.org> writes:
DBUG_RETURN(my_b_copy_to_cache(from_cache, to_cache, SIZE_T_MAX));
You could use from_cache->end_of_file instead of SIZE_T_MAX
Fixed.
uchar checksum_opt;
Wouldn't it be better to have this as an "enum_binlog_checksum_alg" to avoid some casts ?
Fixed.
else if (data) return (enum enum_binlog_checksum_alg)data->checksum_opt; else return BINLOG_CHECKSUM_ALG_OFF; }
You can remove the else's
Fixed.
Fix (after discussions on IRC): Remove the following lines in MYSQL_BIN_LOG::write_event(Log_event *ev, binlog_cache_data *cache_data,
if (cache_data) cache_data->add_status(ev->logged_status());
Done, good catch!
Don't attempt to precompute checksums if: - Disabled by user request, --binlog-legacy-event-pos - Binlog is encrypted, cannot use precomputed checksums - WSREP/Galera.
Why Galera? Would be good to explain this in the commit comment.
Commit message extended.
In current write_cache() code, group is a bad name. Please rename 'group' to 'log_file_pos'.
Agree, fixed.
if (ev_len > old_checksum_len) { uint bytes_to_skip= old_checksum_len - std::min(old_checksum_len, ev_len - chunk); if (writer.write_data(cache->read_pos, chunk - bytes_to_skip)) goto error_in_write; }
As checksum is last in the event, why not just do 'even_len-= old_checksum' Before the loop to copy the event and then disregard 'old_checksum_len' bytes from the cache at the end of the loop?
Agree, that's *much* better, thanks for the suggestion. I removed this complex logic and just did a my_b_read(cache, buf, old_checksum_len) at the end.
The above is likely wrong as for long events we would execute the inner part multiple times, while there is only one checksum.
The complex std_min() expression will be equal to 0 (old_checksum_len - old_checksum_len) except at the end of the loop. But this is absolutely not clear, and I'm much happier with the fixed code.
Anyway, to find bugs like this, we need to have a test case with events that are bigger than the IO_CACHE size for cache. Setting binlog_file_cache_size to 8192 (min value) should make it easy to test this.
There is a test for this: mysql-test/suite/rpl/t/rpl_checksum_cache.test It sets the --binlog-cache-size to 4096 and tests with large events and such.
binlog_checksum_options= value; my_atomic_storeul_explicit(&binlog_checksum_options, value, MY_MEMORY_ORDER_RELAXED); Atom is not needed as we have a lock on binlog.
As we discussed, I removed the atomics changes for binlog_checksum_options. The option is accessed only under LOCK_log, except to get the value of how to pre-compute the checksums. And this value is re-checked again (now under LOCK_log) in MYSQL_BIN_LOG::write_cache().
I noticed in MYSQL_BIN_LOG::write_cache():
group= (size_t)my_b_tell(&log_file); val= uint4korr(header + LOG_POS_OFFSET) + group + end_log_pos_inc; int4store(header + LOG_POS_OFFSET, val);
The first log entry is probably ok, as we have done a rotate() event before. However the cache can have a lot of log_events (as part of a transaction). The end_log_pos after the first event can be wrong as it may not fit into 4 bytes. My understanding is that this is not a problem anymore as we are not using end_log_pos anymore. However I still wonder if rotate() should not only consider the current log file size but also the size of all events we plan to write to the log and do a rotate if the total new log file size > 4G.
I agree, this is a valid concern and I'm not sure exactly what will happen. Something that would be good to look into, maybe someone in the mariadb.com test team? - Kristian.