Re: Review of MDEV-31273: Eliminate Log_event::checksum_alg
Hi Monty, Lots of good comments, thanks! Replies inline: Michael Widenius <michael.widenius@gmail.com> writes:
enum enum_binlog_checksum_alg
In c++ we can drop the first 'enum' You can fix that by doing:
Ack, will fix.
event.select_checksum_alg()
This is called a lot. Would it not be easier to store the checksum_alg in the event when it is created and access the variable directly? On the other hand, the number of calls to event.select_checksum_alg() may not change as we will only create one
Yes, this is called only once per event, when writing the event into the binlog or the event cache. So it seems unnecessary to use extra memory to store it. And we don't necessarily even have binlog_cache_data when constructing the event to pass to select_checksum_alg(), so I think it is correct to select the checksum algorithm to use only when we write the event.
@@ -1258,11 +1258,10 @@ Log_event* Log_event::read_log_event(const uchar *buf, uint event_len,
if (ev) { - ev->checksum_alg= alg; #ifdef MYSQL_CLIENT if (ev->checksum_alg != BINLOG_CHECKSUM_ALG_OFF && ev->checksum_alg != BINLOG_CHECKSUM_ALG_UNDEF) ev->crc= uint4korr(buf + (event_len)); + ev->read_checksum_alg= alg;
Why move the setting of read_checksum_alg inside MYQL_CLIENT?
It is because with this patch, ev->checksum_alg is no longer needed in the server (the class member is removed). It is needed in the client (mysqlbinlog), so I added a new member read_checksum_alg, which is the algorithm that was used to read the event. This is needed so that mysqlbinlog can output the checksum (if any) in textual form. But it is not needed in the server
To ensure that this is not wrongly used, please add #ifdef MYSQL_CLIENT
here:
class Log_event:
#ifdef MYSQL_CLIENT + enum enum_binlog_checksum_alg read_checksum_alg; #endif
Yes, agree, that is the intention - the read_checksum_alg is only present in the class in the client, not in the server (and we want to avoid using memory for it in the server). This code is already in the #else branch of #ifdef MYSQL_SERVER: #ifdef MYSQL_SERVER ... #else /* The checksum algorithm used (if any) when the event was read. */ enum enum_binlog_checksum_alg read_checksum_alg; I thought this would have the same effect as your suggestion with #ifdef MYSQL_CLIENT. It probably does, but it seems somewhat complex when MYSQL_CLIENT and MYSQL_SERVER are defined. Maybe they can even both be defined, though probably not where log_event.h is included? It's a bit confusing that we have both MYSQL_CLIENT and MYSQL_SERVER used for conditional compilation in log_event.h. But probably not something that should be cleaned up in this patch. Do you think I should still add the #ifdef MYSQL_CLIENT as you suggested (even though it should be redundant in the #else branch of #ifdef MYSQL_SERVER)? It can't hurt after all, even if we always have MYSQL_CLIENT == !MYSQL_SERVER.
in Format_description_log_event(const uchar
We have:
else { checksum_alg= BINLOG_CHECKSUM_ALG_UNDEF; used_checksum_alg= BINLOG_CHECKSUM_ALG_OFF; }
The used_checksum_alg is not needed as you set it at function start.
Sorry, I don't understand? At the start we set it to "undef": + used_checksum_alg= BINLOG_CHECKSUM_ALG_UNDEF; This value will remain if there's an error return; else the else branch will change it to "off". Did you miss that it's set differently at the start of the function and in this else { } branch? Or did I miss something?
sql/log_event.h
enum enum_binlog_checksum_alg set_checksum_alg(enum enum_binlog_checksum_alg alg)
As we are just copying and restoring checksum_length, why not just do
Format_description_log_event::write(Log_event_writer *writer)
org_checksum_length= checksum_length... .... checksum_length= org_checksum_length;
instead of calling set_checksum_alg() again ? (The last set we can even do undonditionally which saves us an if)
Agree, that code looks silly, I don't know why I did it that way. I'll fix it, remove the function set_checksum_alg() and just do: org_checksum_length= writer->checksum_length; writer->checksum_length= BINLOG_CHECKSUM_LEN; ... writer->checksum_length= org_checksum_length;
log_event_server.cc
enum enum_binlog_checksum_alg Log_event::select_checksum_alg()
Much simpler, but we do not check if cache_type is correct
DBUG_ASSERT(((get_type_code() != ROTATE_EVENT && get_type_code() != STOP_EVENT) || get_type_code() != FORMAT_DESCRIPTION_EVENT) || cache_type == Log_event::EVENT_NO_CACHE);
Do you think this is not needed? (This could be added just before:
return BINLOG_CHECKSUM_ALG_OFF;
Agree, I was happy to remove a lot of complexity and the need for a number of asserts, but this assertion could be kept. I'll add it back.
Anyway, please remove the else after:
if (cache_type == Log_event::EVENT_NO_CACHE) return (enum_binlog_checksum_alg)binlog_checksum_options;
Ack will do.
Format_description_log_event::write(Log_event_writer *writer)
uint8 checksum_byte= (uint8) (used_checksum_alg != BINLOG_CHECKSUM_ALG_UNDEF ? used_checksum_alg : BINLOG_CHECKSUM_ALG_OFF);
Is it possible the alg can be UNDEF? You have several asserts in the code that this should not happen.
Yeah, honestly, I'm not sure. It's one of the annoying things in the code that I did not get to fix, that we have both the UNDEF and the OFF state for the checksum algorithm. Format_description code is not performance critical (only once per binlog file), so I suggest I leave the extra check for UNDEF as defensive code, but add an assertion that it is not UNDEF and see if it passes the testsuite. Does that sound ok?
slave.cc
@@ -1870,10 +1873,8 @@ static int get_master_version_and_clock(MYSQL* mysql, Master_info* mi) until it has received a new FD_m. */ mi->rli.relay_log.description_event_for_queue->checksum_alg= mi->rli.relay_log.relay_log_checksum_alg;
Not sure why this is removed.
Are these guaranteed to be the same here? If yes, why not add an assert for this?
This patch removes Log_event::checksum_alg and instead adds a field Format_description_log_event::used_checksum_alg which gets set in the Format_description_log_event constructor. So this assignment - mi->rli.relay_log.description_event_for_queue->checksum_alg= - mi->rli.relay_log.relay_log_checksum_alg; is replaced by this code just above which serves the same purpose: - Format_description_log_event(4, mysql->server_version); + Format_description_log_event(4, mysql->server_version, + mi->rli.relay_log.relay_log_checksum_alg); So I think adding an assert DBUG_ASSERT(mi->rli.relay_log.description_event_for_queue->checksum_alg == mi->rli.relay_log.relay_log_checksum_alg); is somewhat redundant, but let me know if you still think it will be useful and I'll add it. - Kristian.
Hi! <cut>
To ensure that this is not wrongly used, please add #ifdef MYSQL_CLIENT
here:
class Log_event:
#ifdef MYSQL_CLIENT + enum enum_binlog_checksum_alg read_checksum_alg; #endif
Yes, agree, that is the intention - the read_checksum_alg is only present in the class in the client, not in the server (and we want to avoid using memory for it in the server).
This code is already in the #else branch of #ifdef MYSQL_SERVER:
#ifdef MYSQL_SERVER ... #else /* The checksum algorithm used (if any) when the event was read. */ enum enum_binlog_checksum_alg read_checksum_alg;
Sorry, I missed the above #ifdef when reading the patch (for this case I didn't look at the source. Please ignore the wrong suggestion. <cut>
in Format_description_log_event(const uchar
We have:
else { checksum_alg= BINLOG_CHECKSUM_ALG_UNDEF; used_checksum_alg= BINLOG_CHECKSUM_ALG_OFF; }
The used_checksum_alg is not needed as you set it at function start.
Sorry, I don't understand? At the start we set it to "undef":
+ used_checksum_alg= BINLOG_CHECKSUM_ALG_UNDEF;
This value will remain if there's an error return; else the else branch will change it to "off".
Did you miss that it's set differently at the start of the function and in this else { } branch? Or did I miss something?
Another mistake from my side. I missed that you had removed checksum_alg= BINLOG_CHECKSUM_ALG_UNDEF; <cut>
I'll fix it, remove the function set_checksum_alg() and just do:
org_checksum_length= writer->checksum_length; writer->checksum_length= BINLOG_CHECKSUM_LEN; ... writer->checksum_length= org_checksum_length;
Great! <cut>
Format_description_log_event::write(Log_event_writer *writer)
uint8 checksum_byte= (uint8) (used_checksum_alg != BINLOG_CHECKSUM_ALG_UNDEF ? used_checksum_alg : BINLOG_CHECKSUM_ALG_OFF);
Is it possible the alg can be UNDEF? You have several asserts in the code that this should not happen.
Yeah, honestly, I'm not sure. It's one of the annoying things in the code that I did not get to fix, that we have both the UNDEF and the OFF state for the checksum algorithm.
Format_description code is not performance critical (only once per binlog file), so I suggest I leave the extra check for UNDEF as defensive code, but add an assertion that it is not UNDEF and see if it passes the testsuite. Does that sound ok?
Yes, adding an assert would be good. If we never hit it for some time, we can then later remove the test for BINLOG_CHECKSUM_ALG_UNDEF. <cut>
slave.cc
@@ -1870,10 +1873,8 @@ static int get_master_version_and_clock(MYSQL* mysql, Master_info* mi) until it has received a new FD_m. */ mi->rli.relay_log.description_event_for_queue->checksum_alg= mi->rli.relay_log.relay_log_checksum_alg;
Not sure why this is removed. ... is replaced by this code just above which serves the same purpose:
- Format_description_log_event(4, mysql->server_version); + Format_description_log_event(4, mysql->server_version, + mi->rli.relay_log.relay_log_checksum_alg);
So I think adding an assert
DBUG_ASSERT(mi->rli.relay_log.description_event_for_queue->checksum_alg == mi->rli.relay_log.relay_log_checksum_alg);
is somewhat redundant, but let me know if you still think it will be useful and I'll add it.
Not needed, thanks for the explanation. Regards, Monty
participants (2)
-
Kristian Nielsen
-
Michael Widenius