Re: [Maria-developers] [MariaDB/server] Restrict the speed of reading binlog from Master (#246)
GCSAdmin <notifications@github.com> writes:
In some case, the speed of reading binlog from master is high, especially when doing a new replica. It would bring the high traffic in master. So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.
It can work when slave_compressed_protocol is on. But it maybe doesn't work well when the binlog event is very big. You can view, comment on, or merge this pull request online at:
-- Patch Links --
https://github.com/MariaDB/server/pull/246.patch https://github.com/MariaDB/server/pull/246.diff
Overall this looks clean and simple. There is one problem. The patch adds a field real_network_read_len to the NET structure. This will break the client library ABI, because NET is a part of MYSQL. So this would cause client programs to crash if they are linked with a different version of the client library. So this needs to be changed (if I understand correctly). One option might be to introduce new functions like cli_safe_read_reallen() and my_net_read_packet_reallen(), which return in addition the actual amount of bytes read from the server. The old cli_safe_read() and my_net_read_packet() could then become simple wrapper functions around those. And cli_safe_read_reallen() can be used in read_event() in sql/slave.cc. A smaller issue is that in case of a large packet, a large my_sleep() may be invoked, which will cause STOP SLAVE to hang. I think this can be solved simply by calling slave_sleep() instead, it handles terminating the wait early if interrupted by STOP SLAVE. Detailed comments on the patch below. I rebased the series against latest 10.2 to get a clean diff (the pull request includes a couple merges against the main 10.2 tree, these changes are unrelated to the patch). The rebase is in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2
diff --git a/include/mysql.h.pp b/include/mysql.h.pp index 857f5b9..1da038c 100644 --- a/include/mysql.h.pp +++ b/include/mysql.h.pp @@ -35,6 +35,7 @@ typedef struct st_net { my_bool thread_specific_malloc; unsigned char compress; my_bool unused3; + unsigned long real_network_read_len;
As explained above, I believe this would break the ABI (that's the purpose of mysql.h.pp, to catch such problems).
diff --git a/sql/slave.cc b/sql/slave.cc index 20bf68e..52bb668 100644 --- a/sql/slave.cc +++ b/sql/slave.cc
@@ -3307,13 +3308,14 @@ static int request_dump(THD *thd, MYSQL* mysql, Master_info* mi, try a reconnect. We do not want to print anything to the error log in this case because this a anormal event in an idle server. + network_read_len get the real network read length in VIO, especially using compressed protocol
RETURN VALUES 'packet_error' Error number Length of packet */
-static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings) +static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings, ulong* network_read_len)
Generally, lines longer than 80 characters should be avoided (coding style).
@@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error from master"); goto err; }
+ /* Control the binlog read speed of master when read_binlog_speed_limit is non-zero + */ + ulonglong read_binlog_speed_limit_in_bytes = opt_read_binlog_speed_limit * 1024; + if (read_binlog_speed_limit_in_bytes) + { + /* prevent the tokenamount become a large value, + for example, the IO thread doesn't work for a long time + */ + if (tokenamount > read_binlog_speed_limit_in_bytes * 2) + { + lastchecktime = my_hrtime().val; + tokenamount = read_binlog_speed_limit_in_bytes * 2; + } + + do + { + ulonglong currenttime = my_hrtime().val; + tokenamount += (currenttime - lastchecktime) * read_binlog_speed_limit_in_bytes / (1000*1000); + lastchecktime = currenttime; + if(tokenamount < network_read_len) + { + ulonglong micro_sleeptime = 1000*1000 * (network_read_len - tokenamount) / read_binlog_speed_limit_in_bytes ; + my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at least sleep 1000 micro second + } + }while(tokenamount < network_read_len); + tokenamount -= network_read_len; + } +
As explained above, probably better to use slave_sleep() here to allow STOP SLAVE to interrupt a long sleep. Would it make sense to do this wait after calling queue_event()? This way the SQL thread can start applying the event immediately, reducing slave lag. What do you think? Thanks, - Kristian.
participants (1)
-
Kristian Nielsen