Hi Monty, here is my review of the patch for MDEV-31404:
commit d51086a9f1b1b6a66c7ba8eb6a6876b433fa118f (origin/bb-11.4-MDEV-31404) Author: Monty <monty@mariadb.org> Date: Sun Dec 3 21:42:44 2023 +0200
MDEV-31404 Implement binlog_space_limit
binlog_space_limit is a variable in Percona server used to limit the total size of all binary logs.
This implementation is based on code from Percona server 5.7.
General comment: The --slave-connections-needed-for-purge defaults to 1. I think this means that --log-expire-days will have no effect on systems that have binlog enabled but no slaves configured. This is something to be aware of for backwards compatibility. What I do not like about this is the consequence it will have for all systems that have binlog enabled with some --expire-log-days set but no slaves configured. After upgrade, these systems will silently start accumulating binlogs without limit, possibly filling up disk eventually. On the other hand, it seems good to have --slave-connections-needed-for-purge enabled for the new --max-binlog-total-size option; as you pointed out, otherwise a single large transaction could immediately cause deletion of all binlogs when no slaves are connected. And having --slave-connections-needed-for-purge affect purge from --max-binlog-total-size but not for --expire-log-days seems inconsistent. In the end, maybe it isn't too serious. Many small and default-configured systems will probably be able to run for ages without binlogs getting excessive due to low traffic; and it will be easy for DBAs to just delete the binlogs if needed, or google the --slave-connections-needed-for-purge if they care enough. Two suggestions to make it easier for the DBAs to know how to handle this: 1. How about emitting a note or warning in the error log, if purge from --log-expire-days=N gets delayed for more than 2*N because no slaves are connected and --slave-connections-needed-for-purge=1 ? Note: Purge of binlogs prevented for <seconds> because no slaves are connected. To allow purge of binlogs with no slaves connected, set --slave-connections-needed-for-purge=0 2. Mention in the changelogs for 11.4 that binlogs will by default no longer be purged when no slaves are configured, and point to some documentation about how to handle this for the DBA. I think with those two suggestions it should be fine. Detailed comments on the patch below. - Kristian.
diff --git a/mysql-test/suite/binlog_encryption/binlog_index-master.opt b/mysql-test/suite/binlog_encryption/binlog_index-master.opt new file mode 100644 index 00000000000..f2673d08020 --- /dev/null +++ b/mysql-test/suite/binlog_encryption/binlog_index-master.opt @@ -0,0 +1 @@ +--slave_connections_needed_for_purge=0
Is this needed? If so, would it be better to put this in a general my.cnf for the entire binlog_encryption suite, like you did for mysql-test/suite/binlog/my.cnf ? So that a similar .opt will not need to be added manually for each new binlog_encryption test that will need it.
--- a/mysql-test/suite/perfschema/t/show_sanity.test +++ b/mysql-test/suite/perfschema/t/show_sanity.test @@ -533,6 +533,7 @@ insert into test.sanity values ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_CACHE_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_STMT_CACHE_SIZE"), + ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOGS_SIZE"),
I think this is a left-over from previous naming, should probably be "MAX_BINLOG_TOTAL_SIZE ?
+FLUSH LOGS; +FLUSH LOGS; +FLUSH LOGS; +source include/show_binary_logs.inc; +show status like "binlog_disk_use";
I think you will need --source include/wait_for_binlog_checkpoint.inc after each of these FLUSH LOGS. Otherwise random delay of binlog checkpoint could cause the binlog_disk_use value to fluctuate and get rare random test failures. (Or alternatively consider if you could omit the select of binlog_disk_use completely, since it will probably differ between versions as event formats change.) Adding a small sleep in binlog_background_thread() just before calling mysql_bin_log.mark_xid_done() can usually reproduce the random failures from delayed binlog checkpoint.
+INSERT INTO t1 VALUES (0,repeat("a",3000)); +show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3848 +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +INSERT INTO t1 VALUES (2,repeat("b",10)); +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +binary.000005 #
Here, I don't understand why binary.000004 is not purged. The binary.000004 contains the large transaction 'repeat("a", 3000)', so the binlog total size is too large. And the transaction 'repeat("n",10)' caused a binlog rotation. So why does that not cause binary.000004 to be purged due to --max-binlog-total-size=1500 ?
+show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3647 +INSERT INTO t1 VALUES (7,repeat("g",3000)); +# binary.000001 should be deleted now +show binary logs; +Log_name File_size +binary.000002 # +binary.000003 #
Similarly here, why binary.000002 is not purged?
+INSERT INTO t1 VALUES (4,repeat("d",3000)); +--echo # First binary should be binary.000011 +source include/show_binary_logs.inc; + +RESET MASTER;
Idea: Here you could (or ask one of the other devs) add a test that the SHOW STATUS LIKE "binlog_disk_use" is equal to the size of the files on disk, to check that the running calculation of disk use doesn't miss the odd byte or two.
+ if ((longlong) (binlog_space_total + binlog_pos - found_space <= + binlog_space_limit)) + break; // Found enough
The (longlong) cast here looks misplaced. It's casting the result of the (X <= Y) comparison. I think the cast can just be removed, all the variables seem to be ulonglong type.
+static bool waiting_for_slave_to_change_binlog= 0; +static ulonglong purge_sending_new_binlog_file= 0;
I think a short comment here on these variables would be good, explaining their purpose. And same on the declaration of Atomic_counter<ulonglong> sending_new_binlog_file.
+ DBUG_ASSERT(!is_relay_log || binlog_xid_count_list.is_empty()); + if (!is_relay_log) { - I_List_iterator<xid_count_per_binlog> it(binlog_xid_count_list); - while ((b= it++) && - 0 != strncmp(log_file_name_arg+dirname_length(log_file_name_arg), - b->binlog_name, b->binlog_name_len)) - ; + xid_count_per_binlog *b; + mysql_mutex_lock(&LOCK_xid_list);
Good catch that we were uselessly doing xid_count operations on the relay log, thanks!
+int MYSQL_BIN_LOG::count_binlog_space() +{ + int error; + LOG_INFO log_info; + DBUG_ENTER("count_binlog_space"); + if (is_relay_log || !binlog_space_limit) + DBUG_RETURN(0);
Maybe this if (is_relay_log || !binlog_space_limit) could be an assert? We probably should not be calling this function at all in that case. It looks like that binlog_space_limit is already checked in all callers, not sure about relay log. (A small point, just something to consider as you like).
@@ -7608,7 +7807,8 @@ int MYSQL_BIN_LOG::rotate(bool force_rotate, bool* check_purge) @retval nonzero - error in rotating routine. */ -void MYSQL_BIN_LOG::purge() + +void MYSQL_BIN_LOG::purge(bool all)
Would be good with a small comment here explaining the meaning of the new parameter `all`. Minor stuff: Commit message:
Binlog file file sizes are checked on startup
Typo (s/file file/file/).
when updating the binary log on on binary log rotation
Typo (s/on on/and on/ ?)
Without this option max-binlog-total-size would potentially deleted binlogs needed by slaves on server startup or when a slave disconnects
Typo s/deleted/delete/.
+static Sys_var_ulonglong Sys_max_binlog_total_size( + "max_binlog_total_size", + "Maximum space to use for all binary logs. Extra logs are deleted on " + "server start, log rotation, FLUSH LOGGS or when writing to binlog. " + "Default is 0, which means no size restrictions." + "See also slave_connections_needed_for_purge",
Missing space at end of line (s/restrictions."/restrictions. "/).
+static Sys_var_ulong Sys_slave_connections_needed_for_purge( + "slave_connections_needed_for_purge", + "Number of minimum connected slaves needed for automatic binary " + "log purge with max_binlog_total_size, binlog_expire_logs_seconds " + "or binlog_expire_logs_days.",
Suggestion for better wording: "Minimum number of connected slaves required for automatic binary " "log purge with max_binlog_total_size, binlog_expire_logs_seconds " "or binlog_expire_logs_days."
Hi! On Fri, Dec 8, 2023 at 10:06 PM Kristian Nielsen via developers <developers@lists.mariadb.org> wrote: <cut>
The --slave-connections-needed-for-purge defaults to 1. I think this means that --log-expire-days will have no effect on systems that have binlog enabled but no slaves configured. This is something to be aware of for backwards compatibility. <cut> In the end, maybe it isn't too serious. Many small and default-configured systems will probably be able to run for ages without binlogs getting excessive due to low traffic; and it will be easy for DBAs to just delete the binlogs if needed, or google the --slave-connections-needed-for-purge if they care enough.
I agree with both the comments and the conclusion. It it is better to keep things consistent than do things that are hard to explain except for 'it has always been that way'.
Two suggestions to make it easier for the DBAs to know how to handle this:
1. How about emitting a note or warning in the error log, if purge from --log-expire-days=N gets delayed for more than 2*N because no slaves are connected and --slave-connections-needed-for-purge=1 ?
Note: Purge of binlogs prevented for <seconds> because no slaves are connected. To allow purge of binlogs with no slaves connected, set --slave-connections-needed-for-purge=0
Unfortunately I do not think many persons are reading the logs. I hope that having this in release logs (under behavior changes) will be enough for people to notice this change. If not, I will add the warning.
2. Mention in the changelogs for 11.4 that binlogs will by default no longer be purged when no slaves are configured, and point to some documentation about how to handle this for the DBA.
Yes, that has to be done. I will do this as soon as the patch is pushed. <cut>
diff --git a/mysql-test/suite/binlog_encryption/binlog_index-master.opt b/mysql-test/suite/binlog_encryption/binlog_index-master.opt new file mode 100644 index 00000000000..f2673d08020 --- /dev/null +++ b/mysql-test/suite/binlog_encryption/binlog_index-master.opt @@ -0,0 +1 @@ +--slave_connections_needed_for_purge=0
Is this needed? If so, would it be better to put this in a general my.cnf for the entire binlog_encryption suite, like you did for mysql-test/suite/binlog/my.cnf ? So that a similar .opt will not need to be added manually for each new binlog_encryption test that will need it.
In binlog tests there are a LOT of tests that checks binlog files. So there it is needed. In binlog_encryption it was only this single test that checked binlog rotation. As this is only one test, better to just disable the option in that one test.
--- a/mysql-test/suite/perfschema/t/show_sanity.test +++ b/mysql-test/suite/perfschema/t/show_sanity.test @@ -533,6 +533,7 @@ insert into test.sanity values ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_CACHE_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_STMT_CACHE_SIZE"), + ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOGS_SIZE"),
I think this is a left-over from previous naming, should probably be "MAX_BINLOG_TOTAL_SIZE ?
Fixed. The reason I did not notice this is because this test is permanently disabled.
+FLUSH LOGS; +FLUSH LOGS; +FLUSH LOGS; +source include/show_binary_logs.inc; +show status like "binlog_disk_use";
I think you will need --source include/wait_for_binlog_checkpoint.inc after each of these FLUSH LOGS. Otherwise random delay of binlog checkpoint could cause the binlog_disk_use value to fluctuate and get rare random test failures. (Or alternatively consider if you could omit the select of binlog_disk_use completely, since it will probably differ between versions as event formats change.)
FLUSH LOGS is safe as it forces a rotation, which makes the values stable. Yes, binlog_disk will change if we change things in the binary log format. However, changing the binary log format can also cause this test to fail as we get rotations in different places. Anyway, I do not expect more than one or two binary log format changes in older versions, which should be trivial to merge.
Adding a small sleep in binlog_background_thread() just before calling mysql_bin_log.mark_xid_done() can usually reproduce the random failures from delayed binlog checkpoint.
I added a sleep(1) at the above place but the max_binlog_total_size result did not change. As far as I can see in the binary log, a checkpoint happens always as part of rotate. I assume this means that after FLUSH LOGS the value of binlog_disk_use will be stable? What determine when we do a new checkpoint? I assume it is size or time of last checkpoint. In both case, should we not be safe as we are doing only one or two inserts per binary log and then do a rotate + checkpoint before we use binlog_disk_use? Anyway, this is academic as the test is using MyISAM tables and there can not be any random checkpoints initiated by the storage engine.
+INSERT INTO t1 VALUES (0,repeat("a",3000)); +show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3848 +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +INSERT INTO t1 VALUES (2,repeat("b",10)); +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +binary.000005 #
Here, I don't understand why binary.000004 is not purged. The binary.000004 contains the large transaction 'repeat("a", 3000)', so the binlog total size is too large. And the transaction 'repeat("n",10)' caused a binlog rotation. So why does that not cause binary.000004 to be purged due to --max-binlog-total-size=1500 ?
Here is what happens. - We rotate binary logs when we notice that the binary log file > max_binlog_size. - After the INSERT INTO t1 VALUES (0,repeat("a",3000)) the binlog size is of binlog.00004 is 3848. It will not be purged as it is in use. - INSERT INTO t1 VALUES (2,repeat("b",10)) will be added to the binary log and it will now be 4139 bytes, causing the binary log to rotate. - End of rotate() calls purge_logs_by_size(). - MYSQL_BIN_LOG::can_purge_log ("binary.000004") is called. - This returns 'false' as there is an entry in the LOCK_xid_list that points to binary.00004 The next insert or flush will be able to do the purge as part of rotate, there will be a checkpoint entry (as part of the rotate) that will clear the LOCK_xid_list. In other words, as part of rotate, the purge of the binary logs will be after the next write.
+show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3647 +INSERT INTO t1 VALUES (7,repeat("g",3000)); +# binary.000001 should be deleted now +show binary logs; +Log_name File_size +binary.000002 # +binary.000003 #
Similarly here, why binary.000002 is not purged?
Same reason as above.
+INSERT INTO t1 VALUES (4,repeat("d",3000)); +--echo # First binary should be binary.000011 +source include/show_binary_logs.inc; + +RESET MASTER;
Idea: Here you could (or ask one of the other devs) add a test that the SHOW STATUS LIKE "binlog_disk_use" is equal to the size of the files on disk, to check that the running calculation of disk use doesn't miss the odd byte or two.
I have checked this manually by removing "-- replace_column 2 #" from show_binary_logs.inc Wonder how to do that easily. We have 'show binary logs', but how to sum a column over all rows? For now, I have replaced the last show_binary_logs.inc with 'show binary logs', which allow us to get the binary logs in the output, which makes it trivial to check that binlog_disk_use is correct.
+ if ((longlong) (binlog_space_total + binlog_pos - found_space <= + binlog_space_limit)) + break; // Found enough
The (longlong) cast here looks misplaced. It's casting the result of the (X <= Y) comparison. I think the cast can just be removed, all the variables seem to be ulonglong type.
Cast removed
+static bool waiting_for_slave_to_change_binlog= 0; +static ulonglong purge_sending_new_binlog_file= 0;
I think a short comment here on these variables would be good, explaining their purpose. And same on the declaration of Atomic_counter<ulonglong> sending_new_binlog_file.
Comment added: /* The following variables are here to allows us to quickly check if the can_purge_log(log_file_name_arg) name will fail in the 'log_in_use' call. waiting_for_slave_to_change_binlog is 1 if last log_in_use failed. purge_binlog_name is the last failed log_file_name_arg. sending_new_binlog_file, cached in purge_sending_new_binlog_file, is incremented every time a slave changes to use a new binary log. */ <cut>
+int MYSQL_BIN_LOG::count_binlog_space() +{ + int error; + LOG_INFO log_info; + DBUG_ENTER("count_binlog_space"); + if (is_relay_log || !binlog_space_limit) + DBUG_RETURN(0);
Maybe this if (is_relay_log || !binlog_space_limit) could be an assert? We probably should not be calling this function at all in that case. It looks like that binlog_space_limit is already checked in all callers, not sure about relay log. (A small point, just something to consider as you like).
Should be fixed when we introduce MYSQL_RELAY_LOG. Anyway, I will change this to an assert for now.
@@ -7608,7 +7807,8 @@ int MYSQL_BIN_LOG::rotate(bool force_rotate, bool* check_purge) @retval nonzero - error in rotating routine. */ -void MYSQL_BIN_LOG::purge() + +void MYSQL_BIN_LOG::purge(bool all)
Would be good with a small comment here explaining the meaning of the new parameter `all`.
Comment added <cut>
Commit message: All types fixed
+static Sys_var_ulonglong Sys_max_binlog_total_size( + "max_binlog_total_size", + "Maximum space to use for all binary logs. Extra logs are deleted on " + "server start, log rotation, FLUSH LOGGS or when writing to binlog. " + "Default is 0, which means no size restrictions." + "See also slave_connections_needed_for_purge",
Missing space at end of line (s/restrictions."/restrictions. "/).
+static Sys_var_ulong Sys_slave_connections_needed_for_purge( + "slave_connections_needed_for_purge", + "Number of minimum connected slaves needed for automatic binary " + "log purge with max_binlog_total_size, binlog_expire_logs_seconds " + "or binlog_expire_logs_days.",
Suggestion for better wording:
"Minimum number of connected slaves required for automatic binary " "log purge with max_binlog_total_size, binlog_expire_logs_seconds " "or binlog_expire_logs_days."
Both fixed. Thanks a lot for the review! Regards, Monty
Hi Monty, Michael Widenius <michael.widenius@gmail.com> writes:
Two suggestions to make it easier for the DBAs to know how to handle this:
1. How about emitting a note or warning in the error log, if purge from --log-expire-days=N gets delayed for more than 2*N because no slaves are connected and --slave-connections-needed-for-purge=1 ?
Unfortunately I do not think many persons are reading the logs. I hope that having this in release logs (under behavior changes) will be enough for people to notice this change. If not, I will add the warning.
Ok.
Is this needed? If so, would it be better to put this in a general my.cnf for the entire binlog_encryption suite, like you did for mysql-test/suite/binlog/my.cnf ? So that a similar .opt will not need to be added manually for each new binlog_encryption test that will need it.
In binlog tests there are a LOT of tests that checks binlog files. So there it is needed. In binlog_encryption it was only this single test that checked binlog rotation. As this is only one test, better to just disable the option in that one test.
Ok, that's fine, not a big thing.
I think you will need --source include/wait_for_binlog_checkpoint.inc after each of these FLUSH LOGS. Otherwise random delay of binlog checkpoint could cause the binlog_disk_use value to fluctuate and get rare random test failures.
Anyway, this is academic as the test is using MyISAM tables and there can not be any random checkpoints initiated by the storage engine.
Right, I missed that. I guess comes from fixing so many parallel replication test cases, starting to forget that not everything is InnoDB ;-).
Here is what happens.
<snip>
In other words, as part of rotate, the purge of the binary logs will be after the next write.
Thanks for the explanation. Good to know why this happens. The behaviour is fine, and normally it's not relevant, since --max-binlog-total-size will be much larger than a single binlog file.
Wonder how to do that easily. We have 'show binary logs', but how to sum a column over all rows?
For now, I have replaced the last show_binary_logs.inc with 'show binary logs', which allow us to get the binary logs in the output, which makes it trivial to check that binlog_disk_use is correct.
I was thinking of a small --perl section that did a stat() on each binlog file. But I think your way is fine as well.
Maybe this if (is_relay_log || !binlog_space_limit) could be an assert?
Should be fixed when we introduce MYSQL_RELAY_LOG.
Agree. - Kristian.
participants (2)
-
Kristian Nielsen
-
Michael Widenius