Hi Monty, here is my review of the patch for MDEV-31404:
commit d51086a9f1b1b6a66c7ba8eb6a6876b433fa118f (origin/bb-11.4-MDEV-31404) Author: Monty <monty@mariadb.org> Date: Sun Dec 3 21:42:44 2023 +0200
MDEV-31404 Implement binlog_space_limit
binlog_space_limit is a variable in Percona server used to limit the total size of all binary logs.
This implementation is based on code from Percona server 5.7.
General comment: The --slave-connections-needed-for-purge defaults to 1. I think this means that --log-expire-days will have no effect on systems that have binlog enabled but no slaves configured. This is something to be aware of for backwards compatibility. What I do not like about this is the consequence it will have for all systems that have binlog enabled with some --expire-log-days set but no slaves configured. After upgrade, these systems will silently start accumulating binlogs without limit, possibly filling up disk eventually. On the other hand, it seems good to have --slave-connections-needed-for-purge enabled for the new --max-binlog-total-size option; as you pointed out, otherwise a single large transaction could immediately cause deletion of all binlogs when no slaves are connected. And having --slave-connections-needed-for-purge affect purge from --max-binlog-total-size but not for --expire-log-days seems inconsistent. In the end, maybe it isn't too serious. Many small and default-configured systems will probably be able to run for ages without binlogs getting excessive due to low traffic; and it will be easy for DBAs to just delete the binlogs if needed, or google the --slave-connections-needed-for-purge if they care enough. Two suggestions to make it easier for the DBAs to know how to handle this: 1. How about emitting a note or warning in the error log, if purge from --log-expire-days=N gets delayed for more than 2*N because no slaves are connected and --slave-connections-needed-for-purge=1 ? Note: Purge of binlogs prevented for <seconds> because no slaves are connected. To allow purge of binlogs with no slaves connected, set --slave-connections-needed-for-purge=0 2. Mention in the changelogs for 11.4 that binlogs will by default no longer be purged when no slaves are configured, and point to some documentation about how to handle this for the DBA. I think with those two suggestions it should be fine. Detailed comments on the patch below. - Kristian.
diff --git a/mysql-test/suite/binlog_encryption/binlog_index-master.opt b/mysql-test/suite/binlog_encryption/binlog_index-master.opt new file mode 100644 index 00000000000..f2673d08020 --- /dev/null +++ b/mysql-test/suite/binlog_encryption/binlog_index-master.opt @@ -0,0 +1 @@ +--slave_connections_needed_for_purge=0
Is this needed? If so, would it be better to put this in a general my.cnf for the entire binlog_encryption suite, like you did for mysql-test/suite/binlog/my.cnf ? So that a similar .opt will not need to be added manually for each new binlog_encryption test that will need it.
--- a/mysql-test/suite/perfschema/t/show_sanity.test +++ b/mysql-test/suite/perfschema/t/show_sanity.test @@ -533,6 +533,7 @@ insert into test.sanity values ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_CACHE_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_SIZE"), ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOG_STMT_CACHE_SIZE"), + ("JUNK: GLOBAL-ONLY", "I_S.SESSION_VARIABLES", "MAX_BINLOGS_SIZE"),
I think this is a left-over from previous naming, should probably be "MAX_BINLOG_TOTAL_SIZE ?
+FLUSH LOGS; +FLUSH LOGS; +FLUSH LOGS; +source include/show_binary_logs.inc; +show status like "binlog_disk_use";
I think you will need --source include/wait_for_binlog_checkpoint.inc after each of these FLUSH LOGS. Otherwise random delay of binlog checkpoint could cause the binlog_disk_use value to fluctuate and get rare random test failures. (Or alternatively consider if you could omit the select of binlog_disk_use completely, since it will probably differ between versions as event formats change.) Adding a small sleep in binlog_background_thread() just before calling mysql_bin_log.mark_xid_done() can usually reproduce the random failures from delayed binlog checkpoint.
+INSERT INTO t1 VALUES (0,repeat("a",3000)); +show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3848 +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +INSERT INTO t1 VALUES (2,repeat("b",10)); +# First binary should be binary.000004 +show binary logs; +Log_name File_size +binary.000004 # +binary.000005 #
Here, I don't understand why binary.000004 is not purged. The binary.000004 contains the large transaction 'repeat("a", 3000)', so the binlog total size is too large. And the transaction 'repeat("n",10)' caused a binlog rotation. So why does that not cause binary.000004 to be purged due to --max-binlog-total-size=1500 ?
+show status like "binlog_disk_use"; +Variable_name Value +Binlog_disk_use 3647 +INSERT INTO t1 VALUES (7,repeat("g",3000)); +# binary.000001 should be deleted now +show binary logs; +Log_name File_size +binary.000002 # +binary.000003 #
Similarly here, why binary.000002 is not purged?
+INSERT INTO t1 VALUES (4,repeat("d",3000)); +--echo # First binary should be binary.000011 +source include/show_binary_logs.inc; + +RESET MASTER;
Idea: Here you could (or ask one of the other devs) add a test that the SHOW STATUS LIKE "binlog_disk_use" is equal to the size of the files on disk, to check that the running calculation of disk use doesn't miss the odd byte or two.
+ if ((longlong) (binlog_space_total + binlog_pos - found_space <= + binlog_space_limit)) + break; // Found enough
The (longlong) cast here looks misplaced. It's casting the result of the (X <= Y) comparison. I think the cast can just be removed, all the variables seem to be ulonglong type.
+static bool waiting_for_slave_to_change_binlog= 0; +static ulonglong purge_sending_new_binlog_file= 0;
I think a short comment here on these variables would be good, explaining their purpose. And same on the declaration of Atomic_counter<ulonglong> sending_new_binlog_file.
+ DBUG_ASSERT(!is_relay_log || binlog_xid_count_list.is_empty()); + if (!is_relay_log) { - I_List_iterator<xid_count_per_binlog> it(binlog_xid_count_list); - while ((b= it++) && - 0 != strncmp(log_file_name_arg+dirname_length(log_file_name_arg), - b->binlog_name, b->binlog_name_len)) - ; + xid_count_per_binlog *b; + mysql_mutex_lock(&LOCK_xid_list);
Good catch that we were uselessly doing xid_count operations on the relay log, thanks!
+int MYSQL_BIN_LOG::count_binlog_space() +{ + int error; + LOG_INFO log_info; + DBUG_ENTER("count_binlog_space"); + if (is_relay_log || !binlog_space_limit) + DBUG_RETURN(0);
Maybe this if (is_relay_log || !binlog_space_limit) could be an assert? We probably should not be calling this function at all in that case. It looks like that binlog_space_limit is already checked in all callers, not sure about relay log. (A small point, just something to consider as you like).
@@ -7608,7 +7807,8 @@ int MYSQL_BIN_LOG::rotate(bool force_rotate, bool* check_purge) @retval nonzero - error in rotating routine. */ -void MYSQL_BIN_LOG::purge() + +void MYSQL_BIN_LOG::purge(bool all)
Would be good with a small comment here explaining the meaning of the new parameter `all`. Minor stuff: Commit message:
Binlog file file sizes are checked on startup
Typo (s/file file/file/).
when updating the binary log on on binary log rotation
Typo (s/on on/and on/ ?)
Without this option max-binlog-total-size would potentially deleted binlogs needed by slaves on server startup or when a slave disconnects
Typo s/deleted/delete/.
+static Sys_var_ulonglong Sys_max_binlog_total_size( + "max_binlog_total_size", + "Maximum space to use for all binary logs. Extra logs are deleted on " + "server start, log rotation, FLUSH LOGGS or when writing to binlog. " + "Default is 0, which means no size restrictions." + "See also slave_connections_needed_for_purge",
Missing space at end of line (s/restrictions."/restrictions. "/).
+static Sys_var_ulong Sys_slave_connections_needed_for_purge( + "slave_connections_needed_for_purge", + "Number of minimum connected slaves needed for automatic binary " + "log purge with max_binlog_total_size, binlog_expire_logs_seconds " + "or binlog_expire_logs_days.",
Suggestion for better wording: "Minimum number of connected slaves required for automatic binary " "log purge with max_binlog_total_size, binlog_expire_logs_seconds " "or binlog_expire_logs_days."