commit 7a50135f3f908380e2dfb45e3f95c9fb10c01d62 Author: Andrei <andrei.elkin@mariadb.com> Date: Mon Jul 15 17:50:37 2024 +0300
MDEV-15393 part 2. Fixes to 'gtid_slave_pos duplicate key errors after mysqldump restore' MDEV-34615 mysqldump wipes off pre-existing gtid slave state
Hi Andrei, here my review of the patch, with a particular important issue to not break existing use cases and backwards compatibility, as well as some general issues/comments.
When mysqldump run on mysql system database it generates inserts sql commands into mysql.gtid_slave_pos. After running the backup script those
Here and below, I suggest better wording for text that is hard to understand or using wrong/unusual grammar. "When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table mysql.gtid_slave_pos"
inserts did not produce an expected gtid state on slave.
"did not produce the exspected gtid state"
In particular the maximum of mysql.gtid_slave_pos.sub_id did not make into rpl_global_gtid_slave_state.last_sub_id a memory object that the table persists.
I think you mean here "an in-memory object that is supposed to match the current state of the table".
And that was regardless of whether --gtid option in or out.
"whether the --gtid option was specified by the user or not".
The fixes ensure the insert block of the dump script is followed with a "summing-up" SET @global.gtid_slave_pos assignment. To address MDEV-34615 the assignment makes sure a possibly pre-existing gtid state (e.g reflecting another domain) remains as a sub-state upon the dump installation.
This change is not appropriate, as it breaks backwards compatibility and breaks a common use case. Certainly not in a GA release such as 10.5! The most common case is the provisioning of a slave with a complete dump from a master. In this case, it is required that the GTID position is set correctly across all domains. Merging the existing and new GTID position as attempted in this patch will break the provisioning of the slave. The existing supported way of not overwriting the slave's old @@GLOBAL.gtid_slave_pos is to use --master-data=2 --gtid. This will output the GTID position in the dump commented out, and the user can themselves (manually or through a script of their own) merge the positions as required for their setup and application. If you think it is important to have a facility in mysqldump to do merging automatically, then you should implement this as a new option. Such new option would need to go to the next development version (11.6 or 11.7 I suppose?). It could be --master-data = 5 or 6 for example (idea being bit 2 means "merge old and new position), or maybe some other way that doesn't require "magic" constants. I am commenting below on some problems with the implementation of the merging of old and new position. But note that these comments apply to a different patch that implement this as a new option. These changes must not be made to the existing default behaviour.
diff --git a/client/mysqldump.c b/client/mysqldump.c index 1c9d92d6e09..0be890e2528 100644 --- a/client/mysqldump.c +++ b/client/mysqldump.c @@ -5985,8 +5985,13 @@ static int dump_selected_tables(char *db, char **table_names, int tables) } /* dump_selected_tables */
+const char fmt_gtid_pos[] = "%sSET GLOBAL gtid_slave_pos=concat('%s', " + "if(length(@@global.gtid_slave_pos) = 0, '', ','), " + "@@global.gtid_slave_pos);\n";
Two problems with this: 1. This will result in a syntax error if the new position is empty and the old position is not (eg. SET GLOBAL gtid_slave_pos=',0-1-2'). 2. Unless I'm missing something obvious, this doesn't actually work to merge two different GTIDs with the same domain! It produces an error ER_DUPLICATE_GTID_DOMAIN. It seems there is a lack of coverage in the associated test case, if the actual merging of domains isn't even tested? SELECT @@GLOBAL.gtid_slave_pos; @@GLOBAL.gtid_slave_pos 0-1-2 SET GLOBAL gtid_slave_pos= CONCAT('0-1-10', ',', @@GLOBAL.gtid_slave_pos); rpl.tmp2 'mix' [ fail ] mysqltest: At line 12: query 'SET GLOBAL gtid_slave_pos= CONCAT('0-1-10', ',', @@GLOBAL.gtid_slave_pos)' failed: ER_DUPLICATE_GTID_DOMAIN (1943): GTID 0-1-2 and 0-1-10 conflict (duplicate domain id 0)
@@ -6054,8 +6059,11 @@ static int do_show_master_status(MYSQL *mysql_con, int consistent_binlog_pos, fprintf(md_result_file, "%sCHANGE MASTER TO MASTER_USE_GTID=slave_pos;\n", comment_prefix); - fprintf(md_result_file, - "%sSET GLOBAL gtid_slave_pos='%s';\n", + /* + Defer print of SET gtid_slave_pos until after its placeholder + table is guaranteed to have been dumped. + */ + sprintf(set_gtid_pos, fmt_gtid_pos,
set_gtid_pos is a fixed size buffer, if I understand the patch correctly. This seems to mean that you have a buffer overflow here if the GTID position has too many domains? Note that it is not appropriate to simply use snprintf() instead, as this will merely truncate the output and give silent data corruption (or in this case probably a strange syntax error). Instead at least a proper error should be given (better to support arbitrary GTID position of course, but that might be a separate task if this is already an existing limitation of mysqldump, I did not check).
@@ -6181,10 +6189,8 @@ static int do_show_slave_status(MYSQL *mysql_con, int use_gtid, mysql_free_result(slave); return 1; } - print_comment(md_result_file, 0, - "-- GTID position to start replication:\n"); - fprintf(md_result_file, "%sSET GLOBAL gtid_slave_pos='%s';\n", - gtid_comment_prefix, gtid_pos); + /* similarly to do_show_master_status */ + sprintf(set_gtid_pos, fmt_gtid_pos, gtid_comment_prefix, gtid_pos);
I assume this is also a potential buffer overflow, didn't check closely.
@@ -6913,6 +6932,11 @@ int main(int argc, char **argv) int exit_code; int consistent_binlog_pos= 0; int have_mariadb_gtid= 0; + /* + to hold SET @@global.gtid_slave_pos which is deferred to print + until the function epilogue. + */ + char set_gtid_pos[3 + sizeof(fmt_gtid_pos) + MAX_GTID_LENGTH]= {0};
- Kristian.