
Hi Monty,
From: Monty <monty@mariadb.org>
The purpose of this commit is to ensure that creation and changes of temporary tables are properly and predicable logged to the binary log. It also fixes some bugs where ROW logging was used in MIXED mode, when STATEMENT would be a better (and expected) choice.
Here is my review of the patch for MDEV-36099 predictable binlog format for temporary tables: Overall this seems like solid work, thanks! As Elena also remarked, this does significant changes to a very tricky and bug-ridden area of replication, that of temporary tables, this is scary stuff. But the patch does look good and relatively simple. A main change with this patch is that now all use of temporary tables will by default be binlogged in row mode. I think this is a reasonable change. There are still remaining problems with temporary tables in statement-based mode; and it also helps with parallel replication, which cannot run in parallel statement-logged temporary tables. And you provided --create-temporary-table-binlog-formats as a way to enable backwards-compatible behaviour. I think it is important to mention this prominently in the release notes. There will invariably be some users that manipulate a large number of rows with temporary tables, eg. BEGIN; CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM large_table1 JOIN large_table2 ON ... INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100; DROP TEMPORARY TABLE tmp; COMMIT; This will change from binlogging a few hundred bytes of Query events to potentially many megabytes of row events, which could cause problems in some cases. I think the change is still justified from the benefits it gives and the relative few cases where this will cause serious problems, but it is still important to try to document it as well as possible. I read the code part of the patch through in detail, but I did not spot anything in it that looks wrong. There may be things missing from the patch that are hard to spot in a review, I think that's unavoidable in a change like this. But I did try to think of possible issues without finding anything, and the patch does look quite simple compared to what it is doing. So looks good I think. Just a few questions on details I did not understand, and some minor suggestions for changes. Questions:
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index c2f842aea56..3d1df5ac42f 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -531,6 +531,7 @@ void init_update_queries(void) CF_SCHEMA_CHANGE; sql_command_flags[SQLCOM_CREATE_SEQUENCE]= (CF_CHANGES_DATA | CF_REEXECUTION_FRAGILE | + CF_FORCE_ORIGINAL_BINLOG_FORMAT | CF_AUTO_COMMIT_TRANS | CF_SCHEMA_CHANGE);
I did not get the reason for this change. I assume some bug fix found during test case writing?
--- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -817,6 +817,7 @@ typedef struct system_variables /* Flags for slow log filtering */ ulong log_slow_rate_limit; ulong binlog_format; ///< binlog format for this thd (see enum_binlog_format) + ulong create_temporary_table_binlog_formats;
Is there any way to make this take less space than 8 bytes (on 64-bit)? It goes into every THD, and only two bits are used.
+ + Temporary tables will be logged only on CREATE in STMT format + or on INSERT if all changes to the table is in the binlog. */ if ((WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()) && - (likely(!error) || thd->transaction->stmt.modified_non_trans_table || - thd->log_current_statement())) + (table->s->using_binlog() || + ((in_create_table && + (!table->s->tmp_table || thd->binlog_create_tmp_table())))) && + (likely(!error) || + (!in_create_table && + (thd->transaction->stmt.modified_non_trans_table || + thd->log_current_statement()))))
This conditional is now very hard to understand. I think I managed to understand all parts of it, except the part "thd->log_current_statement()", but that is unchanged with your patch. This if () would now benefit from being split into multiple parts, each of which explained in a comment.
diff --git a/mysql-test/main/statistics.result b/mysql-test/main/statistics.result index bb6469c46da..ebd9cb7f36e 100644 --- a/mysql-test/main/statistics.result +++ b/mysql-test/main/statistics.result @@ -1278,7 +1278,7 @@ Table Op Msg_type Msg_text mysql.column_stats analyze error Invalid argument ANALYZE TABLE mysql.column_stats; Table Op Msg_type Msg_text -mysql.column_stats analyze status OK +mysql.column_stats analyze status Table is already up to date SELECT * FROM mysql.table_stats; db_name table_name cardinality SELECT * FROM mysql.column_stats;
I did not understand why this .result change happens after your patch? Minor things:
diff --git a/mysql-test/suite/rpl/t/create_or_replace.inc b/mysql-test/suite/rpl/t/create_or_replace.inc index e8fa95cb94a..b4e91d7b412 100644 --- a/mysql-test/suite/rpl/t/create_or_replace.inc +++ b/mysql-test/suite/rpl/t/create_or_replace.inc @@ -4,7 +4,24 @@ --let $rpl_topology=1->2 --source include/rpl_init.inc
-# Create help tables +select @@binlog_format, @@create_temporary_table_binlog_formats; + +# Copy create_temporary_table_binlog_formats from master to slave +# This is done to get same results as with older versions of MariaDB. +# The slave will work even if this is not done. However in mixed +# format on slave temporary tables would not be logged to in the +# slaves binary log +let $format=`select @@create_temporary_table_binlog_formats`; +connection server_2; +--eval set @@global.create_temporary_table_binlog_formats='$format' +# Ensure that the slave threads uses the new values. +stop slave; +start slave;
Here, use --source include/stop_slave.inc and --source include/start_slave.inc. For consistent with other test cases.
- Rename of not binlogged temporary tables where binlogged to binary log which caused replication to stop.
s/where/were/
+ /* + Mark if query was logged as statement. Used to check it tmp table changes + are logged. + */ + bool query_binlogged_as_stmt;
s/check it table/check if table/ + /* + This is set for all normal tables and for temporary tables that has + the CREATE binary logged. + + 0 table table is not in binary log (not logged temporary table) + 1 table create was logged (normal table or logged temp table) + 2 table create was logged but not all changes are in the binary log. + ROW LOGGING will be used for the table. s/that has the CREATE/that have the CREATE/ s/table table is not/table is not/
@@ -794,6 +821,20 @@ void THD::mark_tmp_table_as_free_for_reuse(TABLE *table)
DBUG_ASSERT(table->s->tmp_table);
+ /* + Ensure that table changes where either binary logged or the table + is marked as not up to date. + */
s/where/were/ Thanks, Monty! Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org> - Kristian.