Re: [Maria-developers] 12d5c4f2813: Fixed problems with S3 after "DROP TABLE FORCE" changes
Hi, Monty! On Oct 16, Michael Widenius wrote:
revision-id: 12d5c4f2813 (mariadb-10.5.4-190-g12d5c4f2813) parent(s): afa5f735f30 author: Michael Widenius <michael.widenius@gmail.com> committer: Michael Widenius <michael.widenius@gmail.com> timestamp: 2020-09-13 17:48:15 +0300 message:
Fixed problems with S3 after "DROP TABLE FORCE" changes
MDEV-23691 S3 storage engine: delayed slave can drop the table
please put the MDEV on the first line of the commit comment
This commit also fixes all failing replication S3 tests.
A slave is delayed if it is trying to execute replicated queries on a table that is already converted to S3 by the master.
What do you mean, a slave is delayed?
Fixes for replication events on S3 tables for delayed slaves: - INSERT and INSERT ... SELECT and CREATE TABLE are ignored but written to the binary log. UPDATE & DELETE will be fixed in a future commit.
Other things: - On slaves with --s3-slave-ignore-updates set, allow S3 tables to be opened in read-write mode. This was done to be able to ignore-but-replicate queries like insert. Without this change any open of an S3 table failed with 'Table is read only' which is too early to be able to replicate the original query. - Errors are now printed if handler::extra() call fails in wait_while_tables_are_used() - Error message for row changes are changed from HA_ERR_WRONG_COMMAND to HA_ERR_TABLE_READONLY - Disable some maria_extra() calls for S3 tables. This could cause S3 tables to fail in some cases. - Added missing thr_lock_delete() to ma_open() in case of failure
diff --git a/sql/log.cc b/sql/log.cc index 2a887a68606..9c656387e90 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -2151,6 +2151,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
DBUG_RETURN(0); } + /* + This is true if we are doing an alter table that is replicated as + CREATE TABLE ... SELECT + */ + if (thd->variables.option_bits & OPTION_BIN_COMMIT_OFF) + DBUG_RETURN(0);
why binlog_commit is even called here?
DBUG_PRINT("debug", ("all: %d, in_transaction: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s", diff --git a/sql/sql_class.cc b/sql/sql_class.cc index f4cbda5490c..7c103756566 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -7476,6 +7476,34 @@ int THD::binlog_query(THD::enum_binlog_query_type qtype, char const *query_arg, DBUG_RETURN(0); }
+ +/** + Binlog current query as a statement if binary log is open + + @retval false ok + @retval true error +*/ + + +bool THD::binlog_current_query() +{ + if (!mysql_bin_log.is_open()) + return 0; + + /* + The following is needed if the query was "ignored" by a shared distributed + engine, in which case decide_logging_format() will set the binlog filter + */ + reset_binlog_local_stmt_filter(); + clear_binlog_local_stmt_filter(); + return binlog_query(THD::STMT_QUERY_TYPE, query(), query_length(), + /* is_trans */ FALSE, + /* direct */ FALSE, + /* suppress_use */ FALSE, + /* Error */ 0) > 0; +}
you created a generally looking function with a rather special behavior. Please rename it to make sure it's not a general purpose helper. For example, call it binlog_current_query_unfiltered() or force_binlog_current_query(). And the function comment should say Force-binlog current query as a statement ignoring the binlog filter setting
+ + void THD::wait_for_wakeup_ready() { diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc index 4ad4c478937..5f59e064a35 100644 --- a/sql/sql_insert.cc +++ b/sql/sql_insert.cc @@ -724,6 +724,7 @@ bool mysql_insert(THD *thd, TABLE_LIST *table_list, Item *unused_conds= 0; DBUG_ENTER("mysql_insert");
+ bzero((char*) &info,sizeof(info));
why did you move it up?
create_explain_query(thd->lex, thd->mem_root); /* Upgrade lock type if the requested lock is incompatible with @@ -764,16 +765,27 @@ bool mysql_insert(THD *thd, TABLE_LIST *table_list, DBUG_RETURN(TRUE); value_count= values->elements;
- if (mysql_prepare_insert(thd, table_list, table, fields, values, - update_fields, update_values, duplic, - &unused_conds, FALSE)) + if ((res= mysql_prepare_insert(thd, table_list, table, fields, values, + update_fields, update_values, duplic, + &unused_conds, FALSE))) + { + retval= thd->is_error(); + if (res < 0) + { + /* + Insert should be ignored but we have to log the query in statement + format in the binary log + */ + res= thd->binlog_current_query();
abort label doesn't look at `res` it returns `retval`. So I don't think you need `retval= thd->is_error()` at all, and instead you need if (res < 0) retval= thd->binlog_current_query();
+ } goto abort; + } + /* mysql_prepare_insert sets table_list->table if it was not set */ + table= table_list->table;
/* Prepares LEX::returing_list if it is not empty */ if (returning) result->prepare(returning->item_list, NULL); - /* mysql_prepare_insert sets table_list->table if it was not set */ - table= table_list->table;
why did you move this up?
context= &thd->lex->first_select_lex()->context; /* @@ -1575,21 +1587,29 @@ bool mysql_prepare_insert(THD *thd, TABLE_LIST *table_list, DBUG_ASSERT (!select_insert || !values);
if (mysql_handle_derived(thd->lex, DT_INIT)) DBUG_RETURN(1); if (table_list->handle_derived(thd->lex, DT_MERGE_FOR_INSERT)) DBUG_RETURN(1); if (thd->lex->handle_list_of_derived(table_list, DT_PREPARE)) DBUG_RETURN(1);
if (duplic == DUP_UPDATE) { /* it should be allocated before Item::fix_fields() */ if (table_list->set_insert_values(thd->mem_root)) DBUG_RETURN(1); + } + + if (!table) + table= table_list->table;
As far as I can see, table in mysql_prepare_insert is *always* either table_list->table or NULL. At least in 10.5. So I'd rather remove the table argument completely and let mysql_prepare_insert always use table_list->table.
+ + if (table->file->check_if_updates_are_ignored("INSERT")) + { + DBUG_RETURN(-1); }
if (mysql_prepare_insert_check_table(thd, table_list, fields, select_insert)) DBUG_RETURN(1);
/* Prepare the fields in the statement. */ if (values) diff --git a/sql/sql_table.cc b/sql/sql_table.cc index 15d190c3139..2843496f783 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -5206,7 +5215,19 @@ int create_table_impl(THD *thd, const LEX_CSTRING &orig_db, goto err; } else if (options.if_not_exists()) + { + /* + We never come here as part of normal create table as table existance + is checked in open_and_lock_tables(). We may come here as part of + ALTER TABLE + */
How so? There is no ALTER IF NOT EXISTS. And this is the old code, so it seems to be for CREATE TABLE IF NOT EXISTS.
+ + /* Log CREATE IF NOT EXISTS on slave for distributed engines */ + if (thd->slave_thread && (exists_hton && exists_hton->flags & + HTON_IGNORE_UPDATES)) + thd->log_current_statement= 1; goto warn;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi! On Sun, Oct 18, 2020 at 3:38 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Monty!
Fixed problems with S3 after "DROP TABLE FORCE" changes
MDEV-23691 S3 storage engine: delayed slave can drop the table
please put the MDEV on the first line of the commit comment
I prefer to have the first row readable and understandable for all possible readers, which the MDEV doesn't do. I can move the MDEV to the first row in this case, but if the commit touches several MDEV's, I will list these separately and keep the first row as a summary of what the commit is for.
This commit also fixes all failing replication S3 tests.
A slave is delayed if it is trying to execute replicated queries on a table that is already converted to S3 by the master.
What do you mean, a slave is delayed?
The above explains it, as those the MDEV. It means that the slave is not up to date with the master (either intentionally with stop slave or because the slave still has local not-s3 tables as it has not yet seen ALTER TABLE ... engine=s3). The assumption is also that anyone reading the commit knows what 'slave' and 'master' and 'delayed slave' means <cut>
diff --git a/sql/log.cc b/sql/log.cc index 2a887a68606..9c656387e90 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -2151,6 +2151,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
DBUG_RETURN(0); } + /* + This is true if we are doing an alter table that is replicated as + CREATE TABLE ... SELECT + */ + if (thd->variables.option_bits & OPTION_BIN_COMMIT_OFF) + DBUG_RETURN(0);
why binlog_commit is even called here?
Binlog_drop_table() and binlog_create_table() calls binlog_query(), which always does a commit for any query events. <cut>
+ The following is needed if the query was "ignored" by a shared distributed + engine, in which case decide_logging_format() will set the binlog filter + */ + reset_binlog_local_stmt_filter(); + clear_binlog_local_stmt_filter(); + return binlog_query(THD::STMT_QUERY_TYPE, query(), query_length(), + /* is_trans */ FALSE, + /* direct */ FALSE, + /* suppress_use */ FALSE, + /* Error */ 0) > 0; +}
you created a generally looking function with a rather special behavior. Please rename it to make sure it's not a general purpose helper. For example, call it binlog_current_query_unfiltered() or force_binlog_current_query(). And the function comment should say
Force-binlog current query as a statement ignoring the binlog filter setting
First a note. The function can be used anywhere where one would binlog the current query. The ..._local_stmt_filter() functions are only there to force the query to be logged statement based even if decide_logging_format(), used by some statement, would have wanted to do things differently. I have updated the function comment. I am not sure how many will understand the word 'unfiltered' I have expanded the function comment to explain this in more detail and also renamed the function binlog_current_query_unfiltered().
void THD::wait_for_wakeup_ready() { diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc index 4ad4c478937..5f59e064a35 100644 --- a/sql/sql_insert.cc +++ b/sql/sql_insert.cc @@ -724,6 +724,7 @@ bool mysql_insert(THD *thd, TABLE_LIST *table_list, Item *unused_conds= 0; DBUG_ENTER("mysql_insert");
+ bzero((char*) &info,sizeof(info));
why did you move it up?
Proper place to have it at start. Not directly related to this patch.
create_explain_query(thd->lex, thd->mem_root); /* Upgrade lock type if the requested lock is incompatible with @@ -764,16 +765,27 @@ bool mysql_insert(THD *thd, TABLE_LIST *table_list, DBUG_RETURN(TRUE); value_count= values->elements;
- if (mysql_prepare_insert(thd, table_list, table, fields, values, - update_fields, update_values, duplic, - &unused_conds, FALSE)) + if ((res= mysql_prepare_insert(thd, table_list, table, fields, values, + update_fields, update_values, duplic, + &unused_conds, FALSE))) + { + retval= thd->is_error(); + if (res < 0) + { + /* + Insert should be ignored but we have to log the query in statement + format in the binary log + */ + res= thd->binlog_current_query();
abort label doesn't look at `res` it returns `retval`.
Yes, this is correct. This is why 'retval' is set to is_error() above. res is only used to know if the binlog
So I don't think you need `retval= thd->is_error()` at all, and instead you need
if (res < 0) retval= thd->binlog_current_query();
Actually this is also wrong as it would reset 'result' if it was set by is_error() ;) I have changed it to: if (thd->binlog_current_query_unfiltered()) retval= 1;
+ } goto abort; + } + /* mysql_prepare_insert sets table_list->table if it was not set */ + table= table_list->table;
/* Prepares LEX::returing_list if it is not empty */ if (returning) result->prepare(returning->item_list, NULL); - /* mysql_prepare_insert sets table_list->table if it was not set */ - table= table_list->table;
why did you move this up?
I got crashes is result->prepared and got confused in the debugger that 'table' was not set. Better to set the table early instead of having it undefined for a longer amount of time.
context= &thd->lex->first_select_lex()->context; /* @@ -1575,21 +1587,29 @@ bool mysql_prepare_insert(THD *thd, TABLE_LIST *table_list, DBUG_ASSERT (!select_insert || !values);
if (mysql_handle_derived(thd->lex, DT_INIT)) DBUG_RETURN(1); if (table_list->handle_derived(thd->lex, DT_MERGE_FOR_INSERT)) DBUG_RETURN(1); if (thd->lex->handle_list_of_derived(table_list, DT_PREPARE)) DBUG_RETURN(1);
if (duplic == DUP_UPDATE) { /* it should be allocated before Item::fix_fields() */ if (table_list->set_insert_values(thd->mem_root)) DBUG_RETURN(1); + } + + if (!table) + table= table_list->table;
As far as I can see, table in mysql_prepare_insert is *always* either table_list->table or NULL. At least in 10.5.
So I'd rather remove the table argument completely and let mysql_prepare_insert always use table_list->table.
Agree that it's safe to remove the table argument. Will do that.
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -5206,7 +5215,19 @@ int create_table_impl(THD *thd, const LEX_CSTRING &orig_db, goto err; } else if (options.if_not_exists()) + { + /* + We never come here as part of normal create table as table existance + is checked in open_and_lock_tables(). We may come here as part of + ALTER TABLE + */
How so? There is no ALTER IF NOT EXISTS. And this is the old code, so it seems to be for CREATE TABLE IF NOT EXISTS.
Alter table s3_table engine=innodb table, could come here. I added the comment to clarify this. Regards, Monty
participants (2)
-
Michael Widenius
-
Sergei Golubchik