Hi! On Wed, May 12, 2021 at 3:18 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Michael!
On May 12, Michael Widenius wrote:
revision-id: c80f867b0df (mariadb-10.6.0-78-gc80f867b0df) parent(s): eebebe8ef75 author: Michael Widenius <michael.widenius@gmail.com> committer: Michael Widenius <michael.widenius@gmail.com> timestamp: 2021-05-04 21:09:11 +0300 message:
MDEV-23844 Atomic DROP TABLE ... diff --git a/sql/sql_view.cc b/sql/sql_view.cc index e6d726b30d7..ff8aa55adef 100644 --- a/sql/sql_view.cc +++ b/sql/sql_view.cc @@ -1861,8 +1868,18 @@ bool mysql_drop_view(THD *thd, TABLE_LIST *views, enum_drop_mode drop_mode) not_exists_count++; continue; } + if (!first_table++)
yuck. so your "first_table" actually means "not_first_view", because when it's FALSE, it's the first view.
why not to rename it to "not_first_view" ?
I have renamed it to view_count. I prefer to use a counter as it is easer to set and test at the same time. Also, when debugging it also helps to see where one is.
+ { + if (ddl_log_drop_view_init(thd, &ddl_log_state)) + DBUG_RETURN(TRUE);
what about a crash here? the first entry is written, the second is not.
The init creates the bases for a set of views to drop. We do init once and then we only need to do ddl_log_drop_view for the rest. Most ddl_log operations has only one call. Only those that can have multiple operations, like drop or rename has an init call.
+ } + if (ddl_log_drop_view(thd, &ddl_log_state, &cpath, &view->db, + &view->table_name)) + DBUG_RETURN(TRUE);
<cut>
diff --git a/sql/log.cc b/sql/log.cc index 6e0a9706ec8..dba999054e4 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -10471,16 +10483,16 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, bool last_gtid_standalone= false; bool last_gtid_valid= false; #endif + DBUG_ENTER("TC_LOG_BINLOG::recover");
if (! fdle->is_valid() || - (do_xa && my_hash_init(key_memory_binlog_recover_exec, &xids, - &my_charset_bin, TC_LOG_PAGE_SIZE/3, 0, - sizeof(my_xid), 0, 0, MYF(0)))) + (my_hash_init(key_memory_binlog_recover_exec, &xids, + &my_charset_bin, TC_LOG_PAGE_SIZE/3, 0, + sizeof(my_xid), 0, 0, MYF(0)))) goto err1;
- if (do_xa) - init_alloc_root(key_memory_binlog_recover_exec, &mem_root, - TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0)); + init_alloc_root(key_memory_binlog_recover_exec, &mem_root, + TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0));
I don't think these two changes are correct. do_xa is TRUE on crash recovery and FALSE on gtid state recovery on 5.5->10.0 upgade or if gtid state file was deleted.
The problem is that do_xa is not always true on crash recovery. For example if the following is not true in do_binlog_recovery: if (ev->flags & LOG_EVENT_BINLOG_IN_USE_F) It will execute binglog recovery later with do_xa_recovery == false trough the following code inTC_LOG_BINLOG::open(const char *opt_name) if (!binlog_state_recover_done) { binlog_state_recover_done= true; if (do_binlog_recovery(opt_bin_logname, false)) DBUG_RETURN(1); } Here is a trace: #0 MYSQL_BIN_LOG::recover (this=0x2783d80 <mysql_bin_log>, linfo=0x7fffffff8590, last_log_name=0x7fffffff8100 "./master-bin.000001", first_log=0x7fffffff8410, fdle=0x35685a8, do_xa=false) at /my/maria-10.6/sql/log.cc:10474 #1 0x0000000000dc4c10 in MYSQL_BIN_LOG::do_binlog_recovery (this=0x2783d80 <mysql_bin_log>, opt_name=0x319d098 "master-bin", do_xa_recovery=false) at /my/maria-10.6/sql/log.cc:10779 #2 0x0000000000db0c90 in MYSQL_BIN_LOG::open (this=0x2783d80 <mysql_bin_log>, log_name=0x319d098 "master-bin", new_name=0x0, next_log_number=0, io_cache_type_arg=WRITE_CACHE, max_size_arg=1073741824, null_created_arg=false, need_mutex=true) at /my/maria-10.6/sql/log.cc:3617 #3 0x00000000007c0d24 in init_server_components () at We also have to be able to collect xa events if someone has deleted the gtid state file for ddl_recovery to work.
There is no point to initialize the xid hash or mem_root, as it's not a crash recovery and there can be no incomplete transactions or ddl statements.
Yes, I've seen the commit comment:
- TC_LOG_BINLOG::recover() was changed to always collect Xid's from the binary log and always call ddl_log_close_binlogged_events(). This was needed to be able to collect DROP TABLE events with embedded Xid's (used by ddl log).
but I cannot understand how it applies here.
The issue is that because of ddl log, ::recover must ALWAYS collect binlog 'xid's to be able to do ddl log recovery at all. You can verify this by reversing the patch and do: mtr atomic.create_view This will fail with a different result that shows that ddl recovery was reverting things wrongly because it could not find the xid's in the binlog (as they where not collected)
diff --git a/sql/sql_table.cc b/sql/sql_table.cc
index 7a88ffdfd1b..0a83396ae14 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -1363,6 +1373,17 @@ int mysql_rm_table_no_locks(THD *thd, TABLE_LIST *tables, bool if_exists,
thd->replication_flags= 0; was_view= table_type == TABLE_TYPE_VIEW; + + if (!first_table++)
first_table -> not_first_table
changed to table_count
+ { + LEX_CSTRING comment= {comment_start, (size_t) comment_len}; + if (ddl_log_drop_table_init(thd, &ddl_log_state, &comment)) + { + error= 1; + goto err; + } + } + if ((table_type == TABLE_TYPE_UNKNOWN) || (was_view && !drop_view) || (table_type != TABLE_TYPE_SEQUENCE && drop_sequence)) { @@ -1565,6 +1611,7 @@ int mysql_rm_table_no_locks(THD *thd, TABLE_LIST *tables, bool if_exists, table_name.str, (uint)table_name.length); mysql_audit_drop_table(thd, table); } + ddl_log_update_phase(&ddl_log_state, DDL_DROP_PHASE_COLLECT);
better call it DDL_DROP_PHASE_BINLOG. It would've saved me ~10 minutes reading the code back and forth, as ""DDL_DROP_PHASE_BINLOG" immediately explains what this phase is for.
Will do. Not sure where PHASE_COLLECT come from...
if (!dont_log_query && (!error || table_dropped || non_existing_table_error(error))) diff --git a/sql/ddl_log.h b/sql/ddl_log.h index 316e6708f22..b843404fc3c 100644 --- a/sql/ddl_log.h +++ b/sql/ddl_log.h @@ -97,6 +101,14 @@ enum enum_ddl_log_rename_table_phase { DDL_RENAME_PHASE_TABLE, };
+enum enum_ddl_log_drop_table_phase { + DDL_DROP_PHASE_TABLE=0, + DDL_DROP_PHASE_TRIGGER,
no phase for deleting data from stat tables?
+ DDL_DROP_PHASE_COLLECT, + DDL_DROP_PHASE_RESET, /* Reset found list of dropped tables */
this one is unused (also unused in further patches, I've checked)
No it isn't. case DDL_DROP_PHASE_COLLECT: .... if (increment_phase(entry_pos)) case DDL_DROP_PHASE_RESET: The increment_phase changes current phase on disk to the next one that is PHASE_RESET and it will be used if there is a crash in the next phase.
+ DDL_DROP_PHASE_END +}; + /* Setting ddl_log_entry.phase to this has the same effect as setting action_type to DDL_IGNORE_LOG_ENTRY_CODE diff --git a/sql/ddl_log.cc b/sql/ddl_log.cc index 777dfdddbc7..6145b2f7318 100644 --- a/sql/ddl_log.cc +++ b/sql/ddl_log.cc @@ -109,6 +112,7 @@ struct st_global_ddl_log };
st_global_ddl_log global_ddl_log; +String ddl_drop_query; // Used during startup recovery
static
Fixed in later commits where all ddl log variables are moved to recovery_state <cut>
+ case DDL_LOG_DROP_TABLE_ACTION: + { + LEX_CSTRING db, table, path; + db= ddl_log_entry->db; + table= ddl_log_entry->name; + /* Note that path is without .frm extension */ + path= ddl_log_entry->tmp_name; + + switch (ddl_log_entry->phase) { + case DDL_DROP_PHASE_TABLE: + if (hton) + { + if ((error= hton->drop_table(hton, path.str))) + { + if (!non_existing_table_error(error)) + break; + error= -1; + } + } + else + error= ha_delete_table_force(thd, path.str, &db, &table); + if (error <= 0) + { + /* Not found or already deleted. Delete .frm if it exists */ + strxnmov(to_path, sizeof(to_path)-1, path.str, reg_ext, NullS); + mysql_file_delete(key_file_frm, to_path, MYF(MY_WME|MY_IGNORE_ENOENT));
this should rather be inside the previous if(). no need to do it for ha_delete_table_force(), as it doesn't leave any traces and removes all remnants of a table.
ha_delete_table_force does not delete the .frm file.
+ } + if (ddl_log_increment_phase_no_lock(entry_pos)) + break; + (void) ddl_log_sync_no_lock();
No need to, ddl_log_increment_phase_no_lock() syncs internally
Fixed in later commits.
+ /* Fall through */ + case DDL_DROP_PHASE_TRIGGER: + Table_triggers_list::drop_all_triggers(thd, &db, &table, + MYF(MY_WME | MY_IGNORE_ENOENT)); + if (ddl_log_increment_phase_no_lock(entry_pos)) + break; + (void) ddl_log_sync_no_lock();
same. and in other places too, I'd expect.
All fixed in later commits
+ /* Fall through */ + + case DDL_DROP_PHASE_COLLECT:
indentation?
Fixed by later commits.
+ append_identifier(thd, &ddl_drop_query, &db); + ddl_drop_query.append('.'); + append_identifier(thd, &ddl_drop_query, &table); + ddl_drop_query.append(','); + /* We don't increment phase as we want to retry this in case of crash */ + + if (!ddl_log_entry->next_entry && mysql_bin_log.is_open()) + { + /* Last drop table. Write query to binlog */ + LEX_CSTRING end_comment= + { STRING_WITH_LEN(" /* generated by ddl log */")};
"generated by ddl log" ? a "log" cannot "generate" a statement, may be "generated during ddl recovery" ?
I was thinking about ddl_log.cc here. But agree that ddl recovery is more clear. Fixed. <cut>
default: DBUG_ASSERT(0); break; @@ -1682,6 +1823,7 @@ void ddl_log_release_entries(DDL_LOG_STATE *ddl_log_state) next= log_entry->next_active_log_entry; ddl_log_release_memory_entry(log_entry); } + ddl_log_state->list= 0;
may be you should also clear ddl_log_state->execute_entry below?
Has never been a problem in the code as everything has been procteced by ddl_log_state->list. But agree it is a good idea, just in case. Will add it.
if (ddl_log_state->execute_entry) ddl_log_release_memory_entry(ddl_log_state->execute_entry); @@ -1776,6 +1918,33 @@ bool ddl_log_update_xid(DDL_LOG_STATE *state, ulonglong xid) }
+/* + Write ddl_log_entry and write or update ddl_execute_entry +*/ + +static bool ddl_log_write(DDL_LOG_STATE *ddl_state, + DDL_LOG_ENTRY *ddl_log_entry)
could you use a more descriptive name, please?
Not for this one. I would have to go and change every atomic commit for this to work and I don't have time for it now. Can be done later as it has to be a separate commit anyway because of the above reason. I also didn't come up with a very good name for it :( ddl_log_finalize ? ddl_log_write_entry_and_exceute_entry? Neither much better... Regards, Monty