Re: 1c9edc72fc2: MDEV-32830 I. refactor XA binlogging for better integration with BGC/replication/recovery
Hi, Andrei, On Jan 05, Andrei wrote:
commit 1c9edc72fc2 Author: Andrei <andrei.elkin@mariadb.com> Date: Wed Nov 1 20:32:45 2023 +0200
MDEV-32830 I. refactor XA binlogging for better integration with BGC/replication/recovery
This commit is the part I in the series of three that addresses MDEV-31949.
This part improves upon MDEV-742 design of the XA binlogging. When binlog is ON, to execute XA-prepare first in engines to write the xa-prepare replication event last. MYSQL_BIN_LOG::run_ordered_commit() is made to execute XA commit and rollback in engines, as well as it completes with the XA transaction state, so that its XID has become available for next use, e.g by another slave parallel worker, that e.g could be waiting for its order to commit a being prepared XID transaction.
Is it intentional that line length in the paragraph above varies randomly from line to line? And I don't understand why MYSQL_BIN_LOG::run_ordered_commit() has to do the complete XA commit in all engines? it's not its job. And it won't make XID available sooner, it'll only make XID available in a different function.
Notable changes:
sql/handler.cc - ha_prepare() now is ensured to execute binlog_hton::prepare() as the last branch to prepare; recovery concerns are addressed separately (not by MDEV-21469, but rather along the plan of bug#76233) - read-only cases are handled more uniformly with extending has_binlog_hton() that registers the binlog branch at ha_prepare() when necessary - ha_rollback_trans() executes binlog_hton::rollback() as first branch, as a part of the 76233 planned recovery;
of <what> planned recovery?
- ditto to the external completion of XA via ha_commit_or_rollback_by_xid(). sql/log.cc - binlog_commit,rollback() simplified/clarified on the condition of an empty statement/trx - MYSQL_BIN_LOG::run_commit_ordered() is extended to cover XA-{commit, rollback} respective actions in Engines - XID unlogging by XA is converted to employ the standard binlog-checkpoint mechanism (cache_mngr->using_xa= TRUE) sql/xa.cc - trans_xa_commit,rollback() are refactored to invoke a common function for external (connection) completion - adapting the above two and slave_applier_reset_xa_trans() to possible (normal to slave) XID release, inside run_ordered_commit().
diff --git a/sql/handler.cc b/sql/handler.cc --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1472,6 +1472,48 @@ static int prepare_or_error(handlerton *ht, THD *thd, bool all) return err; }
+/** + Search in @c ha_info transaction list branches for binlog. + When the 2nd argument is not NULL and the modified engine number + is non-zero the binlog branch is activated. + + @param ha_info The first node in the transaction descriptor list. + @param thd The client thread that executes the transaction. + + @return true when the binlog branch is registered or set up for transaction, + false otherwise. +*/ +static bool has_binlog_hton(Ha_trx_info *ha_info, THD* thd= NULL) +{ + bool rc; + uint rw_count; + + for (rc= false, rw_count= 0; ha_info && !rc; ha_info= ha_info->next()) + { + rc= ha_info->ht() == binlog_hton; + rw_count += (ha_info->is_trx_read_write()); + } + + if (!rc && thd && !thd->is_current_stmt_binlog_disabled() && + unlikely(rw_count > 0)) + { + /* + In case an engine is modified but binlog is not part of xa-prepared + transaction (thd is non-NULL) set out binlogging of an empty xa-prepare. + */
when can we land here? non-trans (or non-XA) table modified?
+ DBUG_ASSERT(thd->transaction->xid_state.get_state_code() == XA_IDLE); + + THD_TRANS *ptr_trans= &thd->transaction->all; + thd->ha_data[binlog_hton->slot].ha_info[1].register_ha(ptr_trans, + binlog_hton); + thd->ha_data[binlog_hton->slot].ha_info[1].set_trx_read_write(); + thd->binlog_setup_trx_data(); + + rc= true; + } + + return rc; +}
/** @retval @@ -1488,9 +1530,16 @@ int ha_prepare(THD *thd)
if (ha_info) { + int err; + bool has_binlog= has_binlog_hton(ha_info, thd); + for (; ha_info; ha_info= ha_info->next()) { handlerton *ht= ha_info->ht(); + + if (ht == binlog_hton) + continue;
Does binlog need a handlerton anymore? It was an easy way to get it registered for participating in 2pc. But if you always filter it out, then there's no need to add it to ha_info list in the first place, is there?
+ if (ht->prepare) { if (unlikely(prepare_or_error(ht, thd, all))) diff --git a/sql/log.cc b/sql/log.cc --- a/sql/log.cc +++ b/sql/log.cc @@ -1895,6 +1896,11 @@ binlog_rollback_flush_trx_cache(THD *thd, bool all, if (thd->transaction->xid_state.get_state_code() == XA_PREPARED) buflen= serialize_with_xid(thd->transaction->xid_state.get_xid(), buf, query, q_len); + cache_mngr->using_xa= TRUE; +#ifndef DBUG_OFF + if (thd->killed != NOT_KILLED) + thd->lex->sql_command= SQLCOM_XA_ROLLBACK; // for run_commit_ord asserts
1. don't abbrev, commits are for others not for you 2. what other values can sql_command have here?
+#endif } Query_log_event end_evt(thd, buf, buflen, TRUE, TRUE, TRUE, 0);
@@ -2186,16 +2200,13 @@ int binlog_commit(THD *thd, bool all, bool ro_1pc) }
if (cache_mngr->trx_cache.empty() && - (thd->transaction->xid_state.get_state_code() != XA_PREPARED || - !(thd->ha_data[binlog_hton->slot].ha_info[1].is_started() && - thd->ha_data[binlog_hton->slot].ha_info[1].is_trx_read_write()))) + (!(is_preparing_xa(thd) || is_prepared_xa(thd)) || + !is_binlog_read_write(thd))) { /* - This is an empty transaction commit (both the regular and xa), - or such transaction xa-prepare or - either one's statement having no effect on the transactional cache - as any prior to it. - The empty xa-prepare sinks in only when binlog is read-only. + This is an empty transaction/statement commit (both the regular and xa), + or the xa transaction is running xa-prepare or xa-commit while + its binlog branch is not read-write.
weird. how and why do you get into binlog_commit on XA PREPARE?
*/ cache_mngr->reset(false, true); THD_STAGE_INFO(thd, org_stage); @@ -7752,7 +7763,7 @@ MYSQL_BIN_LOG::write_transaction_to_binlog(THD *thd, entry.all= all; entry.using_stmt_cache= using_stmt_cache; entry.using_trx_cache= using_trx_cache; - entry.need_unlog= is_preparing_xa(thd); + entry.need_unlog= 0;
is `need_unlog` always 0 now?
ha_info= all ? thd->transaction->all.ha_list : thd->transaction->stmt.ha_list; entry.ro_1pc= is_ro_1pc; entry.end_event= end_ev; @@ -9365,18 +9381,102 @@ TC_LOG::run_commit_ordered(THD *thd, bool all) all ? thd->transaction->all.ha_list : thd->transaction->stmt.ha_list;
mysql_mutex_assert_owner(&LOCK_commit_ordered); + + if (!ha_info) + { + // slave or master's xa-"complete" external
what does it mean?
+ DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT || + thd->lex->sql_command == SQLCOM_XA_ROLLBACK); + + XID *xid= thd->lex->xid; + bool commit= thd->lex->sql_command == SQLCOM_XA_COMMIT; + struct xahton_st xaop= { xid, 1 }; + + plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton, + MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
is it for commit/rollback by xid?
+ xid_cache_delete(thd); + + DBUG_ASSERT(!thd->transaction->xid_state.is_explicit_XA()); + + return; + } + + // else + if (is_preparing_xa(thd)) + { + for (; ha_info; ha_info= ha_info->next()) + { + DBUG_ASSERT(all); + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_PREPARE); + DBUG_ASSERT(thd->transaction->xid_state.is_explicit_XA()); + + handlerton *ht= ha_info->ht(); + if (!ht->prepare) + { + push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN, + ER_GET_ERRNO, ER_THD(thd, ER_GET_ERRNO), + HA_ERR_WRONG_COMMAND, + ha_resolve_storage_engine_name(ht)); + continue; + } + else if (ht != binlog_hton) + { + /* leader only */ + //if (thd->binlog_setup_trx_data()->last_commit_pos_offset == 0) + // ht->prepare(ht, thd, all);
wait, what? why do you need a loop over all engines if you don't do anything there?
+ } + } + if (thd->rgi_slave && thd->rgi_slave->is_parallel_exec) + { + thd->transaction->xid_state.set_state_code(XA_PREPARED); + thd->transaction->xid_state.set_cache_element_to_recovered(); + thd->transaction->xid_state.xid_cache_element= 0;
when are you here?
+ } + } + else if (thd->transaction->xid_state.is_explicit_XA()) + { + for (; ha_info; ha_info= ha_info->next()) + { + DBUG_ASSERT(all); + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT || + thd->lex->sql_command == SQLCOM_XA_ROLLBACK); + DBUG_ASSERT(is_prepared_xa(thd) || + ((thd->lex->sql_command == SQLCOM_XA_ROLLBACK && + thd->transaction->xid_state.get_state_code() == XA_IDLE) || + thd->lex->xa_opt == XA_ONE_PHASE)); + DBUG_ASSERT(!thd->transaction->xid_state.is_recovered()); + + handlerton *ht= ha_info->ht(); + if (thd->lex->sql_command == SQLCOM_XA_COMMIT) + { + if (ht->commit_ordered) + ht->commit_ordered(ht, thd, all); + } + else if (ht != binlog_hton)
why not?
+ { + ht->rollback(ht, thd, all); + } + } + xid_cache_delete(thd); // trx is completed now + + DBUG_ASSERT(!thd->transaction->xid_state.is_explicit_XA()); + } + else + {
when are you here?
for (; ha_info; ha_info= ha_info->next()) { handlerton *ht= ha_info->ht(); if (!ht->commit_ordered) continue; ht->commit_ordered(ht, thd, all); DBUG_EXECUTE_IF("enable_log_write_upto_crash", { DBUG_SET_INITIAL("+d,crash_after_log_write_upto"); sleep(1000); }); DEBUG_SYNC(thd, "commit_after_run_commit_ordered"); } + } }
diff --git a/sql/xa.cc b/sql/xa.cc --- a/sql/xa.cc +++ b/sql/xa.cc @@ -319,9 +342,16 @@ static void xid_cache_delete(THD *thd, XID_cache_element *&element)
void xid_cache_delete(THD *thd, XID_STATE *xid_state) { - DBUG_ASSERT(xid_state->is_explicit_XA()); - xid_cache_delete(thd, xid_state->xid_cache_element); - xid_state->xid_cache_element= 0; + if (xid_state->is_explicit_XA())
why it should be done here and not in the caller?
+ { + xid_cache_delete(thd, xid_state->xid_cache_element); + xid_state->xid_cache_element= 0; + } +} + +void xid_cache_delete(THD *thd) +{ + xid_cache_delete(thd, &thd->transaction->xid_state); }
@@ -547,7 +577,14 @@ bool trans_xa_prepare(THD *thd) } else { - thd->transaction->xid_state.xid_cache_element->xa_state= XA_PREPARED; + if (thd->transaction->xid_state.xid_cache_element) + { + thd->transaction->xid_state.xid_cache_element->xa_state= XA_PREPARED; + } + else + { + DBUG_ASSERT(thd->rgi_slave->is_parallel_exec);
what do you mean by that? That parallel slave doesn't add XA transactions to the xid_cache?
+ } MYSQL_SET_TRANSACTION_XA_STATE(thd->m_transaction_psi, XA_PREPARED); res= thd->variables.pseudo_slave_mode || thd->slave_thread ? slave_applier_reset_xa_trans(thd) : 0;
Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org
participants (1)
-
Sergei Golubchik