Re: [PATCH] MDEV-34481 optimize away waiting for owned by prepared xa non-unique index record
From: Andrei <andrei.elkin@mariadb.com>
This work partly implements a ROW binlog format MDEV-33455 part that is makes non-unique-index-only table XA replication safe in RBR. Existence of at least one non-null unique index has always guaranteed safety (no hang to error).
Hi Andrei, A couple early comments on your MDEV-34481 patch, about things that affect non-XA parts of the code and will need to be changed:
diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 6419a58fbe4..b2321cfd865 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc
@@ -5444,20 +5449,33 @@ thd_need_wait_reports(const MYSQL_THD thd) transaction. */ extern "C" int -thd_rpl_deadlock_check(MYSQL_THD thd, MYSQL_THD other_thd) +thd_rpl_deadlock_check(MYSQL_THD thd, MYSQL_THD other_thd, + bool is_other_prepared, bool is_index_unique, + uint *flagged)
+ + /* + Return to the caller the fact of the non-unique index wait lock + conflicts with one of a prepared state transaction. + */ + if (!is_index_unique && rgi && rgi->is_row_event_execution()) { + if (is_other_prepared && !other_thd) + { + rgi->exec_flags |= 1 << rpl_group_info::HIT_BUSY_INDEX; + (*flagged)++; + } + } +
No, thd_rpl_deadlock_check() has a specific purpose related to handling deadlock between active transactions. Don't add this XA-specific and unrelated logic to this function. Instead, add a separate function that InnoDB can call in the specific case you need. That function then only needs the parameters `thd` and `flagged`, the others are not needed: extern "C" bool thd_xa_skip_lock_check(MYSQL_THD thd, uint *flagged) { return (thd->rgi_slave && thd->rgi_slave->us_row_event_execution()); } Then in InnoDB, call something like this: if (lock->trx->state == TRX_STATE_PREPARED && (!lock->index->is_unique() || lock->index->n_nullable)) count_non_unique_index_hit++; About the HIT_BUSY_INDEX, see here:
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index f3d01b7d9ed..166295ed263 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc
@@ -8149,11 +8154,20 @@ int Rows_log_event::find_row(rpl_group_info *rgi) HA_READ_KEY_EXACT)))) { DBUG_PRINT("info",("no record matching the key found in the table")); - if (error == HA_ERR_KEY_NOT_FOUND) - error= row_not_found_error(rgi); - table->file->print_error(error, MYF(0)); - table->file->ha_index_end(); - goto end; + if (error == HA_ERR_LOCK_WAIT_TIMEOUT && !is_index_unique && + (rgi->exec_flags & (1 << rpl_group_info::HIT_BUSY_INDEX))) + { + rgi->exec_flags &= ~(1 << rpl_group_info::HIT_BUSY_INDEX); + skip_first_compare= true; + }
Here, you are effectively making a decision in the storage engine to skip this record which is locked by an XA PREPAREd transaction. But you are communicating this back to the server in a very convoluted way, by introducing the rgi->exec_flags and setting HIT_BUSY_INDEX. This is not the way to communicate back from engine to server. Instead, simply return a different error code than HA_ERR_LOCK_WAIT_TIMEOUT from InnoDB that you can check for here. You can maybe use one of the existing HA_ERR_* from include/my_base.h, or add a new one if necessary. if (error == HA_ERR_<something>) skip_first_compare= true;
diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h index 3f24481839b..a9ec44b58cb 100644 --- a/sql/rpl_rli.h +++ b/sql/rpl_rli.h @@ -833,6 +833,15 @@ struct rpl_group_info RETRY_KILL_KILLED }; uchar killed_for_retry; + enum enum_exec_flags { + HIT_BUSY_INDEX= 0 + }; + /* + Being executed replication event's context. It could contain + notification from engine such as attempts to lock already acquired + index record. + */ + uint32 exec_flags;
... and then you don't need to add this. Also, are you aware that when you skip a record like this, you may be skipping the very record that needs to be changed? For example a case like this, with three transactions in order: XA START 'T1'; UPDATE my_table SET value=10 WHERE key=2 XA END 'T1'; XA PREPARE 'T1'; XA COMMIT 'T1'; XA START 'T2'; UPDATE my_table SET value=20 WHERE key=2; Transaction 'T2' may run speculatively in optimistic parallel replication after prepare of T1, but before commit of T1. Then it will skip the record key=2, IIUC. I didn't see any explanation of this case in the patch commit message or comments. Also I don't see this case tested in the test case, that needs to be added I believe. - Kristian.
participants (1)
-
Kristian Nielsen