Hi, Nikita, On Aug 08, Nikita Malyavin wrote:
yes, extra safety can be worth it. I'd also add an assertion on top of that:
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_ALTER_TABLE); + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
That is, if we'll catch it while testing, we'll fix it. And users will be protected form the possible corner cases we didn't think of
Well, that's not exactly what I meant. I meant that row event changing thd->lex->sql_command might be intentional and not something that needs to be fixed. So just reset it back to SQLCOM_ALTER_TABLE in copy_data_between_tables() and that's it. Without assert and without avoiding SQLCOM_REPLACE.
On Tue, 8 Aug 2023 at 17:55, Sergei Golubchik wrote:
Hi, Nikita,
On Aug 08, Nikita Malyavin wrote:
revision-id: 88a77860a8b (mariadb-11.0.1-194-g88a77860a8b) parent(s): 73a677f0306 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-07 22:45:11 +0400 message:
MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent, Write_rows_log_event::do_before_row_operations had reset thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging, and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit early and created a new PSI transaction. Hence the difference with non-online ALTER.
Nice find.
I vaguely believe I've seen in one of the commits I've reviewed a change that makes every row event set thd->lex->sql_command to the corresponding SQLCOM_INSERT/UPDATE/DELETE. But be it was in an unpushed commit.
Anyway, I think it's a bit risky to assume that row events never change thd->lex->sql_command (and never will). Perhaps you should simply restore it after the applying part? Like
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org