Re: 88a77860a8b: MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
Hi, Nikita, On Aug 08, Nikita Malyavin wrote:
revision-id: 88a77860a8b (mariadb-11.0.1-194-g88a77860a8b) parent(s): 73a677f0306 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-07 22:45:11 +0400 message:
MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent, Write_rows_log_event::do_before_row_operations had reset thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging, and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit early and created a new PSI transaction. Hence the difference with non-online ALTER.
Nice find. I vaguely believe I've seen in one of the commits I've reviewed a change that makes every row event set thd->lex->sql_command to the corresponding SQLCOM_INSERT/UPDATE/DELETE. But be it was in an unpushed commit. Anyway, I think it's a bit risky to assume that row events never change thd->lex->sql_command (and never will). Perhaps you should simply restore it after the applying part? Like --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage { Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
yes, extra safety can be worth it. I'd also add an assertion on top of that: --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_ALTER_TABLE); + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage { That is, if we'll catch it while testing, we'll fix it. And users will be protected form the possible corner cases we didn't think of On Tue, 8 Aug 2023 at 17:55, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
On Aug 08, Nikita Malyavin wrote:
revision-id: 88a77860a8b (mariadb-11.0.1-194-g88a77860a8b) parent(s): 73a677f0306 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-07 22:45:11 +0400 message:
MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent, Write_rows_log_event::do_before_row_operations had reset thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging, and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit early and created a new PSI transaction. Hence the difference with non-online ALTER.
Nice find.
I vaguely believe I've seen in one of the commits I've reviewed a change that makes every row event set thd->lex->sql_command to the corresponding SQLCOM_INSERT/UPDATE/DELETE. But be it was in an unpushed commit.
Anyway, I think it's a bit risky to assume that row events never change thd->lex->sql_command (and never will). Perhaps you should simply restore it after the applying part? Like
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- Yours truly, Nikita Malyavin
Hi, Nikita, On Aug 08, Nikita Malyavin wrote:
yes, extra safety can be worth it. I'd also add an assertion on top of that:
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_ALTER_TABLE); + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
That is, if we'll catch it while testing, we'll fix it. And users will be protected form the possible corner cases we didn't think of
Well, that's not exactly what I meant. I meant that row event changing thd->lex->sql_command might be intentional and not something that needs to be fixed. So just reset it back to SQLCOM_ALTER_TABLE in copy_data_between_tables() and that's it. Without assert and without avoiding SQLCOM_REPLACE.
On Tue, 8 Aug 2023 at 17:55, Sergei Golubchik wrote:
Hi, Nikita,
On Aug 08, Nikita Malyavin wrote:
revision-id: 88a77860a8b (mariadb-11.0.1-194-g88a77860a8b) parent(s): 73a677f0306 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-07 22:45:11 +0400 message:
MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent, Write_rows_log_event::do_before_row_operations had reset thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging, and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit early and created a new PSI transaction. Hence the difference with non-online ALTER.
Nice find.
I vaguely believe I've seen in one of the commits I've reviewed a change that makes every row event set thd->lex->sql_command to the corresponding SQLCOM_INSERT/UPDATE/DELETE. But be it was in an unpushed commit.
Anyway, I think it's a bit risky to assume that row events never change thd->lex->sql_command (and never will). Perhaps you should simply restore it after the applying part? Like
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Tue, 8 Aug 2023 at 21:09, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
On Aug 08, Nikita Malyavin wrote:
yes, extra safety can be worth it. I'd also add an assertion on top of that:
--- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -12181,6 +12181,7 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *> if (error) from->s->tdc->flush_unused(1); // to free the binlog to->pos_in_table_list= NULL; // Safety + DBUG_ASSERT(thd->lex->sql_command == SQLCOM_ALTER_TABLE); + thd->lex->sql_command= SQLCOM_ALTER_TABLE; } else if (online) // error was on copy stage {
That is, if we'll catch it while testing, we'll fix it. And users will be protected form the possible corner cases we didn't think of
Well, that's not exactly what I meant. I meant that row event changing thd->lex->sql_command might be intentional and not something that needs to be fixed.
So just reset it back to SQLCOM_ALTER_TABLE in copy_data_between_tables() and that's it. Without assert and without avoiding SQLCOM_REPLACE.
It doesn't just avoid sql_command reset: look through the slave_exec_mode usage -- it does a few things that can have side-effects. Better to avoid it altogether and keep our execution flow away from interactions with replication switches. The assertion I've added doesn't fail, so no, only IDEMPOTENT inserts do this. -- Yours truly, Nikita Malyavin
Hi, Nikita, On Aug 08, Nikita Malyavin wrote:
It doesn't just avoid sql_command reset: look through the slave_exec_mode usage -- it does a few things that can have side-effects. Better to avoid it altogether and keep our execution flow away from interactions with replication switches.
ok
The assertion I've added doesn't fail, so no, only IDEMPOTENT inserts do this.
Yes, I didn't mean other events do it. I only said that it's fragile to assume they won't do it in the future (and that I've seen an unpushed commit that changes it). But ok, if it'll change in the future your assert will catch it. By the way, should be it SQLCOM_ALTER_TABLE? Or safer to save the old value and restore it? Can OPTIMIZE or something come down this code path? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Wed, 9 Aug 2023 at 15:28, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
On Aug 08, Nikita Malyavin wrote:
It doesn't just avoid sql_command reset: look through the slave_exec_mode usage -- it does a few things that can have side-effects. Better to avoid it altogether and keep our execution flow away from interactions with replication switches.
ok
The assertion I've added doesn't fail, so no, only IDEMPOTENT inserts do this.
Yes, I didn't mean other events do it. I only said that it's fragile to assume they won't do it in the future (and that I've seen an unpushed commit that changes it).
Maybe there's a way to avoid it? In system versioning we first used sql_command to determine ha_delete_row role, but recently we began using trg_event_map for that, which also simplified the code. Maybe some detection improvement is also possible in the case you mention?
But ok, if it'll change in the future your assert will catch it.
By the way, should be it SQLCOM_ALTER_TABLE? Or safer to save the old value and restore it? Can OPTIMIZE or something come down this code path?
Thanks for the pointer! I'll better write it in a safer way. Online is disabled for OPTIMIZE at this moment. Does it make sense otherwise? I know nothing about it. And I'm not aware of any other command that can go there.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- Yours truly, Nikita Malyavin
participants (2)
-
Nikita Malyavin
-
Sergei Golubchik