Re: [Maria-developers] 3e2d297830b: MDEV-29013 ER_KEY_NOT_FOUND/lock timeout upon online alter with long unique
Hi, Nikita, On Jul 08, Nikita Malyavin wrote:
revision-id: 3e2d297830b (mariadb-10.6.1-511-g3e2d297830b) parent(s): c1feb8c3c32 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2022-07-05 15:20:04 +0300 message:
MDEV-29013 ER_KEY_NOT_FOUND/lock timeout upon online alter with long unique
1. ER_KEY_NOT_FOUND Some virtual columns were not updated because were not included in read_set to the moment of calculation.
In Row_logs_event::do_apply_event some fields are excluded based on m_cols value, the number of replicated rows.
the number of replicated *columns*
We can't rely on this. Basically, we'd be satisfied, if all columns were just set, at least for now.
2. ER_LOCK_WAIT_TIMEOUT This is a long unique specific problem.
Sometimes, lookup_handler is created for to->file. To properly free it, ha_reset should be called. It is usually done by calling close_thread_table, but ALTER TABLE makes it differently. Hence, a single ha_reset call is added to mysql_alter_table.
Also, the lifetime of lookup_handler is corrected: it's been allocated on thd->mem_root, which in case of online alter is event's mem_root, and can vary other ways. It is changed to table->mem_root, which is more appropriate: ha_reset is called only when table is closed, or in a few additional cases, which correspond to a statement end. So memory leaks are unlikely here.
diff --git a/sql/handler.cc b/sql/handler.cc index d71c0fc83a3..fbebfd27203 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -3212,7 +3212,7 @@ int handler::create_lookup_handler() handler *tmp; if (lookup_handler != this) return 0; - if (!(tmp= clone(table->s->normalized_path.str, table->in_use->mem_root))) + if (!(tmp= clone(table->s->normalized_path.str, &table->mem_root)))
that's wrong, the lifetime of a lookup handler is one statement, it's destroyed at the end. You cannot keep allocating new handlers on the table->mem_root for every statement.
return 1; lookup_handler= tmp; return lookup_handler->ha_external_lock(table->in_use, F_RDLCK); diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 89ca2a41937..2c19bf67586 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6007,6 +6007,11 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) if (m_width == table->s->fields && bitmap_is_set_all(&m_cols)) set_flags(COMPLETE_ROWS_F);
+ Rpl_table_data rpl_data{}; + if (rgi) rgi->get_table_data(table, &rpl_data); + + if (!rpl_data.copy_fields) + { /* Set tables write and read sets.
@@ -6027,17 +6027,19 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) bitmap_set_all(table->read_set); if (get_general_type_code() == DELETE_ROWS_EVENT || get_general_type_code() == UPDATE_ROWS_EVENT) bitmap_intersect(table->read_set,&m_cols);
bitmap_set_all(table->write_set);
/* WRITE ROWS EVENTS store the bitmap in m_cols instead of m_cols_ai */ MY_BITMAP *after_image= ((get_general_type_code() == UPDATE_ROWS_EVENT) ? &m_cols_ai : &m_cols); bitmap_intersect(table->write_set, after_image);
this->slave_exec_mode= slave_exec_mode_options; // fix the mode + }
I'm sorry, I don't quite understand. "some fields are excluded based on m_cols value" - what columns were excluded? What was the value of m_cols here?
table->rpl_write_set= table->write_set;
// Do event specific preparations error= do_before_row_operations(rli); @@ -6045,8 +6052,6 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) extra columns on the slave. In that case, do not force MODE_NO_AUTO_VALUE_ON_ZERO. */ - Rpl_table_data rpl_data{}; - if (rgi) rgi->get_table_data(table, &rpl_data); sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) thd->variables.sql_mode= (rpl_data.copy_fields ? saved_sql_mode : 0) diff --git a/sql/sql_table.cc b/sql/sql_table.cc index 29c4d09cd39..a3efcdd9785 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -11011,6 +11011,9 @@ do_continue:; !(table->file->ha_table_flags() & HA_REUSES_FILE_NAMES) && !(new_table->file->ha_table_flags() & HA_REUSES_FILE_NAMES)); + + // Close lookup_handler. + new_table->file->ha_reset();
please, put it not here, but inside the following thd->drop_temporary_table() call. It'll also fix Sanja's bug MDEV-26433. And please coordinate with him to put it on the same place so that git would detect a conflict and we could merge it to have only one reset call in that method, not two.
/* Close the intermediate table that will be the new table, but do not delete it! Even though MERGE tables do not have their children @@ -11854,8 +11857,11 @@ copy_data_between_tables(THD *thd, TABLE *from, TABLE *to,
// We'll be filling from->record[0] from row events bitmap_set_all(from->write_set); - // We restore bitmaps, because update event is going to mess up with them. + // Use all columns. UPDATE/DELETE events reset read_set and write_set to + // def_*_set after each row operation, so using all_set won't work. to->default_column_bitmaps(); + bitmap_set_all(&to->def_read_set); + bitmap_set_all(&to->def_write_set);
end_read_record(&info); init_read_record_done= false;
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei! Thanks for the review. I have added some changes according to your notes. Also, few answers to you below On Fri, 8 Jul 2022 at 01:50, Sergei Golubchik <serg@mariadb.org> wrote:
@@ -3212,7 +3212,7 @@ int handler::create_lookup_handler() handler *tmp; if (lookup_handler != this) return 0; - if (!(tmp= clone(table->s->normalized_path.str, table->in_use->mem_root))) + if (!(tmp= clone(table->s->normalized_path.str, &table->mem_root)))
that's wrong, the lifetime of a lookup handler is one statement, it's destroyed at the end. You cannot keep allocating new handlers on the table->mem_root for every statement.
Ouch! I have confused these two. Yes, I needed the statement lifetime. But the problem is I was setting a new mem_root for each event, just in case. That was a mistake. First, I had no case when something was allocated there. Second, per-event data should be anyway allocated differently, not on thd->mem_root. So I removed event_mem_root completely. There was nothing similar in replication code.
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc @@ -6007,6 +6007,11 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) if (m_width == table->s->fields && bitmap_is_set_all(&m_cols)) set_flags(COMPLETE_ROWS_F);
+ Rpl_table_data rpl_data{}; + if (rgi) rgi->get_table_data(table, &rpl_data); + + if (!rpl_data.copy_fields) + { /* Set tables write and read sets.
@@ -6027,17 +6027,19 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) bitmap_set_all(table->read_set); if (get_general_type_code() == DELETE_ROWS_EVENT || get_general_type_code() == UPDATE_ROWS_EVENT) bitmap_intersect(table->read_set,&m_cols);
bitmap_set_all(table->write_set);
/* WRITE ROWS EVENTS store the bitmap in m_cols instead of m_cols_ai */ MY_BITMAP *after_image= ((get_general_type_code() == UPDATE_ROWS_EVENT) ? &m_cols_ai : &m_cols); bitmap_intersect(table->write_set, after_image);
this->slave_exec_mode= slave_exec_mode_options; // fix the mode + }
I'm sorry, I don't quite understand. "some fields are excluded based on m_cols value" - what columns were excluded? What was the value of m_cols here?
Yes, it was a replicated columns bitmap, not value here:) -- Yours truly, Nikita Malyavin
participants (2)
-
Nikita Malyavin
-
Sergei Golubchik