Hi Sergei, On Thu, Apr 8, 2021 at 3:30 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Aleksey!
On Apr 07, Aleksey Midenkov wrote:
+ +--echo # INSERT, INSERT .. SELECT don't increment partitions
it's not really "increment", better say "don't auto-create"
Actually I like "increment" more. "Auto-create" overcomplicates phrases:
--echo # Increment from 3 to 5 --echo # Increment from 3 to 6, manual names, LOCK TABLES --echo # Multiple increments in single command
Besides "increment" is correct because PARTITIONS number is incremented.
Sure. "Increment the number of partitions", this is fine. "Auto create partitions" is also fine. "Increment partitions" is meaningless.
It is obvious from the context that we are talking about the number, not partitions themselves. Treat "partitions" as PARTITIONS keyword and the increment is attached to a number right next to it. That's quite a sense, isn't it?
+insert into t1 values (1); + +create or replace table t2 (y int); +insert into t2 values (2); + +insert into t1 select * from t2; +insert into t2 select * from t1;
why do you need a t2 table in this test?
t1 is not incremented in case of
insert into t2 select * from t1;
add a comment please
Added.
+--echo # Here t2 is incremented too, but not updated +set timestamp= unix_timestamp('2000-01-01 03:00:00'); +update t1, t2 set t1.x= 0 where t1.x< t2.y; +--replace_result $default_engine DEFAULT_ENGINE +show create table t1; +# Multiupdate_prelocking_strategy::handle_end() is processed after table open. +# For PS it is possible to skip unneeded auto-creation because the above happens at +# prepare stage and auto-creation is done at execute stage. +--replace_result $default_engine DEFAULT_ENGINE 'PARTITIONS 4' 'PARTITIONS ok' 'PARTITIONS 5' 'PARTITIONS ok'
eh... I don't think this is really "ok". As far as I remember, Multiupdate_prelocking_strategy knows what tables should be opened for writing and what for reading. Why would a new partition be created for t2?
It knows this after tables are opened. Look at handle_end(), specifically mysql_handle_derived(), handle_derived(), setup_fields_with_no_wrap() and check_fields(). I believe all these calls are required to get proper get_table_map().
To get this working properly there must be 2-staged open tables, something like PS does.
You mean, a new partition is created for t2 in normal mode and not created in --ps?
Yes.
What if you'll also add `&& table_list->updating` to your condition in TABLE::vers_need_hist_part() ?
If `Multiupdate_prelocking_strategy::handle_end()` was not yet executed how `updating` can help? No, it doesn't help.
diff --git a/mysql-test/suite/versioning/r/partition.result b/mysql-test/suite/versioning/r/partition.result index 2a277b1c2ea..8369b40d98c 100644 --- a/mysql-test/suite/versioning/r/partition.result +++ b/mysql-test/suite/versioning/r/partition.result @@ -1236,27 +1270,752 @@ t1 CREATE TABLE `t1` ( ... +set timestamp= unix_timestamp('2000-01-01 03:00:00'); +affected rows: 0 +lock tables t1 write; +affected rows: 0 +execute s; +affected rows: 1 +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0 +execute s; +affected rows: 1 +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0 +show create table t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `x` int(11) DEFAULT NULL +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO +PARTITIONS 6
why did it add two partitions here?
Because of this condition
/* When hist_part is almost full LOCK TABLES my overflow the partition as we can't add new partitions under LOCK TABLES. Reserve one more for this case. */ if (auto_hist && create_count < 2 && thd->lex->sql_command == SQLCOM_LOCK_TABLES && vers_info->hist_part->id + 1 == vers_info->now_part->id) create_count++;
Hmm. I thought the logic is - normally for UPDATE/DELETE you create a new history partition when you need to write into it. And under LOCK TABLES you create the next partition, just in case, even if the current partition is still fine. Which kind of makes sense, at least until we'll fix that MDEV about creating partitions under LOCK TABLES.
But this doesn't mean LOCK TABLES should create two partitions, I think. If UPDATE/DELETE needs to write into a new partition - you create it, LOCK TABLES or not. But I'd rather say that if it's just created, then it's not quite "almost full" (it's more like "almost empty", even "completely empty"). So there's no need to create a second partition at the same time.
Right. Fixed. Actually it was bigger bug: it created 1 partition and when reopened created 1 more partition.
+ --source include/rpl_end.inc diff --git a/sql/ha_partition.h b/sql/ha_partition.h index 60a2d7f6762..7ade4419c22 100644 --- a/sql/ha_partition.h +++ b/sql/ha_partition.h @@ -1610,7 +1610,7 @@ class ha_partition :public handler for (; part_id < part_id_end; ++part_id) { handler *file= m_file[part_id]; - DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id)); + bitmap_set_bit(&(m_part_info->read_partitions), part_id);
Where was it set before your patch?
#1 in partition_info::set_partition_bitmaps () at ../src/sql/partition_info.cc:274 #2 in ha_partition::change_partitions_to_open () at ../src/sql/ha_partition.cc:8644 #3 in set_partitions_as_used () at ../src/sql/sql_base.cc:1562 #4 in open_table () at ../src/sql/sql_base.cc:1990 #5 in open_and_process_table () at ../src/sql/sql_base.cc:3801 #6 in open_tables () at ../src/sql/sql_base.cc:4275
okay. Why is it not happening now?
Because it is happening at line 1990 in open_table() and vers_set_hist_part() is happening earlier.
after such an overflow a table becomes essentially corrupted, rows are in a wrong partition. So new history partitions cannot be added anymore without reorganizing the last partition.
How is it handled now? (it's just a question, not part of the review, as it's not something you were changing in your patch)
You've seemed to miss this question.
Sorry. They are added via reorganize. I.e. manual ADD PARTITIONS does data copy. Added test "Partition overflow error and manual fix" and found pruning boundary bug in 10.3 (see FIXME).
+bool TABLE::vers_need_hist_part(const THD *thd, const TABLE_LIST *table_list) const +{ +#ifdef WITH_PARTITION_STORAGE_ENGINE + if (part_info && part_info->part_type == VERSIONING_PARTITION && + !table_list->vers_conditions.delete_history && + !thd->stmt_arena->is_stmt_prepare() && + table_list->lock_type >= TL_WRITE_ALLOW_WRITE && + table_list->mdl_request.type >= MDL_SHARED_WRITE && + table_list->mdl_request.type < MDL_EXCLUSIVE) + { + switch (thd->lex->sql_command) + {
SQLCOM_LOAD with DUP_REPLACE?
Added.
With a test, I hope :)
Yes.
+ if (thd->rgi_slave && thd->rgi_slave->current_event && thd->lex->sql_command == SQLCOM_END)
I wonder, would it be possible, instead of introducing rgi_slave->current_event, just make row events set thd->lex->sql_command appropriately? For example, currently row events increment thd->status_var.com_stat[] each event for its own SQLCOM_xxx, it won't be needed if they'll just set thd->lex->sql_command.
Do you think this is a quick refactoring? I had the same idea when I did this. And probably I tried to do that quickly already. Added TODO comment.
Right, may be not quick enough for now :(
@@ -2032,6 +2083,20 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx) tc_add_table(thd, table); }
+ if (!ot_ctx->vers_create_count &&
what does this condition mean? When is ot_ctx->vers_create_count > 0 ?
When we already requested backoff action.
Could you describe an example?
I thought if there are two tables in a list and both need to add a new partition (like in a multi-update), then backoff will happen on the first one, so open_table() won't reach the second with ot_ctx->vers_create_count > 0.
open_table() is called again from recover_from_failed_open(). vers_create_count is reset in recover_from_failed_open(): 3306 case OT_ADD_HISTORY_PARTITION: 3307 { 3308 result= false; 3309 TABLE *table= open_ltable(m_thd, m_failed_table, TL_WRITE, 3310 MYSQL_OPEN_HAS_MDL_LOCK | MYSQL_OPEN_IGNORE_LOGGING_FORMAT); .... 3323 vers_create_count= 0; 3324 break; 3325 } 3326 case OT_BACKOFF_AND_RETRY: 3327 case OT_REOPEN_TABLES: 3328 case OT_NO_ACTION: 3329 In our scheme open_table() is called 3 times: 1. before partitions increment; 2. inside OT_ADD_HISTORY_PARTITION; 3. after partitions increment.
+ table->vers_need_hist_part(thd, table_list)) + { + ot_ctx->vers_create_count= table->part_info->vers_set_hist_part(thd, true); + if (ot_ctx->vers_create_count) + { + MYSQL_UNBIND_TABLE(table->file); + tc_release_table(table); + ot_ctx->request_backoff_action(Open_table_context::OT_ADD_HISTORY_PARTITION, + table_list); + DBUG_RETURN(TRUE); + } + } + if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) && table->s->table_category < TABLE_CATEGORY_INFORMATION) { @@ -3172,10 +3239,18 @@ Open_table_context::recover_from_failed_open() break; case OT_DISCOVER: case OT_REPAIR: - if ((result= lock_table_names(m_thd, m_thd->lex->create_info, - m_failed_table, NULL, - get_timeout(), 0))) + case OT_ADD_HISTORY_PARTITION: + result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table, + NULL, get_timeout(), 0); + if (result) + { + if (m_action == OT_ADD_HISTORY_PARTITION) + { + m_thd->clear_error();
why?
Because of MDEV-23642.
I see. I generally prefer to avoid thd->clear_error() because of its non discriminative nature.
It might be ok here, but then, please add an assert before lock_table_names:
DBUG_ASSERT(!m_thd->get_stmt_da()->is_set());
Added.
+ result= false; + } @@ -7281,7 +7291,7 @@ uint fast_alter_partition_table(THD *thd, TABLE *table, write_log_add_change_partition(lpt) || ERROR_INJECT_CRASH("crash_add_partition_4") || ERROR_INJECT_ERROR("fail_add_partition_4") || - mysql_change_partitions(lpt) || + mysql_change_partitions(lpt, false) ||
this way you skip trans_commit_stmt/trans_commit_implicit for ALTER TABLE ... ADD RANGE/LIST partition. is it always ok?
A safer alternative would've been
mysql_change_partitions(lpt, !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
but it's more complex, so I'd prefer your variant, if we can be sure that it doesn't break anything.
The condition is
if (copy_data && mysql_trans_commit_alter_copy_data(thd))
Data is not needed to be copied in ADD RANGE/LIST partition, is it? We can be sure only from testing period until GA.
well, we can never be sure after any amount of testing.
You're right, the data, indeed, won't be copied in ADD RANGE/LIST partition. But perhaps InnoDB can still open a transaction there? As far as I understand a transaction is started on ha_innobase::external_lock(). Will it happen here? Or start_stmt() under LOCK TABLES. Is that possible?
mysql_execute_command() does trans_commit_stmt() and trans_commit_implicit().
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok