Re: [Maria-developers] c4de76aeff8: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT

Hi, Aleksey! Note, it's a review of the `git diff 82e44d60d1e 8175ce8a5f1` so it's not only commit c4de76aeff8 On Apr 04, Aleksey Midenkov wrote:
Overall it's very good! I have few minor questions/comments, see below. But it's almost done, I think.
Hmm, and if DELETE HISTORY would auto-create new partitions, what output would one expect here? I mean, how one can see whether the test result is correct or wrong?
it's not really "increment", better say "don't auto-create"
why do you need a t2 table in this test?
what does this test case demonstrate?
why would a view matter in this test case?
eh... I don't think this is really "ok". As far as I remember, Multiupdate_prelocking_strategy knows what tables should be opened for writing and what for reading. Why would a new partition be created for t2?
because of blobs?
what do you test here? (there is no show create table in the test)
if you add a show create table here, what would it show?
I'd better avoid MENT references here. Let's only mention Jira issues that users can actually see
isn't it a test that you already have above? with x = x + 111
why did it add two partitions here?
and two here
why?
same comment about MENT
may be show binlog events? you know, to verify that DML events aren't written into binary log twice
Where was it set before your patch?
where is this ha_commit_trans(thd, false) called from? mysql_alter_table that adds a new partition?
is that right? Conceptually you need to test vers_conditions.delete_history for the table that owns this partition_info. Is it always last_table() ? I'd say it'd be more logical to do, like part_field_array[0]->table->pos_in_table_list
after such an overflow a table becomes essentially corrupted, rows are in a wrong partition. So new history partitions cannot be added anymore without reorganizing the last partition. How is it handled now? (it's just a question, not part of the review, as it's not something you were changing in your patch)
s/my/may/
would it not be DA_EMPTY without a reset? this is at the beginning of a statement, I'd expect it normally be DA_EMPTY here. What other DA states can you get here?
may be DBUG_ASSERT(thd->get_stmt_da()->is_ok());
shouldn't it be `* 28` ? you need a most pessimistic estimate to make sure you never miss to create a partition when one is needed. You can sometimes create one when it's not needed yet, but it's less of a problem.
+ DBUG_ASSERT(step.year); + return step.year * 86400 * 30 * 365;
that's one `* 30` too many :)
SQLCOM_LOAD with DUP_REPLACE?
I wonder, would it be possible, instead of introducing rgi_slave->current_event, just make row events set thd->lex->sql_command appropriately? For example, currently row events increment thd->status_var.com_stat[] each event for its own SQLCOM_xxx, it won't be needed if they'll just set thd->lex->sql_command.
a bit confusing comment, you left out stuff that are obvious to you but not to others. I'd suggest something like /* New partitions are not auto-created under LOCK TABLES (TODO: fix it) but rotation can still happen. */
what does this condition mean? When is ot_ctx->vers_create_count > 0 ?
why?
It doesn't seem you're using get_warning_info() anywhere
it'd be good it adding new empty VERSIONING partitions would always go this way, auto or not. but it's a separate task.
this way you skip trans_commit_stmt/trans_commit_implicit for ALTER TABLE ... ADD RANGE/LIST partition. is it always ok? A safer alternative would've been mysql_change_partitions(lpt, !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST)) but it's more complex, so I'd prefer your variant, if we can be sure that it doesn't break anything.
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei, On Sun, Apr 4, 2021 at 2:48 PM Sergei Golubchik <serg@mariadb.org> wrote:
By PARTITIONS value. Stale test case, fixed.
Actually I like "increment" more. "Auto-create" overcomplicates phrases: --echo # Increment from 3 to 5 --echo # Increment from 3 to 6, manual names, LOCK TABLES --echo # Multiple increments in single command Besides "increment" is correct because PARTITIONS number is incremented.
t1 is not incremented in case of insert into t2 select * from t1;
Fixed stale test case.
Made it matter.
It knows this after tables are opened. Look at handle_end(), specifically mysql_handle_derived(), handle_derived(), setup_fields_with_no_wrap() and check_fields(). I believe all these calls are required to get proper get_table_map(). To get this working properly there must be 2-staged open tables, something like PS does.
Because of blobs.
This was MENT-675 Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt' failed I moved it to MDEV-25339 and added a comment.
Added. +show create table t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `x` int(11) DEFAULT NULL +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO +PARTITIONS 3
Fixed.
Not fully. That was the original bug posted by Elena (MDEV-25339). And this modification is posted by me (MDEV-23642).
Because of this condition /* When hist_part is almost full LOCK TABLES my overflow the partition as we can't add new partitions under LOCK TABLES. Reserve one more for this case. */ if (auto_hist && create_count < 2 && thd->lex->sql_command == SQLCOM_LOCK_TABLES && vers_info->hist_part->id + 1 == vers_info->now_part->id) create_count++;
Same here.
Actually this is a typo in the test case: year 2000 vs 2001. Fixed.
Fixed.
Added.
#1 0x0000000000a99d04 in partition_info::set_partition_bitmaps (this=0x7fc214b7af80, partition_names=0x0 ) at ../src/sql/partition_info.cc:274 #2 0x0000000001087da4 in ha_partition::change_partitions_to_open (this=0x7fc214b787a0, partition_names=0 x0) at ../src/sql/ha_partition.cc:8644 #3 0x0000000000822c01 in set_partitions_as_used (tl=0x7fc214015988, t=0x7fc214a61fb8) at ../src/sql/sql_ base.cc:1562 #4 0x00000000008222a7 in open_table (thd=0x7fc214000d48, table_list=0x7fc214015988, ot_ctx=0x7fc23c0a19c 8) at ../src/sql/sql_base.cc:1990 #5 0x0000000000827497 in open_and_process_table (thd=0x7fc214000d48, tables=0x7fc214015988, counter=0x7f c23c0a1acc, flags=0, prelocking_strategy=0x7fc23c0a1b40, has_prelocking_list=false, ot_ctx=0x7fc23c0a19c8 ) at ../src/sql/sql_base.cc:3801 #6 0x0000000000825de0 in open_tables (thd=0x7fc214000d48, options=..., start=0x7fc23c0a1ae0, counter=0x7 fc23c0a1acc, flags=0, prelocking_strategy=0x7fc23c0a1b40) at ../src/sql/sql_base.cc:4275
Looks like garbage. Removed SUB_STMT_AUTO_HIST.
It is always last_table() because DELETE HISTORY works with one table.
I'd say it'd be more logical to do, like
part_field_array[0]->table->pos_in_table_list
Fixed.
Fixed.
You are right. Removed.
Added.
Good catch. Made it calendar-correct.
Added.
Do you think this is a quick refactoring? I had the same idea when I did this. And probably I tried to do that quickly already. Added TODO comment.
Fixed.
When we already requested backoff action.
Because of MDEV-23642.
Reverted.
Added comment.
The condition is if (copy_data && mysql_trans_commit_alter_copy_data(thd)) Data is not needed to be copied in ADD RANGE/LIST partition, is it? We can be sure only from testing period until GA.
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! On Apr 07, Aleksey Midenkov wrote:
Sure. "Increment the number of partitions", this is fine. "Auto create partitions" is also fine. "Increment partitions" is meaningless.
add a comment please
You mean, a new partition is created for t2 in normal mode and not created in --ps? What if you'll also add `&& table_list->updating` to your condition in TABLE::vers_need_hist_part() ?
Hmm. I thought the logic is - normally for UPDATE/DELETE you create a new history partition when you need to write into it. And under LOCK TABLES you create the next partition, just in case, even if the current partition is still fine. Which kind of makes sense, at least until we'll fix that MDEV about creating partitions under LOCK TABLES. But this doesn't mean LOCK TABLES should create two partitions, I think. If UPDATE/DELETE needs to write into a new partition - you create it, LOCK TABLES or not. But I'd rather say that if it's just created, then it's not quite "almost full" (it's more like "almost empty", even "completely empty"). So there's no need to create a second partition at the same time.
okay. Why is it not happening now?
You've seemed to miss this question.
With a test, I hope :)
Right, may be not quick enough for now :(
Could you describe an example? I thought if there are two tables in a list and both need to add a new partition (like in a multi-update), then backoff will happen on the first one, so open_table() won't reach the second with ot_ctx->vers_create_count > 0.
I see. I generally prefer to avoid thd->clear_error() because of its non discriminative nature. It might be ok here, but then, please add an assert before lock_table_names: DBUG_ASSERT(!m_thd->get_stmt_da()->is_set());
well, we can never be sure after any amount of testing. You're right, the data, indeed, won't be copied in ADD RANGE/LIST partition. But perhaps InnoDB can still open a transaction there? As far as I understand a transaction is started on ha_innobase::external_lock(). Will it happen here? Or start_stmt() under LOCK TABLES. Is that possible? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei, On Thu, Apr 8, 2021 at 3:30 PM Sergei Golubchik <serg@mariadb.org> wrote:
It is obvious from the context that we are talking about the number, not partitions themselves. Treat "partitions" as PARTITIONS keyword and the increment is attached to a number right next to it. That's quite a sense, isn't it?
Added.
Yes.
What if you'll also add `&& table_list->updating` to your condition in TABLE::vers_need_hist_part() ?
If `Multiupdate_prelocking_strategy::handle_end()` was not yet executed how `updating` can help? No, it doesn't help.
Right. Fixed. Actually it was bigger bug: it created 1 partition and when reopened created 1 more partition.
Because it is happening at line 1990 in open_table() and vers_set_hist_part() is happening earlier.
Sorry. They are added via reorganize. I.e. manual ADD PARTITIONS does data copy. Added test "Partition overflow error and manual fix" and found pruning boundary bug in 10.3 (see FIXME).
Yes.
open_table() is called again from recover_from_failed_open(). vers_create_count is reset in recover_from_failed_open(): 3306 case OT_ADD_HISTORY_PARTITION: 3307 { 3308 result= false; 3309 TABLE *table= open_ltable(m_thd, m_failed_table, TL_WRITE, 3310 MYSQL_OPEN_HAS_MDL_LOCK | MYSQL_OPEN_IGNORE_LOGGING_FORMAT); .... 3323 vers_create_count= 0; 3324 break; 3325 } 3326 case OT_BACKOFF_AND_RETRY: 3327 case OT_REOPEN_TABLES: 3328 case OT_NO_ACTION: 3329 In our scheme open_table() is called 3 times: 1. before partitions increment; 2. inside OT_ADD_HISTORY_PARTITION; 3. after partitions increment.
Added.
mysql_execute_command() does trans_commit_stmt() and trans_commit_implicit().
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! On Apr 09, Aleksey Midenkov wrote:
No, I think it's a meaningless combination of words. But let's ask native speakers, shall we?
Okay. I think I know how to fix it, but let's do it after the main thing is pushed. Users are rather sensitive to cases when a non-updatable table in multi-update gets treated like updatable, we've had a bunch of bug reports about it over the years. Wrong privileges (UPDATE privilege should not be checked), wrong locks (the table should never be write-locked, a concurrent write lock should not block multi-update), triggers (should not be opened, so any problems with triggers, like privileges or missing tables, should not affect multi-update), and so on. So I suppose we'll have to fix this one too. This is how it could be done - vers_need_hist_part should check for table_list->updating. It's always FALSE in multi-update, so it won't be auto-creating new partitions at all for multi-update. But! after handle_end() (or, better, in handle_end()) we'll have another pass over tables and will implement a fallback-and-retry from handle_end(). Let's do it as a separate commit after the feature is pushed.
I'd suggest to rewrite as if (ot_ctx->vers_create_count) /* already tried to add a partition to this table and failed (because of e.g. lock conflict). Don't try again */ else if (table->vers_need_hist_part(thd, table_list)) ... and here your code ... Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei, On Sat, Apr 10, 2021 at 11:12 PM Sergei Golubchik <serg@mariadb.org> wrote:
The ability to imply things and make special terms is the freedom of any language I suppose. Slang (or "special language") helps us to describe technical entities more concisely.
The comment is wrong. Added correct comment.
Sorry, I had to revert that back. The original form was selected not by coincidence. This data in TABLE is not yet initialized. I also have to revert DBUG_ASSERT() you suggested which fails drop_table_force on winx64-debug and add error code condition instead: @@ -3268,12 +3271,12 @@ Open_table_context::recover_from_failed_open() case OT_DISCOVER: case OT_REPAIR: case OT_ADD_HISTORY_PARTITION: - DBUG_ASSERT(!m_thd->get_stmt_da()->is_set()); result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table, NULL, get_timeout(), 0); if (result) { - if (m_action == OT_ADD_HISTORY_PARTITION) + if (m_action == OT_ADD_HISTORY_PARTITION && + m_thd->get_stmt_da()->sql_errno() == ER_LOCK_WAIT_TIMEOUT) { // MDEV-23642 Locking timeout caused by auto-creation affects original DML m_thd->clear_error();
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! On Apr 11, Aleksey Midenkov wrote:
:) I feel it's just incorrect use of words and I shudder every time I see it. True, I know that my feeling of the language is not perfect, I'm not a native speaker. But it works both ways, you're not a native speaker either. So, if you don't want to change to correct English, there's no point in arguing, let's just ask a native speaker.
The comment describes what I've seen in gdb, looking how ot_ctx->vers_create_count could be non-zero. It was your update t1 set x= x + 122 where x = 1 test case, "Locking timeout caused by auto-creation affects original DML" You try to auto-create, it fails because of the lock timeout, you clear_error(), the table is reopened. But because vers_create_count is not reset, it does not try to auto-create again, thus avoiding an infinite loop. What do you thing was wrong there?
What happens on winx64-debug? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei, On Sun, Apr 11, 2021 at 7:21 PM Sergei Golubchik <serg@mariadb.org> wrote:
It was a conscious choice. Quantitative characteristic is implied.
Okay, it is partly true. Failing case is the secondary one.
Please look yourself at http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/24345/ste...
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! On Apr 11, Aleksey Midenkov wrote:
It was a conscious choice. Quantitative characteristic is implied.
This isn't going anywhere. I've asked Ian (KB technical writer) for his opinion and we'll do what he'll say.
May be, but it's the only case where ot_ctx->vers_create_count>0 there. So if there's a "primary" case too, your tests never trigger it. Add a test for it, please.
Thanks. But either I don't know how to read it correctly or it doesn't have enough info. Could you explain what happens on this builder? What error do you see in stmt_da in recover_from_failed_open() that triggers the assert? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei, On Mon, Apr 12, 2021 at 3:14 PM Sergei Golubchik <serg@mariadb.org> wrote:
Of course. And because of that I prefer to write shorter comments.
I've asked Ian (KB technical writer) for his opinion and we'll do what he'll say.
So he says it is not meaningless. I am and was upset by your disagreement with my taste. If you are totally against the form I use I'm going to remove the comments. But when we are talking about public documents of course I'm going to make it better understandable, and he is right regarding that particular place.
Sorry, my bad, vers_create_count was not passed to open_ltable(), there is a different ot_ctx. Added your comment as is.
Why are you asking me to do investigations on your hypotheses out of the task scope? It was just a hypothesis after all! Though you can figure out error code (ER_NO_SUCH_TABLE) from the test case: create table t2(a int not null) engine=archive; flush tables; --error 1 --remove_file $DATADIR/test/t2.frm select * from t2; flush tables; --remove_file $DATADIR/test/t2.ARZ --error ER_NO_SUCH_TABLE select * from t2; There are several tests failing though...
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! On Apr 12, Aleksey Midenkov wrote:
I perceived it simply as incorrect, broken English. So I asked Ian's opinion. Also I believe we should avoid using different terminology internally and externally unless there's a good reason to do it. That's all, not a question of a taste.
I meant that you do thd->clear_error(). It's non-discriminative, it clears all errors, no matter what was the error and where it was created. This is not always correct - you, probably, remember a bug with virtual columns when a vcol expression was parsed in the error handling branch and clear_error() removed the original error. Generally, one should avoid thd->clear_error() and push an error handler into the thd. But my hypothesis was that what you're doing here is safe, because there cannot be any error at this point. If my hypothesis is wrong, then you cannot use thd->clear_error(). Or, perhaps, my hypothesis is correct and there should be no error here, meaning some other code is wrong. I asked to be able to differentiate between these two possibilities. If my hypothesis is wrong, you cannot use thd->clear_error(). If my hypothesis is correct, some other code needs fixing.
This is, I suppose, archive discovery. And it's not Win64-specific. Ok, I see now that my hypothesis was indeed wrong, and this method indeed expects an error and has a thd->clear_error() on pretty much every code path. It'd be easier to read if it'd had one single thd->clear_error() at the beginning, but it's beyond the scope of MDEV-17554. Sorry for confusion, this change of yours is fine. I believe that was my last comment on these commits, please tell me when the new branch is ready. This converges rather quickly now, good! Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi, Sergei! On Mon, Apr 12, 2021 at 8:46 PM Sergei Golubchik <serg@mariadb.org> wrote:
I believe you are right in your motivation. But the emphasis in the test is on increment: we are checking the number in PARTITIONS clause. I removed all the comments regarding "increment" from the test.
Yes, I should remember there are no other debug buildbot nodes for a developer. Though there should be.
The new branch is ready.
-- All the best, Aleksey Midenkov @midenok

Responding to the portion of the thread extracted below, "INSERT, INSERT .. SELECT don't increment partitions" is not meaningless, but both "INSERT, INSERT .. SELECT don't increment the number of partitions" and "INSERT, INSERT .. SELECT don't auto-create partitions" are better. It comes down to emphasis. Is the emphasis on the automatic creation, or on the increase in the number? From what I can understand of the topic, the emphasis most rests on automatic creation, so I'd go with "INSERT, INSERT .. SELECT don't auto-create partitions"
participants (3)
-
Aleksey Midenkov
-
Ian Gilfillan
-
Sergei Golubchik