Re: [Maria-developers] 9bf4b92cbc5: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT

Hi, Aleksey! On Jun 29, Aleksey Midenkov wrote:
I still think that AUTO_INCREMENT here looks rather out-of-place, I don't know how to explain to users why AUTO_INCREMENT is allowed here. But if we'll have it, we'll have to support it here for a long time for compatibility reasons. "Because somebody might be already using it" I'd suggest to support only AUTO here. Or, if the parser can do it easily, a much more readable syntax would be create table t1 (x int) with system versioning auto partition by system_time interval 1 hour; this would be the best, easy to read, matches the documentation (where it could be called "auto partitioning"). If it won't cause any difficult parser problems, I'd prefer us to use that.
INSERT ... ON DUPLICATE KEY UPDATE ?
Why is that? I'd expect your auto-adding of partitions to happen in the context of a session, using session time zone. Is it for replication?
good riddance, it didn't make much sense anyway
could you also, please, fix (in a separate commit) ha_partition::part_records() to take a proper partition_element* argument, not void* ?
let's not all it auto_inc. it'll add noise to every grep for auto_inc issues. What about auto_add ?
Questionable. What does it solve?
but here you don't "reserve at least one history partition" Why INTERVAL is different?
Is it needed? You've just started using a new Diagnostics_area, the status should be DA_EMPTY here, nothing to overwrite.
Is it needed? You've just started using a new Diagnostics_area, the status should be DA_EMPTY here.
better use StringBuffer<...> query; because you're doing lots of reallocs now.
this is wrong, the table and db names can have backticks inside. use append_identifier() instead.
Should it be binlogged at all? May be just leave it to the slave to auto-add the partition as needed? Looks a bit suspicious now, you log an ALTER TABLE possibly in the middle of a transaction. It will be replayed differently, not as auto-adding, in particular, it will commit. So, possibly different results on the slave, perhaps different gtid. Do clear it out with Andrei please. Or don't binlog auto-adding at all.
1. why would this operation register a transaction? 2. you amended a couple of checks like - if (thd->in_sub_stmt) + if (thd->in_sub_stmt & ~SUB_STMT_AUTO_HIST) but what will happen if it's, for example, an insert inside a trigger? So a real sub_stmt and SUB_STMT_AUTO_HIST?
Are you sure the approximate value is enough here? You use it to estimate how many partitions to create. It's not a big deal if you create more than necessary (although it's not nice and we'll definitely get bug reports if that will happen). But you surely don't want to create less.
lt() and ge() don't seem to be used anywhere.
ALTER TABLE works under LOCK TABLES, so this should be possible too, but let's have it as a separate task.
this is quite complex condition, better not to duplicate it. May be something like bool need_set_hist_part(TABLE_LIST *table_list, enum_sql_command sql_command) { if (table_list->table->part_info && table_list->table->part_info->part_type == VERSIONING_PARTITION && !table_list->vers_conditions.delete_history && table_list->lock_type >= TL_WRITE_ALLOW_WRITE && table_list->mdl_request.type >= MDL_SHARED_WRITE && table_list->mdl_request.type < MDL_EXCLUSIVE) { switch(sql_command) { case SQLCOM_LOCK_TABLES: case SQLCOM_DELETE: case SQLCOM_UPDATE: case SQLCOM_REPLACE: case SQLCOM_REPLACE_SELECT: case SQLCOM_DELETE_MULTI: case SQLCOM_UPDATE_MULTI: return true; } default:; } return false; }
There's no MYSQL_UNBIND_TABLE/tc_release_table after other request_backoff_action invocations. Why this one is special?
this doesn't seem to be used anywhere
I'm confused. I thought that ALTER_PARTITION_AUTO_HIST is what you set in vers_add_auto_parts() to mark auto-adding of a new partition. But here you set it in the parser, when a user specifies AUTO in partition specifications. So it seems you're using this flag for two very different purposes. How do you distinguish between them? And why did you decide to do it this way?
can it even happen? Or should it be DBUG_ASSERT(0) here?
this !ALTER_PARTITION_AUTO_HIST - do you mean not vers_add_auto_parts() or not sql statement that uses AUTO ?
Sorry, I don't understand that if() at all. What does it do?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi Sergei! On Wed, Jul 1, 2020 at 12:26 AM Sergei Golubchik <serg@mariadb.org> wrote:
history partitioned by INTERVAL/LIMIT
Sorry, that was just the wrong message. I already removed the AUTO_INCREMENT keyword.
Besides the problems of implementation this probably is not correct logically. AUTO keyword is applicable only to SYSTEM_TIME partitioning so it should go after SYSTEM_TIME keyword, but is also applicable only to rotated partitions so it should go after rotation keywords as well.
Added.
This is for the "Concurrent DML" test case. New connections do not get current session time_zone.
Is it for replication?
select ifnull(max(transaction_id), 0) into @start_trx_id from
mysql.transaction_registry; partition (`p1`) is out of LIMIT, need more HISTORY partitions
Fixed.
part_id)); partition_info::has_unique_name(partition_element *element)
Since auto-partitioning is already there (HA_USE_AUTO_PARTITION) auto_add or auto_hist seem to be the good choice. I prefer auto_hist for more information.
This is for the LOCK TABLES when the history partition is almost full: -- source include/have_partition.inc create or replace table t1 (x int) with system versioning partition by system_time limit 10 auto; insert into t1 values (1), (2), (3), (4), (5), (6), (7), (8), (9); update t1 set x= x + 10; lock tables t1 write; update t1 set x= 1 where x = 11; update t1 set x= 2 where x = 12; update t1 set x= 3 where x = 13; unlock tables; select count(x) from t1 partition (p0); show create table t1; drop tables t1; Imagine 1000000 instead of 10. Locked commands operate on a small subset of records. Nevertheless they are condemned to end in the wrong partition if we didn't reserve one at the LOCK TABLES (we cannot add new partitions while the table is locked). That of course works only if LOCK TABLES is short enough to not fill the whole partition (which is expected to be a normal scenario). We will not need this when we auto-add under LOCK TABLES.
Now as you asked I understand that INTERVAL is not different. Remade this condition for both clauses and limited it to LOCK TABLES-only.
vers_info->hist_part->range_value); thd->m_reprepare_observer;
I rethought my idea of not failing the main command and removed new_stmt_da. Better and simpler is to fail the command if alter fails.
TABLE_REF_BASE_TABLE); thd->variables.lock_wait_timeout))
Reworked according to the above.
+
NULL, table->part_info->next_part_no(num_parts)))
This is for RBR where auto-add didn't work. Reworked via solution of fixing the slave side.
&create_info,
Transaction is registered here: #0 0x0000000000c75288 in Ha_trx_info::register_ha (this=0x7ff8040034b8, trans=0x7ff804004320, ht_arg=0x2d99778) at /home/midenok/src/mariadb/10.5/src/sql/handler.h:1918 #1 0x0000000000c5975a in trans_register_ha (thd=0x7ff804000d48, all=false, ht_arg=0x2d99778, trxid=422179462451680) at /home/midenok/src/mariadb/10.5/src/sql/handler.cc:1322 #2 0x0000000001111d44 in innobase_register_trx (hton=0x2d99778, thd=0x7ff804000d48, trx=0x7ff850e411e0) at /home/midenok/src/mariadb/10.5/src/storage/innobase/handler/ha_innodb.cc:2608 #3 0x000000000112f9fd in ha_innobase::external_lock (this=0x7ff804b243c0, thd=0x7ff804000d48, lock_type=1) at /home/midenok/src/mariadb/10.5/src/storage/innobase/handler/ha_innodb.cc:15524 #4 0x0000000000c5f38e in handler::ha_external_lock (this=0x7ff804b243c0, thd=0x7ff804000d48, lock_type=1) at /home/midenok/src/mariadb/10.5/src/sql/handler.cc:6660 #5 0x000000000104f19f in ha_partition::external_lock (this=0x7ff804ada670, thd=0x7ff804000d48, lock_type=1) at /home/midenok/src/mariadb/10.5/src/sql/ha_partition.cc:4026 #6 0x0000000000c5f32c in handler::ha_external_lock (this=0x7ff804ada670, thd=0x7ff804000d48, lock_type=1) at /home/midenok/src/mariadb/10.5/src/sql/handler.cc:6660 #7 0x0000000000de5b91 in lock_external (thd=0x7ff804000d48, tables=0x7ff8040160a0, count=1) at /home/midenok/src/mariadb/10.5/src/sql/lock.cc:393 #8 0x0000000000de56c3 in mysql_lock_tables (thd=0x7ff804000d48, sql_lock=0x7ff804016080, flags=131104) at /home/midenok/src/mariadb/10.5/src/sql/lock.cc:338 #9 0x0000000000de47b2 in mysql_lock_tables (thd=0x7ff804000d48, tables=0x7ff8040158e8, count=1, flags=131104) at /home/midenok/src/mariadb/10.5/src/sql/lock.cc:301 #10 0x00000000007d487e in open_ltable (thd=0x7ff804000d48, table_list=0x7ff8040157f0, lock_type=TL_WRITE, lock_flags=131104) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:5187 #11 0x00000000007d3349 in Open_table_context::recover_from_failed_open (this=0x7ff84805a6e8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:3258 #12 0x00000000007d5623 in open_tables (thd=0x7ff804000d48, options=..., start=0x7ff84805b2a8, counter=0x7ff84805b25c, flags=0, prelocking_strategy=0x7ff84805a7a8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:4367 #13 0x00000000007c7ac6 in open_tables (thd=0x7ff804000d48, tables=0x7ff84805b2a8, counter=0x7ff84805b25c, flags=0) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.h:477 #14 0x00000000009d9c13 in mysql_update (thd=0x7ff804000d48, table_list=0x7ff804014b68, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7ff84805c178, updated_return=0x7ff84805c170) at /home/midenok/src/mariadb/10.5/src/sql/sql_update.cc:401 #15 0x000000000089bc0a in mysql_execute_command (thd=0x7ff804000d48) at /home/midenok/src/mariadb/10.5/src/sql/sql_parse.cc:4247 #16 0x0000000000892f56 in mysql_parse (thd=0x7ff804000d48, rawbuf=0x7ff804014a90 "update t1 set x= x + 1", length=22, parser_state=0x7ff84805d598) at /home/midenok/src/mariadb/10.5/src/sql/sql_parse.cc:7837
This is LTM_PRELOCKED mode where we can't handle recover_from_failed_open() yet (MDEV-23639).
So a real sub_stmt and SUB_STMT_AUTO_HIST?
This condition will pass.
new_stmt_da.get_warning_info());
True and that's why I made 30 days a month and 365 days a year: lower value makes more partitions. Probably a student task to make calendar-correct calculations or utilize libc functions.
Removed.
*table_list, Open_table_context *ot_ctx)
Ok, I'd wish it was doable. MDEV-23639 is the task.
*table_list, Open_table_context *ot_ctx)
+ ot_ctx->vers_create_count=
Fixed. table->part_info->vers_set_hist_part(thd, true);
It might be different by the TABLE object state. There is no dependence for request_backoff_action() rather it is needed for further tdc_remove_table() and handler::rebind_psi(). Info: hang in wait_for_refs() If we don't do tc_release_table() on request_backoff_action(). #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffc4b0f638) at ../sysdeps/unix/sysv/linux/futex-internal.h:80 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fffc4b0f588, cond=0x7fffc4b0f610) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x7fffc4b0f610, mutex=0x7fffc4b0f588) at pthread_cond_wait.c:638 #3 0x00000000016ac979 in safe_cond_wait (cond=0x7fffc4b0f610, mp=0x7fffc4b0f560, file=0x175364a "/home/midenok/src/mariadb/10.5/src/include/mysql/psi/mysql_thread.h", line=1220) at /home/midenok/src/mariadb/10.5/src/mysys/thr_mutex.c:492 #4 0x0000000000b62814 in inline_mysql_cond_wait (that=0x7fffc4b0f610, mutex=0x7fffc4b0f560, src_file=0x17cc940 "/home/midenok/src/mariadb/10.5/src/sql/table_cache.cc", src_line=1236) at /home/midenok/src/mariadb/10.5/src/include/mysql/psi/mysql_thread.h:1220 #5 0x0000000000b67b56 in TDC_element::wait_for_refs (this=0x7fffc4b0f3c8, my_refs=1) at /home/midenok/src/mariadb/10.5/src/sql/table_cache.cc:1236 #6 0x0000000000b65a50 in tdc_remove_referenced_share (thd=0x7fffc4000d48, share=0x7fffc4ac5550) at /home/midenok/src/mariadb/10.5/src/sql/table_cache.cc:1006 #7 0x0000000000b65f8e in tdc_remove_table (thd=0x7fffc4000d48, db=0x7fffc4015238 "test", table_name=0x7fffc4014b30 "t1") at /home/midenok/src/mariadb/10.5/src/sql/table_cache.cc:1061 #8 0x00000000007d30b1 in Open_table_context::recover_from_failed_open (this=0x7ffff06666e8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:3235 #9 0x00000000007d55d3 in open_tables (thd=0x7fffc4000d48, options=..., start=0x7ffff06672a8, counter=0x7ffff066725c, flags=0, prelocking_strategy=0x7ffff06667a8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:4378 #10 0x00000000007c7a36 in open_tables (thd=0x7fffc4000d48, tables=0x7ffff06672a8, counter=0x7ffff066725c, flags=0) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.h:477 #11 0x00000000009d9bc3 in mysql_update (thd=0x7fffc4000d48, table_list=0x7fffc4014b68, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7ffff0668178, updated_return=0x7ffff0668170) at /home/midenok/src/mariadb/10.5/src/sql/sql_update.cc:401 Without MYSQL_UNBIND_TABLE(): #7 0x00007fa00aa23006 in __GI___assert_fail (assertion=0x196de1f "pfs->m_thread_owner == __null", file=0x196d990 "/home/midenok/src/mariadb/10.5/src/storage/perfschema/pfs.cc", line=2007, function=0x196de3d "PSI_table *pfs_rebind_table_v1(PSI_table_share *, const void *, PSI_table *)") at assert.c:101 #8 0x000000000106ec18 in pfs_rebind_table_v1 (share=0x343d540, identity=0x7f9fc8c62e70, table=0x7f9fc83a7700) at /home/midenok/src/mariadb/10.5/src/storage/perfschema/pfs.cc:2007 #9 0x0000000000c5f3ce in handler::rebind_psi (this=0x7f9fc8c62e70) at /home/midenok/src/mariadb/10.5/src/sql/handler.cc:2918 #10 0x00000000007d0d95 in open_table (thd=0x7f9fb8000d48, table_list=0x7f9fb8011508, ot_ctx=0x7f9fffd656e8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:2019 #11 0x00000000007d6be7 in open_and_process_table (thd=0x7f9fb8000d48, tables=0x7f9fb8011508, counter=0x7f9fffd6625c, flags=0, prelocking_strategy=0x7f9fffd657a8, has_prelocking_list=false, ot_ctx=0x7f9fffd656e8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:3876 #12 0x00000000007d5530 in open_tables (thd=0x7f9fb8000d48, options=..., start=0x7f9fffd662a8, counter=0x7f9fffd6625c, flags=0, prelocking_strategy=0x7f9fffd657a8) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.cc:4348 #13 0x00000000007c7a36 in open_tables (thd=0x7f9fb8000d48, tables=0x7f9fffd662a8, counter=0x7f9fffd6625c, flags=0) at /home/midenok/src/mariadb/10.5/src/sql/sql_base.h:477 #14 0x00000000009d9bd3 in mysql_update (thd=0x7f9fb8000d48, table_list=0x7f9fb8011508, fields=..., values=..., conds=0x7f9fb80121f8, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7f9fffd67178, updated_return=0x7f9fffd67170) at /home/midenok/src/mariadb/10.5/src/sql/sql_update.cc:401 #15 0x000000000089bbca in mysql_execute_command (thd=0x7f9fb8000d48) at /home/midenok/src/mariadb/10.5/src/sql/sql_parse.cc:4247 #16 0x0000000000892f16 in mysql_parse (thd=0x7f9fb8000d48, rawbuf=0x7f9fb8011410 "update t1 set x= x + 122 where x = 1", length=36, parser_state=0x7f9fffd68598) at /home/midenok/src/mariadb/10.5/src/sql/sql_parse.cc:7837 There is also tc_release_table(): if (table->file->discover_check_version()) { tc_release_table(table); (void) ot_ctx->request_backoff_action(Open_table_context::OT_DISCOVER, table_list); DBUG_RETURN(TRUE); }
Removed.
This is auto-add via ALTER command for RBR replication. Removed as part of RBR rework.
*table, Alter_info *alter_info,
This was for the above sql_yacc changes accordingly. Removed this as well.
tab_part_info->vers_info->interval.is_set())
This is for not auto-adding so it is for both. Removed the second so it is for the first.
{ partition_element *hist_part=
tab_part_info->vers_info->hist_part;
This was for the second only (sql statement) so removed this as well.
-- All the best, Aleksey Midenkov @midenok

Hi, Aleksey! just one comment below, no comments on all your other replies. The issue is in review, shall I take another look or it's on you now? On Sep 01, Aleksey Midenkov wrote:
I think you could just say that "new partitions aren't added under LOCK TABLES" and that's all. A documented limitation for now, lifted later. I don't think there will be a lot of users using LOCK TABLES with system versioned tables and auto-adding of partitions. Probably the main (if not the only) use case would be loading the dump created by mysqldump. And it loads the _whole table_ under LOCK TABLES, so your trick won't help anyway. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi, Aleksey! On Dec 01, Sergei Golubchik wrote:
I think I've never got a reply to this question. Shall I take another look at the bb-10.6-midenok-MDEV-17554 branch or not yet?
On Sep 01, Aleksey Midenkov wrote:
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org

Hi, Sergei! On Wed, Mar 31, 2021 at 5:52 PM Sergei Golubchik <serg@mariadb.org> wrote:
I've updated the branch on top of the latest 10.6. Please go ahead and take a look.
-- All the best, Aleksey Midenkov @midenok
participants (2)
-
Aleksey Midenkov
-
Sergei Golubchik