Hi Sergei! The updated branch is cabd0530 bb-10.7-midenok-MDEV-22166 On Wed, Aug 25, 2021 at 9:33 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Aleksey!
It's great that you made it atomic. Frankly, I didn't think of it, MDEV defines the feature as being exactly equivalent to the 3-command sequence, so non-atomic. But, of course, making it atomic makes perfect sense.
See other comments below.
On Aug 23, Aleksey Midenkov wrote:
commit 6768e3a2830 Author: Aleksey Midenkov <midenok@gmail.com> Date: Thu Jul 29 22:54:12 2021 +0300
diff --git a/sql/lex.h b/sql/lex.h index cbf9d9d51b2..55eefc39a7e 100644 --- a/sql/lex.h +++ b/sql/lex.h @@ -240,6 +240,7 @@ SYMBOL symbols[] = { { "EXPLAIN", SYM(DESCRIBE)}, { "EXTENDED", SYM(EXTENDED_SYM)}, { "EXTENT_SIZE", SYM(EXTENT_SIZE_SYM)}, + { "EXTRACT", SYM(EXTRACT_SYM)},
So, you did EXTRACT after all, not MIGRATE?
I did like the original description said but I agree that this should be changed to something better. Moreover EXTRACT parsing is now incorrect, see git show 3b693cd3d9 mysql-test/main/parser.result
ADD/EXTRACT aren't opposites in SQL.
It would be rather consistent to have
ALTER ... ADD PARTITION (...) FROM table_name
and
ALTER ... DROP PARTITION ... TO table_name
except that DROP...TO looks kind of strange, and ADD...FROM does not convey the meaning that the table disappears.
The second issue also applies to IMPORT/EXPORT.
Another option, also kind of strange, but also consistent and makes it clear that the table/partition disappears:
ALTER ... MOVE TABLE table_name TO PARTITION (...) ALTER ... MOVE PARTITION ... TO TABLE table_name
or the same with CHANGE instead of MOVE to reuse an existing keyword.
There is no MOVE symbol and we should not add a new one because of the same compatibility reason: it will not be possible to create table "move" (as well as to work with existing one I suspect). That's why we found MIGRATE as a good candidate And the reasons to have some special keyword, not just "ADD PARTITION ... FROM ..." are: 1. it is not clear whether the source table stays or not (that was the discussion in MDEV-22165); 2. for keyword like MIGRATE it is clear how it works in both directions and no need to remember or guess the opposite keyword; 3. (and most important) is scalability: we may want to expand the syntax for copying the table instead of moving it so instead of MIGRATE we put COPY and that's it: same formula, clear how it works, easy to interchange between MIGRATE and COPY. I've already thought about MOVE as a better keyword but sadly enough it is not reserved. So how about the proposed syntax with MIGRATE?
I'm not saying anything of the above is perfect, and better variants are very much welcome. But pretty much anything is better than ADD/EXTRACT.
{ "FALSE", SYM(FALSE_SYM)}, { "FAST", SYM(FAST_SYM)}, { "FAULTS", SYM(FAULTS_SYM)}, diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index dc2eb4425f0..a41ea7d3e4a 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -7605,6 +7605,25 @@ alter_commands: if (Lex->stmt_alter_table_exchange_partition($6)) MYSQL_YYABORT; } + | EXTRACT_SYM PARTITION_SYM alt_part_name_item + AS TABLE_SYM table_ident have_partitioning + { + LEX *lex= Lex; + lex->first_select_lex()->db= $6->db; + if (lex->first_select_lex()->db.str == NULL && + lex->copy_db_to(&lex->first_select_lex()->db)) + MYSQL_YYABORT; + if (unlikely(check_table_name($6->table.str,$6->table.length, + FALSE)) || + ($6->db.str && unlikely(check_db_name((LEX_STRING*) &$6->db)))) + my_yyabort_error((ER_WRONG_TABLE_NAME, MYF(0), $6->table.str));
good point, LEX::stmt_alter_table_exchange_partition() doesn't seem to do it
+ lex->name= $6->table; + + lex->m_sql_cmd= new (thd->mem_root) Sql_cmd_alter_table(); + if (unlikely(lex->m_sql_cmd == NULL)) + MYSQL_YYABORT; + lex->alter_info.partition_flags|= ALTER_PARTITION_EXTRACT;
why wouldn't you combine the common code from here and EXCHANGE PARTITION? less code duplication and as a bonus EXCHANGE will get proper table/db name checks.
Updated.
+ } ;
remove_partitioning: diff --git a/sql/sql_alter.h b/sql/sql_alter.h index a0f89c28d2a..6aa65d232ca 100644 --- a/sql/sql_alter.h +++ b/sql/sql_alter.h @@ -345,6 +348,8 @@ class Alter_table_ctx /** Indicates that we are altering temporary table */ bool tmp_table;
+ DDL_LOG_STATE *ddl_log_state;
What do you need this for?
Cleaned up this.
+ private: char new_filename[FN_REFLEN + 1]; char new_alias_buff[NAME_LEN + 1]; diff --git a/sql/sql_table.cc b/sql/sql_table.cc index 779d125af70..033d77a827d 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -10016,7 +10146,7 @@ bool mysql_alter_table(THD *thd, const LEX_CSTRING *new_db, }
// In-place execution of ALTER TABLE for partitioning. - DBUG_RETURN(fast_alter_partition_table(thd, table, alter_info, + DBUG_RETURN(fast_alter_partition_table(thd, table, alter_info, &alter_ctx, create_info, table_list, &alter_ctx.db, &alter_ctx.table_name));
you no longer need to pass alter_ctx.db and alter_ctx.table_name, if you pass the whole alter_ctx down.
Cleaned up this.
diff --git a/include/my_dbug.h b/include/my_dbug.h index e25bfcf28a7..cf969ce5750 100644 --- a/include/my_dbug.h +++ b/include/my_dbug.h @@ -104,6 +104,8 @@ extern int (*dbug_sanity)(void); (_db_keyword_(0,(keyword), 0) ? (a1) : (a2)) #define DBUG_EVALUATE_IF(keyword,a1,a2) \ (_db_keyword_(0,(keyword), 1) ? (a1) : (a2)) +#define DBUG_TRUE_IF(keyword) DBUG_EVALUATE_IF(keyword, 1, 0) +#define DBUG_FALSE_IF(keyword) DBUG_EVALUATE_IF(keyword, 0, 1)
hmm. DBUG_FALSE_IF is not used anywhere, DBUG_TRUE_IF is used once as
(1):
if (DBUG_TRUE_IF("error_extract_partition_00") || unlikely(error= file->ha_rename_table(from_name, to_name)))
and this is typically written as
(2):
if (DBUG_EVALUATE_IF("error_extract_partition_00", 1, error= file->ha_rename_table(from_name, to_name)))
so I see no need for either DBUG_TRUE_IF or DBUG_FALSE_IF, please, remove both
Please, (1) looks much better! Considering that there are no complex expressions for a1 in the code but only 1 or 0, we should use DBUG_TRUE_IF/DBUG_FALSE_IF instead. I can change all entries of DBUG_EVALUATE_IF for that. This is supposed to be an independent refactoring.
#define DBUG_PRINT(keyword,arglist) \ do if (_db_pargs_(__LINE__,keyword)) _db_doprnt_ arglist; while(0)
diff --git a/sql/sql_partition.cc b/sql/sql_partition.cc index 85d880cbbb4..4172a61812e 100644 --- a/sql/sql_partition.cc +++ b/sql/sql_partition.cc @@ -5432,7 +5432,7 @@ uint prep_alter_part_table(THD *thd, TABLE *table, Alter_info *alter_info, } while (++part_count < tab_part_info->num_parts); if (num_parts_found != num_parts_dropped) { - my_error(ER_DROP_PARTITION_NON_EXISTENT, MYF(0), "DROP"); + my_error(ER_DROP_PARTITION_NON_EXISTENT, MYF(0), cmd);
Not part of your patch, I know, but the error message here is
"Error in list of partitions to %-.64s"
shouldn't it say "Partition does not exist" or something?
Why does DROP use such wording then? I don't think here should be different error code, but the code name itself should be something like ER_PARTITION_NOT_EXISTS.
goto err; } if (table->file->is_fk_defined_on_table_or_index(MAX_KEY)) @@ -7268,6 +7420,48 @@ uint fast_alter_partition_table(THD *thd, TABLE *table, if (alter_partition_lock_handling(lpt)) goto err; } + else if (alter_info->partition_flags & ALTER_PARTITION_EXTRACT) + { + if (mysql_write_frm(lpt, WFRM_WRITE_EXTRACTED) || + write_log_drop_shadow_frm(lpt) || + ERROR_INJECT_CRASH("crash_extract_partition_1") || + ERROR_INJECT_ERROR("fail_extract_partition_1") || + mysql_write_frm(lpt, WFRM_WRITE_SHADOW) || + ERROR_INJECT_CRASH("crash_extract_partition_2") || + ERROR_INJECT_ERROR("fail_extract_partition_2") || + wait_while_table_is_used(thd, table, HA_EXTRA_NOT_USED) || + ERROR_INJECT_CRASH("crash_extract_partition_3") || + ERROR_INJECT_ERROR("fail_extract_partition_3") || + write_log_extract_partition(lpt) || + ERROR_INJECT_CRASH("crash_extract_partition_4") || + ERROR_INJECT_ERROR("fail_extract_partition_4") || + alter_close_table(lpt) || + ERROR_INJECT_CRASH("crash_extract_partition_5") || + ERROR_INJECT_ERROR("fail_extract_partition_5") || + alter_partition_extract(lpt) || + ERROR_INJECT_CRASH("crash_extract_partition_7") || + ERROR_INJECT_ERROR("fail_extract_partition_7") || + (frm_install= TRUE, FALSE) || + mysql_write_frm(lpt, WFRM_INSTALL_SHADOW|WFRM_BACKUP_ORIGINAL) || + log_partition_alter_to_ddl_log(lpt) || + (frm_install= FALSE, FALSE) || + ERROR_INJECT_CRASH("crash_extract_partition_8") || + ERROR_INJECT_ERROR("fail_extract_partition_8") || + (ddl_log_complete(lpt->part_info), false) ||
as far as I understand this ^^^ should be done *after* binlogging. I suspect that if you crash here, blnlog won't get this ALTER TABLE.
Right, I fixed that by reordering calls and using XID.
+ ((!thd->lex->no_write_to_binlog) && + (write_bin_log(thd, FALSE, + thd->query(), thd->query_length()), FALSE)) ||
if you crash here, what will roll forward on recovery and drop the backup? The log is already closed here, isn't it?
I don't understand why you need a backup. Why not to install shadow after the binlog was written?
If we install shadow after binlog was written and before ddl_log_complete() and it crashes after binlog was written and before ddl_log_complete() the chain will be closed via XID and we are left with (or without) original frm and without installed frm. Let's imagine we do not close chain via XID, then it is replayed back and we are left with master-slave inconsistency. There are more examples of how it can crash and how we can try to recover, but the simplest and most reliable implementation is via WFRM_BACKUP_ORIGINAL. If you want to discuss more, let's do that in chat.
+ mysql_write_frm(lpt, WFRM_DROP_BACKUP) || + ERROR_INJECT_CRASH("crash_extract_partition_9") || + ERROR_INJECT_ERROR("fail_extract_partition_9")) + { + (void) ddl_log_revert(thd, lpt->part_info); + handle_alter_part_error(lpt, true, true, frm_install); + goto err; + } + if (alter_partition_lock_handling(lpt)) + goto err; + } else if ((alter_info->partition_flags & ALTER_PARTITION_ADD) && (part_info->part_type == RANGE_PARTITION || part_info->part_type == LIST_PARTITION)) diff --git a/mysql-test/include/lcase_names.inc b/mysql-test/include/lcase_names.inc new file mode 100644 index 00000000000..e69de29bb2d
better put here some comment, like in other dummy *.inc files see e.g. platform.inc or protocol.inc
Done.
diff --git a/mysql-test/suite/atomic/alter_partition,aria.rdiff b/mysql-test/suite/atomic/alter_partition,aria.rdiff new file mode 100644 index 00000000000..20c0510c5ed --- /dev/null +++ b/mysql-test/suite/atomic/alter_partition,aria.rdiff @@ -0,0 +1,219 @@ +--- alter_partition.result ++++ alter_partition,aria.reject +@@ -1,4 +1,4 @@ +-set @@default_storage_engine=MyISAM; ++set @@default_storage_engine=Aria; + # Crash recovery + create or replace procedure prepare_table() + begin +@@ -13,12 +13,12 @@ + end $ + # QUERY: ALTER TABLE t1 EXTRACT PARTITION p1 AS TABLE tp1 + # CRASH POINT: ddl_log_create_before_create_frm +-t1#P#p0.MYD +-t1#P#p0.MYI +-t1#P#p1.MYD
may be better `replace_result MAI MYI MAD MYD` ? if that's the only difference. To avoid big rdiff files.
Done.
+-t1#P#p1.MYI +-t1#P#pn.MYD +-t1#P#pn.MYI ++t1#P#p0.MAD ++t1#P#p0.MAI
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- All the best, Aleksey Midenkov @midenok