Hi Oleg, Please find the review below.
commit 20609d374b3a82c46291ee5e87cfadd26de47817 Author: Oleg Smirnov <olernov@gmail.com> Date: Thu Feb 17 22:53:37 2022 +0700
MDEV-26278 Elimination: consider GROUP BY as unique key
Please provide a more verbose description. We should aim for it to provide sufficient info for one to understand what the optimization does and how that is achieved. Some general comments: The patch uses this formatting for multi-line comments:
+ /* GROUP BY expression is considered as a synthetic "unique key" + for the derived table. Add this key as a dependency */
While the coding style and actual MariaDB code would use this: /* GROUP BY expression is considered as a synthetic "unique key" for the derived table. Add this key as a dependency */ Please use this style for all comments. Also, the patch uses the term "synthetic unique key". I think the term is not very good: It can be confused with "synthetic primary key", also I read "synthetic" as something that actually exists, while in our case the key doesn't exist. Would using "Pseudo-key" be better?
diff --git a/sql/table.h b/sql/table.h index 497502b2d06..fd598556cc3 100644 --- a/sql/table.h +++ b/sql/table.h @@ -2872,6 +2872,9 @@ struct TABLE_LIST } }
+ void mark_derived_as_eliminated() { m_is_derived_eliminated= true; } + bool is_derived_eliminated() const { return m_is_derived_eliminated; } + So, TABLE_LIST objects are created non-eliminated and then they are eliminated, and that is forever. TABLE_LIST object has the same lifetime as the statement, that is, it will survive multiple executions of a prepared statement.
On the other hand, Table Elimination optimization is done in eliminate_tables() call which is invoked for every statement execution. This looks like a dangerous mismatch. I'll continue below
private: bool prep_check_option(THD *thd, uint8 check_opt_type); bool prep_where(THD *thd, Item **conds, bool no_where_clause); diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index 3bf6d9abf57..02ef5cd7ad7 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -11786,7 +11786,7 @@ bool SELECT_LEX_UNIT::explainable() const EXPLAIN/ANALYZE unit, when: (1) if it's a subquery - it's not part of eliminated WHERE/ON clause. (2) if it's a CTE - it's not hanging (needed for execution) - (3) if it's a derived - it's not merged + (3) if it's a derived - it's not merged or eliminated if it's not 1/2/3 - it's some weird internal thing, ignore it */ return item ? @@ -11795,6 +11795,7 @@ bool SELECT_LEX_UNIT::explainable() const derived && derived->derived_result && !with_element->is_hanging_recursive(): // (2) derived ? - derived->is_materialized_derived() : // (3) + derived->is_materialized_derived() && // (3) + !derived->is_derived_eliminated() : false; } I think, there's no need for m_is_derived_eliminated. You can have this check in is_derived_eliminated():
this->derived->table->map & this->outer_select()->join->eliminated_tables Please try doing this.
diff --git a/mysql-test/main/table_elim.result b/mysql-test/main/table_elim.result index deff0623370..842330f3713 100644 --- a/mysql-test/main/table_elim.result +++ b/mysql-test/main/table_elim.result @@ -704,3 +704,252 @@ LIMIT 1; PostID Voted 1 NULL DROP TABLE t1,t2; +# +# MDEV-26278: Table elimination does not work across derived tables +# +create table t1 (a int, b int); +insert into t1 select seq, seq+10 from seq_1_to_10; +create table t11 ( +a int not null, +b int, +key(a) +); +insert into t11 select A.seq, A.seq+B.seq +from +seq_1_to_100 A, +seq_1_to_1000 B;
Do you really need 100K rows table? I think 10K rows will be just as good.
diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc index 8c4720bdec4..fe2155191ca 100644 --- a/sql/opt_table_elimination.cc +++ b/sql/opt_table_elimination.cc ... @@ -278,6 +286,8 @@ class Dep_value_field : public Dep_value Dep_module_key *key_dep; /* Otherwise, this and advance */ uint equality_no; + + Dep_module_synthetic_key *synth_key_dep;
Please add a comment for the above that we should try that one, too. @@ -447,6 +463,55 @@ const size_t Dep_module::iterator_size=
MY_MAX(Dep_module_expr::iterator_size, Dep_module_key::iterator_size);
+/* A synthetic unique key module for a derived table. + For example, a derived table + SELECT t11.a, count(*) from t1 LEFT JOIN t2 ON t1.id = t2.fk GROUP BY t11.a + has unique values in its first field (t11.a) due to GROUP BY expression + so this can be considered as a unique key for this derived table +*/ + +class Dep_module_synthetic_key : public Dep_module +{ +public: + Dep_module_synthetic_key(Dep_value_table *table_arg, + std::vector<field_index_t>&& field_indexes) + : table(table_arg), derived_table_field_indexes(field_indexes) + { + unbound_args= static_cast<uint>(field_indexes.size()); + } + + Dep_value_table * table;
Coding style: "*table". ...
@@ -1603,7 +1674,73 @@ Dep_value_table *Dep_analysis_context::create_table_value(TABLE *table) key_list= &(key_dep->next_table_key); } } - return table_deps[table->tablenr]= tbl_dep; + + auto select_unit= table_list->get_unit(); + SELECT_LEX *first_select= nullptr; + if (select_unit)
Please move the code that creates the pseudo-key into a separate function. ...
+ +int Dep_analysis_context::find_field_in_list(List<Item> &fields_list, + Item *field)
Please add a one-line comment describing what function does.
@@ -1786,6 +1941,18 @@ Dep_module* Dep_value_field::get_next_unbound_module(Dep_analysis_context *dac, } else di->key_dep= NULL; + + Dep_module_synthetic_key *synth_key_dep= di->synth_key_dep; + if (synth_key_dep && !synth_key_dep->is_applicable() && + std::find(synth_key_dep->derived_table_field_indexes.begin(), + synth_key_dep->derived_table_field_indexes.end(), + field->field_index) != std::end(synth_key_dep->derived_table_field_indexes))
Please introduce a method like bool Dep_module_synthetic_key::covers_field(int field_index); Will this also allow to make Dep_module_synthetic_key::derived_table_field_indexes a private member? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net