New subject: [Maria-developers] Review #2 for: MDEV-26278 Elimination: consider GROUP BY as unique key

19 May 2022

Hi Oleg,

Please find the review below.
...
commit 20609d374b3a82c46291ee5e87cfadd26de47817
Author: Oleg Smirnov <olernov@gmail.com>
Date:   Thu Feb 17 22:53:37 2022 +0700
MDEV-26278 Elimination: consider GROUP BY as unique key
Please provide a more verbose description. We should aim for it to provide
sufficient info for one to understand what the optimization does and how
that is achieved.


Some general comments:

The patch uses this formatting for multi-line comments:
...
+  /* GROUP BY expression is considered as a synthetic "unique key"
+    for the derived table. Add this key as a dependency */
While the coding style and actual MariaDB code would use this:

  /*
    GROUP BY expression is considered as a synthetic "unique key"
    for the derived table. Add this key as a dependency
  */

Please use this style for all comments.


Also, the patch uses the term "synthetic unique key". I think the term
is not very good: It can be confused with "synthetic primary key", also
I read "synthetic" as something that actually exists, while in our case
the key doesn't exist.
Would using "Pseudo-key" be better?
...

diff --git a/sql/table.h b/sql/table.h
index 497502b2d06..fd598556cc3 100644
--- a/sql/table.h
+++ b/sql/table.h
@@ -2872,6 +2872,9 @@ struct TABLE_LIST
     }
   }
+  void mark_derived_as_eliminated() { m_is_derived_eliminated= true; }
+  bool is_derived_eliminated() const { return m_is_derived_eliminated; }
+
So, TABLE_LIST objects are created non-eliminated and then they are
eliminated, and that is forever. TABLE_LIST object has the same lifetime
as the statement, that is, it will survive multiple executions of a prepared
statement.
On the other hand, Table Elimination optimization is done in eliminate_tables() 
call which is invoked for every statement execution.
This looks like a dangerous mismatch. I'll continue below
...
private:
   bool prep_check_option(THD *thd, uint8 check_opt_type);
   bool prep_where(THD *thd, Item **conds, bool no_where_clause);
diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc
index 3bf6d9abf57..02ef5cd7ad7 100644
--- a/sql/sql_lex.cc
+++ b/sql/sql_lex.cc
@@ -11786,7 +11786,7 @@ bool SELECT_LEX_UNIT::explainable() const
     EXPLAIN/ANALYZE unit, when:
     (1) if it's a subquery - it's not part of eliminated WHERE/ON clause.
     (2) if it's a CTE - it's not hanging (needed for execution)
-    (3) if it's a derived - it's not merged
+    (3) if it's a derived - it's not merged or eliminated
     if it's not 1/2/3 - it's some weird internal thing, ignore it
   */
   return item ?
@@ -11795,6 +11795,7 @@ bool SELECT_LEX_UNIT::explainable() const
              derived && derived->derived_result &&
                !with_element->is_hanging_recursive(): // (2)
              derived ?
-               derived->is_materialized_derived() :   // (3)
+               derived->is_materialized_derived() && // (3)
+                 !derived->is_derived_eliminated() :
                false;
 }
I think, there's no need for m_is_derived_eliminated.
You can have this check in is_derived_eliminated():
this->derived->table->map & this->outer_select()->join->eliminated_tables


Please try doing this.
...
diff --git a/mysql-test/main/table_elim.result b/mysql-test/main/table_elim.result
index deff0623370..842330f3713 100644
--- a/mysql-test/main/table_elim.result
+++ b/mysql-test/main/table_elim.result
@@ -704,3 +704,252 @@ LIMIT 1;
 PostID	Voted
 1	NULL
 DROP TABLE t1,t2;
+#
+# MDEV-26278: Table elimination does not work across derived tables
+#
+create table t1 (a int, b int);
+insert into t1 select seq, seq+10 from seq_1_to_10;
+create table t11 (
+a int not null,
+b int,
+key(a)
+);
+insert into t11 select A.seq, A.seq+B.seq
+from
+seq_1_to_100 A,
+seq_1_to_1000 B;
Do you really need 100K rows table? I think 10K rows will be just as good.
...
diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
index 8c4720bdec4..fe2155191ca 100644
--- a/sql/opt_table_elimination.cc
+++ b/sql/opt_table_elimination.cc
...
@@ -278,6 +286,8 @@ class Dep_value_field : public Dep_value
     Dep_module_key *key_dep;
     /* Otherwise, this and advance */
     uint equality_no;
+
+    Dep_module_synthetic_key *synth_key_dep;
Please add a comment for the above that we should try that one, too.

@@ -447,6 +463,55 @@ const size_t Dep_module::iterator_size=
...
MY_MAX(Dep_module_expr::iterator_size, Dep_module_key::iterator_size);
+/* A synthetic unique key module for a derived table.
+  For example, a derived table
+  SELECT t11.a, count(*) from t1 LEFT JOIN t2 ON t1.id = t2.fk GROUP BY t11.a
+  has unique values in its first field (t11.a) due to GROUP BY expression
+  so this can be considered as a unique key for this derived table
+*/
+
+class Dep_module_synthetic_key : public Dep_module
+{
+public:
+  Dep_module_synthetic_key(Dep_value_table *table_arg,
+                           std::vector<field_index_t>&& field_indexes)
+      : table(table_arg), derived_table_field_indexes(field_indexes)
+  {
+    unbound_args= static_cast<uint>(field_indexes.size());
+  }
+
+  Dep_value_table * table;
Coding style: "*table".

...
...
@@ -1603,7 +1674,73 @@ Dep_value_table *Dep_analysis_context::create_table_value(TABLE *table)
       key_list= &(key_dep->next_table_key);
     }
   }
-  return table_deps[table->tablenr]= tbl_dep;
+
+  auto select_unit= table_list->get_unit();
+  SELECT_LEX *first_select= nullptr;
+  if (select_unit)
Please move the code that creates the pseudo-key into a separate function.

...
...
+
+int Dep_analysis_context::find_field_in_list(List<Item> &fields_list,
+                                             Item *field)
Please add a one-line comment describing what function does.
...
@@ -1786,6 +1941,18 @@ Dep_module* Dep_value_field::get_next_unbound_module(Dep_analysis_context *dac,
   }
   else 
     di->key_dep= NULL;
+
+  Dep_module_synthetic_key *synth_key_dep= di->synth_key_dep;
+  if (synth_key_dep && !synth_key_dep->is_applicable() &&
+    std::find(synth_key_dep->derived_table_field_indexes.begin(),
+      synth_key_dep->derived_table_field_indexes.end(),
+      field->field_index) != std::end(synth_key_dep->derived_table_field_indexes))
Please introduce a method like 

  bool Dep_module_synthetic_key::covers_field(int field_index);

Will this also allow to make
Dep_module_synthetic_key::derived_table_field_indexes a private member?


BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net

    

Re: [Maria-developers] MDEV-26278 Elimination: consider GROUP BY as unique key

Sergey Petrunia

Sergey Petrunia

tags

participants (1)