developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 6 participants
- 6824 discussions
[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply@askmonty.org 17 Aug '09
by worklog-noreply@askmonty.org 17 Aug '09
17 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: improving mysqlbinlog output and doing rename
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 29
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=-
Research and architecture review.
Worked 12 hours and estimate 0 hours remain (original estimate increased by 12 hours).
-=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=-
Reported zero hours worked. Estimate unchanged.
-=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=-
Dependency created: 39 now depends on 49
-=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=-
Dependency deleted: 39 no longer depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 47
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 45
-=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=-
Title modified.
--- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300
+++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300
@@ -1 +1 @@
-Replication tasks
+improving mysqlbinlog output and doing rename
-=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=-
Adding 1 hour for Monty's initial work on starting the architecture review.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
------------------------------------------------------------
-=-=(View All Progress Notes, 16 total)=-=-
http://askmonty.org/worklog/index.pl?tid=39&nolimit=1
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply@askmonty.org 17 Aug '09
by worklog-noreply@askmonty.org 17 Aug '09
17 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: improving mysqlbinlog output and doing rename
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 17
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=-
Reported zero hours worked. Estimate unchanged.
-=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=-
Dependency created: 39 now depends on 49
-=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=-
Dependency deleted: 39 no longer depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 47
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 45
-=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=-
Title modified.
--- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300
+++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300
@@ -1 +1 @@
-Replication tasks
+improving mysqlbinlog output and doing rename
-=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=-
Adding 1 hour for Monty's initial work on starting the architecture review.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
------------------------------------------------------------
-=-=(View All Progress Notes, 15 total)=-=-
http://askmonty.org/worklog/index.pl?tid=39&nolimit=1
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply@askmonty.org 17 Aug '09
by worklog-noreply@askmonty.org 17 Aug '09
17 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: improving mysqlbinlog output and doing rename
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 17
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=-
Reported zero hours worked. Estimate unchanged.
-=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=-
Dependency created: 39 now depends on 49
-=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=-
Dependency deleted: 39 no longer depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 47
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 46
-=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=-
Dependency created: 39 now depends on 45
-=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=-
Title modified.
--- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300
+++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300
@@ -1 +1 @@
-Replication tasks
+improving mysqlbinlog output and doing rename
-=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=-
Adding 1 hour for Monty's initial work on starting the architecture review.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
------------------------------------------------------------
-=-=(View All Progress Notes, 15 total)=-=-
http://askmonty.org/worklog/index.pl?tid=39&nolimit=1
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Rev 2730: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09
by Sergey Petrunya 16 Aug '09
16 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r10/
------------------------------------------------------------
revno: 2730
revision-id: psergey(a)askmonty.org-20090816180159-z3lfkjpjfsm7zbp0
parent: psergey(a)askmonty.org-20090816143547-16hyle50tbt31xen
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r10
timestamp: Sun 2009-08-16 21:01:59 +0300
message:
MWL#17: Table elimination
- More comments
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-16 14:35:47 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-16 18:01:59 +0000
@@ -58,7 +58,7 @@
Table elimination is redone on every PS re-execution.
- IMPLEMENTATION
+ TABLE ELIMINATION ALGORITHM
As said above, we can remove inner side of an outer join if it is
@@ -67,23 +67,59 @@
We check #1 by doing a recursive descent down the join->join_list while
maintaining a union of used_tables() attribute of all expressions we've seen
- "elsewhere". When we encounter an outer join, we check if the bitmap of
- tables on its inner side has intersection with tables that are used
- elsewhere. No intersection means that inner side of the outer join could
+ "elsewhere". When we encounter an outer join, we check if the bitmap of
+ tables on its inner side has an intersection with tables that are used
+ elsewhere. No intersection means that inner side of the outer join could
potentially be eliminated.
- #2 is checked using a concept of values and modules that indicate
- dependencies between them.
-
- We start with
- of certain values that functional dependencies between
- them. There are two kinds of values:
-*/
-
-/*
- A value
-
- functional dependencies between two kinds of entities:
+ In order to check #2, one needs to prove that inner side of an outer join
+ is functionally dependent on the outside. We prove dependency by proving
+ functional dependency of intermediate objects:
+
+ - Inner side of outer join is functionally dependent when each of its tables
+ are functionally dependent. (We assume a table is functionally dependent
+ when its dependencies allow to uniquely identify one table record, or no
+ records).
+
+ - Table is functionally dependent when it has got a unique key whose columns
+ are functionally dependent.
+
+ - A column is functionally dependent when we could locate an AND-part of a
+ certain ON clause in form
+
+ tblX.columnY= expr
+
+ where expr is functionally-depdendent.
+
+ Apparently the above rules can be applied recursively. Also, certain entities
+ depend on multiple other entities. We model this by a bipartite graph which
+ has two kinds of nodes:
+
+ Value nodes:
+ - Table column values (each is a value of tblX.columnY)
+ - Table nodes (each node represents a table inside an eliminable join nest).
+ each value is either bound (i.e. functionally dependent) or not.
+
+ Module nodes:
+ - Nodes representing tblX.colY=expr equalities. Equality node has
+ = incoming edges from columns used in expr
+ = outgoing edge to tblX.colY column.
+ - Nodes representing unique keys. Unique key has
+ = incoming edges from key component value nodes
+ = outgoing edge to key's table node
+ - Inner side of outer join node. Outer join node has
+ = incoming edges from table value nodes
+ = No outgoing edges. Once we reach it, we know we can eliminate the
+ outer join.
+ A module may depend on multiple values, and hence its primary attribute is
+ the number of its depedencies that are not bound.
+
+ The algorithm starts with equality nodes that don't have any incoming edges
+ (their expressions are either constant or depend only on tables that are
+ outside of any outer joins) and proceeds to traverse dependency->dependant
+ edges until we've other traversed everything (TODO rephrase elaborate), or
+ we've reached the point where all outer join modules have zero unsatisfied
+ dependencies.
*/
class Value_dep;
1
0
Hello Michael,
Looks like I've overlooked this email back then. :( Peter pinged
me about Sphinx vs Maria status recently and I just found it. Well,
hopefully better late than never!
Sunday, June 7, 2009, 1:18:30 PM, you wrote:
MW> Andrew, what are the possible drawbacks you can see with having
MW> Sphinx to be a part of MairaDB for a user that is not using Sphinx?
Can't think of any. SphinxSE is a mere client and as such does not
allocate any big RAM buffers or other resources.
MW> I assume that if Sphinx is not enabled, it will not take any
MW> resources.
MW> If Sphinx is enabled but not used, what are the resorces it would use?
Pretty much none, AFAIK.
---
On an unrelated note, we're working on so called RT backend here, and
it will allow all the normal CRUD operations in run time (as opposed to
only having reads against a static fulltext index that we have now).
When it's done it'll be also technically possible to integrate it too
- and do so tighter by embedding the library instead of just talking to
Sphinx searchd over network.
Sphinx searchd can now talk MySQL protocol and supports basic SQL
syntax. So for "just" full text tasks end users don't really need
the integrated version.
However it still might possibly be useful in certain use cases.
To keep FT index in (better) sync with DB data, avoid overheads
of double network roundtrips for additional processing, avoid hassles
of keeping two connections and manually managing two open
transactions, etc.
So I wonder what'd be your opinion about the integration - whether
it seems useful at all, and if yes, whether network client or embedded
library route seems better.
--
Best regards,
Andrew mailto:shodan@shodan.ru
5
14
[Maria-developers] Rev 2729: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09
by Sergey Petrunya 16 Aug '09
16 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r10/
------------------------------------------------------------
revno: 2729
revision-id: psergey(a)askmonty.org-20090816143547-16hyle50tbt31xen
parent: psergey(a)askmonty.org-20090816124331-gd53m2alc0jb3ws4
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r10
timestamp: Sun 2009-08-16 17:35:47 +0300
message:
MWL#17: Table elimination
- Better comments
- More OOM checks
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-16 12:43:31 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-16 14:35:47 +0000
@@ -19,6 +19,25 @@
/*
OVERVIEW
+ This file contains table elimination module. The idea behind table
+ elimination is as follows: suppose we have a left join
+
+ SELECT * FROM t1 LEFT JOIN
+ (t2 JOIN t3) ON t3.primary_key=t1.col AND
+ t4.primary_key=t2.col
+ such that
+ * columns of the inner tables are not used anywhere ouside the outer join
+ (not in WHERE, not in GROUP/ORDER BY clause, not in select list etc etc),
+ * inner side of the outer join is guaranteed to produce at most one matching
+ record combination for each record combination of outer tables.
+
+ then the inner side of the outer join can be removed from the query, as it
+ will always produce only one record combination (either real or
+ null-complemented one) and we don't care about what that record combination
+ is.
+
+ MODULE INTERFACE
+
The module has one entry point - eliminate_tables() function, which one
needs to call (once) at some point before the join optimization.
eliminate_tables() operates over the JOIN structures. Logically, it
@@ -38,6 +57,50 @@
by EXPLAIN code to check if the subquery should be shown in EXPLAIN.
Table elimination is redone on every PS re-execution.
+
+ IMPLEMENTATION
+
+ As said above, we can remove inner side of an outer join if it is
+
+ 1. not referred to from any other parts of the query
+ 2. always produces one matching record combination.
+
+ We check #1 by doing a recursive descent down the join->join_list while
+ maintaining a union of used_tables() attribute of all expressions we've seen
+ "elsewhere". When we encounter an outer join, we check if the bitmap of
+ tables on its inner side has intersection with tables that are used
+ elsewhere. No intersection means that inner side of the outer join could
+ potentially be eliminated.
+
+ #2 is checked using a concept of values and modules that indicate
+ dependencies between them.
+
+ We start with
+ of certain values that functional dependencies between
+ them. There are two kinds of values:
+*/
+
+/*
+ A value
+
+ functional dependencies between two kinds of entities:
+*/
+
+class Value_dep;
+ class Field_value;
+ class Table_value;
+
+
+class Module_dep;
+ class Equality_module;
+ class Outer_join_module;
+ class Key_module;
+
+class Table_elimination;
+
+
+/*
+ A value.
*/
class Value_dep : public Sql_alloc
@@ -55,13 +118,9 @@
Value_dep *next;
};
-class Field_value;
-class Table_value;
-class Outer_join_module;
-class Key_module;
/*
- A table field. There is only one such object for any tblX.fieldY
+ A table field value. There is exactly only one such object for any tblX.fieldY
- the field epends on its table and equalities
- expressions that use the field are its dependencies
*/
@@ -87,7 +146,8 @@
/*
- A table.
+ A table value. There is one Table_value object for every table that can
+ potentially be eliminated.
- table depends on any of its unique keys
- has its fields and embedding outer join as dependency.
*/
@@ -221,6 +281,7 @@
MY_BITMAP expr_deps;
};
+
static
bool build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps,
uint *and_level, Item *cond,
@@ -244,6 +305,7 @@
#ifndef DBUG_OFF
static void dbug_print_deps(Table_elimination *te);
#endif
+
/*******************************************************************************************/
/*
@@ -538,8 +600,8 @@
static
bool add_eq_dep(Table_elimination *te, Equality_module **eq_dep,
- uint and_level, Item_func *cond,
- Item *left, Item *right, table_map usable_tables)
+ uint and_level, Item_func *cond, Item *left, Item *right,
+ table_map usable_tables)
{
if ((left->used_tables() & usable_tables) &&
!(right->used_tables() & RAND_TABLE_BIT) &&
@@ -565,7 +627,6 @@
}
}
- /* Store possible eq field */
(*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo;
if (!((*eq_dep)->field= get_field_value(te, field)))
return TRUE;
@@ -651,7 +712,8 @@
table_map deps_map)
{
Outer_join_module *oj_dep;
- oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map));
+ if (!(oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map))))
+ return NULL;
te->n_outer_joins++;
/*
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-08-13 21:12:12 +0000
+++ b/sql/sql_select.cc 2009-08-16 14:35:47 +0000
@@ -8967,20 +8967,6 @@
JOIN *join= last->join;
while (last_emb)
{
- /*
- psergey-elim: (nevermind)
- new_prefix= cur_prefix & ~last;
- if (!(new_prefix & cur_table_map)) // removed last inner table
- {
- join->cur_embedding_map&= ~last_emb->nested_join->nj_map;
- }
- else (current)
- {
- // Won't hurt doing it all the time:
- join->cur_embedding_map |= ...;
- }
- else
- */
if (!(--last_emb->nested_join->counter))
join->cur_embedding_map&= ~last_emb->nested_join->nj_map;
else if (last_emb->nested_join->n_tables-1 ==
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 16 Aug '09
by worklog-noreply@askmonty.org 16 Aug '09
16 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300
+++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300
@@ -13,6 +13,8 @@
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
+See also MySQL BUG#42941.
+
What we could do
----------------
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300
+++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300
@@ -1,5 +1,7 @@
Context
-------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
See also MySQL BUG#42941.
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 16 Aug '09
by worklog-noreply@askmonty.org 16 Aug '09
16 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300
+++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300
@@ -13,6 +13,8 @@
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
+See also MySQL BUG#42941.
+
What we could do
----------------
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300
+++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300
@@ -1,5 +1,7 @@
Context
-------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
See also MySQL BUG#42941.
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 16 Aug '09
by worklog-noreply@askmonty.org 16 Aug '09
16 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: 9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=-
Category updated.
--- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300
+++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-Sprint
-=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=-
Version updated.
--- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300
+++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300
+++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300
@@ -2,163 +2,146 @@
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
-1. Conditions for removal
-1.1 Quick check if there are candidates
-2. Removal operation properties
-3. Removal operation
-4. User interface
-5. Tests and benchmarks
-6. Todo, issues to resolve
-6.1 To resolve
-6.2 Resolved
-7. Additional issues
+1. Elimination criteria
+2. No outside references check
+2.1 Quick check if there are tables with no outside references
+3. One-match check
+3.1 Functional dependency source #1: Potential eq_ref access
+3.2 Functional dependency source #2: col2=func(col1)
+3.3 Functional dependency source #3: One or zero records in the table
+3.4 Functional dependency check implementation
+3.4.1 Equality collection: Option1
+3.4.2 Equality collection: Option2
+3.4.3 Functional dependency propagation - option 1
+3.4.4 Functional dependency propagation - option 2
+4. Removal operation properties
+5. Removal operation
+6. User interface
+6.1 @@optimizer_switch flag
+6.2 EXPLAIN [EXTENDED]
+7. Miscellaneous adjustments
+7.1 Fix used_tables() of aggregate functions
+7.2 Make subquery predicates collect their outer references
+8. Other concerns
+8.1 Relationship with outer->inner joins converter
+8.2 Relationship with prepared statements
+8.3 Relationship with constant table detection
+9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
-1. Conditions for removal
--------------------------
-We can eliminate an inner side of outer join if:
-1. For each record combination of outer tables, it will always produce
- exactly one record.
-2. There are no references to columns of the inner tables anywhere else in
+1. Elimination criteria
+=======================
+We can eliminate inner side of an outer join nest if:
+
+1. There are no references to columns of the inner tables anywhere else in
the query.
+2. For each record combination of outer tables, it will always produce
+ exactly one matching record combination.
+
+Most of effort in this WL entry is checking these two conditions.
-#1 means that every table inside the outer join nest is:
- - is a constant table:
- = because it can be accessed via eq_ref(const) access, or
- = it is a zero-rows or one-row MyISAM-like table [MARK1]
- - has an eq_ref access method candidate.
-
-#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
- GROUP BY and HAVING do not refer to the inner tables of the outer join
- nest.
-
-1.1 Quick check if there are candidates
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before we start to enumerate join nests, here is a quick way to check if
-there *can be* something to be removed:
+2. No outside references check
+==============================
+Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
+outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
+tables of the outer join nest we're trying to remove.
+
+For multi-table UPDATE/DELETE we also must not remove tables that we're
+updating/deleting from or tables that are used in UPDATE's SET clause.
+
+2.1 Quick check if there are tables with no outside references
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start searching for outer join nests that could be eliminated,
+we'll do a quick and cheap check if there possibly could be something that
+could be eliminated:
- if ((tables used in select_list |
+ if (there are outer joins &&
+ (tables used in select_list |
tables used in group/order by UNION |
- tables used in where) != bitmap_of_all_tables)
+ tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
-2. Removal operation properties
--------------------------------
-* There is always one way to remove (no choice to remove either this or that)
-* It is always better to remove as much tables as possible (at least within
- our cost model).
-Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
-3. Removal operation
---------------------
-* Remove the outer join nest's nested join structure (i.e. get the
- outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
- $OJ->embedding->nested_join. Update table_map's of all ancestor nested
- joins). [MARK2]
+3. One-match check
+==================
+We can eliminate inner side of outer join if it will always generate exactly
+one matching record combination.
-* Move the tables and their JOIN_TABs to front like it is done with const
- tables, with exception that if eliminated outer join nest was within
- another outer join nest, that shouldn't prevent us from moving away the
- eliminated tables.
+By definition of OUTER JOIN, a NULL-complemented record combination will be
+generated when the inner side of outer join has not produced any matches.
-* Update join->table_count and all-join-tables bitmap.
+What remains to be checked is that there is no possiblity that inner side of
+the outer join could produce more than one matching record combination.
-* That's it. Nothing else?
+We'll refer to one-match property as "functional dependency":
-4. User interface
------------------
-* We'll add an @@optimizer switch flag for table elimination. Tentative
- name: 'table_elimination'.
- (Note ^^ utility of the above questioned ^, as table elimination can never
- be worse than no elimination. We're leaning towards not adding the flag)
-
-* EXPLAIN will not show the removed tables at all. This will allow to check
- if tables were removed, and also will behave nicely with anchor model and
- VIEWs: stuff that user doesn't care about just won't be there.
+- A outer join nest is functionally dependent [wrt outer tables] if it will
+ produce one matching record combination per each record combination of
+ outer tables
-5. Tests and benchmarks
------------------------
-Create a benchmark in sql-bench which checks if the DBMS has table
-elimination.
-[According to Monty] Run
- - queries that would use elimination
- - queries that are very similar to one above (so that they would have same
- QEP, execution cost, etc) but cannot use table elimination.
-then compare run times and make a conclusion about whether dbms supports table
-elimination.
+- A table is functionally dependent wrt certain set of dependency tables, if
+ record combination of dependency tables uniquely identifies zero or one
+ matching record in the table
-6. Todo, issues to resolve
---------------------------
+- Definitions of functional dependency of keys (=column tuples) and columns are
+ apparent.
-6.1 To resolve
-~~~~~~~~~~~~~~
-- Relationship with prepared statements.
- On one hand, it's natural to desire to make table elimination a
- once-per-statement operation, like outer->inner join conversion. We'll have
- to limit the applicability by removing [MARK1] as that can change during
- lifetime of the statement.
-
- The other option is to do table elimination every time. This will require to
- rework operation [MARK2] to be undoable.
-
- I'm leaning towards doing the former. With anchor modeling, it is unlikely
- that we'll meet outer joins which have N inner tables of which some are 1-row
- MyISAM tables that do not have primary key.
-
-6.2 Resolved
-~~~~~~~~~~~~
-* outer->inner join conversion is not a problem for table elimination.
- We make outer->inner conversions based on predicates in WHERE. If the WHERE
- referred to an inner table (requirement for OJ->IJ conversion) then table
- elimination would not be applicable anyway.
-
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
+Our goal is to prove that the entire join nest is functionally-dependent.
-* Aggregate functions used to report that they depend on all tables, that is,
+Join nest is functionally dependent (on the otside tables) if each of its
+elements (those can be either base tables or join nests) is functionally
+dependent.
- item_agg_func->used_tables() == (1ULL << join->tables) - 1
+Functional dependency is transitive: if table A is f-dependent on the outer
+tables and table B is f.dependent on {A, outer_tables} then B is functionally
+dependent on the outer tables.
+
+Subsequent sections list cases when we can declare a table to be
+functionally-dependent.
+
+3.1 Functional dependency source #1: Potential eq_ref access
+------------------------------------------------------------
+This is the most practically-important case. Taking the example from the HLD
+of this WL entry:
+
+ select
+ A.colA
+ from
+ tableA A
+ left outer join
+ tableB B
+ on
+ B.id = A.id;
- always. Fixed it, now aggregate function reports it depends on
- tables that its arguments depend on. In particular, COUNT(*) reports
- that it depends on no tables (item_count_star->used_tables()==0).
- One consequence of that is that "item->used_tables()==0" is not
- equivalent to "item->const_item()==true" anymore (not sure if it's
- "anymore" or this has been already happening).
-
-* EXPLAIN EXTENDED warning text was generated after the JOIN object has
- been discarded. This didn't allow to use information about join plan
- when printing the warning. Fixed this by keeping the JOIN objects until
- we've printed the warning (have also an intent to remove the const
- tables from the join output).
-
-7. Additional issues
---------------------
-* We remove ON clauses within outer join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
- Yes. Current approach: when removing an outer join nest, walk the ON clause
- and mark subselects as eliminated. Then let EXPLAIN code check if the
- SELECT was eliminated before the printing (EXPLAIN is generated by doing
- a recursive descent, so the check will also cause children of eliminated
- selects not to be printed)
-
-* Table elimination is performed after constant table detection (but before
- the range analysis). Constant tables are technically different from
- eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
- Considering we've already done the join_read_const_table() call, is there any
- real difference between constant table and eliminated one? If there is, should
- we mark const tables also as eliminated?
- from user/EXPLAIN point of view: no. constant table is the one that we read
- one record from. eliminated table is the one that we don't acccess at all.
+and generalizing it: a table TBL is functionally-dependent if the ON
+expression allows to construct a potential eq_ref access to table TBL that
+uses only outer or functionally-dependent tables.
+
+In other words: table TBL will have one match if the ON expression can be
+converted into this form
+
+ TBL.unique_key=func(one_match_tables) AND .. remainder ...
+
+(with appropriate extension for multi-part keys), where
+
+ one_match_tables= {
+ tables that are not on the inner side of the outer join in question, and
+ functionally dependent tables
+ }
+
+Note that this will cover constant tables, except those that are constant because
+they have 0/1 record or are partitioned and have no used partitions.
+
+
+3.2 Functional dependency source #2: col2=func(col1)
+----------------------------------------------------
+This comes from the second example in the HLS:
-* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
@@ -169,32 +152,331 @@
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
- This is because condition "B.fromDate= func(tableB)" cannot be used.
- Reason#1: update_ref_and_keys() does not consider such conditions to
- be of any use (and indeed they are not usable for ref access)
- so they are not put into KEYUSE array.
- Reason#2: even if they were put there, we would need to be able to tell
- between predicates like
- B.fromDate= func(B.id) // guarantees only one matching row as
- // B.id is already bound by B.id=A.id
- // hence B.fromDate becomes bound too.
- and
- "B.fromDate= func(B.*)" // Can potentially have many matching
- // records.
- We need to
- - Have update_ref_and_keys() create KEYUSE elements for such equalities
- - Have eliminate_tables() and friends make a more accurate check.
- The right check is to check whether all parts of a unique key are bound.
- If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
- keypartY to be bound.
- The difficulty here is that correlated subquery predicate cannot tell what
- columns it depends on (it only remembers tables).
- Traversing the predicate is expensive and complicated.
- We're leaning towards making each subquery predicate have a List<Item> with
- items that
- - are in the current select
- - and it depends on.
- This list will be useful in certain other subquery optimizations as well,
- it is cheap to collect it in fix_fields() phase, so it will be collected
- for every subquery predicate.
+Here it is apparent that tableB can be eliminated. It is not possible to
+construct eq_ref access to tableB, though, because for the second part of the
+primary key (fromDate column) we only got a condition in this form:
+
+ B.fromDate= func(tableB)
+
+(we write "func(tableB)" because ref optimizer can only determine which tables
+the right part of the equality depends on).
+
+In general case, equality like this doesn't guarantee functional dependency.
+For example, if func() == { return fromDate;}, i.e the ON expression is
+
+ ... ON B.id = A.id and B.fromDate = B.fromDate
+
+then that would allow table B to have multiple matches per record of table A.
+
+In order to be able to distinguish between these two cases, we'll need to go
+down to column level:
+
+- A table is functionally dependent if it has a unique key that's functionally
+ dependent
+
+- A unique key is functionally dependent when all of its columns are
+ functionally dependent
+
+- A table column is functionally dependent if the ON clause allows to extract
+ an AND-part in this form:
+
+ tbl.column = f(functionally-dependent columns or columns of outer tables)
+
+3.3 Functional dependency source #3: One or zero records in the table
+---------------------------------------------------------------------
+A table with one or zero records cannot generate more than one matching
+record. This source is of lesser importance as one/zero-record tables are only
+MyISAM tables.
+
+3.4 Functional dependency check implementation
+----------------------------------------------
+As shown above, we need something similar to KEYUSE structures, but not
+exactly that (we need things that current ref optimizer considers unusable and
+don't need things that it considers usable).
+
+3.4.1 Equality collection: Option1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We could
+- extend KEYUSE structures to store all kinds of equalities we need
+- change update_ref_and_keys() and co. to collect equalities both for ref
+ access and for table elimination
+ = [possibly] Improve [eq_]ref access to be able to use equalities in
+ form keypart2=func(keypart1)
+- process the KEYUSE array both by table elimination and by ref access
+ optimizer.
+
++ This requires less effort.
+- Code will have to be changed all over sql_select.cc
+- update_ref_and_keys() and co. already do several unrelated things. Hooking
+ up table elimination will make it even worse.
+
+3.4.2 Equality collection: Option2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Alternatively, we could process the WHERE clause totally on our own.
++ Table elimination is standalone and easy to detach module.
+- Some code duplication with update_ref_and_keys() and co.
+
+Having got the equalities, we'll to propagate functional dependency property
+to unique keys, tables and, ultimately, join nests.
+
+3.4.3 Functional dependency propagation - option 1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Borrow the approach used in constant table detection code:
+
+ do
+ {
+ converted= FALSE;
+ for each table T in join nest
+ {
+ if (check_if_functionally_dependent(T))
+ converted= TRUE;
+ }
+ } while (converted == TRUE);
+
+ check_if_functionally_dependent(T)
+ {
+ if (T has eq_ref access based on func_dep_tables)
+ return TRUE;
+
+ Apply the same do-while loop-based approach to available equalities
+ T.column1=func(other columns)
+ to spread the set of functionally-dependent columns. The goal is to get
+ all columns of a certain unique key to be bound.
+ }
+
+
+3.4.4 Functional dependency propagation - option 2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Analyze the ON expression(s) and build a list of
+
+ tbl.field = expr(...)
+
+equalities. tbl here is a table that belongs to a join nest that could
+potentially be eliminated.
+
+besides those, add to the list
+ - An element for each unique key in the table that needs to be eliminated
+ - An element for each table that needs to be eliminated
+ - An element for each join nest that can be eliminated (i.e. has no
+ references from outside).
+
+Then, setup "reverse dependencies": each element should have pointers to
+elements that are functionally dependent on it:
+
+- "tbl.field=expr(...)" equality is functionally dependent on all fields that
+ are used in "expr(...)" (here we take into account only fields that belong
+ to tables that can potentially be eliminated).
+- a unique key is dependent on all of its components
+- a table is dependent on all of its unique keys
+- a join nest is dependent on all tables that it contains
+
+These pointers are stored in form of one bitmap, such that:
+
+ "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
+
+Each object also stores a number of dependencies it needs to be satisfied
+before it itself is satisfied:
+
+- "tbl.field=expr(...)" needs all its underlying fields (if a field is
+ referenced many times it is counted only once)
+
+- a unique key needs all of its key parts
+
+- a table needs only one of its unique keys
+
+- a join nest needs all of its tables
+
+(TODO: so what do we do when we've marked a table as constant? We'll need to
+update the "field=expr(....)" elements that use fields of that table. And the
+problem is that we won't know how much to decrement from the counters of those
+elements.
+
+Solution#1: switch to table_map() based approach.
+Solution#2: introduce separate elements for each involved field.
+ field will depend on its table,
+ "field=expr" will depend on fields.
+)
+
+Besides the above, let each element have a pointer to another element, so that
+we can have a linked list of elements.
+
+After the above structures have been created, we start the main algorithm.
+
+The first step is to create a list of functionally-dependent elements. We walk
+across array of dependencies and mark those elements that are already bound
+(i.e. their dependencies are satisfied). At the moment those immediately-bound
+are only "field=expr" dependencies that don't refer to any columns that are
+not bound.
+
+The second step is the loop
+
+ while (bound_list is not empty)
+ {
+ Take the first bound element F off the list.
+ Use the bitmap to find out what other elements depended on it
+ for each such element E
+ {
+ if (E becomes bound after F is bound)
+ add E to the list;
+ }
+ }
+
+The last step is to walk through elements that represent the join nests. Those
+that are bound can be eliminated.
+
+4. Removal operation properties
+===============================
+* There is always one way to remove (no choice to remove either this or that)
+* It is always better to remove as much tables as possible (at least within
+ our cost model).
+Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
+
+
+5. Removal operation
+====================
+(This depends a lot on whether we make table elimination a one-off rewrite or
+conditional)
+
+At the moment table elimination is re-done for each join re-execution, hence
+the removal operation is designed not to modify any statement's permanent
+members.
+
+* Remove the outer join nest's nested join structure (i.e. get the
+ outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
+ $OJ->embedding->nested_join. Update table_map's of all ancestor nested
+ joins). [MARK2]
+
+* Move the tables and their JOIN_TABs to the front of join order, like it is
+ done with const tables, with exception that if eliminated outer join nest
+ was within another outer join nest, that shouldn't prevent us from moving
+ away the eliminated tables.
+
+* Update join->table_count and all-join-tables bitmap.
+ ^ TODO: not true anymore ^
+
+* That's it. Nothing else?
+
+6. User interface
+=================
+
+6.1 @@optimizer_switch flag
+---------------------------
+Argument againist adding the flag:
+* It is always better to perform table elimination than not to do it.
+
+Arguments for the flag:
+* It is always theoretically possible that the new code will cause unintended
+ slowdowns.
+* Having the flag is useful for QA and comparative benchmarking.
+
+Decision so far: add the flag under #ifdef. Make the flag be present in debug
+builds.
+
+6.2 EXPLAIN [EXTENDED]
+----------------------
+There are two possible options:
+1. Show eliminated tables, like we do with const tables.
+2. Do not show eliminated tables.
+
+We chose option 2, because:
+- the table is not accessed at all (besides locking it)
+- it is more natural for anchor model user - when he's querying an anchor-
+ and attributes view, he doesn't care about the unused attributes.
+
+EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
+
+NOTE: Before this WL, the warning text was generated after all JOIN objects
+have been destroyed. This didn't allow to use information about join plan
+when printing the warning. We've fixed this by keeping the JOIN objects until
+the warning text has been generated.
+
+Table elimination removes inner sides of outer join, and logically the ON
+clause is also removed. If this clause has any subqueries, they will be
+also removed from EXPLAIN output.
+
+An exception to the above is that if we eliminate a derived table, it will
+still be shown in EXPLAIN output. This comes from the fact that the FROM
+subqueries are evaluated before table elimination is invoked.
+TODO: Is the above ok or still remove parts of FROM subqueries?
+
+7. Miscellaneous adjustments
+============================
+
+7.1 Fix used_tables() of aggregate functions
+--------------------------------------------
+Aggregate functions used to report that they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+always. Fixed it, now aggregate function reports that it depends on the
+tables that its arguments depend on. In particular, COUNT(*) reports that it
+depends on no tables (item_count_star->used_tables()==0). One consequence of
+that is that "item->used_tables()==0" is not equivalent to
+"item->const_item()==true" anymore (not sure if it's "anymore" or this has
+been already so for some items).
+
+7.2 Make subquery predicates collect their outer references
+-----------------------------------------------------------
+Per-column functional dependency analysis requires us to take a
+
+ tbl.field = func(...)
+
+equality and tell which columns of which tables are referred from func(...)
+expression. For scalar expressions, this is accomplished by Item::walk()-based
+traversal. It should be reasonably cheap (the only practical Item that can be
+expensive to traverse seems to be a special case of "col IN (const1,const2,
+...)". check if we traverse the long list for such items).
+
+For correlated subqueries, traversal can be expensive, it is cheaper to make
+each subquery item have a list of its outer references. The list can be
+collected at fix_fields() stage with very little extra cost, and then it could
+be used for other optimizations.
+
+
+8. Other concerns
+=================
+
+8.1 Relationship with outer->inner joins converter
+--------------------------------------------------
+One could suspect that outer->inner join conversion could get in the way
+of table elimination by changing outer joins (which could be eliminated)
+to inner (which we will not try to eliminate).
+This concern is not valid: we make outer->inner conversions based on
+predicates in WHERE. If the WHERE referred to an inner table (this is a
+requirement for the conversion) then table elimination would not be
+applicable anyway.
+
+8.2 Relationship with prepared statements
+-----------------------------------------
+On one hand, it's natural to desire to make table elimination a
+once-per-statement operation, like outer->inner join conversion. We'll have
+to limit the applicability by removing [MARK1] as that can change during
+lifetime of the statement.
+
+The other option is to do table elimination every time. This will require to
+rework operation [MARK2] to be undoable.
+
+
+8.3 Relationship with constant table detection
+----------------------------------------------
+Table elimination is performed after constant table detection (but before
+the range analysis). Constant tables are technically different from
+eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+Considering we've already done the join_read_const_table() call, is there any
+real difference between constant table and eliminated one? If there is, should
+we mark const tables also as eliminated?
+from user/EXPLAIN point of view: no. constant table is the one that we read
+one record from. eliminated table is the one that we don't acccess at all.
+TODO
+
+9. Tests and benchmarks
+=======================
+Create a benchmark in sql-bench which checks if the DBMS has table
+elimination.
+[According to Monty] Run
+ - query Q1 that would use elimination
+ - query Q2 that is very similar to Q1 (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether the used dbms
+supports table elimination.
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
------------------------------------------------------------
-=-=(View All Progress Notes, 28 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Elimination criteria
2. No outside references check
2.1 Quick check if there are tables with no outside references
3. One-match check
3.1 Functional dependency source #1: Potential eq_ref access
3.2 Functional dependency source #2: col2=func(col1)
3.3 Functional dependency source #3: One or zero records in the table
3.4 Functional dependency check implementation
3.4.1 Equality collection: Option1
3.4.2 Equality collection: Option2
3.4.3 Functional dependency propagation - option 1
3.4.4 Functional dependency propagation - option 2
4. Removal operation properties
5. Removal operation
6. User interface
6.1 @@optimizer_switch flag
6.2 EXPLAIN [EXTENDED]
7. Miscellaneous adjustments
7.1 Fix used_tables() of aggregate functions
7.2 Make subquery predicates collect their outer references
8. Other concerns
8.1 Relationship with outer->inner joins converter
8.2 Relationship with prepared statements
8.3 Relationship with constant table detection
9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Elimination criteria
=======================
We can eliminate inner side of an outer join nest if:
1. There are no references to columns of the inner tables anywhere else in
the query.
2. For each record combination of outer tables, it will always produce
exactly one matching record combination.
Most of effort in this WL entry is checking these two conditions.
2. No outside references check
==============================
Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
tables of the outer join nest we're trying to remove.
For multi-table UPDATE/DELETE we also must not remove tables that we're
updating/deleting from or tables that are used in UPDATE's SET clause.
2.1 Quick check if there are tables with no outside references
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start searching for outer join nests that could be eliminated,
we'll do a quick and cheap check if there possibly could be something that
could be eliminated:
if (there are outer joins &&
(tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
3. One-match check
==================
We can eliminate inner side of outer join if it will always generate exactly
one matching record combination.
By definition of OUTER JOIN, a NULL-complemented record combination will be
generated when the inner side of outer join has not produced any matches.
What remains to be checked is that there is no possiblity that inner side of
the outer join could produce more than one matching record combination.
We'll refer to one-match property as "functional dependency":
- A outer join nest is functionally dependent [wrt outer tables] if it will
produce one matching record combination per each record combination of
outer tables
- A table is functionally dependent wrt certain set of dependency tables, if
record combination of dependency tables uniquely identifies zero or one
matching record in the table
- Definitions of functional dependency of keys (=column tuples) and columns are
apparent.
Our goal is to prove that the entire join nest is functionally-dependent.
Join nest is functionally dependent (on the otside tables) if each of its
elements (those can be either base tables or join nests) is functionally
dependent.
Functional dependency is transitive: if table A is f-dependent on the outer
tables and table B is f.dependent on {A, outer_tables} then B is functionally
dependent on the outer tables.
Subsequent sections list cases when we can declare a table to be
functionally-dependent.
3.1 Functional dependency source #1: Potential eq_ref access
------------------------------------------------------------
This is the most practically-important case. Taking the example from the HLD
of this WL entry:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
and generalizing it: a table TBL is functionally-dependent if the ON
expression allows to construct a potential eq_ref access to table TBL that
uses only outer or functionally-dependent tables.
In other words: table TBL will have one match if the ON expression can be
converted into this form
TBL.unique_key=func(one_match_tables) AND .. remainder ...
(with appropriate extension for multi-part keys), where
one_match_tables= {
tables that are not on the inner side of the outer join in question, and
functionally dependent tables
}
Note that this will cover constant tables, except those that are constant because
they have 0/1 record or are partitioned and have no used partitions.
3.2 Functional dependency source #2: col2=func(col1)
----------------------------------------------------
This comes from the second example in the HLS:
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
Here it is apparent that tableB can be eliminated. It is not possible to
construct eq_ref access to tableB, though, because for the second part of the
primary key (fromDate column) we only got a condition in this form:
B.fromDate= func(tableB)
(we write "func(tableB)" because ref optimizer can only determine which tables
the right part of the equality depends on).
In general case, equality like this doesn't guarantee functional dependency.
For example, if func() == { return fromDate;}, i.e the ON expression is
... ON B.id = A.id and B.fromDate = B.fromDate
then that would allow table B to have multiple matches per record of table A.
In order to be able to distinguish between these two cases, we'll need to go
down to column level:
- A table is functionally dependent if it has a unique key that's functionally
dependent
- A unique key is functionally dependent when all of its columns are
functionally dependent
- A table column is functionally dependent if the ON clause allows to extract
an AND-part in this form:
tbl.column = f(functionally-dependent columns or columns of outer tables)
3.3 Functional dependency source #3: One or zero records in the table
---------------------------------------------------------------------
A table with one or zero records cannot generate more than one matching
record. This source is of lesser importance as one/zero-record tables are only
MyISAM tables.
3.4 Functional dependency check implementation
----------------------------------------------
As shown above, we need something similar to KEYUSE structures, but not
exactly that (we need things that current ref optimizer considers unusable and
don't need things that it considers usable).
3.4.1 Equality collection: Option1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could
- extend KEYUSE structures to store all kinds of equalities we need
- change update_ref_and_keys() and co. to collect equalities both for ref
access and for table elimination
= [possibly] Improve [eq_]ref access to be able to use equalities in
form keypart2=func(keypart1)
- process the KEYUSE array both by table elimination and by ref access
optimizer.
+ This requires less effort.
- Code will have to be changed all over sql_select.cc
- update_ref_and_keys() and co. already do several unrelated things. Hooking
up table elimination will make it even worse.
3.4.2 Equality collection: Option2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alternatively, we could process the WHERE clause totally on our own.
+ Table elimination is standalone and easy to detach module.
- Some code duplication with update_ref_and_keys() and co.
Having got the equalities, we'll to propagate functional dependency property
to unique keys, tables and, ultimately, join nests.
3.4.3 Functional dependency propagation - option 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Borrow the approach used in constant table detection code:
do
{
converted= FALSE;
for each table T in join nest
{
if (check_if_functionally_dependent(T))
converted= TRUE;
}
} while (converted == TRUE);
check_if_functionally_dependent(T)
{
if (T has eq_ref access based on func_dep_tables)
return TRUE;
Apply the same do-while loop-based approach to available equalities
T.column1=func(other columns)
to spread the set of functionally-dependent columns. The goal is to get
all columns of a certain unique key to be bound.
}
3.4.4 Functional dependency propagation - option 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Analyze the ON expression(s) and build a list of
tbl.field = expr(...)
equalities. tbl here is a table that belongs to a join nest that could
potentially be eliminated.
besides those, add to the list
- An element for each unique key in the table that needs to be eliminated
- An element for each table that needs to be eliminated
- An element for each join nest that can be eliminated (i.e. has no
references from outside).
Then, setup "reverse dependencies": each element should have pointers to
elements that are functionally dependent on it:
- "tbl.field=expr(...)" equality is functionally dependent on all fields that
are used in "expr(...)" (here we take into account only fields that belong
to tables that can potentially be eliminated).
- a unique key is dependent on all of its components
- a table is dependent on all of its unique keys
- a join nest is dependent on all tables that it contains
These pointers are stored in form of one bitmap, such that:
"X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
Each object also stores a number of dependencies it needs to be satisfied
before it itself is satisfied:
- "tbl.field=expr(...)" needs all its underlying fields (if a field is
referenced many times it is counted only once)
- a unique key needs all of its key parts
- a table needs only one of its unique keys
- a join nest needs all of its tables
(TODO: so what do we do when we've marked a table as constant? We'll need to
update the "field=expr(....)" elements that use fields of that table. And the
problem is that we won't know how much to decrement from the counters of those
elements.
Solution#1: switch to table_map() based approach.
Solution#2: introduce separate elements for each involved field.
field will depend on its table,
"field=expr" will depend on fields.
)
Besides the above, let each element have a pointer to another element, so that
we can have a linked list of elements.
After the above structures have been created, we start the main algorithm.
The first step is to create a list of functionally-dependent elements. We walk
across array of dependencies and mark those elements that are already bound
(i.e. their dependencies are satisfied). At the moment those immediately-bound
are only "field=expr" dependencies that don't refer to any columns that are
not bound.
The second step is the loop
while (bound_list is not empty)
{
Take the first bound element F off the list.
Use the bitmap to find out what other elements depended on it
for each such element E
{
if (E becomes bound after F is bound)
add E to the list;
}
}
The last step is to walk through elements that represent the join nests. Those
that are bound can be eliminated.
4. Removal operation properties
===============================
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
5. Removal operation
====================
(This depends a lot on whether we make table elimination a one-off rewrite or
conditional)
At the moment table elimination is re-done for each join re-execution, hence
the removal operation is designed not to modify any statement's permanent
members.
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to the front of join order, like it is
done with const tables, with exception that if eliminated outer join nest
was within another outer join nest, that shouldn't prevent us from moving
away the eliminated tables.
* Update join->table_count and all-join-tables bitmap.
^ TODO: not true anymore ^
* That's it. Nothing else?
6. User interface
=================
6.1 @@optimizer_switch flag
---------------------------
Argument againist adding the flag:
* It is always better to perform table elimination than not to do it.
Arguments for the flag:
* It is always theoretically possible that the new code will cause unintended
slowdowns.
* Having the flag is useful for QA and comparative benchmarking.
Decision so far: add the flag under #ifdef. Make the flag be present in debug
builds.
6.2 EXPLAIN [EXTENDED]
----------------------
There are two possible options:
1. Show eliminated tables, like we do with const tables.
2. Do not show eliminated tables.
We chose option 2, because:
- the table is not accessed at all (besides locking it)
- it is more natural for anchor model user - when he's querying an anchor-
and attributes view, he doesn't care about the unused attributes.
EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
NOTE: Before this WL, the warning text was generated after all JOIN objects
have been destroyed. This didn't allow to use information about join plan
when printing the warning. We've fixed this by keeping the JOIN objects until
the warning text has been generated.
Table elimination removes inner sides of outer join, and logically the ON
clause is also removed. If this clause has any subqueries, they will be
also removed from EXPLAIN output.
An exception to the above is that if we eliminate a derived table, it will
still be shown in EXPLAIN output. This comes from the fact that the FROM
subqueries are evaluated before table elimination is invoked.
TODO: Is the above ok or still remove parts of FROM subqueries?
7. Miscellaneous adjustments
============================
7.1 Fix used_tables() of aggregate functions
--------------------------------------------
Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports that it depends on the
tables that its arguments depend on. In particular, COUNT(*) reports that it
depends on no tables (item_count_star->used_tables()==0). One consequence of
that is that "item->used_tables()==0" is not equivalent to
"item->const_item()==true" anymore (not sure if it's "anymore" or this has
been already so for some items).
7.2 Make subquery predicates collect their outer references
-----------------------------------------------------------
Per-column functional dependency analysis requires us to take a
tbl.field = func(...)
equality and tell which columns of which tables are referred from func(...)
expression. For scalar expressions, this is accomplished by Item::walk()-based
traversal. It should be reasonably cheap (the only practical Item that can be
expensive to traverse seems to be a special case of "col IN (const1,const2,
...)". check if we traverse the long list for such items).
For correlated subqueries, traversal can be expensive, it is cheaper to make
each subquery item have a list of its outer references. The list can be
collected at fix_fields() stage with very little extra cost, and then it could
be used for other optimizations.
8. Other concerns
=================
8.1 Relationship with outer->inner joins converter
--------------------------------------------------
One could suspect that outer->inner join conversion could get in the way
of table elimination by changing outer joins (which could be eliminated)
to inner (which we will not try to eliminate).
This concern is not valid: we make outer->inner conversions based on
predicates in WHERE. If the WHERE referred to an inner table (this is a
requirement for the conversion) then table elimination would not be
applicable anyway.
8.2 Relationship with prepared statements
-----------------------------------------
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
8.3 Relationship with constant table detection
----------------------------------------------
Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
TODO
9. Tests and benchmarks
=======================
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- query Q1 that would use elimination
- query Q2 that is very similar to Q1 (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether the used dbms
supports table elimination.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 16 Aug '09
by worklog-noreply@askmonty.org 16 Aug '09
16 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: 9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=-
Category updated.
--- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300
+++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-Sprint
-=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=-
Version updated.
--- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300
+++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300
+++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300
@@ -2,163 +2,146 @@
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
-1. Conditions for removal
-1.1 Quick check if there are candidates
-2. Removal operation properties
-3. Removal operation
-4. User interface
-5. Tests and benchmarks
-6. Todo, issues to resolve
-6.1 To resolve
-6.2 Resolved
-7. Additional issues
+1. Elimination criteria
+2. No outside references check
+2.1 Quick check if there are tables with no outside references
+3. One-match check
+3.1 Functional dependency source #1: Potential eq_ref access
+3.2 Functional dependency source #2: col2=func(col1)
+3.3 Functional dependency source #3: One or zero records in the table
+3.4 Functional dependency check implementation
+3.4.1 Equality collection: Option1
+3.4.2 Equality collection: Option2
+3.4.3 Functional dependency propagation - option 1
+3.4.4 Functional dependency propagation - option 2
+4. Removal operation properties
+5. Removal operation
+6. User interface
+6.1 @@optimizer_switch flag
+6.2 EXPLAIN [EXTENDED]
+7. Miscellaneous adjustments
+7.1 Fix used_tables() of aggregate functions
+7.2 Make subquery predicates collect their outer references
+8. Other concerns
+8.1 Relationship with outer->inner joins converter
+8.2 Relationship with prepared statements
+8.3 Relationship with constant table detection
+9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
-1. Conditions for removal
--------------------------
-We can eliminate an inner side of outer join if:
-1. For each record combination of outer tables, it will always produce
- exactly one record.
-2. There are no references to columns of the inner tables anywhere else in
+1. Elimination criteria
+=======================
+We can eliminate inner side of an outer join nest if:
+
+1. There are no references to columns of the inner tables anywhere else in
the query.
+2. For each record combination of outer tables, it will always produce
+ exactly one matching record combination.
+
+Most of effort in this WL entry is checking these two conditions.
-#1 means that every table inside the outer join nest is:
- - is a constant table:
- = because it can be accessed via eq_ref(const) access, or
- = it is a zero-rows or one-row MyISAM-like table [MARK1]
- - has an eq_ref access method candidate.
-
-#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
- GROUP BY and HAVING do not refer to the inner tables of the outer join
- nest.
-
-1.1 Quick check if there are candidates
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before we start to enumerate join nests, here is a quick way to check if
-there *can be* something to be removed:
+2. No outside references check
+==============================
+Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
+outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
+tables of the outer join nest we're trying to remove.
+
+For multi-table UPDATE/DELETE we also must not remove tables that we're
+updating/deleting from or tables that are used in UPDATE's SET clause.
+
+2.1 Quick check if there are tables with no outside references
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start searching for outer join nests that could be eliminated,
+we'll do a quick and cheap check if there possibly could be something that
+could be eliminated:
- if ((tables used in select_list |
+ if (there are outer joins &&
+ (tables used in select_list |
tables used in group/order by UNION |
- tables used in where) != bitmap_of_all_tables)
+ tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
-2. Removal operation properties
--------------------------------
-* There is always one way to remove (no choice to remove either this or that)
-* It is always better to remove as much tables as possible (at least within
- our cost model).
-Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
-3. Removal operation
---------------------
-* Remove the outer join nest's nested join structure (i.e. get the
- outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
- $OJ->embedding->nested_join. Update table_map's of all ancestor nested
- joins). [MARK2]
+3. One-match check
+==================
+We can eliminate inner side of outer join if it will always generate exactly
+one matching record combination.
-* Move the tables and their JOIN_TABs to front like it is done with const
- tables, with exception that if eliminated outer join nest was within
- another outer join nest, that shouldn't prevent us from moving away the
- eliminated tables.
+By definition of OUTER JOIN, a NULL-complemented record combination will be
+generated when the inner side of outer join has not produced any matches.
-* Update join->table_count and all-join-tables bitmap.
+What remains to be checked is that there is no possiblity that inner side of
+the outer join could produce more than one matching record combination.
-* That's it. Nothing else?
+We'll refer to one-match property as "functional dependency":
-4. User interface
------------------
-* We'll add an @@optimizer switch flag for table elimination. Tentative
- name: 'table_elimination'.
- (Note ^^ utility of the above questioned ^, as table elimination can never
- be worse than no elimination. We're leaning towards not adding the flag)
-
-* EXPLAIN will not show the removed tables at all. This will allow to check
- if tables were removed, and also will behave nicely with anchor model and
- VIEWs: stuff that user doesn't care about just won't be there.
+- A outer join nest is functionally dependent [wrt outer tables] if it will
+ produce one matching record combination per each record combination of
+ outer tables
-5. Tests and benchmarks
------------------------
-Create a benchmark in sql-bench which checks if the DBMS has table
-elimination.
-[According to Monty] Run
- - queries that would use elimination
- - queries that are very similar to one above (so that they would have same
- QEP, execution cost, etc) but cannot use table elimination.
-then compare run times and make a conclusion about whether dbms supports table
-elimination.
+- A table is functionally dependent wrt certain set of dependency tables, if
+ record combination of dependency tables uniquely identifies zero or one
+ matching record in the table
-6. Todo, issues to resolve
---------------------------
+- Definitions of functional dependency of keys (=column tuples) and columns are
+ apparent.
-6.1 To resolve
-~~~~~~~~~~~~~~
-- Relationship with prepared statements.
- On one hand, it's natural to desire to make table elimination a
- once-per-statement operation, like outer->inner join conversion. We'll have
- to limit the applicability by removing [MARK1] as that can change during
- lifetime of the statement.
-
- The other option is to do table elimination every time. This will require to
- rework operation [MARK2] to be undoable.
-
- I'm leaning towards doing the former. With anchor modeling, it is unlikely
- that we'll meet outer joins which have N inner tables of which some are 1-row
- MyISAM tables that do not have primary key.
-
-6.2 Resolved
-~~~~~~~~~~~~
-* outer->inner join conversion is not a problem for table elimination.
- We make outer->inner conversions based on predicates in WHERE. If the WHERE
- referred to an inner table (requirement for OJ->IJ conversion) then table
- elimination would not be applicable anyway.
-
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
+Our goal is to prove that the entire join nest is functionally-dependent.
-* Aggregate functions used to report that they depend on all tables, that is,
+Join nest is functionally dependent (on the otside tables) if each of its
+elements (those can be either base tables or join nests) is functionally
+dependent.
- item_agg_func->used_tables() == (1ULL << join->tables) - 1
+Functional dependency is transitive: if table A is f-dependent on the outer
+tables and table B is f.dependent on {A, outer_tables} then B is functionally
+dependent on the outer tables.
+
+Subsequent sections list cases when we can declare a table to be
+functionally-dependent.
+
+3.1 Functional dependency source #1: Potential eq_ref access
+------------------------------------------------------------
+This is the most practically-important case. Taking the example from the HLD
+of this WL entry:
+
+ select
+ A.colA
+ from
+ tableA A
+ left outer join
+ tableB B
+ on
+ B.id = A.id;
- always. Fixed it, now aggregate function reports it depends on
- tables that its arguments depend on. In particular, COUNT(*) reports
- that it depends on no tables (item_count_star->used_tables()==0).
- One consequence of that is that "item->used_tables()==0" is not
- equivalent to "item->const_item()==true" anymore (not sure if it's
- "anymore" or this has been already happening).
-
-* EXPLAIN EXTENDED warning text was generated after the JOIN object has
- been discarded. This didn't allow to use information about join plan
- when printing the warning. Fixed this by keeping the JOIN objects until
- we've printed the warning (have also an intent to remove the const
- tables from the join output).
-
-7. Additional issues
---------------------
-* We remove ON clauses within outer join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
- Yes. Current approach: when removing an outer join nest, walk the ON clause
- and mark subselects as eliminated. Then let EXPLAIN code check if the
- SELECT was eliminated before the printing (EXPLAIN is generated by doing
- a recursive descent, so the check will also cause children of eliminated
- selects not to be printed)
-
-* Table elimination is performed after constant table detection (but before
- the range analysis). Constant tables are technically different from
- eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
- Considering we've already done the join_read_const_table() call, is there any
- real difference between constant table and eliminated one? If there is, should
- we mark const tables also as eliminated?
- from user/EXPLAIN point of view: no. constant table is the one that we read
- one record from. eliminated table is the one that we don't acccess at all.
+and generalizing it: a table TBL is functionally-dependent if the ON
+expression allows to construct a potential eq_ref access to table TBL that
+uses only outer or functionally-dependent tables.
+
+In other words: table TBL will have one match if the ON expression can be
+converted into this form
+
+ TBL.unique_key=func(one_match_tables) AND .. remainder ...
+
+(with appropriate extension for multi-part keys), where
+
+ one_match_tables= {
+ tables that are not on the inner side of the outer join in question, and
+ functionally dependent tables
+ }
+
+Note that this will cover constant tables, except those that are constant because
+they have 0/1 record or are partitioned and have no used partitions.
+
+
+3.2 Functional dependency source #2: col2=func(col1)
+----------------------------------------------------
+This comes from the second example in the HLS:
-* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
@@ -169,32 +152,331 @@
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
- This is because condition "B.fromDate= func(tableB)" cannot be used.
- Reason#1: update_ref_and_keys() does not consider such conditions to
- be of any use (and indeed they are not usable for ref access)
- so they are not put into KEYUSE array.
- Reason#2: even if they were put there, we would need to be able to tell
- between predicates like
- B.fromDate= func(B.id) // guarantees only one matching row as
- // B.id is already bound by B.id=A.id
- // hence B.fromDate becomes bound too.
- and
- "B.fromDate= func(B.*)" // Can potentially have many matching
- // records.
- We need to
- - Have update_ref_and_keys() create KEYUSE elements for such equalities
- - Have eliminate_tables() and friends make a more accurate check.
- The right check is to check whether all parts of a unique key are bound.
- If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
- keypartY to be bound.
- The difficulty here is that correlated subquery predicate cannot tell what
- columns it depends on (it only remembers tables).
- Traversing the predicate is expensive and complicated.
- We're leaning towards making each subquery predicate have a List<Item> with
- items that
- - are in the current select
- - and it depends on.
- This list will be useful in certain other subquery optimizations as well,
- it is cheap to collect it in fix_fields() phase, so it will be collected
- for every subquery predicate.
+Here it is apparent that tableB can be eliminated. It is not possible to
+construct eq_ref access to tableB, though, because for the second part of the
+primary key (fromDate column) we only got a condition in this form:
+
+ B.fromDate= func(tableB)
+
+(we write "func(tableB)" because ref optimizer can only determine which tables
+the right part of the equality depends on).
+
+In general case, equality like this doesn't guarantee functional dependency.
+For example, if func() == { return fromDate;}, i.e the ON expression is
+
+ ... ON B.id = A.id and B.fromDate = B.fromDate
+
+then that would allow table B to have multiple matches per record of table A.
+
+In order to be able to distinguish between these two cases, we'll need to go
+down to column level:
+
+- A table is functionally dependent if it has a unique key that's functionally
+ dependent
+
+- A unique key is functionally dependent when all of its columns are
+ functionally dependent
+
+- A table column is functionally dependent if the ON clause allows to extract
+ an AND-part in this form:
+
+ tbl.column = f(functionally-dependent columns or columns of outer tables)
+
+3.3 Functional dependency source #3: One or zero records in the table
+---------------------------------------------------------------------
+A table with one or zero records cannot generate more than one matching
+record. This source is of lesser importance as one/zero-record tables are only
+MyISAM tables.
+
+3.4 Functional dependency check implementation
+----------------------------------------------
+As shown above, we need something similar to KEYUSE structures, but not
+exactly that (we need things that current ref optimizer considers unusable and
+don't need things that it considers usable).
+
+3.4.1 Equality collection: Option1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We could
+- extend KEYUSE structures to store all kinds of equalities we need
+- change update_ref_and_keys() and co. to collect equalities both for ref
+ access and for table elimination
+ = [possibly] Improve [eq_]ref access to be able to use equalities in
+ form keypart2=func(keypart1)
+- process the KEYUSE array both by table elimination and by ref access
+ optimizer.
+
++ This requires less effort.
+- Code will have to be changed all over sql_select.cc
+- update_ref_and_keys() and co. already do several unrelated things. Hooking
+ up table elimination will make it even worse.
+
+3.4.2 Equality collection: Option2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Alternatively, we could process the WHERE clause totally on our own.
++ Table elimination is standalone and easy to detach module.
+- Some code duplication with update_ref_and_keys() and co.
+
+Having got the equalities, we'll to propagate functional dependency property
+to unique keys, tables and, ultimately, join nests.
+
+3.4.3 Functional dependency propagation - option 1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Borrow the approach used in constant table detection code:
+
+ do
+ {
+ converted= FALSE;
+ for each table T in join nest
+ {
+ if (check_if_functionally_dependent(T))
+ converted= TRUE;
+ }
+ } while (converted == TRUE);
+
+ check_if_functionally_dependent(T)
+ {
+ if (T has eq_ref access based on func_dep_tables)
+ return TRUE;
+
+ Apply the same do-while loop-based approach to available equalities
+ T.column1=func(other columns)
+ to spread the set of functionally-dependent columns. The goal is to get
+ all columns of a certain unique key to be bound.
+ }
+
+
+3.4.4 Functional dependency propagation - option 2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Analyze the ON expression(s) and build a list of
+
+ tbl.field = expr(...)
+
+equalities. tbl here is a table that belongs to a join nest that could
+potentially be eliminated.
+
+besides those, add to the list
+ - An element for each unique key in the table that needs to be eliminated
+ - An element for each table that needs to be eliminated
+ - An element for each join nest that can be eliminated (i.e. has no
+ references from outside).
+
+Then, setup "reverse dependencies": each element should have pointers to
+elements that are functionally dependent on it:
+
+- "tbl.field=expr(...)" equality is functionally dependent on all fields that
+ are used in "expr(...)" (here we take into account only fields that belong
+ to tables that can potentially be eliminated).
+- a unique key is dependent on all of its components
+- a table is dependent on all of its unique keys
+- a join nest is dependent on all tables that it contains
+
+These pointers are stored in form of one bitmap, such that:
+
+ "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
+
+Each object also stores a number of dependencies it needs to be satisfied
+before it itself is satisfied:
+
+- "tbl.field=expr(...)" needs all its underlying fields (if a field is
+ referenced many times it is counted only once)
+
+- a unique key needs all of its key parts
+
+- a table needs only one of its unique keys
+
+- a join nest needs all of its tables
+
+(TODO: so what do we do when we've marked a table as constant? We'll need to
+update the "field=expr(....)" elements that use fields of that table. And the
+problem is that we won't know how much to decrement from the counters of those
+elements.
+
+Solution#1: switch to table_map() based approach.
+Solution#2: introduce separate elements for each involved field.
+ field will depend on its table,
+ "field=expr" will depend on fields.
+)
+
+Besides the above, let each element have a pointer to another element, so that
+we can have a linked list of elements.
+
+After the above structures have been created, we start the main algorithm.
+
+The first step is to create a list of functionally-dependent elements. We walk
+across array of dependencies and mark those elements that are already bound
+(i.e. their dependencies are satisfied). At the moment those immediately-bound
+are only "field=expr" dependencies that don't refer to any columns that are
+not bound.
+
+The second step is the loop
+
+ while (bound_list is not empty)
+ {
+ Take the first bound element F off the list.
+ Use the bitmap to find out what other elements depended on it
+ for each such element E
+ {
+ if (E becomes bound after F is bound)
+ add E to the list;
+ }
+ }
+
+The last step is to walk through elements that represent the join nests. Those
+that are bound can be eliminated.
+
+4. Removal operation properties
+===============================
+* There is always one way to remove (no choice to remove either this or that)
+* It is always better to remove as much tables as possible (at least within
+ our cost model).
+Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
+
+
+5. Removal operation
+====================
+(This depends a lot on whether we make table elimination a one-off rewrite or
+conditional)
+
+At the moment table elimination is re-done for each join re-execution, hence
+the removal operation is designed not to modify any statement's permanent
+members.
+
+* Remove the outer join nest's nested join structure (i.e. get the
+ outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
+ $OJ->embedding->nested_join. Update table_map's of all ancestor nested
+ joins). [MARK2]
+
+* Move the tables and their JOIN_TABs to the front of join order, like it is
+ done with const tables, with exception that if eliminated outer join nest
+ was within another outer join nest, that shouldn't prevent us from moving
+ away the eliminated tables.
+
+* Update join->table_count and all-join-tables bitmap.
+ ^ TODO: not true anymore ^
+
+* That's it. Nothing else?
+
+6. User interface
+=================
+
+6.1 @@optimizer_switch flag
+---------------------------
+Argument againist adding the flag:
+* It is always better to perform table elimination than not to do it.
+
+Arguments for the flag:
+* It is always theoretically possible that the new code will cause unintended
+ slowdowns.
+* Having the flag is useful for QA and comparative benchmarking.
+
+Decision so far: add the flag under #ifdef. Make the flag be present in debug
+builds.
+
+6.2 EXPLAIN [EXTENDED]
+----------------------
+There are two possible options:
+1. Show eliminated tables, like we do with const tables.
+2. Do not show eliminated tables.
+
+We chose option 2, because:
+- the table is not accessed at all (besides locking it)
+- it is more natural for anchor model user - when he's querying an anchor-
+ and attributes view, he doesn't care about the unused attributes.
+
+EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
+
+NOTE: Before this WL, the warning text was generated after all JOIN objects
+have been destroyed. This didn't allow to use information about join plan
+when printing the warning. We've fixed this by keeping the JOIN objects until
+the warning text has been generated.
+
+Table elimination removes inner sides of outer join, and logically the ON
+clause is also removed. If this clause has any subqueries, they will be
+also removed from EXPLAIN output.
+
+An exception to the above is that if we eliminate a derived table, it will
+still be shown in EXPLAIN output. This comes from the fact that the FROM
+subqueries are evaluated before table elimination is invoked.
+TODO: Is the above ok or still remove parts of FROM subqueries?
+
+7. Miscellaneous adjustments
+============================
+
+7.1 Fix used_tables() of aggregate functions
+--------------------------------------------
+Aggregate functions used to report that they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+always. Fixed it, now aggregate function reports that it depends on the
+tables that its arguments depend on. In particular, COUNT(*) reports that it
+depends on no tables (item_count_star->used_tables()==0). One consequence of
+that is that "item->used_tables()==0" is not equivalent to
+"item->const_item()==true" anymore (not sure if it's "anymore" or this has
+been already so for some items).
+
+7.2 Make subquery predicates collect their outer references
+-----------------------------------------------------------
+Per-column functional dependency analysis requires us to take a
+
+ tbl.field = func(...)
+
+equality and tell which columns of which tables are referred from func(...)
+expression. For scalar expressions, this is accomplished by Item::walk()-based
+traversal. It should be reasonably cheap (the only practical Item that can be
+expensive to traverse seems to be a special case of "col IN (const1,const2,
+...)". check if we traverse the long list for such items).
+
+For correlated subqueries, traversal can be expensive, it is cheaper to make
+each subquery item have a list of its outer references. The list can be
+collected at fix_fields() stage with very little extra cost, and then it could
+be used for other optimizations.
+
+
+8. Other concerns
+=================
+
+8.1 Relationship with outer->inner joins converter
+--------------------------------------------------
+One could suspect that outer->inner join conversion could get in the way
+of table elimination by changing outer joins (which could be eliminated)
+to inner (which we will not try to eliminate).
+This concern is not valid: we make outer->inner conversions based on
+predicates in WHERE. If the WHERE referred to an inner table (this is a
+requirement for the conversion) then table elimination would not be
+applicable anyway.
+
+8.2 Relationship with prepared statements
+-----------------------------------------
+On one hand, it's natural to desire to make table elimination a
+once-per-statement operation, like outer->inner join conversion. We'll have
+to limit the applicability by removing [MARK1] as that can change during
+lifetime of the statement.
+
+The other option is to do table elimination every time. This will require to
+rework operation [MARK2] to be undoable.
+
+
+8.3 Relationship with constant table detection
+----------------------------------------------
+Table elimination is performed after constant table detection (but before
+the range analysis). Constant tables are technically different from
+eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+Considering we've already done the join_read_const_table() call, is there any
+real difference between constant table and eliminated one? If there is, should
+we mark const tables also as eliminated?
+from user/EXPLAIN point of view: no. constant table is the one that we read
+one record from. eliminated table is the one that we don't acccess at all.
+TODO
+
+9. Tests and benchmarks
+=======================
+Create a benchmark in sql-bench which checks if the DBMS has table
+elimination.
+[According to Monty] Run
+ - query Q1 that would use elimination
+ - query Q2 that is very similar to Q1 (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether the used dbms
+supports table elimination.
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
------------------------------------------------------------
-=-=(View All Progress Notes, 28 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Elimination criteria
2. No outside references check
2.1 Quick check if there are tables with no outside references
3. One-match check
3.1 Functional dependency source #1: Potential eq_ref access
3.2 Functional dependency source #2: col2=func(col1)
3.3 Functional dependency source #3: One or zero records in the table
3.4 Functional dependency check implementation
3.4.1 Equality collection: Option1
3.4.2 Equality collection: Option2
3.4.3 Functional dependency propagation - option 1
3.4.4 Functional dependency propagation - option 2
4. Removal operation properties
5. Removal operation
6. User interface
6.1 @@optimizer_switch flag
6.2 EXPLAIN [EXTENDED]
7. Miscellaneous adjustments
7.1 Fix used_tables() of aggregate functions
7.2 Make subquery predicates collect their outer references
8. Other concerns
8.1 Relationship with outer->inner joins converter
8.2 Relationship with prepared statements
8.3 Relationship with constant table detection
9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Elimination criteria
=======================
We can eliminate inner side of an outer join nest if:
1. There are no references to columns of the inner tables anywhere else in
the query.
2. For each record combination of outer tables, it will always produce
exactly one matching record combination.
Most of effort in this WL entry is checking these two conditions.
2. No outside references check
==============================
Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
tables of the outer join nest we're trying to remove.
For multi-table UPDATE/DELETE we also must not remove tables that we're
updating/deleting from or tables that are used in UPDATE's SET clause.
2.1 Quick check if there are tables with no outside references
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start searching for outer join nests that could be eliminated,
we'll do a quick and cheap check if there possibly could be something that
could be eliminated:
if (there are outer joins &&
(tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
3. One-match check
==================
We can eliminate inner side of outer join if it will always generate exactly
one matching record combination.
By definition of OUTER JOIN, a NULL-complemented record combination will be
generated when the inner side of outer join has not produced any matches.
What remains to be checked is that there is no possiblity that inner side of
the outer join could produce more than one matching record combination.
We'll refer to one-match property as "functional dependency":
- A outer join nest is functionally dependent [wrt outer tables] if it will
produce one matching record combination per each record combination of
outer tables
- A table is functionally dependent wrt certain set of dependency tables, if
record combination of dependency tables uniquely identifies zero or one
matching record in the table
- Definitions of functional dependency of keys (=column tuples) and columns are
apparent.
Our goal is to prove that the entire join nest is functionally-dependent.
Join nest is functionally dependent (on the otside tables) if each of its
elements (those can be either base tables or join nests) is functionally
dependent.
Functional dependency is transitive: if table A is f-dependent on the outer
tables and table B is f.dependent on {A, outer_tables} then B is functionally
dependent on the outer tables.
Subsequent sections list cases when we can declare a table to be
functionally-dependent.
3.1 Functional dependency source #1: Potential eq_ref access
------------------------------------------------------------
This is the most practically-important case. Taking the example from the HLD
of this WL entry:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
and generalizing it: a table TBL is functionally-dependent if the ON
expression allows to construct a potential eq_ref access to table TBL that
uses only outer or functionally-dependent tables.
In other words: table TBL will have one match if the ON expression can be
converted into this form
TBL.unique_key=func(one_match_tables) AND .. remainder ...
(with appropriate extension for multi-part keys), where
one_match_tables= {
tables that are not on the inner side of the outer join in question, and
functionally dependent tables
}
Note that this will cover constant tables, except those that are constant because
they have 0/1 record or are partitioned and have no used partitions.
3.2 Functional dependency source #2: col2=func(col1)
----------------------------------------------------
This comes from the second example in the HLS:
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
Here it is apparent that tableB can be eliminated. It is not possible to
construct eq_ref access to tableB, though, because for the second part of the
primary key (fromDate column) we only got a condition in this form:
B.fromDate= func(tableB)
(we write "func(tableB)" because ref optimizer can only determine which tables
the right part of the equality depends on).
In general case, equality like this doesn't guarantee functional dependency.
For example, if func() == { return fromDate;}, i.e the ON expression is
... ON B.id = A.id and B.fromDate = B.fromDate
then that would allow table B to have multiple matches per record of table A.
In order to be able to distinguish between these two cases, we'll need to go
down to column level:
- A table is functionally dependent if it has a unique key that's functionally
dependent
- A unique key is functionally dependent when all of its columns are
functionally dependent
- A table column is functionally dependent if the ON clause allows to extract
an AND-part in this form:
tbl.column = f(functionally-dependent columns or columns of outer tables)
3.3 Functional dependency source #3: One or zero records in the table
---------------------------------------------------------------------
A table with one or zero records cannot generate more than one matching
record. This source is of lesser importance as one/zero-record tables are only
MyISAM tables.
3.4 Functional dependency check implementation
----------------------------------------------
As shown above, we need something similar to KEYUSE structures, but not
exactly that (we need things that current ref optimizer considers unusable and
don't need things that it considers usable).
3.4.1 Equality collection: Option1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could
- extend KEYUSE structures to store all kinds of equalities we need
- change update_ref_and_keys() and co. to collect equalities both for ref
access and for table elimination
= [possibly] Improve [eq_]ref access to be able to use equalities in
form keypart2=func(keypart1)
- process the KEYUSE array both by table elimination and by ref access
optimizer.
+ This requires less effort.
- Code will have to be changed all over sql_select.cc
- update_ref_and_keys() and co. already do several unrelated things. Hooking
up table elimination will make it even worse.
3.4.2 Equality collection: Option2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alternatively, we could process the WHERE clause totally on our own.
+ Table elimination is standalone and easy to detach module.
- Some code duplication with update_ref_and_keys() and co.
Having got the equalities, we'll to propagate functional dependency property
to unique keys, tables and, ultimately, join nests.
3.4.3 Functional dependency propagation - option 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Borrow the approach used in constant table detection code:
do
{
converted= FALSE;
for each table T in join nest
{
if (check_if_functionally_dependent(T))
converted= TRUE;
}
} while (converted == TRUE);
check_if_functionally_dependent(T)
{
if (T has eq_ref access based on func_dep_tables)
return TRUE;
Apply the same do-while loop-based approach to available equalities
T.column1=func(other columns)
to spread the set of functionally-dependent columns. The goal is to get
all columns of a certain unique key to be bound.
}
3.4.4 Functional dependency propagation - option 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Analyze the ON expression(s) and build a list of
tbl.field = expr(...)
equalities. tbl here is a table that belongs to a join nest that could
potentially be eliminated.
besides those, add to the list
- An element for each unique key in the table that needs to be eliminated
- An element for each table that needs to be eliminated
- An element for each join nest that can be eliminated (i.e. has no
references from outside).
Then, setup "reverse dependencies": each element should have pointers to
elements that are functionally dependent on it:
- "tbl.field=expr(...)" equality is functionally dependent on all fields that
are used in "expr(...)" (here we take into account only fields that belong
to tables that can potentially be eliminated).
- a unique key is dependent on all of its components
- a table is dependent on all of its unique keys
- a join nest is dependent on all tables that it contains
These pointers are stored in form of one bitmap, such that:
"X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
Each object also stores a number of dependencies it needs to be satisfied
before it itself is satisfied:
- "tbl.field=expr(...)" needs all its underlying fields (if a field is
referenced many times it is counted only once)
- a unique key needs all of its key parts
- a table needs only one of its unique keys
- a join nest needs all of its tables
(TODO: so what do we do when we've marked a table as constant? We'll need to
update the "field=expr(....)" elements that use fields of that table. And the
problem is that we won't know how much to decrement from the counters of those
elements.
Solution#1: switch to table_map() based approach.
Solution#2: introduce separate elements for each involved field.
field will depend on its table,
"field=expr" will depend on fields.
)
Besides the above, let each element have a pointer to another element, so that
we can have a linked list of elements.
After the above structures have been created, we start the main algorithm.
The first step is to create a list of functionally-dependent elements. We walk
across array of dependencies and mark those elements that are already bound
(i.e. their dependencies are satisfied). At the moment those immediately-bound
are only "field=expr" dependencies that don't refer to any columns that are
not bound.
The second step is the loop
while (bound_list is not empty)
{
Take the first bound element F off the list.
Use the bitmap to find out what other elements depended on it
for each such element E
{
if (E becomes bound after F is bound)
add E to the list;
}
}
The last step is to walk through elements that represent the join nests. Those
that are bound can be eliminated.
4. Removal operation properties
===============================
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
5. Removal operation
====================
(This depends a lot on whether we make table elimination a one-off rewrite or
conditional)
At the moment table elimination is re-done for each join re-execution, hence
the removal operation is designed not to modify any statement's permanent
members.
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to the front of join order, like it is
done with const tables, with exception that if eliminated outer join nest
was within another outer join nest, that shouldn't prevent us from moving
away the eliminated tables.
* Update join->table_count and all-join-tables bitmap.
^ TODO: not true anymore ^
* That's it. Nothing else?
6. User interface
=================
6.1 @@optimizer_switch flag
---------------------------
Argument againist adding the flag:
* It is always better to perform table elimination than not to do it.
Arguments for the flag:
* It is always theoretically possible that the new code will cause unintended
slowdowns.
* Having the flag is useful for QA and comparative benchmarking.
Decision so far: add the flag under #ifdef. Make the flag be present in debug
builds.
6.2 EXPLAIN [EXTENDED]
----------------------
There are two possible options:
1. Show eliminated tables, like we do with const tables.
2. Do not show eliminated tables.
We chose option 2, because:
- the table is not accessed at all (besides locking it)
- it is more natural for anchor model user - when he's querying an anchor-
and attributes view, he doesn't care about the unused attributes.
EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
NOTE: Before this WL, the warning text was generated after all JOIN objects
have been destroyed. This didn't allow to use information about join plan
when printing the warning. We've fixed this by keeping the JOIN objects until
the warning text has been generated.
Table elimination removes inner sides of outer join, and logically the ON
clause is also removed. If this clause has any subqueries, they will be
also removed from EXPLAIN output.
An exception to the above is that if we eliminate a derived table, it will
still be shown in EXPLAIN output. This comes from the fact that the FROM
subqueries are evaluated before table elimination is invoked.
TODO: Is the above ok or still remove parts of FROM subqueries?
7. Miscellaneous adjustments
============================
7.1 Fix used_tables() of aggregate functions
--------------------------------------------
Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports that it depends on the
tables that its arguments depend on. In particular, COUNT(*) reports that it
depends on no tables (item_count_star->used_tables()==0). One consequence of
that is that "item->used_tables()==0" is not equivalent to
"item->const_item()==true" anymore (not sure if it's "anymore" or this has
been already so for some items).
7.2 Make subquery predicates collect their outer references
-----------------------------------------------------------
Per-column functional dependency analysis requires us to take a
tbl.field = func(...)
equality and tell which columns of which tables are referred from func(...)
expression. For scalar expressions, this is accomplished by Item::walk()-based
traversal. It should be reasonably cheap (the only practical Item that can be
expensive to traverse seems to be a special case of "col IN (const1,const2,
...)". check if we traverse the long list for such items).
For correlated subqueries, traversal can be expensive, it is cheaper to make
each subquery item have a list of its outer references. The list can be
collected at fix_fields() stage with very little extra cost, and then it could
be used for other optimizations.
8. Other concerns
=================
8.1 Relationship with outer->inner joins converter
--------------------------------------------------
One could suspect that outer->inner join conversion could get in the way
of table elimination by changing outer joins (which could be eliminated)
to inner (which we will not try to eliminate).
This concern is not valid: we make outer->inner conversions based on
predicates in WHERE. If the WHERE referred to an inner table (this is a
requirement for the conversion) then table elimination would not be
applicable anyway.
8.2 Relationship with prepared statements
-----------------------------------------
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
8.3 Relationship with constant table detection
----------------------------------------------
Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
TODO
9. Tests and benchmarks
=======================
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- query Q1 that would use elimination
- query Q2 that is very similar to Q1 (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether the used dbms
supports table elimination.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0