developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 8 participants
- 6811 discussions
[Maria-developers] New (by Monty): MERGE tables for Maria (28)
by worklog-noreply@askmonty.org 02 Jun '09
by worklog-noreply@askmonty.org 02 Jun '09
02 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: MERGE tables for Maria
CREATION DATE..: Tue, 02 Jun 2009, 01:39
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 28 (http://askmonty.org/worklog/?tid=28)
VERSION........: Server-5.2
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 80 (hours remain)
ORIG. ESTIMATE.: 80
PROGRESS NOTES:
DESCRIPTION:
Make a version of MERGE tables for Maria.
This is needed to make Maria a full MyISAM replacement.
(And yes, Partition is not a replacement for MERGE table)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.000000000 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.000000000 +0300
@@ -1,11 +1,14 @@
<contents>
1. Conditions for removal
+1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
+1.1 Quick check if there are candidates
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+ if ((tables used in select_list |
+ tables used in group/order by UNION |
+ tables used in where) != bitmap_of_all_tables)
+ {
+ attempt table elimination;
+ }
+
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
+ (Note ^^ utility of the above questioned ^, as table elimination can never
+ be worse than no elimination. We're leaning towards not adding the flag)
-* With EXPLAIN, there are two options:
- - Show removed tables in a way similar to const tables, with some
- indication that they are removed.
- - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+ if tables were removed, and also will behave nicely with anchor model and
+ VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+-----------------------
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
-5. Todo, issues to resolve
+6. Todo, issues to resolve
--------------------------
-5.1 To resolve
+6.1 To resolve
~~~~~~~~~~~~~~
-- See EXPLAIN question in section #4.
-
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
@@ -87,7 +104,7 @@
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
-5.2 Resolved
+6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.000000000 +0300
+++ /tmp/wklog.17.new.30851 2009-05-22 17:23:38.000000000 +0300
@@ -6,7 +6,7 @@
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
-execution plan when it is unneccessary to include them. This can, of
+execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
@@ -22,30 +22,26 @@
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
-contain a NULL value.
+still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
-A
-contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
+A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
-what
-the result will look like is to actually touch both tables during
+what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
-as
-many rows as there are in tableA, since joining with tableB cannot
+as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
-unneccessary. We can remove the whole join operation from the execution
+unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
-in
-the case described above. Let us look at a more advanced query, where
+in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176 2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176 2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0
------------------------------------------------------------
-=-=(View All Progress Notes, 18 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Should create a benchmark in sql-bench which checks if the dbms has table
elimination.
TODO elaborate
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Aggregate functions report they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
* Should eliminated tables be shown in EXPLAIN EXTENDED?
- If we just ignore the question, they will be shown
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.000000000 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.000000000 +0300
@@ -1,11 +1,14 @@
<contents>
1. Conditions for removal
+1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
+1.1 Quick check if there are candidates
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+ if ((tables used in select_list |
+ tables used in group/order by UNION |
+ tables used in where) != bitmap_of_all_tables)
+ {
+ attempt table elimination;
+ }
+
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
+ (Note ^^ utility of the above questioned ^, as table elimination can never
+ be worse than no elimination. We're leaning towards not adding the flag)
-* With EXPLAIN, there are two options:
- - Show removed tables in a way similar to const tables, with some
- indication that they are removed.
- - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+ if tables were removed, and also will behave nicely with anchor model and
+ VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+-----------------------
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
-5. Todo, issues to resolve
+6. Todo, issues to resolve
--------------------------
-5.1 To resolve
+6.1 To resolve
~~~~~~~~~~~~~~
-- See EXPLAIN question in section #4.
-
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
@@ -87,7 +104,7 @@
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
-5.2 Resolved
+6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.000000000 +0300
+++ /tmp/wklog.17.new.30851 2009-05-22 17:23:38.000000000 +0300
@@ -6,7 +6,7 @@
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
-execution plan when it is unneccessary to include them. This can, of
+execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
@@ -22,30 +22,26 @@
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
-contain a NULL value.
+still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
-A
-contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
+A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
-what
-the result will look like is to actually touch both tables during
+what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
-as
-many rows as there are in tableA, since joining with tableB cannot
+as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
-unneccessary. We can remove the whole join operation from the execution
+unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
-in
-the case described above. Let us look at a more advanced query, where
+in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176 2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176 2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0
------------------------------------------------------------
-=-=(View All Progress Notes, 18 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Should create a benchmark in sql-bench which checks if the dbms has table
elimination.
TODO elaborate
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Aggregate functions report they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
* Should eliminated tables be shown in EXPLAIN EXTENDED?
- If we just ignore the question, they will be shown
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point
+where it is really necessary.
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to
+take it into consideration together with range access plans.
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND
+ range_cond(key2_1) AND
+ other_cond AND
+ index_merge_OR_cond1(key3_1, key3_2, ...)
+ index_merge_OR_cond2(key4_1, key4_2, ...)
+
+where
+
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR
+ range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+ and doesn't allow to construct range/index_merge accesses
+ for any keys of the table in question.
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+ SEL_TREE(
+ range(key1_1),
+ ...
+ range(key2_1),
+ SEL_IMERGE( (1)
+ SEL_TREE(key3_1})
+ SEL_TREE(key3_2})
+ ...
+ )
+ ...
+ )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+ index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok
+ it did but that was done for accesses to different tables). Will there be any
+ possible gotchas here?
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point
+where it is really necessary.
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to
+take it into consideration together with range access plans.
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND
+ range_cond(key2_1) AND
+ other_cond AND
+ index_merge_OR_cond1(key3_1, key3_2, ...)
+ index_merge_OR_cond2(key4_1, key4_2, ...)
+
+where
+
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR
+ range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+ and doesn't allow to construct range/index_merge accesses
+ for any keys of the table in question.
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+ SEL_TREE(
+ range(key1_1),
+ ...
+ range(key2_1),
+ SEL_IMERGE( (1)
+ SEL_TREE(key3_1})
+ SEL_TREE(key3_2})
+ ...
+ )
+ ...
+ )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+ index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok
+ it did but that was done for accesses to different tables). Will there be any
+ possible gotchas here?
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Table elimination (17)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.000000000 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.000000000 +0300
@@ -1,11 +1,14 @@
<contents>
1. Conditions for removal
+1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
+1.1 Quick check if there are candidates
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+ if ((tables used in select_list |
+ tables used in group/order by UNION |
+ tables used in where) != bitmap_of_all_tables)
+ {
+ attempt table elimination;
+ }
+
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
+ (Note ^^ utility of the above questioned ^, as table elimination can never
+ be worse than no elimination. We're leaning towards not adding the flag)
-* With EXPLAIN, there are two options:
- - Show removed tables in a way similar to const tables, with some
- indication that they are removed.
- - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+ if tables were removed, and also will behave nicely with anchor model and
+ VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+-----------------------
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
-5. Todo, issues to resolve
+6. Todo, issues to resolve
--------------------------
-5.1 To resolve
+6.1 To resolve
~~~~~~~~~~~~~~
-- See EXPLAIN question in section #4.
-
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
@@ -87,7 +104,7 @@
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
-5.2 Resolved
+6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.000000000 +0300
+++ /tmp/wklog.17.new.30851 2009-05-22 17:23:38.000000000 +0300
@@ -6,7 +6,7 @@
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
-execution plan when it is unneccessary to include them. This can, of
+execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
@@ -22,30 +22,26 @@
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
-contain a NULL value.
+still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
-A
-contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
+A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
-what
-the result will look like is to actually touch both tables during
+what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
-as
-many rows as there are in tableA, since joining with tableB cannot
+as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
-unneccessary. We can remove the whole join operation from the execution
+unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
-in
-the case described above. Let us look at a more advanced query, where
+in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176 2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176 2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30122 2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122 2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Maria-Sprint
------------------------------------------------------------
-=-=(View All Progress Notes, 17 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Should create a benchmark in sql-bench which checks if the dbms has table
elimination.
TODO elaborate
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Aggregate functions report they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
* Should eliminated tables be shown in EXPLAIN EXTENDED?
- If we just ignore the question, they will be shown
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Table elimination (17)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.000000000 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.000000000 +0300
@@ -1,11 +1,14 @@
<contents>
1. Conditions for removal
+1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
+1.1 Quick check if there are candidates
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+ if ((tables used in select_list |
+ tables used in group/order by UNION |
+ tables used in where) != bitmap_of_all_tables)
+ {
+ attempt table elimination;
+ }
+
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
+ (Note ^^ utility of the above questioned ^, as table elimination can never
+ be worse than no elimination. We're leaning towards not adding the flag)
-* With EXPLAIN, there are two options:
- - Show removed tables in a way similar to const tables, with some
- indication that they are removed.
- - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+ if tables were removed, and also will behave nicely with anchor model and
+ VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+-----------------------
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
-5. Todo, issues to resolve
+6. Todo, issues to resolve
--------------------------
-5.1 To resolve
+6.1 To resolve
~~~~~~~~~~~~~~
-- See EXPLAIN question in section #4.
-
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
@@ -87,7 +104,7 @@
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
-5.2 Resolved
+6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.000000000 +0300
+++ /tmp/wklog.17.new.30851 2009-05-22 17:23:38.000000000 +0300
@@ -6,7 +6,7 @@
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
-execution plan when it is unneccessary to include them. This can, of
+execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
@@ -22,30 +22,26 @@
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
-contain a NULL value.
+still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
-A
-contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
+A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
-what
-the result will look like is to actually touch both tables during
+what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
-as
-many rows as there are in tableA, since joining with tableB cannot
+as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
-unneccessary. We can remove the whole join operation from the execution
+unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
-in
-the case described above. Let us look at a more advanced query, where
+in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176 2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176 2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30122 2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122 2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Maria-Sprint
------------------------------------------------------------
-=-=(View All Progress Notes, 17 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Should create a benchmark in sql-bench which checks if the dbms has table
elimination.
TODO elaborate
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Aggregate functions report they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
* Should eliminated tables be shown in EXPLAIN EXTENDED?
- If we just ignore the question, they will be shown
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
I noticed a difference between Bitkeeper and Bzr merges.
In Bitkeeper, merge parents are essentially unordered. The two parents of a
merge are completely equal, and there is no way from the Bitkeeper to history
to say whether one parent was merged into the other or vise versa. The actual
order in which merge parents are listed is based on some deterministic
algorithm (most recent commit date if I recall correctly).
In particular, this means that in Bitkeeper, the result of a pull of A into B
is exactly identical to a pull of B into A. This makes it impossible to
establish the history of pushes into main trees from just the Bitkeeper
history.
In Bzr, merge parents _are_ ordered. Merging B into A gives a different result
from merging A into B. When merging B into A (cd A; bzr merge B), A is the
primary or left-hand parent. The resulting tree has one extra top-level merge
revision following the tip of A, with all merged revisions of B appearing as
sub-revisions of this.
This means that if one merges A into B giving C1, and later merges B into A
giving C2, then C1 and C2 are considered diverged and need another merge
changeset to be consolidated (in Bitkeeper C1 and C2 are identical and need no
further merge).
What this means is that in bzr is _is_ actually possible to see the history of
what was merged into what, at least in some sense.
Unfortunately, it tends to be backwards with usual MySQL working style. This
is to make a branch at some point X, make a patch, commit and get review. Then
when pushing, the main tree has gotten additional pushes by others and is now
at revision Y > X. So one does a bzr merge of the main tree into the local
clone, followed by a push to the main tree. Now the local patch becomes the
main merge parent, and all other pushes between X and Y become sub-revisions
with their revision numbers renamed. Which eventually makes primary/secondary
merge parent relationships more or less random.
So since I did not take part in the transition to bzr within Sun, I just
wanted to ask if this is something that was discussed, and if so if there were
any conclusions?
As I understand, the Drizzle people make sure that whenever they merge, they
merge into a copy of their trunk, so that all past pushes are visible as the
line of primary/left-most merge parents in the main tree. This works better
for them, as they follow the model of a single/few merge captains merging in
other peoples trees upon merge requests.
My personal opinion is that while it would be nice to have the push history
available like the Drizzle people do, the bzr support for this is not good
enough, without a very tight control on who can push to main trees that we do
not want. What one can do is when pushing, first swith to an up-to-date fresh
clone of the main tree, pull from the local branch with own changes, then push
that to the main tree. Maybe this is a good way, even though it cannot be
enforced?
Just wanted to hear if there are any other opinions on this?
- Kristian.
3
8
Hi!
I've got several questions for our instance of buildbot:
Grid display (http://askmonty.org/buildbot/grid) is very nice, but it
would be even nicer if one could display only pushes into one particular
branch, like in pushbuild. Is it difficult to achieve? If not, can I request
it from somebody or will need to spend the effort myself?
Can we have buildbot automatically re-build things, say, daily when no
other load is present? I've re-ran maria-5.1 tests on my machine several
times and it failed all three times with different number of failing tests,
which is an indication of precence of random failures. (In Sun/MySQL I used
to push fixes that touch only comments in order to trigger re-builds and
catch random failures)
Can we have buildbot also process other branches? maria trees are public-
facing trees, so I suppose that means we should not break them too often,
and one way to do that is to have staging trees where features could be
tested before they are pushed into the main tree.
One could argue that this could be achieved by autopush, but autopush only
uses one platform, doesn't save/keep track of test run results, etc. My
experience at Sun/MySQL shows that pushbuild (and I suppose buildbot) over a
separate branch is much better. It runs different tests on different
platforms, pedantically records test results (where everyone can see them),
etc etc.
Is it possible to set up buildbot to run several configurations, but one after
the other? Looking at the buildbot status page, I see that
- there are not too many hosts
- they are idle most of the time.
Suppose I want to use my 32/64 box to build 32 and 64 bit, debug and release,
using gcc and sunstudio. That's 2^3 = 8 combinations, but I don't want to
have 8 build bots running in parallel (I dont think the box has enough RAM for
that)?
p.s.
I myself have almost repaired my desktop computer, so we can expect to have a
dual 32/64-bit buildbot slave to be added as soon as I finish setting things
up (sometime this week).
BR
Sergey
--
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
4
3
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 01 Jun '09
by worklog-noreply@askmonty.org 01 Jun '09
01 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
-=-=(Guest - Tue, 26 May 2009, 21:52)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.14120 2009-05-26 21:52:06.000000000 +0300
+++ /tmp/wklog.17.new.14120 2009-05-26 21:52:06.000000000 +0300
@@ -1,11 +1,14 @@
<contents>
1. Conditions for removal
+1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
-5. Todo, issues to resolve
-5.1 To resolve
-5.2 Resolved
+5. Tests and benchmarks
+6. Todo, issues to resolve
+6.1 To resolve
+6.2 Resolved
+
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -29,6 +32,18 @@
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
+1.1 Quick check if there are candidates
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start to enumerate join nests, here is a quick way to check if
+there *can be* something to be removed:
+
+ if ((tables used in select_list |
+ tables used in group/order by UNION |
+ tables used in where) != bitmap_of_all_tables)
+ {
+ attempt table elimination;
+ }
+
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
@@ -56,22 +71,24 @@
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
+ (Note ^^ utility of the above questioned ^, as table elimination can never
+ be worse than no elimination. We're leaning towards not adding the flag)
-* With EXPLAIN, there are two options:
- - Show removed tables in a way similar to const tables, with some
- indication that they are removed.
- - Do not show them altogether.
-(the second one seems to be better? We're targeting a situation with VIEWs,
-where the user would not care about what tables were added into his query
-and then discarded from it?)
+* EXPLAIN will not show the removed tables at all. This will allow to check
+ if tables were removed, and also will behave nicely with anchor model and
+ VIEWs: stuff that user doesn't care about just won't be there.
+
+5. Tests and benchmarks
+-----------------------
+Should create a benchmark in sql-bench which checks if the dbms has table
+elimination.
+TODO elaborate
-5. Todo, issues to resolve
+6. Todo, issues to resolve
--------------------------
-5.1 To resolve
+6.1 To resolve
~~~~~~~~~~~~~~
-- See EXPLAIN question in section #4.
-
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
@@ -87,7 +104,7 @@
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
-5.2 Resolved
+6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
-=-=(Guest - Fri, 22 May 2009, 17:23)=-=-
High-Level Specification modified.
--- /tmp/wklog.17.old.30851 2009-05-22 17:23:38.000000000 +0300
+++ /tmp/wklog.17.new.30851 2009-05-22 17:23:38.000000000 +0300
@@ -6,7 +6,7 @@
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
-execution plan when it is unneccessary to include them. This can, of
+execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
@@ -22,30 +22,26 @@
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
-contain a NULL value.
+still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
-A
-contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
+A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
-what
-the result will look like is to actually touch both tables during
+what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
-as
-many rows as there are in tableA, since joining with tableB cannot
+as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
-unneccessary. We can remove the whole join operation from the execution
+unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
-in
-the case described above. Let us look at a more advanced query, where
+in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176 2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176 2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162 2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162 2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30122 2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122 2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Maria-Sprint
-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30122 2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122 2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-5.2
+2.0
------------------------------------------------------------
-=-=(View All Progress Notes, 16 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Should create a benchmark in sql-bench which checks if the dbms has table
elimination.
TODO elaborate
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Aggregate functions report they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0