developers
Threads by month
- ----- 2025 -----
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 10 participants
- 6869 discussions

[Maria-developers] Updated (by Guest): Update packaging scripts for MariaDB 5.2 (88)
by worklog-noreply@askmonty.org 13 Mar '10
by worklog-noreply@askmonty.org 13 Mar '10
13 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Update packaging scripts for MariaDB 5.2
CREATION DATE..: Sat, 27 Feb 2010, 16:39
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 88 (http://askmonty.org/worklog/?tid=88)
VERSION........: Server-5.2
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 30 (hours remain)
ORIG. ESTIMATE.: 30
PROGRESS NOTES:
-=-=(Guest - Sat, 13 Mar 2010, 08:12)=-=-
Category updated.
--- /tmp/wklog.88.old.22167 2010-03-13 08:12:01.000000000 +0000
+++ /tmp/wklog.88.new.22167 2010-03-13 08:12:01.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
DESCRIPTION:
The packaging scripts need to be updated to work for MariaDB 5.2
Currently, 5.2 package builds fail in Buildbot. The .debs are missing a
debian-5.2 subdirectory.
The .rpm also need to be checked.
Buildbot needs to be updated to do the new upgrade tests (mariadb-5.1 ->
mariadb 5.2)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2832: 1. don't crash on failing to load a plugin with newer MYSQL_PLUGIN_INTERFACE_VERSION
by noreply@launchpad.net 12 Mar '10
by noreply@launchpad.net 12 Mar '10
12 Mar '10
------------------------------------------------------------
revno: 2832
committer: Sergei Golubchik <sergii(a)pisem.net>
branch nick: maria-5.1
timestamp: Fri 2010-03-12 20:05:21 +0100
message:
1. don't crash on failing to load a plugin with newer MYSQL_PLUGIN_INTERFACE_VERSION
2. don't copy st_mysql_plugin structure unnecessary (sizeof hasn't changed)
modified:
sql/sql_plugin.cc
sql/sql_plugin.h
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2831: Fix myisam checksum patch to check for HA_OPTION_CHECKSUM after it was set, not before
by noreply@launchpad.net 12 Mar '10
by noreply@launchpad.net 12 Mar '10
12 Mar '10
------------------------------------------------------------
revno: 2831
committer: Sergei Golubchik <sergii(a)pisem.net>
branch nick: maria-5.1
timestamp: Fri 2010-03-12 20:03:37 +0100
message:
Fix myisam checksum patch to check for HA_OPTION_CHECKSUM after it was set, not before
modified:
storage/myisam/mi_create.c
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-Sprint
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=-
Status updated.
--- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000
+++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Category updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Status updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
@@ -46,4 +46,10 @@
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1: outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+ subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+ select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+ strategy.
+
+
-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+ SELECT
+ oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+ FROM outer_tbl
+ WHERE outer_cond
+
+If we use Materialization strategy, the costs will be
+
+ cost of accessing outer_tbl +
+ materialization_cost +
+ #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where
+
+ materialization_cost=
+ cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+ SELECT
+ EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+ FROM outer_tbl
+ WHERE outer_cond
+
+and the costs will be
+
+ cost of accessing outer_tbl +
+ #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+ exists_select_cost=
+ cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and materialization_cost.
-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89
DESCRIPTION:
For uncorrelated IN subqueries that can't be converted to semi-joins it is
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.
Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.
This WL is about making cost-based decision for #1.
HIGH-LEVEL SPECIFICATION:
Why need two optimizations
--------------------------
Consider a query with subquery:
SELECT
oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
FROM outer_tbl
WHERE outer_cond
If we use Materialization strategy, the costs will be
cost of accessing outer_tbl +
materialization_cost +
#records(outer_tbl w/o outer_cond) * lookup_cost
where
materialization_cost=
cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
SELECT
EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
FROM outer_tbl
WHERE outer_cond
and the costs will be
cost of accessing outer_tbl +
#records(outer_tbl w/o outer_cond) * exists_select_cost
where
exists_select_cost=
cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
So, we'll need to compute both exists_select_cost and materialization_cost.
Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1: outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-Sprint
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=-
Status updated.
--- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000
+++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Category updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Status updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
@@ -46,4 +46,10 @@
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1: outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+ subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+ select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+ strategy.
+
+
-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+ SELECT
+ oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+ FROM outer_tbl
+ WHERE outer_cond
+
+If we use Materialization strategy, the costs will be
+
+ cost of accessing outer_tbl +
+ materialization_cost +
+ #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where
+
+ materialization_cost=
+ cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+ SELECT
+ EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+ FROM outer_tbl
+ WHERE outer_cond
+
+and the costs will be
+
+ cost of accessing outer_tbl +
+ #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+ exists_select_cost=
+ cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and materialization_cost.
-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89
DESCRIPTION:
For uncorrelated IN subqueries that can't be converted to semi-joins it is
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.
Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.
This WL is about making cost-based decision for #1.
HIGH-LEVEL SPECIFICATION:
Why need two optimizations
--------------------------
Consider a query with subquery:
SELECT
oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
FROM outer_tbl
WHERE outer_cond
If we use Materialization strategy, the costs will be
cost of accessing outer_tbl +
materialization_cost +
#records(outer_tbl w/o outer_cond) * lookup_cost
where
materialization_cost=
cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
SELECT
EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
FROM outer_tbl
WHERE outer_cond
and the costs will be
cost of accessing outer_tbl +
#records(outer_tbl w/o outer_cond) * exists_select_cost
where
exists_select_cost=
cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
So, we'll need to compute both exists_select_cost and materialization_cost.
Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1: outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-Sprint
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=-
Status updated.
--- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000
+++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Category updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Status updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
@@ -46,4 +46,10 @@
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1: outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+ subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+ select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+ strategy.
+
+
-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+ SELECT
+ oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+ FROM outer_tbl
+ WHERE outer_cond
+
+If we use Materialization strategy, the costs will be
+
+ cost of accessing outer_tbl +
+ materialization_cost +
+ #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where
+
+ materialization_cost=
+ cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+ SELECT
+ EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+ FROM outer_tbl
+ WHERE outer_cond
+
+and the costs will be
+
+ cost of accessing outer_tbl +
+ #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+ exists_select_cost=
+ cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and materialization_cost.
-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89
DESCRIPTION:
For uncorrelated IN subqueries that can't be converted to semi-joins it is
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.
Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.
This WL is about making cost-based decision for #1.
HIGH-LEVEL SPECIFICATION:
Why need two optimizations
--------------------------
Consider a query with subquery:
SELECT
oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
FROM outer_tbl
WHERE outer_cond
If we use Materialization strategy, the costs will be
cost of accessing outer_tbl +
materialization_cost +
#records(outer_tbl w/o outer_cond) * lookup_cost
where
materialization_cost=
cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
SELECT
EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
FROM outer_tbl
WHERE outer_cond
and the costs will be
cost of accessing outer_tbl +
#records(outer_tbl w/o outer_cond) * exists_select_cost
where
exists_select_cost=
cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
So, we'll need to compute both exists_select_cost and materialization_cost.
Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1: outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-Sprint
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=-
Status updated.
--- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000
+++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Category updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Status updated.
--- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
@@ -46,4 +46,10 @@
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1: outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+ subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+ select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+ strategy.
+
+
-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+ SELECT
+ oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+ FROM outer_tbl
+ WHERE outer_cond
+
+If we use Materialization strategy, the costs will be
+
+ cost of accessing outer_tbl +
+ materialization_cost +
+ #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where
+
+ materialization_cost=
+ cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+ SELECT
+ EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+ FROM outer_tbl
+ WHERE outer_cond
+
+and the costs will be
+
+ cost of accessing outer_tbl +
+ #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+ exists_select_cost=
+ cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and materialization_cost.
-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89
DESCRIPTION:
For uncorrelated IN subqueries that can't be converted to semi-joins it is
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.
Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.
This WL is about making cost-based decision for #1.
HIGH-LEVEL SPECIFICATION:
Why need two optimizations
--------------------------
Consider a query with subquery:
SELECT
oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
FROM outer_tbl
WHERE outer_cond
If we use Materialization strategy, the costs will be
cost of accessing outer_tbl +
materialization_cost +
#records(outer_tbl w/o outer_cond) * lookup_cost
where
materialization_cost=
cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
SELECT
EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
FROM outer_tbl
WHERE outer_cond
and the costs will be
cost of accessing outer_tbl +
#records(outer_tbl w/o outer_cond) * exists_select_cost
where
exists_select_cost=
cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
So, we'll need to compute both exists_select_cost and materialization_cost.
Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1: outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Timour): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 68
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 68
-=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=-
Dependency created: 94 now depends on 68
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
No change.
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
------------------------------------------------------------
-=-=(View All Progress Notes, 16 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Updated (by Timour): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 12 Mar '10
by worklog-noreply@askmonty.org 12 Mar '10
12 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 68
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 68
-=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=-
Dependency created: 94 now depends on 68
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
No change.
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
------------------------------------------------------------
-=-=(View All Progress Notes, 16 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0

[Maria-developers] Rev 2767: MWL#68 Subquery optimization: Efficient NOT IN execution with NULLs in file:///home/tsk/mprog/src/5.3-mwl68/
by timour@askmonty.org 11 Mar '10
by timour@askmonty.org 11 Mar '10
11 Mar '10
At file:///home/tsk/mprog/src/5.3-mwl68/
------------------------------------------------------------
revno: 2767
revision-id: timour(a)askmonty.org-20100311214331-kw8ng8aiy6h60vai
parent: timour(a)askmonty.org-20100309103615-dzmm6xt7ye5xfs25
committer: timour(a)askmonty.org
branch nick: 5.3-mwl68
timestamp: Thu 2010-03-11 23:43:31 +0200
message:
MWL#68 Subquery optimization: Efficient NOT IN execution with NULLs
This patch does three things:
- It adds the possibility to force the execution of top-level [NOT] IN
subquery predicates via the IN=>EXISTS transformation. This is done by
setting both optimizer switches partial_match_rowid_merge and
partial_match_table_scan to "off".
- It adjusts all test cases where the complete optimizer_switch is
selected because now we have two more switches.
- For those test cases where the plan changes because of the new available
strategies, we switch off both partial match strategies in order to
force the "old" IN=>EXISTS strategy. This is done because most of these
test cases specifically test bugs in this strategy.
=== modified file 'mysql-test/include/mix1.inc'
--- a/mysql-test/include/mix1.inc 2009-09-15 06:08:54 +0000
+++ b/mysql-test/include/mix1.inc 2010-03-11 21:43:31 +0000
@@ -1177,8 +1177,11 @@
create table t1 (a bit(1) not null,b int) engine=myisam;
create table t2 (c int) engine=innodb;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch='partial_match_rowid_merge=off,partial_match_table_scan=off';
explain
select b from t1 where a not in (select b from t1,t2 group by a) group by a;
+set optimizer_switch=@save_optimizer_switch;
DROP TABLE t1,t2;
--echo End of 5.0 tests
=== modified file 'mysql-test/r/index_merge_myisam.result'
--- a/mysql-test/r/index_merge_myisam.result 2010-01-17 14:51:10 +0000
+++ b/mysql-test/r/index_merge_myisam.result 2010-03-11 21:43:31 +0000
@@ -1419,19 +1419,19 @@
#
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='index_merge=off,index_merge_union=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='index_merge_union=on';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=off,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=off,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,index_merge_sort_union=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=off,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=off,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch=4;
ERROR 42000: Variable 'optimizer_switch' can't be set to the value of '4'
set optimizer_switch=NULL;
@@ -1458,21 +1458,21 @@
set optimizer_switch='index_merge=off,index_merge_union=off,default';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch=default;
select @@global.optimizer_switch;
@@global.optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set @@global.optimizer_switch=default;
select @@global.optimizer_switch;
@@global.optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
#
# Check index_merge's @@optimizer_switch flags
#
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, c int, filler char(100),
@@ -1582,5 +1582,5 @@
set optimizer_switch=default;
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
drop table t0, t1;
=== modified file 'mysql-test/r/innodb_mysql.result'
--- a/mysql-test/r/innodb_mysql.result 2009-12-15 07:16:46 +0000
+++ b/mysql-test/r/innodb_mysql.result 2010-03-11 21:43:31 +0000
@@ -1425,12 +1425,15 @@
#
create table t1 (a bit(1) not null,b int) engine=myisam;
create table t2 (c int) engine=innodb;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch='partial_match_rowid_merge=off,partial_match_table_scan=off';
explain
select b from t1 where a not in (select b from t1,t2 group by a) group by a;
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
2 DEPENDENT SUBQUERY t1 system NULL NULL NULL NULL 0 const row not found
2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 1
+set optimizer_switch=@save_optimizer_switch;
DROP TABLE t1,t2;
End of 5.0 tests
CREATE TABLE `t2` (
=== modified file 'mysql-test/r/myisam_mrr.result'
--- a/mysql-test/r/myisam_mrr.result 2010-01-17 14:51:10 +0000
+++ b/mysql-test/r/myisam_mrr.result 2010-03-11 21:43:31 +0000
@@ -394,7 +394,7 @@
# - engine_condition_pushdown does not affect ICP
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, key(a));
=== modified file 'mysql-test/r/ps.result'
--- a/mysql-test/r/ps.result 2009-05-27 15:19:44 +0000
+++ b/mysql-test/r/ps.result 2010-03-11 21:43:31 +0000
@@ -149,6 +149,8 @@
c32 set('monday', 'tuesday', 'wednesday')
) engine = MYISAM ;
create table t2 like t1;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
set @stmt= ' explain SELECT (SELECT SUM(c1 + c12 + 0.0) FROM t2 where (t1.c2 - 0e-3) = t2.c2 GROUP BY t1.c15 LIMIT 1) as scalar_s, exists (select 1.0e+0 from t2 where t2.c3 * 9.0000000000 = t1.c4) as exists_s, c5 * 4 in (select c6 + 0.3e+1 from t2) as in_s, (c7 - 4, c8 - 4) in (select c9 + 4.0, c10 + 40e-1 from t2) as in_row_s FROM t1, (select c25 x, c32 y from t2) tt WHERE x * 1 = c25 ' ;
prepare stmt1 from @stmt ;
execute stmt1 ;
@@ -177,6 +179,7 @@
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
deallocate prepare stmt1;
drop tables t1,t2;
+set @@optimizer_switch=@save_optimizer_switch;
set @arg00=1;
prepare stmt1 from ' create table t1 (m int) as select 1 as m ' ;
execute stmt1 ;
=== modified file 'mysql-test/r/subselect.result'
--- a/mysql-test/r/subselect.result 2010-02-17 21:59:41 +0000
+++ b/mysql-test/r/subselect.result 2010-03-11 21:43:31 +0000
@@ -1,4 +1,6 @@
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
select (select 2);
(select 2)
2
@@ -4803,4 +4805,5 @@
1
1
DROP TABLE t1;
+set @@optimizer_switch=@save_optimizer_switch;
End of 5.1 tests.
=== modified file 'mysql-test/r/subselect3.result'
--- a/mysql-test/r/subselect3.result 2010-02-17 10:05:27 +0000
+++ b/mysql-test/r/subselect3.result 2010-03-11 21:43:31 +0000
@@ -63,12 +63,15 @@
select ' ^ This must show 11' Z;
Z
^ This must show 11
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t3 ALL NULL NULL NULL NULL 2 100.00
2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 6 100.00 Using where; Using temporary; Using filesort
Warnings:
Note 1003 select <in_optimizer>(`test`.`t3`.`a`,<exists>(select max(`test`.`t1`.`ie`) AS `max(ie)` from `test`.`t1` where (`test`.`t1`.`oref` = 4) group by `test`.`t1`.`grp` having trigcond((<cache>(`test`.`t3`.`a`) = <ref_null_helper>(max(`test`.`t1`.`ie`)))))) AS `a in (select max(ie) from t1 where oref=4 group by grp)` from `test`.`t3`
+set @@optimizer_switch=@save_optimizer_switch;
drop table t1, t2, t3;
create table t1 (a int, oref int, key(a));
insert into t1 values
@@ -692,6 +695,8 @@
2 3 h
3 4 i
DROP TABLE t1, t2;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
CREATE TABLE t1 (a int);
CREATE TABLE t2 (b int, PRIMARY KEY(b));
INSERT INTO t1 VALUES (1), (NULL), (4);
@@ -759,6 +764,7 @@
1 PRIMARY t1 ALL NULL NULL NULL NULL 4 Using where
2 DEPENDENT SUBQUERY t2 unique_subquery PRIMARY PRIMARY 4 func 1 Using index; Using where
DROP TABLE t1, t2;
+set @@optimizer_switch=@save_optimizer_switch;
CREATE TABLE t1 (a INT);
INSERT INTO t1 VALUES(1);
CREATE TABLE t2 (placeholder CHAR(11));
@@ -960,7 +966,7 @@
# Baseline:
SHOW STATUS LIKE '%Handler_read_rnd_next';
Variable_name Value
-Handler_read_rnd_next 17
+Handler_read_rnd_next 18
INSERT INTO t1 VALUES (NULL, NULL);
FLUSH STATUS;
@@ -977,7 +983,7 @@
# (read record from t1, but do not read from t2)
SHOW STATUS LIKE '%Handler_read_rnd_next';
Variable_name Value
-Handler_read_rnd_next 18
+Handler_read_rnd_next 19
DROP TABLE t1,t2;
End of 5.1 tests
CREATE TABLE t1 (
=== modified file 'mysql-test/r/subselect3_jcl6.result'
--- a/mysql-test/r/subselect3_jcl6.result 2010-02-17 10:47:55 +0000
+++ b/mysql-test/r/subselect3_jcl6.result 2010-03-11 21:43:31 +0000
@@ -67,12 +67,15 @@
select ' ^ This must show 11' Z;
Z
^ This must show 11
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t3 ALL NULL NULL NULL NULL 2 100.00
2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 6 100.00 Using where; Using temporary; Using filesort
Warnings:
Note 1003 select <in_optimizer>(`test`.`t3`.`a`,<exists>(select max(`test`.`t1`.`ie`) AS `max(ie)` from `test`.`t1` where (`test`.`t1`.`oref` = 4) group by `test`.`t1`.`grp` having trigcond((<cache>(`test`.`t3`.`a`) = <ref_null_helper>(max(`test`.`t1`.`ie`)))))) AS `a in (select max(ie) from t1 where oref=4 group by grp)` from `test`.`t3`
+set @@optimizer_switch=@save_optimizer_switch;
drop table t1, t2, t3;
create table t1 (a int, oref int, key(a));
insert into t1 values
@@ -696,6 +699,8 @@
2 3 h
3 4 i
DROP TABLE t1, t2;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
CREATE TABLE t1 (a int);
CREATE TABLE t2 (b int, PRIMARY KEY(b));
INSERT INTO t1 VALUES (1), (NULL), (4);
@@ -763,6 +768,7 @@
1 PRIMARY t1 ALL NULL NULL NULL NULL 4 Using where
2 DEPENDENT SUBQUERY t2 unique_subquery PRIMARY PRIMARY 4 func 1 Using index; Using where
DROP TABLE t1, t2;
+set @@optimizer_switch=@save_optimizer_switch;
CREATE TABLE t1 (a INT);
INSERT INTO t1 VALUES(1);
CREATE TABLE t2 (placeholder CHAR(11));
@@ -964,7 +970,7 @@
# Baseline:
SHOW STATUS LIKE '%Handler_read_rnd_next';
Variable_name Value
-Handler_read_rnd_next 17
+Handler_read_rnd_next 18
INSERT INTO t1 VALUES (NULL, NULL);
FLUSH STATUS;
@@ -981,7 +987,7 @@
# (read record from t1, but do not read from t2)
SHOW STATUS LIKE '%Handler_read_rnd_next';
Variable_name Value
-Handler_read_rnd_next 18
+Handler_read_rnd_next 19
DROP TABLE t1,t2;
End of 5.1 tests
CREATE TABLE t1 (
=== modified file 'mysql-test/r/subselect_no_mat.result'
--- a/mysql-test/r/subselect_no_mat.result 2010-02-21 07:33:54 +0000
+++ b/mysql-test/r/subselect_no_mat.result 2010-03-11 21:43:31 +0000
@@ -1,8 +1,10 @@
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='materialization=off';
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
select (select 2);
(select 2)
2
@@ -4807,8 +4809,9 @@
1
1
DROP TABLE t1;
+set @@optimizer_switch=@save_optimizer_switch;
End of 5.1 tests.
set optimizer_switch=default;
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
=== modified file 'mysql-test/r/subselect_no_opts.result'
--- a/mysql-test/r/subselect_no_opts.result 2010-02-21 07:33:54 +0000
+++ b/mysql-test/r/subselect_no_opts.result 2010-03-11 21:43:31 +0000
@@ -1,8 +1,10 @@
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='materialization=off,semijoin=off';
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
select (select 2);
(select 2)
2
@@ -4807,8 +4809,9 @@
1
1
DROP TABLE t1;
+set @@optimizer_switch=@save_optimizer_switch;
End of 5.1 tests.
set optimizer_switch=default;
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
=== modified file 'mysql-test/r/subselect_no_semijoin.result'
--- a/mysql-test/r/subselect_no_semijoin.result 2010-02-21 07:33:54 +0000
+++ b/mysql-test/r/subselect_no_semijoin.result 2010-03-11 21:43:31 +0000
@@ -1,8 +1,10 @@
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='semijoin=off';
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
select (select 2);
(select 2)
2
@@ -4807,8 +4809,9 @@
1
1
DROP TABLE t1;
+set @@optimizer_switch=@save_optimizer_switch;
End of 5.1 tests.
set optimizer_switch=default;
show variables like 'optimizer_switch';
Variable_name Value
-optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
=== modified file 'mysql-test/r/subselect_sj.result'
--- a/mysql-test/r/subselect_sj.result 2010-02-24 11:33:42 +0000
+++ b/mysql-test/r/subselect_sj.result 2010-03-11 21:43:31 +0000
@@ -202,39 +202,39 @@
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,materialization=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off,semijoin=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,materialization=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch=default;
drop table t0, t1, t2;
drop table t10, t11, t12;
=== modified file 'mysql-test/r/subselect_sj_jcl6.result'
--- a/mysql-test/r/subselect_sj_jcl6.result 2010-03-07 15:41:45 +0000
+++ b/mysql-test/r/subselect_sj_jcl6.result 2010-03-11 21:43:31 +0000
@@ -206,39 +206,39 @@
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,materialization=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off,semijoin=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,materialization=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,semijoin=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch='default,materialization=off,loosescan=off';
select @@optimizer_switch;
@@optimizer_switch
-index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on
+index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on
set optimizer_switch=default;
drop table t0, t1, t2;
drop table t10, t11, t12;
=== modified file 'mysql-test/t/ps.test'
--- a/mysql-test/t/ps.test 2009-05-27 15:19:44 +0000
+++ b/mysql-test/t/ps.test 2010-03-11 21:43:31 +0000
@@ -163,6 +163,9 @@
) engine = MYISAM ;
create table t2 like t1;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
+
set @stmt= ' explain SELECT (SELECT SUM(c1 + c12 + 0.0) FROM t2 where (t1.c2 - 0e-3) = t2.c2 GROUP BY t1.c15 LIMIT 1) as scalar_s, exists (select 1.0e+0 from t2 where t2.c3 * 9.0000000000 = t1.c4) as exists_s, c5 * 4 in (select c6 + 0.3e+1 from t2) as in_s, (c7 - 4, c8 - 4) in (select c9 + 4.0, c10 + 40e-1 from t2) as in_row_s FROM t1, (select c25 x, c32 y from t2) tt WHERE x * 1 = c25 ' ;
prepare stmt1 from @stmt ;
execute stmt1 ;
@@ -171,6 +174,8 @@
deallocate prepare stmt1;
drop tables t1,t2;
+set @@optimizer_switch=@save_optimizer_switch;
+
#
# parameters from variables (for field creation)
#
=== modified file 'mysql-test/t/subselect.test'
--- a/mysql-test/t/subselect.test 2010-01-17 20:52:20 +0000
+++ b/mysql-test/t/subselect.test 2010-03-11 21:43:31 +0000
@@ -11,6 +11,9 @@
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12;
--enable_warnings
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
+
select (select 2);
explain extended select (select 2);
SELECT (SELECT 1) UNION SELECT (SELECT 2);
@@ -4061,4 +4064,6 @@
(SELECT LAST_INSERT_ID() FROM t1 ORDER BY MIN(a) ASC LIMIT 1);
DROP TABLE t1;
+set @@optimizer_switch=@save_optimizer_switch;
+
--echo End of 5.1 tests.
=== modified file 'mysql-test/t/subselect3.test'
--- a/mysql-test/t/subselect3.test 2010-01-17 14:51:10 +0000
+++ b/mysql-test/t/subselect3.test 2010-03-11 21:43:31 +0000
@@ -59,9 +59,13 @@
show status like 'Handler_read_rnd_next';
select ' ^ This must show 11' Z;
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
+
# This must show trigcond:
explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3;
+set @@optimizer_switch=@save_optimizer_switch;
drop table t1, t2, t3;
#
@@ -529,6 +533,9 @@
DROP TABLE t1, t2;
+# The next three test cases must be executed with the IN=>EXISTS strategy
+set @save_optimizer_switch=@@optimizer_switch;
+set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off";
#
# Bug #27870: crash of an equijoin query with WHERE condition containing
@@ -588,6 +595,8 @@
DROP TABLE t1, t2;
+set @@optimizer_switch=@save_optimizer_switch;
+
#
# Bug #34763: item_subselect.cc:1235:Item_in_subselect::row_value_transformer:
# Assertion failed, unexpected error message:
=== modified file 'sql/opt_subselect.cc'
--- a/sql/opt_subselect.cc 2010-03-09 10:36:15 +0000
+++ b/sql/opt_subselect.cc 2010-03-11 21:43:31 +0000
@@ -187,7 +187,11 @@
does not call setup_subquery_materialization(). We could make
SELECT ... FROM DUAL call that function but that doesn't seem
to be the case that is worth handling.
- 4. Subquery is non-correlated
+ 4. Either the subquery predicate is a top-level predicate, or at
+ least one partial match strategy is enabled. If no partial match
+ strategy is enabled, then materialization cannot be used for
+ non-top-level queries because it cannot handle NULLs correctly.
+ 5. Subquery is non-correlated
TODO:
This is an overly restrictive condition. It can be extended to:
(Subquery is non-correlated ||
@@ -195,13 +199,13 @@
(Subquery is correlated to the immediate outer query &&
Subquery !contains {GROUP BY, ORDER BY [LIMIT],
aggregate functions}) && subquery predicate is not under "NOT IN"))
- 5. No execution method was already chosen (by a prepared statement).
+ 6. No execution method was already chosen (by a prepared statement).
(*) The subquery must be part of a SELECT statement. The current
condition also excludes multi-table update statements.
- We have to determine whether we will perform subquery materialization
- before calling the IN=>EXISTS transformation, so that we know whether to
+ Determine whether we will perform subquery materialization before
+ calling the IN=>EXISTS transformation, so that we know whether to
perform the whole transformation or only that part of it which wraps
Item_in_subselect in an Item_in_optimizer.
*/
@@ -211,11 +215,14 @@
select_lex->master_unit()->first_select()->leaf_tables && // 3
thd->lex->sql_command == SQLCOM_SELECT && // *
select_lex->outer_select()->leaf_tables && // 3A
- subquery_types_allow_materialization(in_subs))
+ subquery_types_allow_materialization(in_subs) &&
+ // psergey-todo: duplicated_subselect_card_check: where it's done?
+ (in_subs->is_top_level_item() ||
+ optimizer_flag(thd, OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE) ||
+ optimizer_flag(thd, OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)) &&//4
+ !in_subs->is_correlated && // 5
+ in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED) // 6
{
- // psergey-todo: duplicated_subselect_card_check: where it's done?
- if (!in_subs->is_correlated && // 4
- in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED) // 5
in_subs->exec_method= Item_in_subselect::MATERIALIZATION;
}
1
0