- developers - lists.mariadb.org

[Maria-developers] Updated (by Guest): Update packaging scripts for MariaDB 5.2 (88)
by worklog-noreply＠askmonty.org 13 Mar '10

13 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Update packaging scripts for MariaDB 5.2 CREATION DATE..: Sat, 27 Feb 2010, 16:39 SUPERVISOR.....: Knielsen IMPLEMENTOR....: Knielsen COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 88 (http://askmonty.org/worklog/?tid=88) VERSION........: Server-5.2 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 30 (hours remain) ORIG. ESTIMATE.: 30 PROGRESS NOTES: -=-=(Guest - Sat, 13 Mar 2010, 08:12)=-=- Category updated. --- /tmp/wklog.88.old.22167 2010-03-13 08:12:01.000000000 +0000 +++ /tmp/wklog.88.new.22167 2010-03-13 08:12:01.000000000 +0000 @@ -1 +1 @@ -Server-RawIdeaBin +Server-Sprint DESCRIPTION: The packaging scripts need to be updated to work for MariaDB 5.2 Currently, 5.2 package builds fail in Buildbot. The .debs are missing a debian-5.2 subdirectory. The .rpm also need to be checked. Buildbot needs to be updated to do the new upgrade tests (mariadb-5.1 -> mariadb 5.2) ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2832: 1. don't crash on failing to load a plugin with newer MYSQL_PLUGIN_INTERFACE_VERSION
by noreply＠launchpad.net 12 Mar '10

12 Mar '10

------------------------------------------------------------ revno: 2832 committer: Sergei Golubchik <sergii(a)pisem.net> branch nick: maria-5.1 timestamp: Fri 2010-03-12 20:05:21 +0100 message: 1. don't crash on failing to load a plugin with newer MYSQL_PLUGIN_INTERFACE_VERSION 2. don't copy st_mysql_plugin structure unnecessary (sizeof hasn't changed) modified: sql/sql_plugin.cc sql/sql_plugin.h -- lp:maria https://code.launchpad.net/~maria-captains/maria/5.1 Your team Maria developers is subscribed to branch lp:maria. To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.

1 0

[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2831: Fix myisam checksum patch to check for HA_OPTION_CHECKSUM after it was set, not before
by noreply＠launchpad.net 12 Mar '10

12 Mar '10

------------------------------------------------------------ revno: 2831 committer: Sergei Golubchik <sergii(a)pisem.net> branch nick: maria-5.1 timestamp: Fri 2010-03-12 20:03:37 +0100 message: Fix myisam checksum patch to check for HA_OPTION_CHECKSUM after it was set, not before modified: storage/myisam/mi_create.c -- lp:maria https://code.launchpad.net/~maria-captains/maria/5.1 Your team Maria developers is subscribed to branch lp:maria. To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.

1 0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS transformation CREATION DATE..: Sun, 28 Feb 2010, 13:39 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-Sprint TASK ID........: 89 (http://askmonty.org/worklog/?tid=89) VERSION........: Server-5.3 STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=- Status updated. --- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000 +++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Category updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Server-RawIdeaBin +Server-Sprint -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Status updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Un-Assigned +Assigned -=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000 +++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000 @@ -36,8 +36,8 @@ So, we'll need to compute both exists_select_cost and materialization_cost. -Difficulty with computing the two costs ---------------------------------------- +Difficulty with the need to run select optimization two times +------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. @@ -46,4 +46,10 @@ 3. Then we find that cost #1 is less and want to execute the materialization strategy. +The problem is that once one injects "oe=ie", it can trigger some optimization +steps that are not possible to undo. +- Example1: outer->inner join conversion +- non-Example: according to Igor, "oe=ie" won't participate in equality propagation. +- ... what else ? + -=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000 +++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000 @@ -36,3 +36,14 @@ So, we'll need to compute both exists_select_cost and materialization_cost. +Difficulty with computing the two costs +--------------------------------------- +The problem is in this scenario: +1. We compute materialization_cost by running optimization for the original + subquery select. +2. We compute exists_select_cost by running optimization for the subquery's + select with "oe=ie" injected into WHERE +3. Then we find that cost #1 is less and want to execute the materialization + strategy. + + -=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000 +++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000 @@ -1 +1,38 @@ +Why need two optimizations +-------------------------- +Consider a query with subquery: + + SELECT + oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) + FROM outer_tbl + WHERE outer_cond + +If we use Materialization strategy, the costs will be + + cost of accessing outer_tbl + + materialization_cost + + #records(outer_tbl w/o outer_cond) * lookup_cost + +where + + materialization_cost= + cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) + +On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into + + SELECT + EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + FROM outer_tbl + WHERE outer_cond + +and the costs will be + + cost of accessing outer_tbl + + #records(outer_tbl w/o outer_cond) * exists_select_cost + +where + exists_select_cost= + cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + +So, we'll need to compute both exists_select_cost and materialization_cost. -=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=- Dependency created: 91 now depends on 89 DESCRIPTION: For uncorrelated IN subqueries that can't be converted to semi-joins it is necessary to make a cost-based choice between IN->EXISTS and Materialization strategies. Both strategies handle two cases: 1. A simple case w/o NULLs handling 2. Handling NULLs. This WL is about making cost-based decision for #1. HIGH-LEVEL SPECIFICATION: Why need two optimizations -------------------------- Consider a query with subquery: SELECT oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) FROM outer_tbl WHERE outer_cond If we use Materialization strategy, the costs will be cost of accessing outer_tbl + materialization_cost + #records(outer_tbl w/o outer_cond) * lookup_cost where materialization_cost= cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into SELECT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) FROM outer_tbl WHERE outer_cond and the costs will be cost of accessing outer_tbl + #records(outer_tbl w/o outer_cond) * exists_select_cost where exists_select_cost= cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) So, we'll need to compute both exists_select_cost and materialization_cost. Difficulty with the need to run select optimization two times ------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. 2. We compute exists_select_cost by running optimization for the subquery's select with "oe=ie" injected into WHERE 3. Then we find that cost #1 is less and want to execute the materialization strategy. The problem is that once one injects "oe=ie", it can trigger some optimization steps that are not possible to undo. - Example1: outer->inner join conversion - non-Example: according to Igor, "oe=ie" won't participate in equality propagation. - ... what else ? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS transformation CREATION DATE..: Sun, 28 Feb 2010, 13:39 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-Sprint TASK ID........: 89 (http://askmonty.org/worklog/?tid=89) VERSION........: Server-5.3 STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=- Status updated. --- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000 +++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Category updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Server-RawIdeaBin +Server-Sprint -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Status updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Un-Assigned +Assigned -=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000 +++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000 @@ -36,8 +36,8 @@ So, we'll need to compute both exists_select_cost and materialization_cost. -Difficulty with computing the two costs ---------------------------------------- +Difficulty with the need to run select optimization two times +------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. @@ -46,4 +46,10 @@ 3. Then we find that cost #1 is less and want to execute the materialization strategy. +The problem is that once one injects "oe=ie", it can trigger some optimization +steps that are not possible to undo. +- Example1: outer->inner join conversion +- non-Example: according to Igor, "oe=ie" won't participate in equality propagation. +- ... what else ? + -=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000 +++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000 @@ -36,3 +36,14 @@ So, we'll need to compute both exists_select_cost and materialization_cost. +Difficulty with computing the two costs +--------------------------------------- +The problem is in this scenario: +1. We compute materialization_cost by running optimization for the original + subquery select. +2. We compute exists_select_cost by running optimization for the subquery's + select with "oe=ie" injected into WHERE +3. Then we find that cost #1 is less and want to execute the materialization + strategy. + + -=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000 +++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000 @@ -1 +1,38 @@ +Why need two optimizations +-------------------------- +Consider a query with subquery: + + SELECT + oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) + FROM outer_tbl + WHERE outer_cond + +If we use Materialization strategy, the costs will be + + cost of accessing outer_tbl + + materialization_cost + + #records(outer_tbl w/o outer_cond) * lookup_cost + +where + + materialization_cost= + cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) + +On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into + + SELECT + EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + FROM outer_tbl + WHERE outer_cond + +and the costs will be + + cost of accessing outer_tbl + + #records(outer_tbl w/o outer_cond) * exists_select_cost + +where + exists_select_cost= + cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + +So, we'll need to compute both exists_select_cost and materialization_cost. -=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=- Dependency created: 91 now depends on 89 DESCRIPTION: For uncorrelated IN subqueries that can't be converted to semi-joins it is necessary to make a cost-based choice between IN->EXISTS and Materialization strategies. Both strategies handle two cases: 1. A simple case w/o NULLs handling 2. Handling NULLs. This WL is about making cost-based decision for #1. HIGH-LEVEL SPECIFICATION: Why need two optimizations -------------------------- Consider a query with subquery: SELECT oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) FROM outer_tbl WHERE outer_cond If we use Materialization strategy, the costs will be cost of accessing outer_tbl + materialization_cost + #records(outer_tbl w/o outer_cond) * lookup_cost where materialization_cost= cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into SELECT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) FROM outer_tbl WHERE outer_cond and the costs will be cost of accessing outer_tbl + #records(outer_tbl w/o outer_cond) * exists_select_cost where exists_select_cost= cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) So, we'll need to compute both exists_select_cost and materialization_cost. Difficulty with the need to run select optimization two times ------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. 2. We compute exists_select_cost by running optimization for the subquery's select with "oe=ie" injected into WHERE 3. Then we find that cost #1 is less and want to execute the materialization strategy. The problem is that once one injects "oe=ie", it can trigger some optimization steps that are not possible to undo. - Example1: outer->inner join conversion - non-Example: according to Igor, "oe=ie" won't participate in equality propagation. - ... what else ? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS transformation CREATION DATE..: Sun, 28 Feb 2010, 13:39 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-Sprint TASK ID........: 89 (http://askmonty.org/worklog/?tid=89) VERSION........: Server-5.3 STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=- Status updated. --- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000 +++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Category updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Server-RawIdeaBin +Server-Sprint -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Status updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Un-Assigned +Assigned -=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000 +++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000 @@ -36,8 +36,8 @@ So, we'll need to compute both exists_select_cost and materialization_cost. -Difficulty with computing the two costs ---------------------------------------- +Difficulty with the need to run select optimization two times +------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. @@ -46,4 +46,10 @@ 3. Then we find that cost #1 is less and want to execute the materialization strategy. +The problem is that once one injects "oe=ie", it can trigger some optimization +steps that are not possible to undo. +- Example1: outer->inner join conversion +- non-Example: according to Igor, "oe=ie" won't participate in equality propagation. +- ... what else ? + -=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000 +++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000 @@ -36,3 +36,14 @@ So, we'll need to compute both exists_select_cost and materialization_cost. +Difficulty with computing the two costs +--------------------------------------- +The problem is in this scenario: +1. We compute materialization_cost by running optimization for the original + subquery select. +2. We compute exists_select_cost by running optimization for the subquery's + select with "oe=ie" injected into WHERE +3. Then we find that cost #1 is less and want to execute the materialization + strategy. + + -=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000 +++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000 @@ -1 +1,38 @@ +Why need two optimizations +-------------------------- +Consider a query with subquery: + + SELECT + oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) + FROM outer_tbl + WHERE outer_cond + +If we use Materialization strategy, the costs will be + + cost of accessing outer_tbl + + materialization_cost + + #records(outer_tbl w/o outer_cond) * lookup_cost + +where + + materialization_cost= + cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) + +On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into + + SELECT + EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + FROM outer_tbl + WHERE outer_cond + +and the costs will be + + cost of accessing outer_tbl + + #records(outer_tbl w/o outer_cond) * exists_select_cost + +where + exists_select_cost= + cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + +So, we'll need to compute both exists_select_cost and materialization_cost. -=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=- Dependency created: 91 now depends on 89 DESCRIPTION: For uncorrelated IN subqueries that can't be converted to semi-joins it is necessary to make a cost-based choice between IN->EXISTS and Materialization strategies. Both strategies handle two cases: 1. A simple case w/o NULLs handling 2. Handling NULLs. This WL is about making cost-based decision for #1. HIGH-LEVEL SPECIFICATION: Why need two optimizations -------------------------- Consider a query with subquery: SELECT oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) FROM outer_tbl WHERE outer_cond If we use Materialization strategy, the costs will be cost of accessing outer_tbl + materialization_cost + #records(outer_tbl w/o outer_cond) * lookup_cost where materialization_cost= cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into SELECT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) FROM outer_tbl WHERE outer_cond and the costs will be cost of accessing outer_tbl + #records(outer_tbl w/o outer_cond) * exists_select_cost where exists_select_cost= cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) So, we'll need to compute both exists_select_cost and materialization_cost. Difficulty with the need to run select optimization two times ------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. 2. We compute exists_select_cost by running optimization for the subquery's select with "oe=ie" injected into WHERE 3. Then we find that cost #1 is less and want to execute the materialization strategy. The problem is that once one injects "oe=ie", it can trigger some optimization steps that are not possible to undo. - Example1: outer->inner join conversion - non-Example: according to Igor, "oe=ie" won't participate in equality propagation. - ... what else ? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS transformation CREATION DATE..: Sun, 28 Feb 2010, 13:39 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-Sprint TASK ID........: 89 (http://askmonty.org/worklog/?tid=89) VERSION........: Server-5.3 STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=- Status updated. --- /tmp/wklog.89.old.13018 2010-03-12 09:17:25.000000000 +0000 +++ /tmp/wklog.89.new.13018 2010-03-12 09:17:25.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Category updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Server-RawIdeaBin +Server-Sprint -=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=- Status updated. --- /tmp/wklog.89.old.778 2010-03-10 21:48:08.000000000 +0000 +++ /tmp/wklog.89.new.778 2010-03-10 21:48:08.000000000 +0000 @@ -1 +1 @@ -Un-Assigned +Assigned -=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000 +++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000 @@ -36,8 +36,8 @@ So, we'll need to compute both exists_select_cost and materialization_cost. -Difficulty with computing the two costs ---------------------------------------- +Difficulty with the need to run select optimization two times +------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. @@ -46,4 +46,10 @@ 3. Then we find that cost #1 is less and want to execute the materialization strategy. +The problem is that once one injects "oe=ie", it can trigger some optimization +steps that are not possible to undo. +- Example1: outer->inner join conversion +- non-Example: according to Igor, "oe=ie" won't participate in equality propagation. +- ... what else ? + -=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000 +++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000 @@ -36,3 +36,14 @@ So, we'll need to compute both exists_select_cost and materialization_cost. +Difficulty with computing the two costs +--------------------------------------- +The problem is in this scenario: +1. We compute materialization_cost by running optimization for the original + subquery select. +2. We compute exists_select_cost by running optimization for the subquery's + select with "oe=ie" injected into WHERE +3. Then we find that cost #1 is less and want to execute the materialization + strategy. + + -=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000 +++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000 @@ -1 +1,38 @@ +Why need two optimizations +-------------------------- +Consider a query with subquery: + + SELECT + oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) + FROM outer_tbl + WHERE outer_cond + +If we use Materialization strategy, the costs will be + + cost of accessing outer_tbl + + materialization_cost + + #records(outer_tbl w/o outer_cond) * lookup_cost + +where + + materialization_cost= + cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) + +On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into + + SELECT + EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + FROM outer_tbl + WHERE outer_cond + +and the costs will be + + cost of accessing outer_tbl + + #records(outer_tbl w/o outer_cond) * exists_select_cost + +where + exists_select_cost= + cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + +So, we'll need to compute both exists_select_cost and materialization_cost. -=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=- Dependency created: 91 now depends on 89 DESCRIPTION: For uncorrelated IN subqueries that can't be converted to semi-joins it is necessary to make a cost-based choice between IN->EXISTS and Materialization strategies. Both strategies handle two cases: 1. A simple case w/o NULLs handling 2. Handling NULLs. This WL is about making cost-based decision for #1. HIGH-LEVEL SPECIFICATION: Why need two optimizations -------------------------- Consider a query with subquery: SELECT oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) FROM outer_tbl WHERE outer_cond If we use Materialization strategy, the costs will be cost of accessing outer_tbl + materialization_cost + #records(outer_tbl w/o outer_cond) * lookup_cost where materialization_cost= cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into SELECT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) FROM outer_tbl WHERE outer_cond and the costs will be cost of accessing outer_tbl + #records(outer_tbl w/o outer_cond) * exists_select_cost where exists_select_cost= cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) So, we'll need to compute both exists_select_cost and materialization_cost. Difficulty with the need to run select optimization two times ------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. 2. We compute exists_select_cost by running optimization for the subquery's select with "oe=ie" injected into WHERE 3. Then we find that cost #1 is less and want to execute the materialization strategy. The problem is that once one injects "oe=ie", it can trigger some optimization steps that are not possible to undo. - Example1: outer->inner join conversion - non-Example: according to Igor, "oe=ie" won't participate in equality propagation. - ... what else ? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Timour): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs CREATION DATE..: Fri, 27 Nov 2009, 13:22 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 68 (http://askmonty.org/worklog/?tid=68) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=- Dependency created: 91 now depends on 68 -=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=- Dependency deleted: 94 no longer depends on 68 -=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=- Dependency created: 94 now depends on 68 -=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=- Status updated. No change. -=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=- Status updated. --- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000 +++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200 +++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200 @@ -233,6 +233,7 @@ 1. If columns a_j1,...,a_jm do not contain null values in the temporary table at all and v_j1,...,v_jm cannot be null, create for these columns only one index array (and of course do not create any bitmaps for them). +[done] 2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number of rows, where a_i is not null and V(a_i) is the number of distinct @@ -264,6 +265,10 @@ 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. +[This is wrong, because if we don't fill the whole temp table, there may + be some tuple(s) that would match some outer tuple. In such cases, if we + stop filling the temp table, we would miss a TRUE result. Having a partial + match doesn't preclude us from having a complete match]. 8. [timour] Consider that due to materialization, we already have a unique index -=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200 +++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200 @@ -132,11 +132,10 @@ if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE - if (nonull_key) - pq.insert(nonull_key) for (i = 1; i <= n; i++) { + if (vkey[i] != nonull_key) vkey[i].lookup(outer_ref) if (! vkey[i].is_eof()) pq.insert(i) @@ -167,7 +166,7 @@ /* There cannot be a complete match, as we already checked for one. */ assert(matching_keys.elements < n) } - else if (cur_min_key == nonull_key) + else if (vkey[cur_min_key] == nonull_key) { /* The non-NULL key has no corresponding NULL index, so we know for @@ -183,8 +182,10 @@ /* Check if all null_keys contain a NULL at row 'min_row'. The procedure internally checks all keys in a special precomputed order. A prior - procedure determines an optimal order and a mapping - idx_no -> idx_order (encoded as an array). + procedure determines an optimal order and a mapping idx_no -> idx_order + (encoded as an array). + + This procedure makes sure not to match the non-NULL column. */ if (test_null_row(null_keys, min_row)) return TRUE @@ -198,6 +199,14 @@ vkey[cur_min_key].next() if (! vkey[cur_min_key].is_eof()) pq.insert(cur_min_key) + else if (vkey[cur_min_key] == nonull_key) + { + /* + If there can't be more matches for the nonull_key, we know for sure + there is no match, since there is no possible NULL match. + */ + return FALSE + } if (pq.is_empty()) { @@ -216,7 +225,6 @@ } - 3. Directions for improvement ======================================================================== -=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200 +++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200 @@ -132,6 +132,8 @@ if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE + if (nonull_key) + pq.insert(nonull_key) for (i = 1; i <= n; i++) { -=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200 +++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200 @@ -1,8 +1,16 @@ -This a copy of the initial algorithm proposed by Igor: -====================================================== +Contents +======================================================================== -For each left side tuple (v_1,...,v_n) we have to find the following set -of rowids for the temp table containing N rows as the result of +1. Initial idea as proposed by Igor +2. Algorithm for IN execution with partial matching +3. Directions for improvement + + +1. Initial idea as proposed by Igor +======================================================================== + +For each left side tuple (v_1,...,v_n) we have to find the following +set of rowids for the temp table containing N rows as the result of materialization of the subquery: R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs @@ -18,38 +26,198 @@ - it requires minimum memory: not more than N*n bits in total - search of an element in a set is extremely cheap -Taken all above into account I could suggest the following algorithm to -build R: +Taken all above into account I could suggest the following algorithm +to build R: - Using indexes (read about them below) for each column participating in the - intersection, - merge ordered sets rowid{a_i=v_i} in the following manner. + Using indexes (read about them below) for each column participating + in the intersection, merge ordered sets rowid{a_i=v_i} in the + following manner. If a rowid r has been encountered maximum in k sets -rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), + rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), then it has to be checked against all rowid{a_i=v_i} such that i is -not in {i1,...,ik}. + not in {i1,...,ik}. As soon as we fail to find r in one of these sets we discard it. If r has been found in all of them then r belongs to the set R. -Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i -is null} is either +Here we use the property (1): +any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can -infer that for any r from R -indexes a_i can be uniquely divided into two groups: one contains -indexes a_i where r belongs to -the sets rowid{a_i=v_i}, the other contains indexes a_j such that r -belongs to rowid{a_j is null}. - -Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order -needed for the merge procedure. We could use BTREE indexes for temp -table. But they are rather expensive and -take a lot of memory as the are implemented with RB trees. +infer that for any r from R indexes a_i can be uniquely divided into +two groups: +- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i}, +- the other contains indexes a_j such that r belongs to + rowid{a_j is null}. + +Now let's talk how to get elements from rowid{a_i=v_i} in a sorted +order needed for the merge procedure. We could use BTREE indexes for +temp table. But they are rather expensive and take a lot of memory as +the are implemented with RB trees. I would suggest creating for each column from the temporary table just an array of rowids sorted by the value from column a. Index lookup in such an array is cheap. It's also rather cheap to check that the next rowid refers to a row with a different value in column a. The array can be created on demand. +2. Algorithm for IN execution with partial matching +======================================================================== + +2.1 Below is shown the top-level algorithm to execute an IN predicate +with partial matching. This algorithm is essentially the implementation +of Item_subselect:exec(). + +int lookup_with_null_semantics(outer_ref[], mat_subquery) +{ + if (index_lookup(outer_ref, mat_subquery) + return TRUE + else + { + /* + Check if there is a partial match (UNKNOWN) or no match (NULL). + */ + if (this is the first partial match) + { + vkey[] = build array of value keys for each NULL-able column + of mat_subquery. + nkey[] = build a bitmap NULL index for each column of mat_subquery + that contains NULLs + nonull_key = build a key over all non-NULL columns of mat_subquery + } + if (partial_match(outer_ref, vkey[], nkey[], nonull_key) + return UNKNOWN + else + return FALSE + } +} + +2.2 The implementation of partial matching is as follows + +/* + Assumptions: + - It has already been checked if there is a complete match by a + regular index lookup, and the test failed. + - It has already been checked if there is a complete NULL row, + and if there was we wouldn't call this function. Thus we assume + that there is no complete NULL row. + - Not all vidx_i are empty, but some can be empty. If all were empty, + then the only possibility for a match is a complete NULL row, which + we already checked. + + @param outer_ref - the uter (left) IN argument. + @param vidx[] - array of value keys + Ordered sequences of rowids of the corresponding columns a_i, such + that all rowids in idx_i are the ones where column a_i contains some + value or NULL. Each idx_i is derived dynamically, for each different + left argument of an IN predicate. + @param nidx[] - array of NULL keys + Bitmpas, one per each column, where a bit is set if the corresponding + row has a NULL value for the corresponding column. + @nonull_key - the only key over all columns of the materialized subquery + that do not contain NULLs + + @returns + @retval FALSE if there is no match + @retval TRUE if there is a partial match +*/ + +Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key) +{ + /* Set of the keys (columns) that form a partial match. */ + Set matching_keys = {} + /* A subset of all keys that need to be checked for NULL matches. */ + Set null_keys = {} + Int min_key /* Key that contains the current minimum position. */ + Int min_row /* Current row number of min_key. */ + Int cur_min_key, cur_min_row + PriorityQueue pq + + if (nonull_key && ! nonull_key->lookup(outer_ref)) + return FALSE + + for (i = 1; i <= n; i++) + { + vkey[i].lookup(outer_ref) + if (! vkey[i].is_eof()) + pq.insert(i) + } + /* + Not all value keys are empty, thus we don't have only NULL + keys. If we had, the only possible match is a NULL row, and + we cheked there is no such row, therefore the result is known + to be FALSE. + In fact this algorithm makes sense for at least two non-NULL + columns. + */ + assert(pq.elements > 1) + + (min_key, min_row) = pq.pop() + matching_keys.add(min_key) + vkey[min_key].next() + if (! vkey[min_key].is_eof()) + pq.insert(min_key) + + while (TRUE) + { + (cur_min_key, cur_min_row) = pq.pop() + + if (cur_min_row == min_row) + { + matching_keys.add(cur_min_key) + /* There cannot be a complete match, as we already checked for one. */ + assert(matching_keys.elements < n) + } + else if (cur_min_key == nonull_key) + { + /* + The non-NULL key has no corresponding NULL index, so we know for + sure that the row 'min_row' is not a match. + */ + (min_key, min_row) = (cur_min_key, cur_min_row) + matching_keys = {min_key} + } + else + { + assert(cur_min_row > min_row) /* Follows from the use of PQ. */ + null_keys = set_difference(all keys vkey[], matching_keys) + /* + Check if all null_keys contain a NULL at row 'min_row'. The procedure + internally checks all keys in a special precomputed order. A prior + procedure determines an optimal order and a mapping + idx_no -> idx_order (encoded as an array). + */ + if (test_null_row(null_keys, min_row)) + return TRUE + else + { + (min_key, min_row) = (cur_min_key, cur_min_row) + matching_keys = {min_key} + } + } + + vkey[cur_min_key].next() + if (! vkey[cur_min_key].is_eof()) + pq.insert(cur_min_key) + + if (pq.is_empty()) + { + /* Check the last row of the last column in PQ for NULL matches. */ + null_keys = set_difference(all keys vkey[], matching_keys) + if (test_null_row(null_keys, min_row)) + return TRUE + else + return FALSE + } + } + + /* We should never get here. */ + assert(FALSE) + return FALSE +} + + + +3. Directions for improvement +======================================================================== + Other consideration that may be taken into account: 1. If columns a_j1,...,a_jm do not contain null values in the temporary -=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200 +++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200 @@ -87,3 +87,8 @@ 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. +8. [timour] + Consider that due to materialization, we already have a unique index +on all columns <a_1,..., a_n>. We can use the first key part of this index +over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid +creating the index rowid{a_i=v_i}. ------------------------------------------------------------ -=-=(View All Progress Notes, 16 total)=-=- http://askmonty.org/worklog/index.pl?tid=68&nolimit=1 DESCRIPTION: The goal of this task is to implement efficient execution of NOT IN subquery predicates of the form: <oe_1,...,oe_n> NOT IN <non_correlated subquery> when either some oe_i, or some subqury result column contains NULLs. The problem with such predicates is that it is possible to use index lookups only when neither argument of the predicate contains NULLs. If some argument contains a NULL, then due to NULL semantics, it plays the role of a wildcard. If we were to use regular index lookups, then we would get 'no match' for some outer tuple (thus the predicate evaluates to FALSE), while the SQL semantics means 'partial match', and the predicate should evaluate to NULL. This task implements an efficient algorithm to compute such 'parial matches', where a NULL matches any value. HIGH-LEVEL SPECIFICATION: Contents ======================================================================== 1. Initial idea as proposed by Igor 2. Algorithm for IN execution with partial matching 3. Directions for improvement 1. Initial idea as proposed by Igor ======================================================================== For each left side tuple (v_1,...,v_n) we have to find the following set of rowids for the temp table containing N rows as the result of materialization of the subquery: R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs trough all indexes from [1..n] such that v_i is not null. Bear in mind the following specifics of this intersection: (1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint (2) For each i: rowid{a_i is null} is the same for each tuple, that is, this set is independent of the left-side tuples. Due to (2) it makes sense to build rowid{a_i is null} only once. A good representation for such sets would be bitmaps: - it requires minimum memory: not more than N*n bits in total - search of an element in a set is extremely cheap Taken all above into account I could suggest the following algorithm to build R: Using indexes (read about them below) for each column participating in the intersection, merge ordered sets rowid{a_i=v_i} in the following manner. If a rowid r has been encountered maximum in k sets rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), then it has to be checked against all rowid{a_i=v_i} such that i is not in {i1,...,ik}. As soon as we fail to find r in one of these sets we discard it. If r has been found in all of them then r belongs to the set R. Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can infer that for any r from R indexes a_i can be uniquely divided into two groups: - one contains indexes a_i where r belongs to the sets rowid{a_i=v_i}, - the other contains indexes a_j such that r belongs to rowid{a_j is null}. Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order needed for the merge procedure. We could use BTREE indexes for temp table. But they are rather expensive and take a lot of memory as the are implemented with RB trees. I would suggest creating for each column from the temporary table just an array of rowids sorted by the value from column a. Index lookup in such an array is cheap. It's also rather cheap to check that the next rowid refers to a row with a different value in column a. The array can be created on demand. 2. Algorithm for IN execution with partial matching ======================================================================== 2.1 Below is shown the top-level algorithm to execute an IN predicate with partial matching. This algorithm is essentially the implementation of Item_subselect:exec(). int lookup_with_null_semantics(outer_ref[], mat_subquery) { if (index_lookup(outer_ref, mat_subquery) return TRUE else { /* Check if there is a partial match (UNKNOWN) or no match (NULL). */ if (this is the first partial match) { vkey[] = build array of value keys for each NULL-able column of mat_subquery. nkey[] = build a bitmap NULL index for each column of mat_subquery that contains NULLs nonull_key = build a key over all non-NULL columns of mat_subquery } if (partial_match(outer_ref, vkey[], nkey[], nonull_key) return UNKNOWN else return FALSE } } 2.2 The implementation of partial matching is as follows /* Assumptions: - It has already been checked if there is a complete match by a regular index lookup, and the test failed. - It has already been checked if there is a complete NULL row, and if there was we wouldn't call this function. Thus we assume that there is no complete NULL row. - Not all vidx_i are empty, but some can be empty. If all were empty, then the only possibility for a match is a complete NULL row, which we already checked. @param outer_ref - the uter (left) IN argument. @param vidx[] - array of value keys Ordered sequences of rowids of the corresponding columns a_i, such that all rowids in idx_i are the ones where column a_i contains some value or NULL. Each idx_i is derived dynamically, for each different left argument of an IN predicate. @param nidx[] - array of NULL keys Bitmpas, one per each column, where a bit is set if the corresponding row has a NULL value for the corresponding column. @nonull_key - the only key over all columns of the materialized subquery that do not contain NULLs @returns @retval FALSE if there is no match @retval TRUE if there is a partial match */ Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key) { /* Set of the keys (columns) that form a partial match. */ Set matching_keys = {} /* A subset of all keys that need to be checked for NULL matches. */ Set null_keys = {} Int min_key /* Key that contains the current minimum position. */ Int min_row /* Current row number of min_key. */ Int cur_min_key, cur_min_row PriorityQueue pq if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE for (i = 1; i <= n; i++) { if (vkey[i] != nonull_key) vkey[i].lookup(outer_ref) if (! vkey[i].is_eof()) pq.insert(i) } /* Not all value keys are empty, thus we don't have only NULL keys. If we had, the only possible match is a NULL row, and we cheked there is no such row, therefore the result is known to be FALSE. In fact this algorithm makes sense for at least two non-NULL columns. */ assert(pq.elements > 1) (min_key, min_row) = pq.pop() matching_keys.add(min_key) vkey[min_key].next() if (! vkey[min_key].is_eof()) pq.insert(min_key) while (TRUE) { (cur_min_key, cur_min_row) = pq.pop() if (cur_min_row == min_row) { matching_keys.add(cur_min_key) /* There cannot be a complete match, as we already checked for one. */ assert(matching_keys.elements < n) } else if (vkey[cur_min_key] == nonull_key) { /* The non-NULL key has no corresponding NULL index, so we know for sure that the row 'min_row' is not a match. */ (min_key, min_row) = (cur_min_key, cur_min_row) matching_keys = {min_key} } else { assert(cur_min_row > min_row) /* Follows from the use of PQ. */ null_keys = set_difference(all keys vkey[], matching_keys) /* Check if all null_keys contain a NULL at row 'min_row'. The procedure internally checks all keys in a special precomputed order. A prior procedure determines an optimal order and a mapping idx_no -> idx_order (encoded as an array). This procedure makes sure not to match the non-NULL column. */ if (test_null_row(null_keys, min_row)) return TRUE else { (min_key, min_row) = (cur_min_key, cur_min_row) matching_keys = {min_key} } } vkey[cur_min_key].next() if (! vkey[cur_min_key].is_eof()) pq.insert(cur_min_key) else if (vkey[cur_min_key] == nonull_key) { /* If there can't be more matches for the nonull_key, we know for sure there is no match, since there is no possible NULL match. */ return FALSE } if (pq.is_empty()) { /* Check the last row of the last column in PQ for NULL matches. */ null_keys = set_difference(all keys vkey[], matching_keys) if (test_null_row(null_keys, min_row)) return TRUE else return FALSE } } /* We should never get here. */ assert(FALSE) return FALSE } 3. Directions for improvement ======================================================================== Other consideration that may be taken into account: 1. If columns a_j1,...,a_jm do not contain null values in the temporary table at all and v_j1,...,v_jm cannot be null, create for these columns only one index array (and of course do not create any bitmaps for them). [done] 2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number of rows, where a_i is not null and V(a_i) is the number of distinct values for a_i excluding nulls. If d(a_i) is close to N'(a_i) then do not create any index array: check whether there is a match running through the records that have been filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection with rowid{a_i=v_i} will not reduce the number of remaining rowids significantly. In other words is V(a_i) exceeds some threshold there is no sense to create an index for a_i. If additionally N-N'(a_i) is small do not create a bitmap for this column either. 3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is small a sorted array of rowids from the set rowid{a_i is null} can be used instead of a bitmap. 4. We always have a match if R0= INTERSECT rowid{a_i is null} is not empty. Here i runs through all indexes from [1..n] such that v_i is not null. For a given subset of columns this fact has to be checked only once. It can be easily done with bitmap intersection. 5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be created only for rows with nulls. 6. If v1,...,vn never can be a null and number of rows with nulls is small do not create indexes and do not create bitmaps. 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. [This is wrong, because if we don't fill the whole temp table, there may be some tuple(s) that would match some outer tuple. In such cases, if we stop filling the temp table, we would miss a TRUE result. Having a partial match doesn't preclude us from having a complete match]. 8. [timour] Consider that due to materialization, we already have a unique index on all columns <a_1,..., a_n>. We can use the first key part of this index over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid creating the index rowid{a_i=v_i}. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Timour): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply＠askmonty.org 12 Mar '10

12 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs CREATION DATE..: Fri, 27 Nov 2009, 13:22 SUPERVISOR.....: Monty IMPLEMENTOR....: Timour COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 68 (http://askmonty.org/worklog/?tid=68) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=- Dependency created: 91 now depends on 68 -=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=- Dependency deleted: 94 no longer depends on 68 -=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=- Dependency created: 94 now depends on 68 -=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=- Status updated. No change. -=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=- Status updated. --- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000 +++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000 @@ -1 +1 @@ -Assigned +In-Progress -=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200 +++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200 @@ -233,6 +233,7 @@ 1. If columns a_j1,...,a_jm do not contain null values in the temporary table at all and v_j1,...,v_jm cannot be null, create for these columns only one index array (and of course do not create any bitmaps for them). +[done] 2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number of rows, where a_i is not null and V(a_i) is the number of distinct @@ -264,6 +265,10 @@ 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. +[This is wrong, because if we don't fill the whole temp table, there may + be some tuple(s) that would match some outer tuple. In such cases, if we + stop filling the temp table, we would miss a TRUE result. Having a partial + match doesn't preclude us from having a complete match]. 8. [timour] Consider that due to materialization, we already have a unique index -=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200 +++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200 @@ -132,11 +132,10 @@ if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE - if (nonull_key) - pq.insert(nonull_key) for (i = 1; i <= n; i++) { + if (vkey[i] != nonull_key) vkey[i].lookup(outer_ref) if (! vkey[i].is_eof()) pq.insert(i) @@ -167,7 +166,7 @@ /* There cannot be a complete match, as we already checked for one. */ assert(matching_keys.elements < n) } - else if (cur_min_key == nonull_key) + else if (vkey[cur_min_key] == nonull_key) { /* The non-NULL key has no corresponding NULL index, so we know for @@ -183,8 +182,10 @@ /* Check if all null_keys contain a NULL at row 'min_row'. The procedure internally checks all keys in a special precomputed order. A prior - procedure determines an optimal order and a mapping - idx_no -> idx_order (encoded as an array). + procedure determines an optimal order and a mapping idx_no -> idx_order + (encoded as an array). + + This procedure makes sure not to match the non-NULL column. */ if (test_null_row(null_keys, min_row)) return TRUE @@ -198,6 +199,14 @@ vkey[cur_min_key].next() if (! vkey[cur_min_key].is_eof()) pq.insert(cur_min_key) + else if (vkey[cur_min_key] == nonull_key) + { + /* + If there can't be more matches for the nonull_key, we know for sure + there is no match, since there is no possible NULL match. + */ + return FALSE + } if (pq.is_empty()) { @@ -216,7 +225,6 @@ } - 3. Directions for improvement ======================================================================== -=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200 +++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200 @@ -132,6 +132,8 @@ if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE + if (nonull_key) + pq.insert(nonull_key) for (i = 1; i <= n; i++) { -=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200 +++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200 @@ -1,8 +1,16 @@ -This a copy of the initial algorithm proposed by Igor: -====================================================== +Contents +======================================================================== -For each left side tuple (v_1,...,v_n) we have to find the following set -of rowids for the temp table containing N rows as the result of +1. Initial idea as proposed by Igor +2. Algorithm for IN execution with partial matching +3. Directions for improvement + + +1. Initial idea as proposed by Igor +======================================================================== + +For each left side tuple (v_1,...,v_n) we have to find the following +set of rowids for the temp table containing N rows as the result of materialization of the subquery: R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs @@ -18,38 +26,198 @@ - it requires minimum memory: not more than N*n bits in total - search of an element in a set is extremely cheap -Taken all above into account I could suggest the following algorithm to -build R: +Taken all above into account I could suggest the following algorithm +to build R: - Using indexes (read about them below) for each column participating in the - intersection, - merge ordered sets rowid{a_i=v_i} in the following manner. + Using indexes (read about them below) for each column participating + in the intersection, merge ordered sets rowid{a_i=v_i} in the + following manner. If a rowid r has been encountered maximum in k sets -rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), + rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), then it has to be checked against all rowid{a_i=v_i} such that i is -not in {i1,...,ik}. + not in {i1,...,ik}. As soon as we fail to find r in one of these sets we discard it. If r has been found in all of them then r belongs to the set R. -Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i -is null} is either +Here we use the property (1): +any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can -infer that for any r from R -indexes a_i can be uniquely divided into two groups: one contains -indexes a_i where r belongs to -the sets rowid{a_i=v_i}, the other contains indexes a_j such that r -belongs to rowid{a_j is null}. - -Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order -needed for the merge procedure. We could use BTREE indexes for temp -table. But they are rather expensive and -take a lot of memory as the are implemented with RB trees. +infer that for any r from R indexes a_i can be uniquely divided into +two groups: +- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i}, +- the other contains indexes a_j such that r belongs to + rowid{a_j is null}. + +Now let's talk how to get elements from rowid{a_i=v_i} in a sorted +order needed for the merge procedure. We could use BTREE indexes for +temp table. But they are rather expensive and take a lot of memory as +the are implemented with RB trees. I would suggest creating for each column from the temporary table just an array of rowids sorted by the value from column a. Index lookup in such an array is cheap. It's also rather cheap to check that the next rowid refers to a row with a different value in column a. The array can be created on demand. +2. Algorithm for IN execution with partial matching +======================================================================== + +2.1 Below is shown the top-level algorithm to execute an IN predicate +with partial matching. This algorithm is essentially the implementation +of Item_subselect:exec(). + +int lookup_with_null_semantics(outer_ref[], mat_subquery) +{ + if (index_lookup(outer_ref, mat_subquery) + return TRUE + else + { + /* + Check if there is a partial match (UNKNOWN) or no match (NULL). + */ + if (this is the first partial match) + { + vkey[] = build array of value keys for each NULL-able column + of mat_subquery. + nkey[] = build a bitmap NULL index for each column of mat_subquery + that contains NULLs + nonull_key = build a key over all non-NULL columns of mat_subquery + } + if (partial_match(outer_ref, vkey[], nkey[], nonull_key) + return UNKNOWN + else + return FALSE + } +} + +2.2 The implementation of partial matching is as follows + +/* + Assumptions: + - It has already been checked if there is a complete match by a + regular index lookup, and the test failed. + - It has already been checked if there is a complete NULL row, + and if there was we wouldn't call this function. Thus we assume + that there is no complete NULL row. + - Not all vidx_i are empty, but some can be empty. If all were empty, + then the only possibility for a match is a complete NULL row, which + we already checked. + + @param outer_ref - the uter (left) IN argument. + @param vidx[] - array of value keys + Ordered sequences of rowids of the corresponding columns a_i, such + that all rowids in idx_i are the ones where column a_i contains some + value or NULL. Each idx_i is derived dynamically, for each different + left argument of an IN predicate. + @param nidx[] - array of NULL keys + Bitmpas, one per each column, where a bit is set if the corresponding + row has a NULL value for the corresponding column. + @nonull_key - the only key over all columns of the materialized subquery + that do not contain NULLs + + @returns + @retval FALSE if there is no match + @retval TRUE if there is a partial match +*/ + +Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key) +{ + /* Set of the keys (columns) that form a partial match. */ + Set matching_keys = {} + /* A subset of all keys that need to be checked for NULL matches. */ + Set null_keys = {} + Int min_key /* Key that contains the current minimum position. */ + Int min_row /* Current row number of min_key. */ + Int cur_min_key, cur_min_row + PriorityQueue pq + + if (nonull_key && ! nonull_key->lookup(outer_ref)) + return FALSE + + for (i = 1; i <= n; i++) + { + vkey[i].lookup(outer_ref) + if (! vkey[i].is_eof()) + pq.insert(i) + } + /* + Not all value keys are empty, thus we don't have only NULL + keys. If we had, the only possible match is a NULL row, and + we cheked there is no such row, therefore the result is known + to be FALSE. + In fact this algorithm makes sense for at least two non-NULL + columns. + */ + assert(pq.elements > 1) + + (min_key, min_row) = pq.pop() + matching_keys.add(min_key) + vkey[min_key].next() + if (! vkey[min_key].is_eof()) + pq.insert(min_key) + + while (TRUE) + { + (cur_min_key, cur_min_row) = pq.pop() + + if (cur_min_row == min_row) + { + matching_keys.add(cur_min_key) + /* There cannot be a complete match, as we already checked for one. */ + assert(matching_keys.elements < n) + } + else if (cur_min_key == nonull_key) + { + /* + The non-NULL key has no corresponding NULL index, so we know for + sure that the row 'min_row' is not a match. + */ + (min_key, min_row) = (cur_min_key, cur_min_row) + matching_keys = {min_key} + } + else + { + assert(cur_min_row > min_row) /* Follows from the use of PQ. */ + null_keys = set_difference(all keys vkey[], matching_keys) + /* + Check if all null_keys contain a NULL at row 'min_row'. The procedure + internally checks all keys in a special precomputed order. A prior + procedure determines an optimal order and a mapping + idx_no -> idx_order (encoded as an array). + */ + if (test_null_row(null_keys, min_row)) + return TRUE + else + { + (min_key, min_row) = (cur_min_key, cur_min_row) + matching_keys = {min_key} + } + } + + vkey[cur_min_key].next() + if (! vkey[cur_min_key].is_eof()) + pq.insert(cur_min_key) + + if (pq.is_empty()) + { + /* Check the last row of the last column in PQ for NULL matches. */ + null_keys = set_difference(all keys vkey[], matching_keys) + if (test_null_row(null_keys, min_row)) + return TRUE + else + return FALSE + } + } + + /* We should never get here. */ + assert(FALSE) + return FALSE +} + + + +3. Directions for improvement +======================================================================== + Other consideration that may be taken into account: 1. If columns a_j1,...,a_jm do not contain null values in the temporary -=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=- High-Level Specification modified. --- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200 +++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200 @@ -87,3 +87,8 @@ 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. +8. [timour] + Consider that due to materialization, we already have a unique index +on all columns <a_1,..., a_n>. We can use the first key part of this index +over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid +creating the index rowid{a_i=v_i}. ------------------------------------------------------------ -=-=(View All Progress Notes, 16 total)=-=- http://askmonty.org/worklog/index.pl?tid=68&nolimit=1 DESCRIPTION: The goal of this task is to implement efficient execution of NOT IN subquery predicates of the form: <oe_1,...,oe_n> NOT IN <non_correlated subquery> when either some oe_i, or some subqury result column contains NULLs. The problem with such predicates is that it is possible to use index lookups only when neither argument of the predicate contains NULLs. If some argument contains a NULL, then due to NULL semantics, it plays the role of a wildcard. If we were to use regular index lookups, then we would get 'no match' for some outer tuple (thus the predicate evaluates to FALSE), while the SQL semantics means 'partial match', and the predicate should evaluate to NULL. This task implements an efficient algorithm to compute such 'parial matches', where a NULL matches any value. HIGH-LEVEL SPECIFICATION: Contents ======================================================================== 1. Initial idea as proposed by Igor 2. Algorithm for IN execution with partial matching 3. Directions for improvement 1. Initial idea as proposed by Igor ======================================================================== For each left side tuple (v_1,...,v_n) we have to find the following set of rowids for the temp table containing N rows as the result of materialization of the subquery: R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs trough all indexes from [1..n] such that v_i is not null. Bear in mind the following specifics of this intersection: (1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint (2) For each i: rowid{a_i is null} is the same for each tuple, that is, this set is independent of the left-side tuples. Due to (2) it makes sense to build rowid{a_i is null} only once. A good representation for such sets would be bitmaps: - it requires minimum memory: not more than N*n bits in total - search of an element in a set is extremely cheap Taken all above into account I could suggest the following algorithm to build R: Using indexes (read about them below) for each column participating in the intersection, merge ordered sets rowid{a_i=v_i} in the following manner. If a rowid r has been encountered maximum in k sets rowid{a_i1=v_i1},...,rowid(a_ik=v_ik), then it has to be checked against all rowid{a_i=v_i} such that i is not in {i1,...,ik}. As soon as we fail to find r in one of these sets we discard it. If r has been found in all of them then r belongs to the set R. Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can infer that for any r from R indexes a_i can be uniquely divided into two groups: - one contains indexes a_i where r belongs to the sets rowid{a_i=v_i}, - the other contains indexes a_j such that r belongs to rowid{a_j is null}. Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order needed for the merge procedure. We could use BTREE indexes for temp table. But they are rather expensive and take a lot of memory as the are implemented with RB trees. I would suggest creating for each column from the temporary table just an array of rowids sorted by the value from column a. Index lookup in such an array is cheap. It's also rather cheap to check that the next rowid refers to a row with a different value in column a. The array can be created on demand. 2. Algorithm for IN execution with partial matching ======================================================================== 2.1 Below is shown the top-level algorithm to execute an IN predicate with partial matching. This algorithm is essentially the implementation of Item_subselect:exec(). int lookup_with_null_semantics(outer_ref[], mat_subquery) { if (index_lookup(outer_ref, mat_subquery) return TRUE else { /* Check if there is a partial match (UNKNOWN) or no match (NULL). */ if (this is the first partial match) { vkey[] = build array of value keys for each NULL-able column of mat_subquery. nkey[] = build a bitmap NULL index for each column of mat_subquery that contains NULLs nonull_key = build a key over all non-NULL columns of mat_subquery } if (partial_match(outer_ref, vkey[], nkey[], nonull_key) return UNKNOWN else return FALSE } } 2.2 The implementation of partial matching is as follows /* Assumptions: - It has already been checked if there is a complete match by a regular index lookup, and the test failed. - It has already been checked if there is a complete NULL row, and if there was we wouldn't call this function. Thus we assume that there is no complete NULL row. - Not all vidx_i are empty, but some can be empty. If all were empty, then the only possibility for a match is a complete NULL row, which we already checked. @param outer_ref - the uter (left) IN argument. @param vidx[] - array of value keys Ordered sequences of rowids of the corresponding columns a_i, such that all rowids in idx_i are the ones where column a_i contains some value or NULL. Each idx_i is derived dynamically, for each different left argument of an IN predicate. @param nidx[] - array of NULL keys Bitmpas, one per each column, where a bit is set if the corresponding row has a NULL value for the corresponding column. @nonull_key - the only key over all columns of the materialized subquery that do not contain NULLs @returns @retval FALSE if there is no match @retval TRUE if there is a partial match */ Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key) { /* Set of the keys (columns) that form a partial match. */ Set matching_keys = {} /* A subset of all keys that need to be checked for NULL matches. */ Set null_keys = {} Int min_key /* Key that contains the current minimum position. */ Int min_row /* Current row number of min_key. */ Int cur_min_key, cur_min_row PriorityQueue pq if (nonull_key && ! nonull_key->lookup(outer_ref)) return FALSE for (i = 1; i <= n; i++) { if (vkey[i] != nonull_key) vkey[i].lookup(outer_ref) if (! vkey[i].is_eof()) pq.insert(i) } /* Not all value keys are empty, thus we don't have only NULL keys. If we had, the only possible match is a NULL row, and we cheked there is no such row, therefore the result is known to be FALSE. In fact this algorithm makes sense for at least two non-NULL columns. */ assert(pq.elements > 1) (min_key, min_row) = pq.pop() matching_keys.add(min_key) vkey[min_key].next() if (! vkey[min_key].is_eof()) pq.insert(min_key) while (TRUE) { (cur_min_key, cur_min_row) = pq.pop() if (cur_min_row == min_row) { matching_keys.add(cur_min_key) /* There cannot be a complete match, as we already checked for one. */ assert(matching_keys.elements < n) } else if (vkey[cur_min_key] == nonull_key) { /* The non-NULL key has no corresponding NULL index, so we know for sure that the row 'min_row' is not a match. */ (min_key, min_row) = (cur_min_key, cur_min_row) matching_keys = {min_key} } else { assert(cur_min_row > min_row) /* Follows from the use of PQ. */ null_keys = set_difference(all keys vkey[], matching_keys) /* Check if all null_keys contain a NULL at row 'min_row'. The procedure internally checks all keys in a special precomputed order. A prior procedure determines an optimal order and a mapping idx_no -> idx_order (encoded as an array). This procedure makes sure not to match the non-NULL column. */ if (test_null_row(null_keys, min_row)) return TRUE else { (min_key, min_row) = (cur_min_key, cur_min_row) matching_keys = {min_key} } } vkey[cur_min_key].next() if (! vkey[cur_min_key].is_eof()) pq.insert(cur_min_key) else if (vkey[cur_min_key] == nonull_key) { /* If there can't be more matches for the nonull_key, we know for sure there is no match, since there is no possible NULL match. */ return FALSE } if (pq.is_empty()) { /* Check the last row of the last column in PQ for NULL matches. */ null_keys = set_difference(all keys vkey[], matching_keys) if (test_null_row(null_keys, min_row)) return TRUE else return FALSE } } /* We should never get here. */ assert(FALSE) return FALSE } 3. Directions for improvement ======================================================================== Other consideration that may be taken into account: 1. If columns a_j1,...,a_jm do not contain null values in the temporary table at all and v_j1,...,v_jm cannot be null, create for these columns only one index array (and of course do not create any bitmaps for them). [done] 2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number of rows, where a_i is not null and V(a_i) is the number of distinct values for a_i excluding nulls. If d(a_i) is close to N'(a_i) then do not create any index array: check whether there is a match running through the records that have been filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection with rowid{a_i=v_i} will not reduce the number of remaining rowids significantly. In other words is V(a_i) exceeds some threshold there is no sense to create an index for a_i. If additionally N-N'(a_i) is small do not create a bitmap for this column either. 3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is small a sorted array of rowids from the set rowid{a_i is null} can be used instead of a bitmap. 4. We always have a match if R0= INTERSECT rowid{a_i is null} is not empty. Here i runs through all indexes from [1..n] such that v_i is not null. For a given subset of columns this fact has to be checked only once. It can be easily done with bitmap intersection. 5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be created only for rows with nulls. 6. If v1,...,vn never can be a null and number of rows with nulls is small do not create indexes and do not create bitmaps. 7. If you get a row with nulls in all columns stop filling the temporary table and return UNKNOWN for any tuple <v1,...,vn>. [This is wrong, because if we don't fill the whole temp table, there may be some tuple(s) that would match some outer tuple. In such cases, if we stop filling the temp table, we would miss a TRUE result. Having a partial match doesn't preclude us from having a complete match]. 8. [timour] Consider that due to materialization, we already have a unique index on all columns <a_1,..., a_n>. We can use the first key part of this index over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid creating the index rowid{a_i=v_i}. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2767: MWL#68 Subquery optimization: Efficient NOT IN execution with NULLs in file:///home/tsk/mprog/src/5.3-mwl68/
by timour＠askmonty.org 11 Mar '10

11 Mar '10

At file:///home/tsk/mprog/src/5.3-mwl68/ ------------------------------------------------------------ revno: 2767 revision-id: timour(a)askmonty.org-20100311214331-kw8ng8aiy6h60vai parent: timour(a)askmonty.org-20100309103615-dzmm6xt7ye5xfs25 committer: timour(a)askmonty.org branch nick: 5.3-mwl68 timestamp: Thu 2010-03-11 23:43:31 +0200 message: MWL#68 Subquery optimization: Efficient NOT IN execution with NULLs This patch does three things: - It adds the possibility to force the execution of top-level [NOT] IN subquery predicates via the IN=>EXISTS transformation. This is done by setting both optimizer switches partial_match_rowid_merge and partial_match_table_scan to "off". - It adjusts all test cases where the complete optimizer_switch is selected because now we have two more switches. - For those test cases where the plan changes because of the new available strategies, we switch off both partial match strategies in order to force the "old" IN=>EXISTS strategy. This is done because most of these test cases specifically test bugs in this strategy. === modified file 'mysql-test/include/mix1.inc' --- a/mysql-test/include/mix1.inc 2009-09-15 06:08:54 +0000 +++ b/mysql-test/include/mix1.inc 2010-03-11 21:43:31 +0000 @@ -1177,8 +1177,11 @@ create table t1 (a bit(1) not null,b int) engine=myisam; create table t2 (c int) engine=innodb; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch='partial_match_rowid_merge=off,partial_match_table_scan=off'; explain select b from t1 where a not in (select b from t1,t2 group by a) group by a; +set optimizer_switch=@save_optimizer_switch; DROP TABLE t1,t2; --echo End of 5.0 tests === modified file 'mysql-test/r/index_merge_myisam.result' --- a/mysql-test/r/index_merge_myisam.result 2010-01-17 14:51:10 +0000 +++ b/mysql-test/r/index_merge_myisam.result 2010-03-11 21:43:31 +0000 @@ -1419,19 +1419,19 @@ # select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='index_merge=off,index_merge_union=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='index_merge_union=on'; select @@optimizer_switch; @@optimizer_switch -index_merge=off,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=off,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,index_merge_sort_union=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=off,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=off,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch=4; ERROR 42000: Variable 'optimizer_switch' can't be set to the value of '4' set optimizer_switch=NULL; @@ -1458,21 +1458,21 @@ set optimizer_switch='index_merge=off,index_merge_union=off,default'; select @@optimizer_switch; @@optimizer_switch -index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=off,index_merge_union=off,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch=default; select @@global.optimizer_switch; @@global.optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set @@global.optimizer_switch=default; select @@global.optimizer_switch; @@global.optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on # # Check index_merge's @@optimizer_switch flags # select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on create table t0 (a int); insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table t1 (a int, b int, c int, filler char(100), @@ -1582,5 +1582,5 @@ set optimizer_switch=default; show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on drop table t0, t1; === modified file 'mysql-test/r/innodb_mysql.result' --- a/mysql-test/r/innodb_mysql.result 2009-12-15 07:16:46 +0000 +++ b/mysql-test/r/innodb_mysql.result 2010-03-11 21:43:31 +0000 @@ -1425,12 +1425,15 @@ # create table t1 (a bit(1) not null,b int) engine=myisam; create table t2 (c int) engine=innodb; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch='partial_match_rowid_merge=off,partial_match_table_scan=off'; explain select b from t1 where a not in (select b from t1,t2 group by a) group by a; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables 2 DEPENDENT SUBQUERY t1 system NULL NULL NULL NULL 0 const row not found 2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 1 +set optimizer_switch=@save_optimizer_switch; DROP TABLE t1,t2; End of 5.0 tests CREATE TABLE `t2` ( === modified file 'mysql-test/r/myisam_mrr.result' --- a/mysql-test/r/myisam_mrr.result 2010-01-17 14:51:10 +0000 +++ b/mysql-test/r/myisam_mrr.result 2010-03-11 21:43:31 +0000 @@ -394,7 +394,7 @@ # - engine_condition_pushdown does not affect ICP select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on create table t0 (a int); insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table t1 (a int, b int, key(a)); === modified file 'mysql-test/r/ps.result' --- a/mysql-test/r/ps.result 2009-05-27 15:19:44 +0000 +++ b/mysql-test/r/ps.result 2010-03-11 21:43:31 +0000 @@ -149,6 +149,8 @@ c32 set('monday', 'tuesday', 'wednesday') ) engine = MYISAM ; create table t2 like t1; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; set @stmt= ' explain SELECT (SELECT SUM(c1 + c12 + 0.0) FROM t2 where (t1.c2 - 0e-3) = t2.c2 GROUP BY t1.c15 LIMIT 1) as scalar_s, exists (select 1.0e+0 from t2 where t2.c3 * 9.0000000000 = t1.c4) as exists_s, c5 * 4 in (select c6 + 0.3e+1 from t2) as in_s, (c7 - 4, c8 - 4) in (select c9 + 4.0, c10 + 40e-1 from t2) as in_row_s FROM t1, (select c25 x, c32 y from t2) tt WHERE x * 1 = c25 ' ; prepare stmt1 from @stmt ; execute stmt1 ; @@ -177,6 +179,7 @@ 2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables deallocate prepare stmt1; drop tables t1,t2; +set @@optimizer_switch=@save_optimizer_switch; set @arg00=1; prepare stmt1 from ' create table t1 (m int) as select 1 as m ' ; execute stmt1 ; === modified file 'mysql-test/r/subselect.result' --- a/mysql-test/r/subselect.result 2010-02-17 21:59:41 +0000 +++ b/mysql-test/r/subselect.result 2010-03-11 21:43:31 +0000 @@ -1,4 +1,6 @@ drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; select (select 2); (select 2) 2 @@ -4803,4 +4805,5 @@ 1 1 DROP TABLE t1; +set @@optimizer_switch=@save_optimizer_switch; End of 5.1 tests. === modified file 'mysql-test/r/subselect3.result' --- a/mysql-test/r/subselect3.result 2010-02-17 10:05:27 +0000 +++ b/mysql-test/r/subselect3.result 2010-03-11 21:43:31 +0000 @@ -63,12 +63,15 @@ select ' ^ This must show 11' Z; Z ^ This must show 11 +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3; id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t3 ALL NULL NULL NULL NULL 2 100.00 2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 6 100.00 Using where; Using temporary; Using filesort Warnings: Note 1003 select <in_optimizer>(`test`.`t3`.`a`,<exists>(select max(`test`.`t1`.`ie`) AS `max(ie)` from `test`.`t1` where (`test`.`t1`.`oref` = 4) group by `test`.`t1`.`grp` having trigcond((<cache>(`test`.`t3`.`a`) = <ref_null_helper>(max(`test`.`t1`.`ie`)))))) AS `a in (select max(ie) from t1 where oref=4 group by grp)` from `test`.`t3` +set @@optimizer_switch=@save_optimizer_switch; drop table t1, t2, t3; create table t1 (a int, oref int, key(a)); insert into t1 values @@ -692,6 +695,8 @@ 2 3 h 3 4 i DROP TABLE t1, t2; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; CREATE TABLE t1 (a int); CREATE TABLE t2 (b int, PRIMARY KEY(b)); INSERT INTO t1 VALUES (1), (NULL), (4); @@ -759,6 +764,7 @@ 1 PRIMARY t1 ALL NULL NULL NULL NULL 4 Using where 2 DEPENDENT SUBQUERY t2 unique_subquery PRIMARY PRIMARY 4 func 1 Using index; Using where DROP TABLE t1, t2; +set @@optimizer_switch=@save_optimizer_switch; CREATE TABLE t1 (a INT); INSERT INTO t1 VALUES(1); CREATE TABLE t2 (placeholder CHAR(11)); @@ -960,7 +966,7 @@ # Baseline: SHOW STATUS LIKE '%Handler_read_rnd_next'; Variable_name Value -Handler_read_rnd_next 17 +Handler_read_rnd_next 18 INSERT INTO t1 VALUES (NULL, NULL); FLUSH STATUS; @@ -977,7 +983,7 @@ # (read record from t1, but do not read from t2) SHOW STATUS LIKE '%Handler_read_rnd_next'; Variable_name Value -Handler_read_rnd_next 18 +Handler_read_rnd_next 19 DROP TABLE t1,t2; End of 5.1 tests CREATE TABLE t1 ( === modified file 'mysql-test/r/subselect3_jcl6.result' --- a/mysql-test/r/subselect3_jcl6.result 2010-02-17 10:47:55 +0000 +++ b/mysql-test/r/subselect3_jcl6.result 2010-03-11 21:43:31 +0000 @@ -67,12 +67,15 @@ select ' ^ This must show 11' Z; Z ^ This must show 11 +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3; id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t3 ALL NULL NULL NULL NULL 2 100.00 2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 6 100.00 Using where; Using temporary; Using filesort Warnings: Note 1003 select <in_optimizer>(`test`.`t3`.`a`,<exists>(select max(`test`.`t1`.`ie`) AS `max(ie)` from `test`.`t1` where (`test`.`t1`.`oref` = 4) group by `test`.`t1`.`grp` having trigcond((<cache>(`test`.`t3`.`a`) = <ref_null_helper>(max(`test`.`t1`.`ie`)))))) AS `a in (select max(ie) from t1 where oref=4 group by grp)` from `test`.`t3` +set @@optimizer_switch=@save_optimizer_switch; drop table t1, t2, t3; create table t1 (a int, oref int, key(a)); insert into t1 values @@ -696,6 +699,8 @@ 2 3 h 3 4 i DROP TABLE t1, t2; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; CREATE TABLE t1 (a int); CREATE TABLE t2 (b int, PRIMARY KEY(b)); INSERT INTO t1 VALUES (1), (NULL), (4); @@ -763,6 +768,7 @@ 1 PRIMARY t1 ALL NULL NULL NULL NULL 4 Using where 2 DEPENDENT SUBQUERY t2 unique_subquery PRIMARY PRIMARY 4 func 1 Using index; Using where DROP TABLE t1, t2; +set @@optimizer_switch=@save_optimizer_switch; CREATE TABLE t1 (a INT); INSERT INTO t1 VALUES(1); CREATE TABLE t2 (placeholder CHAR(11)); @@ -964,7 +970,7 @@ # Baseline: SHOW STATUS LIKE '%Handler_read_rnd_next'; Variable_name Value -Handler_read_rnd_next 17 +Handler_read_rnd_next 18 INSERT INTO t1 VALUES (NULL, NULL); FLUSH STATUS; @@ -981,7 +987,7 @@ # (read record from t1, but do not read from t2) SHOW STATUS LIKE '%Handler_read_rnd_next'; Variable_name Value -Handler_read_rnd_next 18 +Handler_read_rnd_next 19 DROP TABLE t1,t2; End of 5.1 tests CREATE TABLE t1 ( === modified file 'mysql-test/r/subselect_no_mat.result' --- a/mysql-test/r/subselect_no_mat.result 2010-02-21 07:33:54 +0000 +++ b/mysql-test/r/subselect_no_mat.result 2010-03-11 21:43:31 +0000 @@ -1,8 +1,10 @@ show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='materialization=off'; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; select (select 2); (select 2) 2 @@ -4807,8 +4809,9 @@ 1 1 DROP TABLE t1; +set @@optimizer_switch=@save_optimizer_switch; End of 5.1 tests. set optimizer_switch=default; show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on === modified file 'mysql-test/r/subselect_no_opts.result' --- a/mysql-test/r/subselect_no_opts.result 2010-02-21 07:33:54 +0000 +++ b/mysql-test/r/subselect_no_opts.result 2010-03-11 21:43:31 +0000 @@ -1,8 +1,10 @@ show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='materialization=off,semijoin=off'; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; select (select 2); (select 2) 2 @@ -4807,8 +4809,9 @@ 1 1 DROP TABLE t1; +set @@optimizer_switch=@save_optimizer_switch; End of 5.1 tests. set optimizer_switch=default; show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on === modified file 'mysql-test/r/subselect_no_semijoin.result' --- a/mysql-test/r/subselect_no_semijoin.result 2010-02-21 07:33:54 +0000 +++ b/mysql-test/r/subselect_no_semijoin.result 2010-03-11 21:43:31 +0000 @@ -1,8 +1,10 @@ show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='semijoin=off'; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; select (select 2); (select 2) 2 @@ -4807,8 +4809,9 @@ 1 1 DROP TABLE t1; +set @@optimizer_switch=@save_optimizer_switch; End of 5.1 tests. set optimizer_switch=default; show variables like 'optimizer_switch'; Variable_name Value -optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on === modified file 'mysql-test/r/subselect_sj.result' --- a/mysql-test/r/subselect_sj.result 2010-02-24 11:33:42 +0000 +++ b/mysql-test/r/subselect_sj.result 2010-03-11 21:43:31 +0000 @@ -202,39 +202,39 @@ select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,materialization=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off,semijoin=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,materialization=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch=default; drop table t0, t1, t2; drop table t10, t11, t12; === modified file 'mysql-test/r/subselect_sj_jcl6.result' --- a/mysql-test/r/subselect_sj_jcl6.result 2010-03-07 15:41:45 +0000 +++ b/mysql-test/r/subselect_sj_jcl6.result 2010-03-11 21:43:31 +0000 @@ -206,39 +206,39 @@ select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,materialization=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off,semijoin=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,materialization=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,semijoin=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch='default,materialization=off,loosescan=off'; select @@optimizer_switch; @@optimizer_switch -index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on +index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_condition_pushdown=on,firstmatch=on,loosescan=off,materialization=off,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on set optimizer_switch=default; drop table t0, t1, t2; drop table t10, t11, t12; === modified file 'mysql-test/t/ps.test' --- a/mysql-test/t/ps.test 2009-05-27 15:19:44 +0000 +++ b/mysql-test/t/ps.test 2010-03-11 21:43:31 +0000 @@ -163,6 +163,9 @@ ) engine = MYISAM ; create table t2 like t1; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; + set @stmt= ' explain SELECT (SELECT SUM(c1 + c12 + 0.0) FROM t2 where (t1.c2 - 0e-3) = t2.c2 GROUP BY t1.c15 LIMIT 1) as scalar_s, exists (select 1.0e+0 from t2 where t2.c3 * 9.0000000000 = t1.c4) as exists_s, c5 * 4 in (select c6 + 0.3e+1 from t2) as in_s, (c7 - 4, c8 - 4) in (select c9 + 4.0, c10 + 40e-1 from t2) as in_row_s FROM t1, (select c25 x, c32 y from t2) tt WHERE x * 1 = c25 ' ; prepare stmt1 from @stmt ; execute stmt1 ; @@ -171,6 +174,8 @@ deallocate prepare stmt1; drop tables t1,t2; +set @@optimizer_switch=@save_optimizer_switch; + # # parameters from variables (for field creation) # === modified file 'mysql-test/t/subselect.test' --- a/mysql-test/t/subselect.test 2010-01-17 20:52:20 +0000 +++ b/mysql-test/t/subselect.test 2010-03-11 21:43:31 +0000 @@ -11,6 +11,9 @@ drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t11,t12; --enable_warnings +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; + select (select 2); explain extended select (select 2); SELECT (SELECT 1) UNION SELECT (SELECT 2); @@ -4061,4 +4064,6 @@ (SELECT LAST_INSERT_ID() FROM t1 ORDER BY MIN(a) ASC LIMIT 1); DROP TABLE t1; +set @@optimizer_switch=@save_optimizer_switch; + --echo End of 5.1 tests. === modified file 'mysql-test/t/subselect3.test' --- a/mysql-test/t/subselect3.test 2010-01-17 14:51:10 +0000 +++ b/mysql-test/t/subselect3.test 2010-03-11 21:43:31 +0000 @@ -59,9 +59,13 @@ show status like 'Handler_read_rnd_next'; select ' ^ This must show 11' Z; +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; + # This must show trigcond: explain extended select a in (select max(ie) from t1 where oref=4 group by grp) from t3; +set @@optimizer_switch=@save_optimizer_switch; drop table t1, t2, t3; # @@ -529,6 +533,9 @@ DROP TABLE t1, t2; +# The next three test cases must be executed with the IN=>EXISTS strategy +set @save_optimizer_switch=@@optimizer_switch; +set @@optimizer_switch="partial_match_rowid_merge=off,partial_match_table_scan=off"; # # Bug #27870: crash of an equijoin query with WHERE condition containing @@ -588,6 +595,8 @@ DROP TABLE t1, t2; +set @@optimizer_switch=@save_optimizer_switch; + # # Bug #34763: item_subselect.cc:1235:Item_in_subselect::row_value_transformer: # Assertion failed, unexpected error message: === modified file 'sql/opt_subselect.cc' --- a/sql/opt_subselect.cc 2010-03-09 10:36:15 +0000 +++ b/sql/opt_subselect.cc 2010-03-11 21:43:31 +0000 @@ -187,7 +187,11 @@ does not call setup_subquery_materialization(). We could make SELECT ... FROM DUAL call that function but that doesn't seem to be the case that is worth handling. - 4. Subquery is non-correlated + 4. Either the subquery predicate is a top-level predicate, or at + least one partial match strategy is enabled. If no partial match + strategy is enabled, then materialization cannot be used for + non-top-level queries because it cannot handle NULLs correctly. + 5. Subquery is non-correlated TODO: This is an overly restrictive condition. It can be extended to: (Subquery is non-correlated || @@ -195,13 +199,13 @@ (Subquery is correlated to the immediate outer query && Subquery !contains {GROUP BY, ORDER BY [LIMIT], aggregate functions}) && subquery predicate is not under "NOT IN")) - 5. No execution method was already chosen (by a prepared statement). + 6. No execution method was already chosen (by a prepared statement). (*) The subquery must be part of a SELECT statement. The current condition also excludes multi-table update statements. - We have to determine whether we will perform subquery materialization - before calling the IN=>EXISTS transformation, so that we know whether to + Determine whether we will perform subquery materialization before + calling the IN=>EXISTS transformation, so that we know whether to perform the whole transformation or only that part of it which wraps Item_in_subselect in an Item_in_optimizer. */ @@ -211,11 +215,14 @@ select_lex->master_unit()->first_select()->leaf_tables && // 3 thd->lex->sql_command == SQLCOM_SELECT && // * select_lex->outer_select()->leaf_tables && // 3A - subquery_types_allow_materialization(in_subs)) + subquery_types_allow_materialization(in_subs) && + // psergey-todo: duplicated_subselect_card_check: where it's done? + (in_subs->is_top_level_item() || + optimizer_flag(thd, OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE) || + optimizer_flag(thd, OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)) &&//4 + !in_subs->is_correlated && // 5 + in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED) // 6 { - // psergey-todo: duplicated_subselect_card_check: where it's done? - if (!in_subs->is_correlated && // 4 - in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED) // 5 in_subs->exec_method= Item_in_subselect::MATERIALIZATION; }

1 0