[Maria-developers] Updated (by Psergey): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS transformation CREATION DATE..: Sun, 28 Feb 2010, 13:39 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-RawIdeaBin TASK ID........: 89 (http://askmonty.org/worklog/?tid=89) VERSION........: Server-5.3 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000 +++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000 @@ -36,8 +36,8 @@ So, we'll need to compute both exists_select_cost and materialization_cost. -Difficulty with computing the two costs ---------------------------------------- +Difficulty with the need to run select optimization two times +------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. @@ -46,4 +46,10 @@ 3. Then we find that cost #1 is less and want to execute the materialization strategy. +The problem is that once one injects "oe=ie", it can trigger some optimization +steps that are not possible to undo. +- Example1: outer->inner join conversion +- non-Example: according to Igor, "oe=ie" won't participate in equality propagation. +- ... what else ? + -=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000 +++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000 @@ -36,3 +36,14 @@ So, we'll need to compute both exists_select_cost and materialization_cost. +Difficulty with computing the two costs +--------------------------------------- +The problem is in this scenario: +1. We compute materialization_cost by running optimization for the original + subquery select. +2. We compute exists_select_cost by running optimization for the subquery's + select with "oe=ie" injected into WHERE +3. Then we find that cost #1 is less and want to execute the materialization + strategy. + + -=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=- High-Level Specification modified. --- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000 +++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000 @@ -1 +1,38 @@ +Why need two optimizations +-------------------------- +Consider a query with subquery: + + SELECT + oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) + FROM outer_tbl + WHERE outer_cond + +If we use Materialization strategy, the costs will be + + cost of accessing outer_tbl + + materialization_cost + + #records(outer_tbl w/o outer_cond) * lookup_cost + +where + + materialization_cost= + cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) + +On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into + + SELECT + EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + FROM outer_tbl + WHERE outer_cond + +and the costs will be + + cost of accessing outer_tbl + + #records(outer_tbl w/o outer_cond) * exists_select_cost + +where + exists_select_cost= + cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) + +So, we'll need to compute both exists_select_cost and materialization_cost. -=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=- Dependency created: 91 now depends on 89 DESCRIPTION: For uncorrelated IN subqueries that can't be converted to semi-joins it is necessary to make a cost-based choice between IN->EXISTS and Materialization strategies. Both strategies handle two cases: 1. A simple case w/o NULLs handling 2. Handling NULLs. This WL is about making cost-based decision for #1. HIGH-LEVEL SPECIFICATION: Why need two optimizations -------------------------- Consider a query with subquery: SELECT oe IN (SELECT ie FROM inner_tbl WHERE inner_cond) FROM outer_tbl WHERE outer_cond If we use Materialization strategy, the costs will be cost of accessing outer_tbl + materialization_cost + #records(outer_tbl w/o outer_cond) * lookup_cost where materialization_cost= cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond) On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into SELECT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) FROM outer_tbl WHERE outer_cond and the costs will be cost of accessing outer_tbl + #records(outer_tbl w/o outer_cond) * exists_select_cost where exists_select_cost= cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie) So, we'll need to compute both exists_select_cost and materialization_cost. Difficulty with the need to run select optimization two times ------------------------------------------------------------- The problem is in this scenario: 1. We compute materialization_cost by running optimization for the original subquery select. 2. We compute exists_select_cost by running optimization for the subquery's select with "oe=ie" injected into WHERE 3. Then we find that cost #1 is less and want to execute the materialization strategy. The problem is that once one injects "oe=ie", it can trigger some optimization steps that are not possible to undo. - Example1: outer->inner join conversion - non-Example: according to Igor, "oe=ie" won't participate in equality propagation. - ... what else ? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)
participants (1)
-
worklog-noreply@askmonty.org