----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE CREATION DATE..: Sun, 28 Feb 2010, 13:45 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Igor, Psergey, Timour CATEGORY.......: Server-RawIdeaBin TASK ID........: 90 (http://askmonty.org/worklog/?tid=90) VERSION........: Server-5.3 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: -1 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=- High Level Description modified. --- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000 +++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000 @@ -15,3 +15,7 @@ Semi-join materialization supports such strategy with SJM-Scan strategy. This WL entry is about adding support for such strategies for non-semijoin subqueries. + + +Once WL#89 is done, there will be a cost-based choice between +Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies. -=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=- High-Level Specification modified. --- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000 +++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000 @@ -1 +1,33 @@ +Basic idea on how this could be achieved: + +Pre-optimization phase +---------------------- + +The rewrite +~~~~~~~~~~~ +If we find a subquery predicate that is +- not processed by current semi-join optimizations +- is an AND-part of the WHERE/ON clause +- can be executed with Materialization + +then +- Remove the predicate from WHERE/ON clause +- Add a special JOIN_TAB object instead. + +Plan options +~~~~~~~~~~~~ +- Use the IN-equality to create KEYUSE elements. + +Optimization +------------ +- Pre-optimize the subquery so we know materialization cost +- Whenever best_access_path() encounters the "special JOIN_TAB" it should + consider two strategies: + A. Materialization and making lookups in the materialized table (if applicable) + B. Materialization and then scanning the materialized table. + + +EXPLAIN +------- +TODO how this will look in EXPLAIN output? -=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=- Dependency created: 91 now depends on 90 -=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=- Dependency deleted: 94 no longer depends on 90 -=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=- Title modified. --- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000 +++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000 @@ -1 +1 @@ - Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE +Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE -=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=- High Level Description modified. --- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000 +++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000 @@ -1,10 +1,17 @@ -For uncorrelated IN subqueries that can't be converted to semi-joins it is -necessary to make a cost-based choice between IN->EXISTS and Materialization -strategies. +Consider the following case: -Both strategies handle two cases: -1. A simple case w/o NULLs handling -2. Handling NULLs. +SELECT * FROM big_table +WHERE oe IN (SELECT ie FROM table_with_few_groups + WHERE ... + GROUP BY group_col) AND ... -This WL is about making cost-based decision for #1. +Here the best way to execute the query is: + Materialize the subquery; + # now run the join: + for each record R1 in materialized table + for each record R2 in big_table such that oe=R1 + pass R2 to output + +Semi-join materialization supports such strategy with SJM-Scan strategy. This WL +entry is about adding support for such strategies for non-semijoin subqueries. -=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=- Title modified. --- /tmp/wklog.90.old.21859 2010-02-28 14:47:02.000000000 +0000 +++ /tmp/wklog.90.new.21859 2010-02-28 14:47:02.000000000 +0000 @@ -1 +1 @@ -Subqueries: cost-based choice between Materialization and IN->EXISTS transformation + Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE -=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=- Dependency created: 94 now depends on 90 DESCRIPTION: Consider the following case: SELECT * FROM big_table WHERE oe IN (SELECT ie FROM table_with_few_groups WHERE ... GROUP BY group_col) AND ... Here the best way to execute the query is: Materialize the subquery; # now run the join: for each record R1 in materialized table for each record R2 in big_table such that oe=R1 pass R2 to output Semi-join materialization supports such strategy with SJM-Scan strategy. This WL entry is about adding support for such strategies for non-semijoin subqueries. Once WL#89 is done, there will be a cost-based choice between Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies. HIGH-LEVEL SPECIFICATION: Basic idea on how this could be achieved: Pre-optimization phase ---------------------- The rewrite ~~~~~~~~~~~ If we find a subquery predicate that is - not processed by current semi-join optimizations - is an AND-part of the WHERE/ON clause - can be executed with Materialization then - Remove the predicate from WHERE/ON clause - Add a special JOIN_TAB object instead. Plan options ~~~~~~~~~~~~ - Use the IN-equality to create KEYUSE elements. Optimization ------------ - Pre-optimize the subquery so we know materialization cost - Whenever best_access_path() encounters the "special JOIN_TAB" it should consider two strategies: A. Materialization and making lookups in the materialized table (if applicable) B. Materialization and then scanning the materialized table. EXPLAIN ------- TODO how this will look in EXPLAIN output? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)