developers
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 6826 discussions
[Maria-developers] Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 29 Mar 2010, 20:02)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.27113 2010-03-29 20:02:22.000000000 +0000
+++ /tmp/wklog.90.new.27113 2010-03-29 20:02:22.000000000 +0000
@@ -2,16 +2,15 @@
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
-2.2 On subquery predicate removal
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
-2.3 What is expected of the result of conversion
+2.1.1 On subquery predicate removal
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 Conversion process and its result
3. Pre-optimization steps
-3.1 Constant detection
-3.3 update_ref_and_keys
-3.4 JOIN_TAB sorting criteria
-4. Optimization
+3.1 update_ref_and_keys
+3.2 Constant detection
+4. Join optimization
5. Execution
-User interface.
+6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
@@ -19,48 +18,60 @@
1. Applicability check
======================
-The criteria for checking whether a subquery can be processed with
-jtbm-materialization can be checked at JOIN::prepare stage (like it
-happens with semi-join check)
+Applicability criteria of jtbm-materialization can be checked at
+JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
-Make it work like semi-join nests: each jtbm-predicate is converted into a
-TABLE_LIST object. This will make it
+Make it work like semi-join nests: a jtbm-predicate is converted into a
+TABLE_LIST object that is injected into select's join tree. This will make it
- - uniform with semi-joins (we've stepped on all rakes there)
- - allow to process JTBM-subqueries in ON expressions
+ - uniform with semi-join processing (and we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in LEFT JOINs' ON expressions
+ (as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
-for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
-i.e. for
+EXPLAIN EXTENDED will display the conversion result as join, that is, for
+subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
-we'll print
+it will print
- SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+ SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
-the XX part is not clear. we don't want to print 'ie' the second time here?
+After we've added a TABLE_LIST element, we'll need to remove the subquery
+predicate from the WHERE/On clause, in the same way as SJ-converter does it.
-2.2 On subquery predicate removal
----------------------------------
-Q: if we remove the subquery predicate permanently, who will run
-fix_fields() for it? For semi-joins we don't have the problem as we
-inject into ON expression (right? or not? we have sj_on_expr, too...
-(Investigation: we the the same Item* pointer both in WHERE and
-as sj_on_expr. fix_fields() is called for the WHERE part and that's
-how sj_on_expr gets fixed. This works as long as
-Item_func_eq::fix_fields() does not try to substitute itself with
-another item).
+TODO: note: we inject a new kind of base table and then next PS re-execution
+will re-run name resolution/etc on TABLE_LIST tree that includes this table.
+We'll need to make sure that materialized-table doesn't interfere with
+anything.
+
+TODO: Separation of steps that need to be done once and steps that need to be
+re-done for every PS execution.
+
+2.1.1 On subquery predicate removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: if we remove the subquery predicate permanently, who will call
+fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
+problem as we inject into ON expression
+
+(right? or not? we have sj_on_expr, too...
+ (Investigation: we the the same Item* pointer both in WHERE and
+ as sj_on_expr. fix_fields() is called for the WHERE part and that's
+ how sj_on_expr gets fixed. This works as long as
+ Item_func_eq::fix_fields() does not try to substitute itself with
+ another item)
+)
A: ?
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
@@ -70,71 +81,96 @@
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
-2.3 What is expected of the result of conversion
-------------------------------------------------
-Join [pre]optimization relies on each optimized entity to have a bit in
-table_map.
-
-TODO: where do we check if there will be enough bits for everyone? (at the
- point where we assign them?)
+Note: if we convert that late, we could remove the subquery predicate in the
+same way as remove_eq_conds() removes known-to-be-false AND-parts.
+This operation is undoable.
+
+2.3 Conversion process and its result
+-------------------------------------
+At the moment, the following is being implemented:
+
+* convert_join_subqueries_to_semijoins() handles both conversions to
+ semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
+ convert_subq_to_jtbm()).
+ - This means the process is done in bottom-up way (we walk down into
+ semi-join subselects but not jtbm-selects), with protection against
+ exhaustion of table bits beyond #MAX_TABLES, etc.
+
+* convert_subq_to_jtbm() invokes
+
+ subq_predicate->setup_engine()
+
+ which creates the temporary table, and then we use it as the TABLE* object
+ that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
+ One of the first actions is that we assign it a bit and store it in
+ table->map.
+ (Note that this won't work when we add a choice between Materialization and
+ IN->EXISTS)
-The bit stored in join_tab->table->map, and the apparent problem is that JTBM
-join_tabs do not naturally have TABLE* object.
+ (Another option is do like done in FROM optimization: create a 'preliminary'
+ TABLE* object which has only some of the fields filled in).
-We could use the the one that will be used for Materialization, but that will
-stop working when we will have to include IN->EXISTS in the choice.
-
-Current approach: don't create a table. create a table_map element in JOIN_TAB
-instead. Evgen has probably done something like that already.
+QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
+ where needed...
3. Pre-optimization steps
=========================
-JOIN_TABs are allocated in make_join_statistics(). This where the changes will
-be needed: for JOIN_TABs that correspond to JTBM-tables:
-
-- don't set tab->table, set tab->jtbm_select (or whatever)
+JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
+need:
- run subquery's optimizer to get its output cardinality
+- todo what else?
+
+3.1 update_ref_and_keys
+-----------------------
+Since we've injected the equality into WHERE/ON and created a TABLE* with
+appropriate index no special steps are necessary. The regular procedure will
+add the needed KEYUSE elements.
-3.1 Constant detection
+3.2 Constant detection
----------------------
-What about subqueries that "are constant"?
- const_item IN (SELECT uncorrelated) -> is constant, but not something
- we would want to evaluate.
- something IN (SELECT from_constant_join) -> is constant
+* We should not let const table detection code to look at the materialized
+ table, find it to be empty and mark it as constant.
-Do we need to mark their JOIN_TABs as constant?
+* We also should disallow eq_ref(const)-based detection for materialized table
+ That is, if outer_expr is found to be constant, the code should not attempt
+ to make a eq_ref(const) lookup in the materialized table.
-3.3 update_ref_and_keys
------------------------
-* Walk through JTBM elements and inject KEYUSE elements for their
- IN-equalities.
+TODO {
+ Second: What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate at optimization phase.
+
+ something IN (SELECT from_constant_join) -> is constant
+ Do we need to mark their JOIN_TABs as constant?)
+}
-TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+4. Join optimization
+====================
+Should just work, with exception that we need to make best_access_path not
+to use table->stats.records but use JOIN_TAB::records instead (which already
+exists and is filled)
-3.4 JOIN_TAB sorting criteria
------------------------------
-Q: Where do we put JTBM's join_tab when pre-sorting records?
-A: it should sort as regular table.
-
-TODO: where do we remove the predicates from the WHERE?
- - remove them like SJ-converter does
- - remove them with optimizer (like remove_eq_conds does)
-
-4. Optimization
-===============
-Add a branch in best_access_path to account for
-- JTBM-Materialization
-- JTBM-Materialization-Scan.
+NOTE: FROM-optimization does set table->stats.records. Whose approach is
+ wrong?
5. Execution
============
-* We should be able to reuse item_subselect.cc code for lookups
-* But will have to use our own temptable scan code
+We need that:
+1. Materialization operation (i.e. execute the subquery and fill the
+ materialized temptable) is done at start of execution
+2. Materialized table is accessed during execution (either by doing lookups or
+ full table scan)
+3. IN-equality is checked at the right moment.
+
+#2 is achieved by putting the materialized table into join_tab->table.
+#3 is achieved by injecting IN-equality into the WHERE.
-TODO: is it possible to have any unification with SJ-Materialization?
+for #1, we need to place a call that fills the temptable at the right location.
-User interface
---------------
-Any @@optimizer_switch flags for all this?
+6. User interface
+=================
+No @@optimizer_switch flags are planned at the moment. JTBM-materialization
+will be controlled by existing materialization=on|off switch.
+We could also add a generic materialization_scan=on|off switch.
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.1.1 On subquery predicate removal
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 Conversion process and its result
3. Pre-optimization steps
3.1 update_ref_and_keys
3.2 Constant detection
4. Join optimization
5. Execution
6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
Applicability criteria of jtbm-materialization can be checked at
JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: a jtbm-predicate is converted into a
TABLE_LIST object that is injected into select's join tree. This will make it
- uniform with semi-join processing (and we've stepped on all rakes there)
- allow to process JTBM-subqueries in LEFT JOINs' ON expressions
(as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
EXPLAIN EXTENDED will display the conversion result as join, that is, for
subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
it will print
SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
After we've added a TABLE_LIST element, we'll need to remove the subquery
predicate from the WHERE/On clause, in the same way as SJ-converter does it.
TODO: note: we inject a new kind of base table and then next PS re-execution
will re-run name resolution/etc on TABLE_LIST tree that includes this table.
We'll need to make sure that materialized-table doesn't interfere with
anything.
TODO: Separation of steps that need to be done once and steps that need to be
re-done for every PS execution.
2.1.1 On subquery predicate removal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: if we remove the subquery predicate permanently, who will call
fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
problem as we inject into ON expression
(right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item)
)
A: ?
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
Note: if we convert that late, we could remove the subquery predicate in the
same way as remove_eq_conds() removes known-to-be-false AND-parts.
This operation is undoable.
2.3 Conversion process and its result
-------------------------------------
At the moment, the following is being implemented:
* convert_join_subqueries_to_semijoins() handles both conversions to
semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
convert_subq_to_jtbm()).
- This means the process is done in bottom-up way (we walk down into
semi-join subselects but not jtbm-selects), with protection against
exhaustion of table bits beyond #MAX_TABLES, etc.
* convert_subq_to_jtbm() invokes
subq_predicate->setup_engine()
which creates the temporary table, and then we use it as the TABLE* object
that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
One of the first actions is that we assign it a bit and store it in
table->map.
(Note that this won't work when we add a choice between Materialization and
IN->EXISTS)
(Another option is do like done in FROM optimization: create a 'preliminary'
TABLE* object which has only some of the fields filled in).
QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
where needed...
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
need:
- run subquery's optimizer to get its output cardinality
- todo what else?
3.1 update_ref_and_keys
-----------------------
Since we've injected the equality into WHERE/ON and created a TABLE* with
appropriate index no special steps are necessary. The regular procedure will
add the needed KEYUSE elements.
3.2 Constant detection
----------------------
* We should not let const table detection code to look at the materialized
table, find it to be empty and mark it as constant.
* We also should disallow eq_ref(const)-based detection for materialized table
That is, if outer_expr is found to be constant, the code should not attempt
to make a eq_ref(const) lookup in the materialized table.
TODO {
Second: What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate at optimization phase.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?)
}
4. Join optimization
====================
Should just work, with exception that we need to make best_access_path not
to use table->stats.records but use JOIN_TAB::records instead (which already
exists and is filled)
NOTE: FROM-optimization does set table->stats.records. Whose approach is
wrong?
5. Execution
============
We need that:
1. Materialization operation (i.e. execute the subquery and fill the
materialized temptable) is done at start of execution
2. Materialized table is accessed during execution (either by doing lookups or
full table scan)
3. IN-equality is checked at the right moment.
#2 is achieved by putting the materialized table into join_tab->table.
#3 is achieved by injecting IN-equality into the WHERE.
for #1, we need to place a call that fills the temptable at the right location.
6. User interface
=================
No @@optimizer_switch flags are planned at the moment. JTBM-materialization
will be controlled by existing materialization=on|off switch.
We could also add a generic materialization_scan=on|off switch.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 29 Mar 2010, 20:02)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.27113 2010-03-29 20:02:22.000000000 +0000
+++ /tmp/wklog.90.new.27113 2010-03-29 20:02:22.000000000 +0000
@@ -2,16 +2,15 @@
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
-2.2 On subquery predicate removal
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
-2.3 What is expected of the result of conversion
+2.1.1 On subquery predicate removal
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 Conversion process and its result
3. Pre-optimization steps
-3.1 Constant detection
-3.3 update_ref_and_keys
-3.4 JOIN_TAB sorting criteria
-4. Optimization
+3.1 update_ref_and_keys
+3.2 Constant detection
+4. Join optimization
5. Execution
-User interface.
+6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
@@ -19,48 +18,60 @@
1. Applicability check
======================
-The criteria for checking whether a subquery can be processed with
-jtbm-materialization can be checked at JOIN::prepare stage (like it
-happens with semi-join check)
+Applicability criteria of jtbm-materialization can be checked at
+JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
-Make it work like semi-join nests: each jtbm-predicate is converted into a
-TABLE_LIST object. This will make it
+Make it work like semi-join nests: a jtbm-predicate is converted into a
+TABLE_LIST object that is injected into select's join tree. This will make it
- - uniform with semi-joins (we've stepped on all rakes there)
- - allow to process JTBM-subqueries in ON expressions
+ - uniform with semi-join processing (and we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in LEFT JOINs' ON expressions
+ (as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
-for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
-i.e. for
+EXPLAIN EXTENDED will display the conversion result as join, that is, for
+subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
-we'll print
+it will print
- SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+ SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
-the XX part is not clear. we don't want to print 'ie' the second time here?
+After we've added a TABLE_LIST element, we'll need to remove the subquery
+predicate from the WHERE/On clause, in the same way as SJ-converter does it.
-2.2 On subquery predicate removal
----------------------------------
-Q: if we remove the subquery predicate permanently, who will run
-fix_fields() for it? For semi-joins we don't have the problem as we
-inject into ON expression (right? or not? we have sj_on_expr, too...
-(Investigation: we the the same Item* pointer both in WHERE and
-as sj_on_expr. fix_fields() is called for the WHERE part and that's
-how sj_on_expr gets fixed. This works as long as
-Item_func_eq::fix_fields() does not try to substitute itself with
-another item).
+TODO: note: we inject a new kind of base table and then next PS re-execution
+will re-run name resolution/etc on TABLE_LIST tree that includes this table.
+We'll need to make sure that materialized-table doesn't interfere with
+anything.
+
+TODO: Separation of steps that need to be done once and steps that need to be
+re-done for every PS execution.
+
+2.1.1 On subquery predicate removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: if we remove the subquery predicate permanently, who will call
+fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
+problem as we inject into ON expression
+
+(right? or not? we have sj_on_expr, too...
+ (Investigation: we the the same Item* pointer both in WHERE and
+ as sj_on_expr. fix_fields() is called for the WHERE part and that's
+ how sj_on_expr gets fixed. This works as long as
+ Item_func_eq::fix_fields() does not try to substitute itself with
+ another item)
+)
A: ?
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
@@ -70,71 +81,96 @@
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
-2.3 What is expected of the result of conversion
-------------------------------------------------
-Join [pre]optimization relies on each optimized entity to have a bit in
-table_map.
-
-TODO: where do we check if there will be enough bits for everyone? (at the
- point where we assign them?)
+Note: if we convert that late, we could remove the subquery predicate in the
+same way as remove_eq_conds() removes known-to-be-false AND-parts.
+This operation is undoable.
+
+2.3 Conversion process and its result
+-------------------------------------
+At the moment, the following is being implemented:
+
+* convert_join_subqueries_to_semijoins() handles both conversions to
+ semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
+ convert_subq_to_jtbm()).
+ - This means the process is done in bottom-up way (we walk down into
+ semi-join subselects but not jtbm-selects), with protection against
+ exhaustion of table bits beyond #MAX_TABLES, etc.
+
+* convert_subq_to_jtbm() invokes
+
+ subq_predicate->setup_engine()
+
+ which creates the temporary table, and then we use it as the TABLE* object
+ that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
+ One of the first actions is that we assign it a bit and store it in
+ table->map.
+ (Note that this won't work when we add a choice between Materialization and
+ IN->EXISTS)
-The bit stored in join_tab->table->map, and the apparent problem is that JTBM
-join_tabs do not naturally have TABLE* object.
+ (Another option is do like done in FROM optimization: create a 'preliminary'
+ TABLE* object which has only some of the fields filled in).
-We could use the the one that will be used for Materialization, but that will
-stop working when we will have to include IN->EXISTS in the choice.
-
-Current approach: don't create a table. create a table_map element in JOIN_TAB
-instead. Evgen has probably done something like that already.
+QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
+ where needed...
3. Pre-optimization steps
=========================
-JOIN_TABs are allocated in make_join_statistics(). This where the changes will
-be needed: for JOIN_TABs that correspond to JTBM-tables:
-
-- don't set tab->table, set tab->jtbm_select (or whatever)
+JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
+need:
- run subquery's optimizer to get its output cardinality
+- todo what else?
+
+3.1 update_ref_and_keys
+-----------------------
+Since we've injected the equality into WHERE/ON and created a TABLE* with
+appropriate index no special steps are necessary. The regular procedure will
+add the needed KEYUSE elements.
-3.1 Constant detection
+3.2 Constant detection
----------------------
-What about subqueries that "are constant"?
- const_item IN (SELECT uncorrelated) -> is constant, but not something
- we would want to evaluate.
- something IN (SELECT from_constant_join) -> is constant
+* We should not let const table detection code to look at the materialized
+ table, find it to be empty and mark it as constant.
-Do we need to mark their JOIN_TABs as constant?
+* We also should disallow eq_ref(const)-based detection for materialized table
+ That is, if outer_expr is found to be constant, the code should not attempt
+ to make a eq_ref(const) lookup in the materialized table.
-3.3 update_ref_and_keys
------------------------
-* Walk through JTBM elements and inject KEYUSE elements for their
- IN-equalities.
+TODO {
+ Second: What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate at optimization phase.
+
+ something IN (SELECT from_constant_join) -> is constant
+ Do we need to mark their JOIN_TABs as constant?)
+}
-TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+4. Join optimization
+====================
+Should just work, with exception that we need to make best_access_path not
+to use table->stats.records but use JOIN_TAB::records instead (which already
+exists and is filled)
-3.4 JOIN_TAB sorting criteria
------------------------------
-Q: Where do we put JTBM's join_tab when pre-sorting records?
-A: it should sort as regular table.
-
-TODO: where do we remove the predicates from the WHERE?
- - remove them like SJ-converter does
- - remove them with optimizer (like remove_eq_conds does)
-
-4. Optimization
-===============
-Add a branch in best_access_path to account for
-- JTBM-Materialization
-- JTBM-Materialization-Scan.
+NOTE: FROM-optimization does set table->stats.records. Whose approach is
+ wrong?
5. Execution
============
-* We should be able to reuse item_subselect.cc code for lookups
-* But will have to use our own temptable scan code
+We need that:
+1. Materialization operation (i.e. execute the subquery and fill the
+ materialized temptable) is done at start of execution
+2. Materialized table is accessed during execution (either by doing lookups or
+ full table scan)
+3. IN-equality is checked at the right moment.
+
+#2 is achieved by putting the materialized table into join_tab->table.
+#3 is achieved by injecting IN-equality into the WHERE.
-TODO: is it possible to have any unification with SJ-Materialization?
+for #1, we need to place a call that fills the temptable at the right location.
-User interface
---------------
-Any @@optimizer_switch flags for all this?
+6. User interface
+=================
+No @@optimizer_switch flags are planned at the moment. JTBM-materialization
+will be controlled by existing materialization=on|off switch.
+We could also add a generic materialization_scan=on|off switch.
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.1.1 On subquery predicate removal
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 Conversion process and its result
3. Pre-optimization steps
3.1 update_ref_and_keys
3.2 Constant detection
4. Join optimization
5. Execution
6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
Applicability criteria of jtbm-materialization can be checked at
JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: a jtbm-predicate is converted into a
TABLE_LIST object that is injected into select's join tree. This will make it
- uniform with semi-join processing (and we've stepped on all rakes there)
- allow to process JTBM-subqueries in LEFT JOINs' ON expressions
(as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
EXPLAIN EXTENDED will display the conversion result as join, that is, for
subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
it will print
SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
After we've added a TABLE_LIST element, we'll need to remove the subquery
predicate from the WHERE/On clause, in the same way as SJ-converter does it.
TODO: note: we inject a new kind of base table and then next PS re-execution
will re-run name resolution/etc on TABLE_LIST tree that includes this table.
We'll need to make sure that materialized-table doesn't interfere with
anything.
TODO: Separation of steps that need to be done once and steps that need to be
re-done for every PS execution.
2.1.1 On subquery predicate removal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: if we remove the subquery predicate permanently, who will call
fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
problem as we inject into ON expression
(right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item)
)
A: ?
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
Note: if we convert that late, we could remove the subquery predicate in the
same way as remove_eq_conds() removes known-to-be-false AND-parts.
This operation is undoable.
2.3 Conversion process and its result
-------------------------------------
At the moment, the following is being implemented:
* convert_join_subqueries_to_semijoins() handles both conversions to
semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
convert_subq_to_jtbm()).
- This means the process is done in bottom-up way (we walk down into
semi-join subselects but not jtbm-selects), with protection against
exhaustion of table bits beyond #MAX_TABLES, etc.
* convert_subq_to_jtbm() invokes
subq_predicate->setup_engine()
which creates the temporary table, and then we use it as the TABLE* object
that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
One of the first actions is that we assign it a bit and store it in
table->map.
(Note that this won't work when we add a choice between Materialization and
IN->EXISTS)
(Another option is do like done in FROM optimization: create a 'preliminary'
TABLE* object which has only some of the fields filled in).
QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
where needed...
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
need:
- run subquery's optimizer to get its output cardinality
- todo what else?
3.1 update_ref_and_keys
-----------------------
Since we've injected the equality into WHERE/ON and created a TABLE* with
appropriate index no special steps are necessary. The regular procedure will
add the needed KEYUSE elements.
3.2 Constant detection
----------------------
* We should not let const table detection code to look at the materialized
table, find it to be empty and mark it as constant.
* We also should disallow eq_ref(const)-based detection for materialized table
That is, if outer_expr is found to be constant, the code should not attempt
to make a eq_ref(const) lookup in the materialized table.
TODO {
Second: What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate at optimization phase.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?)
}
4. Join optimization
====================
Should just work, with exception that we need to make best_access_path not
to use table->stats.records but use JOIN_TAB::records instead (which already
exists and is filled)
NOTE: FROM-optimization does set table->stats.records. Whose approach is
wrong?
5. Execution
============
We need that:
1. Materialization operation (i.e. execute the subquery and fill the
materialized temptable) is done at start of execution
2. Materialized table is accessed during execution (either by doing lookups or
full table scan)
3. IN-equality is checked at the right moment.
#2 is achieved by putting the materialized table into join_tab->table.
#3 is achieved by injecting IN-equality into the WHERE.
for #1, we need to place a call that fills the temptable at the right location.
6. User interface
=================
No @@optimizer_switch flags are planned at the moment. JTBM-materialization
will be controlled by existing materialization=on|off switch.
We could also add a generic materialization_scan=on|off switch.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 29 Mar 2010, 20:02)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.27113 2010-03-29 20:02:22.000000000 +0000
+++ /tmp/wklog.90.new.27113 2010-03-29 20:02:22.000000000 +0000
@@ -2,16 +2,15 @@
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
-2.2 On subquery predicate removal
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
-2.3 What is expected of the result of conversion
+2.1.1 On subquery predicate removal
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 Conversion process and its result
3. Pre-optimization steps
-3.1 Constant detection
-3.3 update_ref_and_keys
-3.4 JOIN_TAB sorting criteria
-4. Optimization
+3.1 update_ref_and_keys
+3.2 Constant detection
+4. Join optimization
5. Execution
-User interface.
+6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
@@ -19,48 +18,60 @@
1. Applicability check
======================
-The criteria for checking whether a subquery can be processed with
-jtbm-materialization can be checked at JOIN::prepare stage (like it
-happens with semi-join check)
+Applicability criteria of jtbm-materialization can be checked at
+JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
-Make it work like semi-join nests: each jtbm-predicate is converted into a
-TABLE_LIST object. This will make it
+Make it work like semi-join nests: a jtbm-predicate is converted into a
+TABLE_LIST object that is injected into select's join tree. This will make it
- - uniform with semi-joins (we've stepped on all rakes there)
- - allow to process JTBM-subqueries in ON expressions
+ - uniform with semi-join processing (and we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in LEFT JOINs' ON expressions
+ (as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
-for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
-i.e. for
+EXPLAIN EXTENDED will display the conversion result as join, that is, for
+subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
-we'll print
+it will print
- SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+ SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
-the XX part is not clear. we don't want to print 'ie' the second time here?
+After we've added a TABLE_LIST element, we'll need to remove the subquery
+predicate from the WHERE/On clause, in the same way as SJ-converter does it.
-2.2 On subquery predicate removal
----------------------------------
-Q: if we remove the subquery predicate permanently, who will run
-fix_fields() for it? For semi-joins we don't have the problem as we
-inject into ON expression (right? or not? we have sj_on_expr, too...
-(Investigation: we the the same Item* pointer both in WHERE and
-as sj_on_expr. fix_fields() is called for the WHERE part and that's
-how sj_on_expr gets fixed. This works as long as
-Item_func_eq::fix_fields() does not try to substitute itself with
-another item).
+TODO: note: we inject a new kind of base table and then next PS re-execution
+will re-run name resolution/etc on TABLE_LIST tree that includes this table.
+We'll need to make sure that materialized-table doesn't interfere with
+anything.
+
+TODO: Separation of steps that need to be done once and steps that need to be
+re-done for every PS execution.
+
+2.1.1 On subquery predicate removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: if we remove the subquery predicate permanently, who will call
+fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
+problem as we inject into ON expression
+
+(right? or not? we have sj_on_expr, too...
+ (Investigation: we the the same Item* pointer both in WHERE and
+ as sj_on_expr. fix_fields() is called for the WHERE part and that's
+ how sj_on_expr gets fixed. This works as long as
+ Item_func_eq::fix_fields() does not try to substitute itself with
+ another item)
+)
A: ?
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
@@ -70,71 +81,96 @@
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
-2.3 What is expected of the result of conversion
-------------------------------------------------
-Join [pre]optimization relies on each optimized entity to have a bit in
-table_map.
-
-TODO: where do we check if there will be enough bits for everyone? (at the
- point where we assign them?)
+Note: if we convert that late, we could remove the subquery predicate in the
+same way as remove_eq_conds() removes known-to-be-false AND-parts.
+This operation is undoable.
+
+2.3 Conversion process and its result
+-------------------------------------
+At the moment, the following is being implemented:
+
+* convert_join_subqueries_to_semijoins() handles both conversions to
+ semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
+ convert_subq_to_jtbm()).
+ - This means the process is done in bottom-up way (we walk down into
+ semi-join subselects but not jtbm-selects), with protection against
+ exhaustion of table bits beyond #MAX_TABLES, etc.
+
+* convert_subq_to_jtbm() invokes
+
+ subq_predicate->setup_engine()
+
+ which creates the temporary table, and then we use it as the TABLE* object
+ that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
+ One of the first actions is that we assign it a bit and store it in
+ table->map.
+ (Note that this won't work when we add a choice between Materialization and
+ IN->EXISTS)
-The bit stored in join_tab->table->map, and the apparent problem is that JTBM
-join_tabs do not naturally have TABLE* object.
+ (Another option is do like done in FROM optimization: create a 'preliminary'
+ TABLE* object which has only some of the fields filled in).
-We could use the the one that will be used for Materialization, but that will
-stop working when we will have to include IN->EXISTS in the choice.
-
-Current approach: don't create a table. create a table_map element in JOIN_TAB
-instead. Evgen has probably done something like that already.
+QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
+ where needed...
3. Pre-optimization steps
=========================
-JOIN_TABs are allocated in make_join_statistics(). This where the changes will
-be needed: for JOIN_TABs that correspond to JTBM-tables:
-
-- don't set tab->table, set tab->jtbm_select (or whatever)
+JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
+need:
- run subquery's optimizer to get its output cardinality
+- todo what else?
+
+3.1 update_ref_and_keys
+-----------------------
+Since we've injected the equality into WHERE/ON and created a TABLE* with
+appropriate index no special steps are necessary. The regular procedure will
+add the needed KEYUSE elements.
-3.1 Constant detection
+3.2 Constant detection
----------------------
-What about subqueries that "are constant"?
- const_item IN (SELECT uncorrelated) -> is constant, but not something
- we would want to evaluate.
- something IN (SELECT from_constant_join) -> is constant
+* We should not let const table detection code to look at the materialized
+ table, find it to be empty and mark it as constant.
-Do we need to mark their JOIN_TABs as constant?
+* We also should disallow eq_ref(const)-based detection for materialized table
+ That is, if outer_expr is found to be constant, the code should not attempt
+ to make a eq_ref(const) lookup in the materialized table.
-3.3 update_ref_and_keys
------------------------
-* Walk through JTBM elements and inject KEYUSE elements for their
- IN-equalities.
+TODO {
+ Second: What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate at optimization phase.
+
+ something IN (SELECT from_constant_join) -> is constant
+ Do we need to mark their JOIN_TABs as constant?)
+}
-TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+4. Join optimization
+====================
+Should just work, with exception that we need to make best_access_path not
+to use table->stats.records but use JOIN_TAB::records instead (which already
+exists and is filled)
-3.4 JOIN_TAB sorting criteria
------------------------------
-Q: Where do we put JTBM's join_tab when pre-sorting records?
-A: it should sort as regular table.
-
-TODO: where do we remove the predicates from the WHERE?
- - remove them like SJ-converter does
- - remove them with optimizer (like remove_eq_conds does)
-
-4. Optimization
-===============
-Add a branch in best_access_path to account for
-- JTBM-Materialization
-- JTBM-Materialization-Scan.
+NOTE: FROM-optimization does set table->stats.records. Whose approach is
+ wrong?
5. Execution
============
-* We should be able to reuse item_subselect.cc code for lookups
-* But will have to use our own temptable scan code
+We need that:
+1. Materialization operation (i.e. execute the subquery and fill the
+ materialized temptable) is done at start of execution
+2. Materialized table is accessed during execution (either by doing lookups or
+ full table scan)
+3. IN-equality is checked at the right moment.
+
+#2 is achieved by putting the materialized table into join_tab->table.
+#3 is achieved by injecting IN-equality into the WHERE.
-TODO: is it possible to have any unification with SJ-Materialization?
+for #1, we need to place a call that fills the temptable at the right location.
-User interface
---------------
-Any @@optimizer_switch flags for all this?
+6. User interface
+=================
+No @@optimizer_switch flags are planned at the moment. JTBM-materialization
+will be controlled by existing materialization=on|off switch.
+We could also add a generic materialization_scan=on|off switch.
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.1.1 On subquery predicate removal
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 Conversion process and its result
3. Pre-optimization steps
3.1 update_ref_and_keys
3.2 Constant detection
4. Join optimization
5. Execution
6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
Applicability criteria of jtbm-materialization can be checked at
JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: a jtbm-predicate is converted into a
TABLE_LIST object that is injected into select's join tree. This will make it
- uniform with semi-join processing (and we've stepped on all rakes there)
- allow to process JTBM-subqueries in LEFT JOINs' ON expressions
(as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
EXPLAIN EXTENDED will display the conversion result as join, that is, for
subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
it will print
SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
After we've added a TABLE_LIST element, we'll need to remove the subquery
predicate from the WHERE/On clause, in the same way as SJ-converter does it.
TODO: note: we inject a new kind of base table and then next PS re-execution
will re-run name resolution/etc on TABLE_LIST tree that includes this table.
We'll need to make sure that materialized-table doesn't interfere with
anything.
TODO: Separation of steps that need to be done once and steps that need to be
re-done for every PS execution.
2.1.1 On subquery predicate removal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: if we remove the subquery predicate permanently, who will call
fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
problem as we inject into ON expression
(right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item)
)
A: ?
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
Note: if we convert that late, we could remove the subquery predicate in the
same way as remove_eq_conds() removes known-to-be-false AND-parts.
This operation is undoable.
2.3 Conversion process and its result
-------------------------------------
At the moment, the following is being implemented:
* convert_join_subqueries_to_semijoins() handles both conversions to
semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
convert_subq_to_jtbm()).
- This means the process is done in bottom-up way (we walk down into
semi-join subselects but not jtbm-selects), with protection against
exhaustion of table bits beyond #MAX_TABLES, etc.
* convert_subq_to_jtbm() invokes
subq_predicate->setup_engine()
which creates the temporary table, and then we use it as the TABLE* object
that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
One of the first actions is that we assign it a bit and store it in
table->map.
(Note that this won't work when we add a choice between Materialization and
IN->EXISTS)
(Another option is do like done in FROM optimization: create a 'preliminary'
TABLE* object which has only some of the fields filled in).
QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
where needed...
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
need:
- run subquery's optimizer to get its output cardinality
- todo what else?
3.1 update_ref_and_keys
-----------------------
Since we've injected the equality into WHERE/ON and created a TABLE* with
appropriate index no special steps are necessary. The regular procedure will
add the needed KEYUSE elements.
3.2 Constant detection
----------------------
* We should not let const table detection code to look at the materialized
table, find it to be empty and mark it as constant.
* We also should disallow eq_ref(const)-based detection for materialized table
That is, if outer_expr is found to be constant, the code should not attempt
to make a eq_ref(const) lookup in the materialized table.
TODO {
Second: What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate at optimization phase.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?)
}
4. Join optimization
====================
Should just work, with exception that we need to make best_access_path not
to use table->stats.records but use JOIN_TAB::records instead (which already
exists and is filled)
NOTE: FROM-optimization does set table->stats.records. Whose approach is
wrong?
5. Execution
============
We need that:
1. Materialization operation (i.e. execute the subquery and fill the
materialized temptable) is done at start of execution
2. Materialized table is accessed during execution (either by doing lookups or
full table scan)
3. IN-equality is checked at the right moment.
#2 is achieved by putting the materialized table into join_tab->table.
#3 is achieved by injecting IN-equality into the WHERE.
for #1, we need to place a call that fills the temptable at the right location.
6. User interface
=================
No @@optimizer_switch flags are planned at the moment. JTBM-materialization
will be controlled by existing materialization=on|off switch.
We could also add a generic materialization_scan=on|off switch.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 29 Mar 2010, 20:02)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.27113 2010-03-29 20:02:22.000000000 +0000
+++ /tmp/wklog.90.new.27113 2010-03-29 20:02:22.000000000 +0000
@@ -2,16 +2,15 @@
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
-2.2 On subquery predicate removal
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
-2.3 What is expected of the result of conversion
+2.1.1 On subquery predicate removal
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 Conversion process and its result
3. Pre-optimization steps
-3.1 Constant detection
-3.3 update_ref_and_keys
-3.4 JOIN_TAB sorting criteria
-4. Optimization
+3.1 update_ref_and_keys
+3.2 Constant detection
+4. Join optimization
5. Execution
-User interface.
+6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
@@ -19,48 +18,60 @@
1. Applicability check
======================
-The criteria for checking whether a subquery can be processed with
-jtbm-materialization can be checked at JOIN::prepare stage (like it
-happens with semi-join check)
+Applicability criteria of jtbm-materialization can be checked at
+JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
-Make it work like semi-join nests: each jtbm-predicate is converted into a
-TABLE_LIST object. This will make it
+Make it work like semi-join nests: a jtbm-predicate is converted into a
+TABLE_LIST object that is injected into select's join tree. This will make it
- - uniform with semi-joins (we've stepped on all rakes there)
- - allow to process JTBM-subqueries in ON expressions
+ - uniform with semi-join processing (and we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in LEFT JOINs' ON expressions
+ (as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
-for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
-i.e. for
+EXPLAIN EXTENDED will display the conversion result as join, that is, for
+subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
-we'll print
+it will print
- SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+ SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
-the XX part is not clear. we don't want to print 'ie' the second time here?
+After we've added a TABLE_LIST element, we'll need to remove the subquery
+predicate from the WHERE/On clause, in the same way as SJ-converter does it.
-2.2 On subquery predicate removal
----------------------------------
-Q: if we remove the subquery predicate permanently, who will run
-fix_fields() for it? For semi-joins we don't have the problem as we
-inject into ON expression (right? or not? we have sj_on_expr, too...
-(Investigation: we the the same Item* pointer both in WHERE and
-as sj_on_expr. fix_fields() is called for the WHERE part and that's
-how sj_on_expr gets fixed. This works as long as
-Item_func_eq::fix_fields() does not try to substitute itself with
-another item).
+TODO: note: we inject a new kind of base table and then next PS re-execution
+will re-run name resolution/etc on TABLE_LIST tree that includes this table.
+We'll need to make sure that materialized-table doesn't interfere with
+anything.
+
+TODO: Separation of steps that need to be done once and steps that need to be
+re-done for every PS execution.
+
+2.1.1 On subquery predicate removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: if we remove the subquery predicate permanently, who will call
+fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
+problem as we inject into ON expression
+
+(right? or not? we have sj_on_expr, too...
+ (Investigation: we the the same Item* pointer both in WHERE and
+ as sj_on_expr. fix_fields() is called for the WHERE part and that's
+ how sj_on_expr gets fixed. This works as long as
+ Item_func_eq::fix_fields() does not try to substitute itself with
+ another item)
+)
A: ?
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
@@ -70,71 +81,96 @@
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
-2.3 What is expected of the result of conversion
-------------------------------------------------
-Join [pre]optimization relies on each optimized entity to have a bit in
-table_map.
-
-TODO: where do we check if there will be enough bits for everyone? (at the
- point where we assign them?)
+Note: if we convert that late, we could remove the subquery predicate in the
+same way as remove_eq_conds() removes known-to-be-false AND-parts.
+This operation is undoable.
+
+2.3 Conversion process and its result
+-------------------------------------
+At the moment, the following is being implemented:
+
+* convert_join_subqueries_to_semijoins() handles both conversions to
+ semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
+ convert_subq_to_jtbm()).
+ - This means the process is done in bottom-up way (we walk down into
+ semi-join subselects but not jtbm-selects), with protection against
+ exhaustion of table bits beyond #MAX_TABLES, etc.
+
+* convert_subq_to_jtbm() invokes
+
+ subq_predicate->setup_engine()
+
+ which creates the temporary table, and then we use it as the TABLE* object
+ that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
+ One of the first actions is that we assign it a bit and store it in
+ table->map.
+ (Note that this won't work when we add a choice between Materialization and
+ IN->EXISTS)
-The bit stored in join_tab->table->map, and the apparent problem is that JTBM
-join_tabs do not naturally have TABLE* object.
+ (Another option is do like done in FROM optimization: create a 'preliminary'
+ TABLE* object which has only some of the fields filled in).
-We could use the the one that will be used for Materialization, but that will
-stop working when we will have to include IN->EXISTS in the choice.
-
-Current approach: don't create a table. create a table_map element in JOIN_TAB
-instead. Evgen has probably done something like that already.
+QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
+ where needed...
3. Pre-optimization steps
=========================
-JOIN_TABs are allocated in make_join_statistics(). This where the changes will
-be needed: for JOIN_TABs that correspond to JTBM-tables:
-
-- don't set tab->table, set tab->jtbm_select (or whatever)
+JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
+need:
- run subquery's optimizer to get its output cardinality
+- todo what else?
+
+3.1 update_ref_and_keys
+-----------------------
+Since we've injected the equality into WHERE/ON and created a TABLE* with
+appropriate index no special steps are necessary. The regular procedure will
+add the needed KEYUSE elements.
-3.1 Constant detection
+3.2 Constant detection
----------------------
-What about subqueries that "are constant"?
- const_item IN (SELECT uncorrelated) -> is constant, but not something
- we would want to evaluate.
- something IN (SELECT from_constant_join) -> is constant
+* We should not let const table detection code to look at the materialized
+ table, find it to be empty and mark it as constant.
-Do we need to mark their JOIN_TABs as constant?
+* We also should disallow eq_ref(const)-based detection for materialized table
+ That is, if outer_expr is found to be constant, the code should not attempt
+ to make a eq_ref(const) lookup in the materialized table.
-3.3 update_ref_and_keys
------------------------
-* Walk through JTBM elements and inject KEYUSE elements for their
- IN-equalities.
+TODO {
+ Second: What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate at optimization phase.
+
+ something IN (SELECT from_constant_join) -> is constant
+ Do we need to mark their JOIN_TABs as constant?)
+}
-TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+4. Join optimization
+====================
+Should just work, with exception that we need to make best_access_path not
+to use table->stats.records but use JOIN_TAB::records instead (which already
+exists and is filled)
-3.4 JOIN_TAB sorting criteria
------------------------------
-Q: Where do we put JTBM's join_tab when pre-sorting records?
-A: it should sort as regular table.
-
-TODO: where do we remove the predicates from the WHERE?
- - remove them like SJ-converter does
- - remove them with optimizer (like remove_eq_conds does)
-
-4. Optimization
-===============
-Add a branch in best_access_path to account for
-- JTBM-Materialization
-- JTBM-Materialization-Scan.
+NOTE: FROM-optimization does set table->stats.records. Whose approach is
+ wrong?
5. Execution
============
-* We should be able to reuse item_subselect.cc code for lookups
-* But will have to use our own temptable scan code
+We need that:
+1. Materialization operation (i.e. execute the subquery and fill the
+ materialized temptable) is done at start of execution
+2. Materialized table is accessed during execution (either by doing lookups or
+ full table scan)
+3. IN-equality is checked at the right moment.
+
+#2 is achieved by putting the materialized table into join_tab->table.
+#3 is achieved by injecting IN-equality into the WHERE.
-TODO: is it possible to have any unification with SJ-Materialization?
+for #1, we need to place a call that fills the temptable at the right location.
-User interface
---------------
-Any @@optimizer_switch flags for all this?
+6. User interface
+=================
+No @@optimizer_switch flags are planned at the moment. JTBM-materialization
+will be controlled by existing materialization=on|off switch.
+We could also add a generic materialization_scan=on|off switch.
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.1.1 On subquery predicate removal
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 Conversion process and its result
3. Pre-optimization steps
3.1 update_ref_and_keys
3.2 Constant detection
4. Join optimization
5. Execution
6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
Applicability criteria of jtbm-materialization can be checked at
JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: a jtbm-predicate is converted into a
TABLE_LIST object that is injected into select's join tree. This will make it
- uniform with semi-join processing (and we've stepped on all rakes there)
- allow to process JTBM-subqueries in LEFT JOINs' ON expressions
(as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
EXPLAIN EXTENDED will display the conversion result as join, that is, for
subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
it will print
SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
After we've added a TABLE_LIST element, we'll need to remove the subquery
predicate from the WHERE/On clause, in the same way as SJ-converter does it.
TODO: note: we inject a new kind of base table and then next PS re-execution
will re-run name resolution/etc on TABLE_LIST tree that includes this table.
We'll need to make sure that materialized-table doesn't interfere with
anything.
TODO: Separation of steps that need to be done once and steps that need to be
re-done for every PS execution.
2.1.1 On subquery predicate removal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: if we remove the subquery predicate permanently, who will call
fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
problem as we inject into ON expression
(right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item)
)
A: ?
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
Note: if we convert that late, we could remove the subquery predicate in the
same way as remove_eq_conds() removes known-to-be-false AND-parts.
This operation is undoable.
2.3 Conversion process and its result
-------------------------------------
At the moment, the following is being implemented:
* convert_join_subqueries_to_semijoins() handles both conversions to
semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
convert_subq_to_jtbm()).
- This means the process is done in bottom-up way (we walk down into
semi-join subselects but not jtbm-selects), with protection against
exhaustion of table bits beyond #MAX_TABLES, etc.
* convert_subq_to_jtbm() invokes
subq_predicate->setup_engine()
which creates the temporary table, and then we use it as the TABLE* object
that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
One of the first actions is that we assign it a bit and store it in
table->map.
(Note that this won't work when we add a choice between Materialization and
IN->EXISTS)
(Another option is do like done in FROM optimization: create a 'preliminary'
TABLE* object which has only some of the fields filled in).
QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
where needed...
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
need:
- run subquery's optimizer to get its output cardinality
- todo what else?
3.1 update_ref_and_keys
-----------------------
Since we've injected the equality into WHERE/ON and created a TABLE* with
appropriate index no special steps are necessary. The regular procedure will
add the needed KEYUSE elements.
3.2 Constant detection
----------------------
* We should not let const table detection code to look at the materialized
table, find it to be empty and mark it as constant.
* We also should disallow eq_ref(const)-based detection for materialized table
That is, if outer_expr is found to be constant, the code should not attempt
to make a eq_ref(const) lookup in the materialized table.
TODO {
Second: What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate at optimization phase.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?)
}
4. Join optimization
====================
Should just work, with exception that we need to make best_access_path not
to use table->stats.records but use JOIN_TAB::records instead (which already
exists and is filled)
NOTE: FROM-optimization does set table->stats.records. Whose approach is
wrong?
5. Execution
============
We need that:
1. Materialization operation (i.e. execute the subquery and fill the
materialized temptable) is done at start of execution
2. Materialized table is accessed during execution (either by doing lookups or
full table scan)
3. IN-equality is checked at the right moment.
#2 is achieved by putting the materialized table into join_tab->table.
#3 is achieved by injecting IN-equality into the WHERE.
for #1, we need to place a call that fills the temptable at the right location.
6. User interface
=================
No @@optimizer_switch flags are planned at the moment. JTBM-materialization
will be controlled by existing materialization=on|off switch.
We could also add a generic materialization_scan=on|off switch.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2835: Fixed random failure in test system
by noreply@launchpad.net 29 Mar '10
by noreply@launchpad.net 29 Mar '10
29 Mar '10
------------------------------------------------------------
revno: 2835
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: maria-5.1
timestamp: Mon 2010-03-29 22:07:45 +0300
message:
Fixed random failure in test system
Removed and suppressed compiler warnings
modified:
mysql-test/suite/parts/t/rpl_partition.test
mysql-test/suite/rpl/t/rpl_row_trig003.test
sql/item_create.cc
sql/net_serv.cc
support-files/compiler_warnings.supp
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2835)
by Michael Widenius 29 Mar '10
by Michael Widenius 29 Mar '10
29 Mar '10
#At lp:maria based on revid:monty@askmonty.org-20100328181000-qyehmg51jkgpht8w
2835 Michael Widenius 2010-03-29
Fixed random failure in test system
Removed and suppressed compiler warnings
modified:
mysql-test/suite/parts/t/rpl_partition.test
mysql-test/suite/rpl/t/rpl_row_trig003.test
sql/item_create.cc
sql/net_serv.cc
support-files/compiler_warnings.supp
per-file messages:
mysql-test/suite/parts/t/rpl_partition.test
Mark as big test (as test takes > 5 minutes)
mysql-test/suite/rpl/t/rpl_row_trig003.test
Fixed random failure
sql/item_create.cc
Renamed create -> create_builder() to avoid compiler warnings about function hidden by others
sql/net_serv.cc
Removed compiler warning
support-files/compiler_warnings.supp
Supress some warnings
Remove suppresstion from set_var.cc, as this has already been fixed
Added missing \ before .c
Added suppression of warnings that are due to gcc bug (.*discards qualifiers from pointer target type.*)
=== modified file 'mysql-test/suite/parts/t/rpl_partition.test'
--- a/mysql-test/suite/parts/t/rpl_partition.test 2009-02-01 12:00:48 +0000
+++ b/mysql-test/suite/parts/t/rpl_partition.test 2010-03-29 19:07:45 +0000
@@ -1,6 +1,7 @@
--source include/have_partition.inc
--source include/have_innodb.inc
--source include/master-slave.inc
+--source include/big_test.inc
--vertical_results
=== modified file 'mysql-test/suite/rpl/t/rpl_row_trig003.test'
--- a/mysql-test/suite/rpl/t/rpl_row_trig003.test 2007-12-06 15:27:10 +0000
+++ b/mysql-test/suite/rpl/t/rpl_row_trig003.test 2010-03-29 19:07:45 +0000
@@ -149,6 +149,8 @@ DROP TABLE IF EXISTS test.t2;
DROP TABLE IF EXISTS test.t3;
--enable_warnings
+sync_slave_with_master;
+
diff_files $MYSQLTEST_VARDIR/tmp/trg003_master.sql $MYSQLTEST_VARDIR/tmp/trg003_slave.sql;
# End of 5.0 test case
=== modified file 'sql/item_create.cc'
--- a/sql/item_create.cc 2010-03-28 18:10:00 +0000
+++ b/sql/item_create.cc 2010-03-29 19:07:45 +0000
@@ -76,7 +76,7 @@ public:
@param thd The current thread
@return An item representing the function call
*/
- virtual Item *create(THD *thd) = 0;
+ virtual Item *create_builder(THD *thd) = 0;
protected:
/** Constructor. */
@@ -498,7 +498,7 @@ protected:
class Create_func_connection_id : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_connection_id s_singleton;
@@ -943,7 +943,7 @@ protected:
class Create_func_found_rows : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_found_rows s_singleton;
@@ -1647,7 +1647,7 @@ protected:
class Create_func_pi : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_pi s_singleton;
@@ -1766,7 +1766,7 @@ protected:
class Create_func_row_count : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_row_count s_singleton;
@@ -2136,7 +2136,7 @@ protected:
class Create_func_uuid : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_uuid s_singleton;
@@ -2149,7 +2149,7 @@ protected:
class Create_func_uuid_short : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_uuid_short s_singleton;
@@ -2162,7 +2162,7 @@ protected:
class Create_func_version : public Create_func_arg0
{
public:
- virtual Item *create(THD *thd);
+ virtual Item *create_builder(THD *thd);
static Create_func_version s_singleton;
@@ -2538,7 +2538,7 @@ Create_func_arg0::create(THD *thd, LEX_S
return NULL;
}
- return create(thd);
+ return create_builder(thd);
}
@@ -2882,7 +2882,7 @@ Create_func_compress::create_1_arg(THD *
Create_func_connection_id Create_func_connection_id::s_singleton;
Item*
-Create_func_connection_id::create(THD *thd)
+Create_func_connection_id::create_builder(THD *thd)
{
thd->lex->safe_to_cache_query= 0;
return new (thd->mem_root) Item_func_connection_id();
@@ -3361,7 +3361,7 @@ Create_func_format::create_2_arg(THD *th
Create_func_found_rows Create_func_found_rows::s_singleton;
Item*
-Create_func_found_rows::create(THD *thd)
+Create_func_found_rows::create_builder(THD *thd)
{
thd->lex->set_stmt_unsafe();
thd->lex->safe_to_cache_query= 0;
@@ -4120,7 +4120,7 @@ Create_func_period_diff::create_2_arg(TH
Create_func_pi Create_func_pi::s_singleton;
Item*
-Create_func_pi::create(THD *thd)
+Create_func_pi::create_builder(THD *thd)
{
return new (thd->mem_root) Item_static_float_func("pi()", M_PI, 6, 8);
}
@@ -4274,7 +4274,7 @@ Create_func_round::create_native(THD *th
Create_func_row_count Create_func_row_count::s_singleton;
Item*
-Create_func_row_count::create(THD *thd)
+Create_func_row_count::create_builder(THD *thd)
{
thd->lex->set_stmt_unsafe();
thd->lex->safe_to_cache_query= 0;
@@ -4584,7 +4584,7 @@ Create_func_unix_timestamp::create_nativ
Create_func_uuid Create_func_uuid::s_singleton;
Item*
-Create_func_uuid::create(THD *thd)
+Create_func_uuid::create_builder(THD *thd)
{
thd->lex->set_stmt_unsafe();
thd->lex->safe_to_cache_query= 0;
@@ -4595,7 +4595,7 @@ Create_func_uuid::create(THD *thd)
Create_func_uuid_short Create_func_uuid_short::s_singleton;
Item*
-Create_func_uuid_short::create(THD *thd)
+Create_func_uuid_short::create_builder(THD *thd)
{
thd->lex->set_stmt_unsafe();
thd->lex->safe_to_cache_query= 0;
@@ -4606,7 +4606,7 @@ Create_func_uuid_short::create(THD *thd)
Create_func_version Create_func_version::s_singleton;
Item*
-Create_func_version::create(THD *thd)
+Create_func_version::create_builder(THD *thd)
{
thd->lex->set_stmt_unsafe();
return new (thd->mem_root) Item_static_string_func("version()",
=== modified file 'sql/net_serv.cc'
--- a/sql/net_serv.cc 2010-02-23 12:04:58 +0000
+++ b/sql/net_serv.cc 2010-03-29 19:07:45 +0000
@@ -279,7 +279,7 @@ static int net_data_is_ready(my_socket s
@param clear_buffer if <> 0, then clear all data from comm buff
*/
-void net_clear(NET *net, my_bool clear_buffer)
+void net_clear(NET *net, my_bool clear_buffer __attribute__((unused)))
{
#if !defined(EMBEDDED_LIBRARY) && defined(DBUG_OFF)
size_t count;
=== modified file 'support-files/compiler_warnings.supp'
--- a/support-files/compiler_warnings.supp 2010-03-28 18:10:00 +0000
+++ b/support-files/compiler_warnings.supp 2010-03-29 19:07:45 +0000
@@ -39,6 +39,7 @@ sync/sync0rw\.c: unused parameter
sync/sync0sync\.c: unused parameter
sync/sync0sync\.c: unused variable
ut/ut0ut\.c: ignoring return value of
+srv/srv0srv\.c: value computed is not used
#
# bdb is not critical to keep up to date
@@ -128,18 +129,18 @@ include/runtime.hpp: .*pure_error.*
#
listener.cc : .*conversion from 'SOCKET' to 'int'.*
net_serv.cc : .*conversion from 'SOCKET' to 'int'.*
-set_var.cc: right-hand operand of comma has no effect : 1000-1400
-
# allow a little moving space for the warning below
-mi_packrec.c : .*result of 32-bit shift implicitly converted to 64 bits.* : 560-600
-ma_packrec.c : .*result of 32-bit shift implicitly converted to 64 bits.* : 550-650
+mi_packrec\.c : .*result of 32-bit shift implicitly converted to 64 bits.* : 560-600
+ma_packrec\.c : .*result of 32-bit shift implicitly converted to 64 bits.* : 550-650
#
# Wrong compiler warnings
#
.* : .*no matching operator delete found; memory will not be freed if initialization throws an exception.*
-ctype-simple.c : .*unary minus operator applied to unsigned type, result still unsigned.*
+ctype-simple\.c : .*unary minus operator applied to unsigned type, result still unsigned.*
# Wrong warning due to GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29478
-regexec\.c : .*passing argument 3 of.*matcher.* discards qualifiers from pointer target type.*
+regexec\.c : passing argument 3 of.*matcher.* discards qualifiers from pointer target type
+libmysql\.c: passing argument 2 of .*memcpy.* discards qualifiers from pointer target type : 3000-4000
+storage/xtradb/dict/dict0dict\.c: passing argument 1 of .*strcpy.* discards qualifiers from pointer target type : 2500-3500
1
0
[Maria-developers] Rev 2758: Atomic operation removed because we do not need it. in file:///home/bell/maria/bzr/work-maria-5.2-sync/
by sanja@askmonty.org 29 Mar '10
by sanja@askmonty.org 29 Mar '10
29 Mar '10
At file:///home/bell/maria/bzr/work-maria-5.2-sync/
------------------------------------------------------------
revno: 2758
revision-id: sanja(a)askmonty.org-20100329185318-s8s6lsuvm12oi8yb
parent: sergii(a)pisem.net-20100329151428-zkvrkidiyqmb19pn
committer: sanja(a)askmonty.org
branch nick: work-maria-5.2-sync
timestamp: Mon 2010-03-29 21:53:18 +0300
message:
Atomic operation removed because we do not need it.
=== modified file 'storage/maria/ma_loghandler.c'
--- a/storage/maria/ma_loghandler.c 2010-03-15 11:51:23 +0000
+++ b/storage/maria/ma_loghandler.c 2010-03-29 18:53:18 +0000
@@ -8060,18 +8060,14 @@
/* keep values for soft sync() and forced sync() actual */
{
uint32 fileno= LSN_FILE_NO(lsn);
- my_atomic_rwlock_wrlock(&soft_sync_rwl);
- my_atomic_store32(&soft_sync_min, fileno);
- my_atomic_store32(&soft_sync_max, fileno);
- my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ soft_sync_min= fileno;
+ soft_sync_max= fileno;
}
}
else
{
- my_atomic_rwlock_wrlock(&soft_sync_rwl);
- my_atomic_store32(&soft_sync_max, LSN_FILE_NO(lsn));
- my_atomic_store32(&soft_need_sync, 1);
- my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ soft_sync_max= lsn;
+ soft_need_sync= 1;
}
DBUG_ASSERT(flush_horizon <= log_descriptor.horizon);
@@ -8464,9 +8460,7 @@
translog_status == TRANSLOG_READONLY);
soft= soft_sync;
- my_atomic_rwlock_wrlock(&soft_sync_rwl);
- min_unsync= my_atomic_load32(&soft_sync_min);
- my_atomic_rwlock_wrunlock(&soft_sync_rwl);
+ min_unsync= soft_sync_min;
DBUG_PRINT("info", ("min_unsync: %lu", (ulong) min_unsync));
if (soft && min_unsync < last_need_file)
{
@@ -8748,9 +8742,7 @@
uint32 min;
DBUG_ENTER("ma_translog_sync");
- my_atomic_rwlock_rdlock(&soft_sync_rwl);
- min= my_atomic_load32(&soft_sync_min);
- my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ min= soft_sync_min;
if (!min)
min= max;
@@ -8796,13 +8788,11 @@
ulonglong prev_loop= my_micro_time();
ulonglong time, sleep;
uint32 min, max, sync_request;
- my_atomic_rwlock_rdlock(&soft_sync_rwl);
- min= my_atomic_load32(&soft_sync_min);
- max= my_atomic_load32(&soft_sync_max);
- sync_request= my_atomic_load32(&soft_need_sync);
- my_atomic_store32(&soft_sync_min, max);
- my_atomic_store32(&soft_need_sync, 0);
- my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ min= soft_sync_min;
+ max= soft_sync_max;
+ sync_request= soft_need_sync;
+ soft_sync_min= max;
+ soft_need_sync= 0;
sleep= group_commit_wait;
if (sync_request)
@@ -8834,15 +8824,13 @@
DBUG_ENTER("translog_soft_sync_start");
/* check and init variables */
- my_atomic_rwlock_rdlock(&soft_sync_rwl);
- min= my_atomic_load32(&soft_sync_min);
- max= my_atomic_load32(&soft_sync_max);
+ min= soft_sync_min;
+ max= soft_sync_max;
if (!max)
- my_atomic_store32(&soft_sync_max, (max= get_current_logfile()->number));
+ soft_sync_max= max= get_current_logfile()->number;
if (!min)
- my_atomic_store32(&soft_sync_min, max);
- my_atomic_store32(&soft_need_sync, 1);
- my_atomic_rwlock_rdunlock(&soft_sync_rwl);
+ soft_sync_min= max;
+ soft_need_sync= 1;
if (!(res= ma_service_thread_control_init(&soft_sync_control)))
if (!(res= pthread_create(&th, NULL, ma_soft_sync_background, NULL)))
1
0
[Maria-developers] Updated (by Knielsen): Merge OQGraph into MariaDB (112)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Merge OQGraph into MariaDB
CREATION DATE..: Mon, 29 Mar 2010, 18:00
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 112 (http://askmonty.org/worklog/?tid=112)
VERSION........: Server-5.2
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 15 (hours remain)
ORIG. ESTIMATE.: 15
PROGRESS NOTES:
-=-=(Knielsen - Mon, 29 Mar 2010, 18:09)=-=-
High-Level Specification modified.
--- /tmp/wklog.112.old.23061 2010-03-29 18:09:27.000000000 +0000
+++ /tmp/wklog.112.new.23061 2010-03-29 18:09:27.000000000 +0000
@@ -1 +1,28 @@
+Tasks:
+
+Find the latest version of OQGraph to base this on (there should be a
+Launchpad branch somewhere, match it up with what is in the OQGraph patch for
+MySQL 5.0 in the ourdelta stuff).
+
+Extract the correct version of Boost from the MySQL 5.0 ourdelta patch. This
+is a patched version of Boost fixing a bug that is supposedly fatal for
+OQGraph (details are not known at the time of writing).
+
+Document in OQGraph README the need for boost of a specific version, and point
+to where it can be obtained. Also include the patch for boost if the correct
+base version of boost to do this against can be determined.
+
+Install the patched boost in /usr/local/ on the build machines (release builds
+and selected Buildbot slaves).
+
+Fix OQGraph plug.in to detect correct version of OQGraph that makes the build
+not break. Check which version in Ubuntu starts working (I think it was
+Jaunty), and require at least that version.
+
+Setup some repository or source tarball of the patched boost
+somewhere. Preferably a Launchpad branch or similar (if upstream project can
+be found).
+
+Setup in plug.in or /configure.in appropriate --with-boost=xxx. Or in a pinch,
+we can make do with CFLAGS=-Ixxx, or even default look in /usr/local/.
DESCRIPTION:
Get the OQGraph storage engine merged into MariaDB, fixing the remaining
problems blocking the merge.
HIGH-LEVEL SPECIFICATION:
Tasks:
Find the latest version of OQGraph to base this on (there should be a
Launchpad branch somewhere, match it up with what is in the OQGraph patch for
MySQL 5.0 in the ourdelta stuff).
Extract the correct version of Boost from the MySQL 5.0 ourdelta patch. This
is a patched version of Boost fixing a bug that is supposedly fatal for
OQGraph (details are not known at the time of writing).
Document in OQGraph README the need for boost of a specific version, and point
to where it can be obtained. Also include the patch for boost if the correct
base version of boost to do this against can be determined.
Install the patched boost in /usr/local/ on the build machines (release builds
and selected Buildbot slaves).
Fix OQGraph plug.in to detect correct version of OQGraph that makes the build
not break. Check which version in Ubuntu starts working (I think it was
Jaunty), and require at least that version.
Setup some repository or source tarball of the patched boost
somewhere. Preferably a Launchpad branch or similar (if upstream project can
be found).
Setup in plug.in or /configure.in appropriate --with-boost=xxx. Or in a pinch,
we can make do with CFLAGS=-Ixxx, or even default look in /usr/local/.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Knielsen): Merge OQGraph into MariaDB (112)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Merge OQGraph into MariaDB
CREATION DATE..: Mon, 29 Mar 2010, 18:00
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 112 (http://askmonty.org/worklog/?tid=112)
VERSION........: Server-5.2
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 15 (hours remain)
ORIG. ESTIMATE.: 15
PROGRESS NOTES:
DESCRIPTION:
Get the OQGraph storage engine merged into MariaDB, fixing the remaining
problems blocking the merge.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make EXPLAIN always show materialization separately (110)
by worklog-noreply@askmonty.org 29 Mar '10
by worklog-noreply@askmonty.org 29 Mar '10
29 Mar '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make EXPLAIN always show materialization separately
CREATION DATE..: Mon, 29 Mar 2010, 06:45
SUPERVISOR.....: Igor
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 110 (http://askmonty.org/worklog/?tid=110)
VERSION........: Server-5.3
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 29 Mar 2010, 14:09)=-=-
Status updated.
--- /tmp/wklog.110.old.11193 2010-03-29 14:09:27.000000000 +0000
+++ /tmp/wklog.110.new.11193 2010-03-29 14:09:27.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Complete
-=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=-
Low Level Design modified.
--- /tmp/wklog.110.old.11745 2010-03-29 06:46:30.000000000 +0000
+++ /tmp/wklog.110.new.11745 2010-03-29 06:46:30.000000000 +0000
@@ -1 +1,8 @@
+For now, all changes will be in select_describe():
+- In the for-each-join-table loop, when we've reached a line where we would
+ have printed "[Start ]Materialize;" in Extra column, remember the table
+ number, and emit a materialized table access line instead
+- After the loop, do another loop over remembered materialization nests and
+ print them (a possible difficulty: do we remember what select# they are
+ from?)
-=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=-
High-Level Specification modified.
--- /tmp/wklog.110.old.11654 2010-03-29 06:46:19.000000000 +0000
+++ /tmp/wklog.110.new.11654 2010-03-29 06:46:19.000000000 +0000
@@ -1 +1,15 @@
+Materialized table access line will look as follows:
+Table name
+----------
+* Table name will be "SUBQUERY#%d" where %d will refer to the id of first
+ select in the subquery (when the subquery is a UNION it would be better
+ to refe to the union-operation line but it has id=NULL so it's not easy
+ to refer to it)
+
+Access method
+-------------
+* SJ-Materialization-lookup will have eq_ref on 'distinct_key'
+* SJ-Materialization-scan will have access method ALL, with #rows being
+ expected number of records in the temp table (i.e. after duplicates are
+ removed)
-=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=-
Category updated.
--- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000
+++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=-
Version updated.
--- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000
+++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000
@@ -1 +1 @@
-Benchmarks-3.0
+9.x
-=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=-
Version updated.
--- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000
+++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000
@@ -1 +1 @@
-9.x
+Server-5.3
DESCRIPTION:
At the moment, SJM-Materialization is shown in EXPLAIN output in this way:
MariaDB [j45]> explain select * from ot where a in (select b from it1);
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10
| |
| 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10
| Materialize |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2);
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10
| |
| 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10
| Start materialize |
| 1 | PRIMARY | it2 | ALL | NULL | NULL | NULL | NULL | 10
| End materialize; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
This WL task is to change the output format so it will look as follows:
- Tables inside the SJM-nest are displayed as a separate select
- within the master select, there is a line that denotes SJM-lookup or
SJM-Scan.
The above examples will look as follows:
MariaDB [j45]> explain select * from ot where a in (select b from it1);
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+
| 1 | PRIMARY | ot | ALL | NULL | NULL | NULL
| NULL | 10 | |
| 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5
| j45.ot.a | 1 | |
| 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL
| NULL | 10 | |
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+
MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2);
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+
| 1 | PRIMARY | ot | ALL | NULL | NULL | NULL
| NULL | 10 | |
| 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5
| j45.ot.a | 1 | |
| 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL
| NULL | 10 | |
| 2 | SUBQUERY | it2 | ALL | NULL | NULL | NULL
| NULL | 10 | Using join buffer |
+----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+
The rationale behind the change is:
- Unification of EXPLAIN output with MWL#90
- The new format is more natural representation of what is going on,
conceptually-wise (and may be soon be code-wise)
- The new format allows to display E(#records-in-temp-table) for the
SJM-Scan case (and for SJM-lookup that number doesn't matter that much)
- The new format doesn't put anything into "Extra" column and that's good
because that column is already overloaded and horizontal screen space is
precious (while vertical is not so much).
HIGH-LEVEL SPECIFICATION:
Materialized table access line will look as follows:
Table name
----------
* Table name will be "SUBQUERY#%d" where %d will refer to the id of first
select in the subquery (when the subquery is a UNION it would be better
to refe to the union-operation line but it has id=NULL so it's not easy
to refer to it)
Access method
-------------
* SJ-Materialization-lookup will have eq_ref on 'distinct_key'
* SJ-Materialization-scan will have access method ALL, with #rows being
expected number of records in the temp table (i.e. after duplicates are
removed)
LOW-LEVEL DESIGN:
For now, all changes will be in select_describe():
- In the for-each-join-table loop, when we've reached a line where we would
have printed "[Start ]Materialize;" in Extra column, remember the table
number, and emit a materialized table access line instead
- After the loop, do another loop over remembered materialization nests and
print them (a possible difficulty: do we remember what select# they are
from?)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0