- developers - lists.mariadb.org

[Maria-developers] New (by Psergey): Make EXPLAIN show where subquery predicates are in the WHERE clause (111)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make EXPLAIN show where subquery predicates are in the WHERE clause CREATION DATE..: Mon, 29 Mar 2010, 07:07 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 111 (http://askmonty.org/worklog/?tid=111) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Current EXPLAIN does not show where the subquery predicate is attached to. For example, see [1], slide#5 "Straightforward subquery evaluation (contd)", or look here: MariaDB [test]> explain select * from ot1, ot2 where ot1.a=ot2.a and (ot2.a in (select it1.b from it1) or ot1.b<3); +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ | 1 | PRIMARY | ot1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | PRIMARY | ot2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 20 | | +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ 3 rows in set (0.01 sec) Here there are two "Using where", and it is not clear where the predicate is attached to. If one has sufficient knowledge, they could know that - "ot2.a in (select )" will be substituted for "ot1.a" (provided datatypes allow equality propagation) - For correlated subqueries, equality propagation will not affect outside references (so, if the subquery in the above example was correlated, it would have been attached to table ot2, not ot1). As one can see, the rules are quite complicated. The full solution would be to show expressions behind "Using Where", but that has additional complications due to expressions being too long for current EXPLAIN output format. A simplified solution would be to show "Subquery#n" in Extra column if the clause in "Using Where" has a subquery. [1] http://forge.mysql.com/wiki/Image:NewSubqueryOptimizationsIn6_0_UC2008.pdf ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Make EXPLAIN show where subquery predicates are in the WHERE clause (111)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make EXPLAIN show where subquery predicates are in the WHERE clause CREATION DATE..: Mon, 29 Mar 2010, 07:07 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 111 (http://askmonty.org/worklog/?tid=111) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Current EXPLAIN does not show where the subquery predicate is attached to. For example, see [1], slide#5 "Straightforward subquery evaluation (contd)", or look here: MariaDB [test]> explain select * from ot1, ot2 where ot1.a=ot2.a and (ot2.a in (select it1.b from it1) or ot1.b<3); +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ | 1 | PRIMARY | ot1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | PRIMARY | ot2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 20 | | +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+ 3 rows in set (0.01 sec) Here there are two "Using where", and it is not clear where the predicate is attached to. If one has sufficient knowledge, they could know that - "ot2.a in (select )" will be substituted for "ot1.a" (provided datatypes allow equality propagation) - For correlated subqueries, equality propagation will not affect outside references (so, if the subquery in the above example was correlated, it would have been attached to table ot2, not ot1). As one can see, the rules are quite complicated. The full solution would be to show expressions behind "Using Where", but that has additional complications due to expressions being too long for current EXPLAIN output format. A simplified solution would be to show "Subquery#n" in Extra column if the clause in "Using Where" has a subquery. [1] http://forge.mysql.com/wiki/Image:NewSubqueryOptimizationsIn6_0_UC2008.pdf ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make EXPLAIN always show materialization separately CREATION DATE..: Mon, 29 Mar 2010, 06:45 SUPERVISOR.....: Igor IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 110 (http://askmonty.org/worklog/?tid=110) VERSION........: Server-5.3 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Low Level Design modified. --- /tmp/wklog.110.old.11745 2010-03-29 06:46:30.000000000 +0000 +++ /tmp/wklog.110.new.11745 2010-03-29 06:46:30.000000000 +0000 @@ -1 +1,8 @@ +For now, all changes will be in select_describe(): +- In the for-each-join-table loop, when we've reached a line where we would + have printed "[Start ]Materialize;" in Extra column, remember the table + number, and emit a materialized table access line instead +- After the loop, do another loop over remembered materialization nests and + print them (a possible difficulty: do we remember what select# they are + from?) -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- High-Level Specification modified. --- /tmp/wklog.110.old.11654 2010-03-29 06:46:19.000000000 +0000 +++ /tmp/wklog.110.new.11654 2010-03-29 06:46:19.000000000 +0000 @@ -1 +1,15 @@ +Materialized table access line will look as follows: +Table name +---------- +* Table name will be "SUBQUERY#%d" where %d will refer to the id of first + select in the subquery (when the subquery is a UNION it would be better + to refe to the union-operation line but it has id=NULL so it's not easy + to refer to it) + +Access method +------------- +* SJ-Materialization-lookup will have eq_ref on 'distinct_key' +* SJ-Materialization-scan will have access method ALL, with #rows being + expected number of records in the temp table (i.e. after duplicates are + removed) -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Category updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -Client-BackLog +Server-RawIdeaBin -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Version updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -Benchmarks-3.0 +9.x -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Version updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -9.x +Server-5.3 DESCRIPTION: At the moment, SJM-Materialization is shown in EXPLAIN output in this way: MariaDB [j45]> explain select * from ot where a in (select b from it1); +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | Materialize | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2); +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | Start materialize | | 1 | PRIMARY | it2 | ALL | NULL | NULL | NULL | NULL | 10 | End materialize; Using join buffer | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ This WL task is to change the output format so it will look as follows: - Tables inside the SJM-nest are displayed as a separate select - within the master select, there is a line that denotes SJM-lookup or SJM-Scan. The above examples will look as follows: MariaDB [j45]> explain select * from ot where a in (select b from it1); +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5 | j45.ot.a | 1 | | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2); +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5 | j45.ot.a | 1 | | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | SUBQUERY | it2 | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ The rationale behind the change is: - Unification of EXPLAIN output with MWL#90 - The new format is more natural representation of what is going on, conceptually-wise (and may be soon be code-wise) - The new format allows to display E(#records-in-temp-table) for the SJM-Scan case (and for SJM-lookup that number doesn't matter that much) - The new format doesn't put anything into "Extra" column and that's good because that column is already overloaded and horizontal screen space is precious (while vertical is not so much). HIGH-LEVEL SPECIFICATION: Materialized table access line will look as follows: Table name ---------- * Table name will be "SUBQUERY#%d" where %d will refer to the id of first select in the subquery (when the subquery is a UNION it would be better to refe to the union-operation line but it has id=NULL so it's not easy to refer to it) Access method ------------- * SJ-Materialization-lookup will have eq_ref on 'distinct_key' * SJ-Materialization-scan will have access method ALL, with #rows being expected number of records in the temp table (i.e. after duplicates are removed) LOW-LEVEL DESIGN: For now, all changes will be in select_describe(): - In the for-each-join-table loop, when we've reached a line where we would have printed "[Start ]Materialize;" in Extra column, remember the table number, and emit a materialized table access line instead - After the loop, do another loop over remembered materialization nests and print them (a possible difficulty: do we remember what select# they are from?) ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make EXPLAIN always show materialization separately CREATION DATE..: Mon, 29 Mar 2010, 06:45 SUPERVISOR.....: Igor IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 110 (http://askmonty.org/worklog/?tid=110) VERSION........: Server-5.3 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Low Level Design modified. --- /tmp/wklog.110.old.11745 2010-03-29 06:46:30.000000000 +0000 +++ /tmp/wklog.110.new.11745 2010-03-29 06:46:30.000000000 +0000 @@ -1 +1,8 @@ +For now, all changes will be in select_describe(): +- In the for-each-join-table loop, when we've reached a line where we would + have printed "[Start ]Materialize;" in Extra column, remember the table + number, and emit a materialized table access line instead +- After the loop, do another loop over remembered materialization nests and + print them (a possible difficulty: do we remember what select# they are + from?) -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- High-Level Specification modified. --- /tmp/wklog.110.old.11654 2010-03-29 06:46:19.000000000 +0000 +++ /tmp/wklog.110.new.11654 2010-03-29 06:46:19.000000000 +0000 @@ -1 +1,15 @@ +Materialized table access line will look as follows: +Table name +---------- +* Table name will be "SUBQUERY#%d" where %d will refer to the id of first + select in the subquery (when the subquery is a UNION it would be better + to refe to the union-operation line but it has id=NULL so it's not easy + to refer to it) + +Access method +------------- +* SJ-Materialization-lookup will have eq_ref on 'distinct_key' +* SJ-Materialization-scan will have access method ALL, with #rows being + expected number of records in the temp table (i.e. after duplicates are + removed) -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Category updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -Client-BackLog +Server-RawIdeaBin -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Version updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -Benchmarks-3.0 +9.x -=-=(Psergey - Mon, 29 Mar 2010, 06:46)=-=- Version updated. --- /tmp/wklog.110.old.11639 2010-03-29 06:46:02.000000000 +0000 +++ /tmp/wklog.110.new.11639 2010-03-29 06:46:02.000000000 +0000 @@ -1 +1 @@ -9.x +Server-5.3 DESCRIPTION: At the moment, SJM-Materialization is shown in EXPLAIN output in this way: MariaDB [j45]> explain select * from ot where a in (select b from it1); +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | Materialize | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2); +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | Start materialize | | 1 | PRIMARY | it2 | ALL | NULL | NULL | NULL | NULL | 10 | End materialize; Using join buffer | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+ This WL task is to change the output format so it will look as follows: - Tables inside the SJM-nest are displayed as a separate select - within the master select, there is a line that denotes SJM-lookup or SJM-Scan. The above examples will look as follows: MariaDB [j45]> explain select * from ot where a in (select b from it1); +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5 | j45.ot.a | 1 | | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------+ MariaDB [j45]> explain select * from ot where a in (select it1.b from it1, it2); +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ | 1 | PRIMARY | ot | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | SUBQUERY#2 | eq_ref | distinct_key | distinct_key | 5 | j45.ot.a | 1 | | | 2 | SUBQUERY | it1 | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | SUBQUERY | it2 | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer | +----+-------------+------------+--------+---------------+--------------+---------+----------+------+-------------------+ The rationale behind the change is: - Unification of EXPLAIN output with MWL#90 - The new format is more natural representation of what is going on, conceptually-wise (and may be soon be code-wise) - The new format allows to display E(#records-in-temp-table) for the SJM-Scan case (and for SJM-lookup that number doesn't matter that much) - The new format doesn't put anything into "Extra" column and that's good because that column is already overloaded and horizontal screen space is precious (while vertical is not so much). HIGH-LEVEL SPECIFICATION: Materialized table access line will look as follows: Table name ---------- * Table name will be "SUBQUERY#%d" where %d will refer to the id of first select in the subquery (when the subquery is a UNION it would be better to refe to the union-operation line but it has id=NULL so it's not easy to refer to it) Access method ------------- * SJ-Materialization-lookup will have eq_ref on 'distinct_key' * SJ-Materialization-scan will have access method ALL, with #rows being expected number of records in the temp table (i.e. after duplicates are removed) LOW-LEVEL DESIGN: For now, all changes will be in select_describe(): - In the for-each-join-table loop, when we've reached a line where we would have printed "[Start ]Materialize;" in Extra column, remember the table number, and emit a materialized table access line instead - After the loop, do another loop over remembered materialization nests and print them (a possible difficulty: do we remember what select# they are from?) ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0

[Maria-developers] Updated (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0

[Maria-developers] New (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0

[Maria-developers] New (by Psergey): Make EXPLAIN always show materialization separately (110)
by worklog-noreply＠askmonty.org 29 Mar '10

29 Mar '10

1 0