[Maria-developers] Cost analysis: Materialization plan
Debugging with optimizer_switch='materialization=on', query: explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'forest%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1994-01-01') and l_shipdate < date('1994-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'canada' order by s_name limit 10; +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ |id |select_type |table |type ||key |key_len|ref |rows|filtered|Extra | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25| 100.00|Using where; Using temporary; Using filesort| | 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 180| 100.00| | | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| 100.00| | | 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387| 100.00|Using where; Using index | | 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1| 100.00|Using where | | 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4| 100.00|Using where | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ ********* SEMI-JOIN pre-optimization ************** ===== PRE-1. part === (gdb) p position->records_read $564 = 2387 (gdb) p position->read_time $565 = 497.33246183668518 (gdb) p current_record_count / TIME_FOR_COMPARE $566 = 477.39999999999998 (gdb) p current_read_time $567 = 974.73246183668516 ===== PRE-2. partsupp === (gdb) p position->records_read $570 = 1 (gdb) p position->read_time $571 = 2390.4263969377266 (gdb) p current_record_count / TIME_FOR_COMPARE $572 = 477.39999999999998 (gdb) p current_read_time $573 = 3842.5588587744119 ## (part, partsupp) is the best plan to materialize the subquery. (gdb) p join->best_read $575 = 6229.5578587744112 (gdb) p join->record_count $576 = 2387 ## Extra expenses for semi-join (gdb) print lookup_cost $582 = 0.050000000000000003 (gdb) print write_cost $584 = 0.050000000000000003 ********* MAIN join optimization ************** ===== 1. nation === (gdb) p position->records_read $587 = 25 (gdb) p position->read_time $588 = 1 (gdb) p current_record_count / TIME_FOR_COMPARE $589 = 5 (gdb) p current_read_time $590 = 6 ===== 2. supplier === (gdb) p position->records_read $593 = 180 (gdb) p position->read_time $594 = 4525 (gdb) p current_record_count / TIME_FOR_COMPARE $595 = 900 (gdb) p current_read_time $596 = 5431 ## here it considers "part, partsupp" extension, with huge cost: ## I don't skip it: === 3.A part === (gdb) p position->records_read $599 = 2387 (gdb) p position->read_time $600 = 2237996.0782650835 (gdb) p current_record_count / TIME_FOR_COMPARE $601 = 2148300 (gdb) p current_read_time $602 = 4391727.078265084 === 4.A partsupp == (gdb) p position->records_read $605 = 1 (gdb) p position->read_time $606 = 10741500 (gdb) p current_record_count / TIME_FOR_COMPARE $607 = 2148300 (gdb) p current_read_time $608 = 17281527.078265086
advance_sj_state Firstmatch_picker::check_qep ### Picks first match with (gdb) print read_time $615 = 12984927.078265084 (gdb) print rec_count $616 = 4500 < Firstmatch_picker::check_qep
Sj_materialization_picker::check_qep (gdb) p prefix_cost.total_cost() $622 = 5431 # OK, same as $596
(gdb) p mat_info->materialization_cost.total_cost() $623 = 3007.1088587744121 ## This one is weird. It should be $575 + $582 + $584= 6229.6578587744098 ## I will investigate. ## The difference is not millions, but a ~3K. (gdb) print prefix_rec_count $625 = 4500 (gdb) p mat_info->lookup_cost.total_cost() $626 = 0.050000000000000003 < Sj_materialization_picker::check_qep ## Materialization is chosen as the best strategy with (gdb) print current_record_count $634 = 4500 (gdb) print current_read_time $635 = 8663.1088587744125 < advance_sj_state === 5. Finally === We hit these lines: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We have to make a temp table */ current_read_time+= current_record_count; and get the final query plan cost of (gdb) p join->record_count $643 = 4500 (gdb) p join->best_read $644 = 13163.107858774412 #records is the same for plan in this email and in the previous. The cost is very different, however, it is 13K vs 360K. Now, when we have the breakdown of the numbers, we can evaluate what did the optimizer get wrong. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
On Wed, Aug 08, 2012 at 07:06:39PM +0400, Sergei Petrunia wrote:
Debugging with optimizer_switch='materialization=on', query:
explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'forest%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1994-01-01') and l_shipdate < date('1994-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'canada' order by s_name limit 10; +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ |id |select_type |table |type ||key |key_len|ref |rows|filtered|Extra | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25| 100.00|Using where; Using temporary; Using filesort| | 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 180| 100.00| | | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| 100.00| | | 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387| 100.00|Using where; Using index | | 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1| 100.00|Using where | | 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4| 100.00|Using where | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
- The join order in SJ-Materialization nest is (part, partsupp). - scalar subquery is correlated with partsupp, so it can only be attached to partsupp - join order of (partsupp, part) is much more expensive so we can be certain it won't be used when we start taking scalar subquery costs into account. Cost of scalar subquery execution (in choose_plan()): (gdb) p join->best_read $144 = 4.5989999999999993 We estimate that it is run 2378*1=2378 times (NOTE: in reality, it is 9555 times, see "optimizer statistics vs real dataset" email, which also explains the reason of discrepansy: ref access on table partsupp has estimate rows=1, while it is actually 4) Added subquery cost is: $SCALAR_SUBQ_COST= 4.6 * 2378 =10398.8 Now, let's walk down and add it, where appropriate:
********* SEMI-JOIN pre-optimization **************
===== PRE-1. part === (gdb) p position->records_read $564 = 2387 (gdb) p position->read_time $565 = 497.33246183668518 (gdb) p current_record_count / TIME_FOR_COMPARE $566 = 477.39999999999998 (gdb) p current_read_time $567 = 974.73246183668516
===== PRE-2. partsupp ===
(gdb) p position->records_read $570 = 1 (gdb) p position->read_time $571 = 2390.4263969377266 (gdb) p current_record_count / TIME_FOR_COMPARE $572 = 477.39999999999998 (gdb) p current_read_time $573 = 3842.5588587744119
## (part, partsupp) is the best plan to materialize the subquery. (gdb) p join->best_read $575 = 6229.5578587744112 (gdb) p join->record_count $576 = 2387
## Extra expenses for semi-join (gdb) print lookup_cost $582 = 0.050000000000000003 (gdb) print write_cost $584 = 0.050000000000000003
#### Here $SCALAR_SUBQ_COST will be added.
********* MAIN join optimization ************** ===== 1. nation === (gdb) p position->records_read $587 = 25 (gdb) p position->read_time $588 = 1 (gdb) p current_record_count / TIME_FOR_COMPARE $589 = 5 (gdb) p current_read_time $590 = 6
===== 2. supplier === (gdb) p position->records_read $593 = 180 (gdb) p position->read_time $594 = 4525 (gdb) p current_record_count / TIME_FOR_COMPARE $595 = 900 (gdb) p current_read_time $596 = 5431
## here it considers "part, partsupp" extension, with huge cost: ## I don't skip it:
=== 3.A part === (gdb) p position->records_read $599 = 2387 (gdb) p position->read_time $600 = 2237996.0782650835 (gdb) p current_record_count / TIME_FOR_COMPARE $601 = 2148300 (gdb) p current_read_time $602 = 4391727.078265084
=== 4.A partsupp == (gdb) p position->records_read $605 = 1 (gdb) p position->read_time $606 = 10741500 (gdb) p current_record_count / TIME_FOR_COMPARE $607 = 2148300 (gdb) p current_read_time $608 = 17281527.078265086
advance_sj_state Firstmatch_picker::check_qep ### Picks first match with (gdb) print read_time $615 = 12984927.078265084 (gdb) print rec_count $616 = 4500 < Firstmatch_picker::check_qep
Sj_materialization_picker::check_qep (gdb) p prefix_cost.total_cost() $622 = 5431 # OK, same as $596
(gdb) p mat_info->materialization_cost.total_cost() $623 = 3007.1088587744121 ## This one is weird. It should be $575 + $582 + $584= 6229.6578587744098 ## I will investigate. ## The difference is not millions, but a ~3K.
(gdb) print prefix_rec_count $625 = 4500 (gdb) p mat_info->lookup_cost.total_cost() $626 = 0.050000000000000003 < Sj_materialization_picker::check_qep
## Materialization is chosen as the best strategy with
(gdb) print current_record_count $634 = 4500 (gdb) print current_read_time $635 = 8663.1088587744125
Here $SCALAR_SUBQ_COST will be added.
< advance_sj_state
=== 5. Finally === We hit these lines: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We have to make a temp table */ current_read_time+= current_record_count; and get the final query plan cost of
(gdb) p join->record_count $643 = 4500 (gdb) p join->best_read $644 = 13163.107858774412
Here we will see the added $SCALAR_SUBQ_COST, which will give the final cost of: 13163.1 + 10938.7 = 24101.8 BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
I've found an error: emails with subjects - Cost analysis: Materialization plan+scalar subquery cost, prediction (this one) - Cost analysis: FirstMatch plan+scalar subquery cost, prediction were made as replies to the out-of-date analysis, where we didn't have index on nation.n_name, and thus had #rows == 25 for table nation. We'll need to re-do the prediction. On Thu, Aug 09, 2012 at 03:27:24PM +0400, Sergei Petrunia wrote:
On Wed, Aug 08, 2012 at 07:06:39PM +0400, Sergei Petrunia wrote:
Debugging with optimizer_switch='materialization=on', query:
explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'forest%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1994-01-01') and l_shipdate < date('1994-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'canada' order by s_name limit 10; +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ |id |select_type |table |type ||key |key_len|ref |rows|filtered|Extra | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25| 100.00|Using where; Using temporary; Using filesort| | 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 180| 100.00| | | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| 100.00| | | 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387| 100.00|Using where; Using index | | 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1| 100.00|Using where | | 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4| 100.00|Using where | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
- The join order in SJ-Materialization nest is (part, partsupp). - scalar subquery is correlated with partsupp, so it can only be attached to partsupp - join order of (partsupp, part) is much more expensive so we can be certain it won't be used when we start taking scalar subquery costs into account.
Cost of scalar subquery execution (in choose_plan()): (gdb) p join->best_read $144 = 4.5989999999999993
We estimate that it is run 2378*1=2378 times (NOTE: in reality, it is 9555 times, see "optimizer statistics vs real dataset" email, which also explains the reason of discrepansy: ref access on table partsupp has estimate rows=1, while it is actually 4)
Added subquery cost is: $SCALAR_SUBQ_COST= 4.6 * 2378 =10398.8
Now, let's walk down and add it, where appropriate:
********* SEMI-JOIN pre-optimization **************
===== PRE-1. part === (gdb) p position->records_read $564 = 2387 (gdb) p position->read_time $565 = 497.33246183668518 (gdb) p current_record_count / TIME_FOR_COMPARE $566 = 477.39999999999998 (gdb) p current_read_time $567 = 974.73246183668516
===== PRE-2. partsupp ===
(gdb) p position->records_read $570 = 1 (gdb) p position->read_time $571 = 2390.4263969377266 (gdb) p current_record_count / TIME_FOR_COMPARE $572 = 477.39999999999998 (gdb) p current_read_time $573 = 3842.5588587744119
## (part, partsupp) is the best plan to materialize the subquery. (gdb) p join->best_read $575 = 6229.5578587744112 (gdb) p join->record_count $576 = 2387
## Extra expenses for semi-join (gdb) print lookup_cost $582 = 0.050000000000000003 (gdb) print write_cost $584 = 0.050000000000000003
#### Here $SCALAR_SUBQ_COST will be added.
********* MAIN join optimization ************** ===== 1. nation === (gdb) p position->records_read $587 = 25 (gdb) p position->read_time $588 = 1 (gdb) p current_record_count / TIME_FOR_COMPARE $589 = 5 (gdb) p current_read_time $590 = 6
===== 2. supplier === (gdb) p position->records_read $593 = 180 (gdb) p position->read_time $594 = 4525 (gdb) p current_record_count / TIME_FOR_COMPARE $595 = 900 (gdb) p current_read_time $596 = 5431
## here it considers "part, partsupp" extension, with huge cost: ## I don't skip it:
=== 3.A part === (gdb) p position->records_read $599 = 2387 (gdb) p position->read_time $600 = 2237996.0782650835 (gdb) p current_record_count / TIME_FOR_COMPARE $601 = 2148300 (gdb) p current_read_time $602 = 4391727.078265084
=== 4.A partsupp == (gdb) p position->records_read $605 = 1 (gdb) p position->read_time $606 = 10741500 (gdb) p current_record_count / TIME_FOR_COMPARE $607 = 2148300 (gdb) p current_read_time $608 = 17281527.078265086
advance_sj_state Firstmatch_picker::check_qep ### Picks first match with (gdb) print read_time $615 = 12984927.078265084 (gdb) print rec_count $616 = 4500 < Firstmatch_picker::check_qep
Sj_materialization_picker::check_qep (gdb) p prefix_cost.total_cost() $622 = 5431 # OK, same as $596
(gdb) p mat_info->materialization_cost.total_cost() $623 = 3007.1088587744121 ## This one is weird. It should be $575 + $582 + $584= 6229.6578587744098 ## I will investigate. ## The difference is not millions, but a ~3K.
(gdb) print prefix_rec_count $625 = 4500 (gdb) p mat_info->lookup_cost.total_cost() $626 = 0.050000000000000003 < Sj_materialization_picker::check_qep
## Materialization is chosen as the best strategy with
(gdb) print current_record_count $634 = 4500 (gdb) print current_read_time $635 = 8663.1088587744125
Here $SCALAR_SUBQ_COST will be added.
< advance_sj_state
=== 5. Finally === We hit these lines: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We have to make a temp table */ current_read_time+= current_record_count; and get the final query plan cost of
(gdb) p join->record_count $643 = 4500 (gdb) p join->best_read $644 = 13163.107858774412
Here we will see the added $SCALAR_SUBQ_COST, which will give the final cost of:
13163.1 + 10938.7 = 24101.8
BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
-- BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
participants (1)
-
Sergei Petrunia