On Wed, Aug 08, 2012 at 07:04:39PM +0400, Sergei Petrunia wrote:
Please find below cost analysis for the plan with FirstMatch strategy.
Debugging with optimizer_switch='semijoin=off', query:
explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'forest%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1994-01-01') and l_shipdate < date('1994-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name limit 10;
+----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+ |id |select_type |table |type ||key |key_len|ef |rows|Extra | +----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+ | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25|Using where; Using temporary; Using filesort| | 1|PRIMARY |supplier|ref ||i_s_nationkey |5 |nation.n_nationkey | 180| | | 1|PRIMARY |partsupp|ref ||i_ps_suppkey |4 |supplier.s_suppkey | 39|Using where | | 1|PRIMARY |part |eq_ref||PRIMARY |4 |partsupp.ps_partkey | 1|Using where; FirstMatch(supplier) | | 4|DEPENDENT SUBQUERY|lineitem|ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4|Using where | +----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+
Cost of scalar subquery execution (in choose_plan()): (gdb) p join->best_read $144 = 4.5989999999999993 From the email "check if optimizer is able to produce a correct E(#scalar_subquery executions)", we know:
Let's take estimate numbers: 1 * 251 * 34 * 1 * 0.0118 = 100.7012 evaluations.
which gives: $OVERALL_SCALAR_SUBQUERY_COST= 100.7 * 4.6= 463.2
===== 1. nation === (gdb) p position->records_read $367 = 25 (gdb) p position->read_time $368 = 1 (gdb) p current_record_count / TIME_FOR_COMPARE $369 = 5 (gdb) p current_read_time $370 = 6
===== 2. supplier ===
(gdb) p position->records_read $377 = 180 (gdb) p position->read_time $378 = 4525 (gdb) p current_record_count / TIME_FOR_COMPARE $379 = 900 (gdb) p current_read_time $380 = 5431
## here it considers "part, partsupp" extension, with cost ## 12,984,927. I skip that, because that's not the optimal plan
===== 3. partsupp ===
(gdb) p position->records_read $411 = 39 (gdb) p position->read_time $412 = 180000 (gdb) p current_record_count / TIME_FOR_COMPARE $413 = 35100 (gdb) p current_read_time $414 = 220531
===== 4. part === (gdb) p position->records_read $421 = 1 (gdb) p position->read_time $422 = 175500 (gdb) p current_record_count / TIME_FOR_COMPARE $423 = 35100 (gdb) p current_read_time $424 = 431131
advance_sj_state() Firstmatch_picker::check_qep > optimize_wo_join_buffering
### It doesn't make any best_access_path() calls here, because no table is ### using join buffering.
### It forgets to add (current_record_count / TIME_FOR_COMPARE), and ends ### up with a slightly lower cost of (gdb) print cost $425 = 360931 > optimize_wo_join_buffering
Firstmatch_picker::check_qep
## FirstMatch is chosen as the way we will use to eliminate duplicates..
## At the end of advance_sj_state, we have:
(gdb) p *current_read_time $426 = 360931 (gdb) p *current_record_count $427 = 4500
advance_sj_state(). Adding $OVERALL_SCALAR_SUBQUERY_COST:
360931 + $OVERALL_SCALAR_SUBQUERY_COST = 361394.20. which is still much greater than the cost of Materialization+$SCALAR_SUBQ_COST, which is 24101.8 (see prev. email). We've got to think what else is wrong here. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog