Debugging with optimizer_switch='materialization=on', query: explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'forest%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1994-01-01') and l_shipdate < date('1994-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'canada' order by s_name limit 10; +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ |id |select_type |table |type ||key |key_len|ref |rows|filtered|Extra | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25| 100.00|Using where; Using temporary; Using filesort| | 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 180| 100.00| | | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| 100.00| | | 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387| 100.00|Using where; Using index | | 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1| 100.00|Using where | | 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4| 100.00|Using where | +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+ ********* SEMI-JOIN pre-optimization ************** ===== PRE-1. part === (gdb) p position->records_read $564 = 2387 (gdb) p position->read_time $565 = 497.33246183668518 (gdb) p current_record_count / TIME_FOR_COMPARE $566 = 477.39999999999998 (gdb) p current_read_time $567 = 974.73246183668516 ===== PRE-2. partsupp === (gdb) p position->records_read $570 = 1 (gdb) p position->read_time $571 = 2390.4263969377266 (gdb) p current_record_count / TIME_FOR_COMPARE $572 = 477.39999999999998 (gdb) p current_read_time $573 = 3842.5588587744119 ## (part, partsupp) is the best plan to materialize the subquery. (gdb) p join->best_read $575 = 6229.5578587744112 (gdb) p join->record_count $576 = 2387 ## Extra expenses for semi-join (gdb) print lookup_cost $582 = 0.050000000000000003 (gdb) print write_cost $584 = 0.050000000000000003 ********* MAIN join optimization ************** ===== 1. nation === (gdb) p position->records_read $587 = 25 (gdb) p position->read_time $588 = 1 (gdb) p current_record_count / TIME_FOR_COMPARE $589 = 5 (gdb) p current_read_time $590 = 6 ===== 2. supplier === (gdb) p position->records_read $593 = 180 (gdb) p position->read_time $594 = 4525 (gdb) p current_record_count / TIME_FOR_COMPARE $595 = 900 (gdb) p current_read_time $596 = 5431 ## here it considers "part, partsupp" extension, with huge cost: ## I don't skip it: === 3.A part === (gdb) p position->records_read $599 = 2387 (gdb) p position->read_time $600 = 2237996.0782650835 (gdb) p current_record_count / TIME_FOR_COMPARE $601 = 2148300 (gdb) p current_read_time $602 = 4391727.078265084 === 4.A partsupp == (gdb) p position->records_read $605 = 1 (gdb) p position->read_time $606 = 10741500 (gdb) p current_record_count / TIME_FOR_COMPARE $607 = 2148300 (gdb) p current_read_time $608 = 17281527.078265086
advance_sj_state Firstmatch_picker::check_qep ### Picks first match with (gdb) print read_time $615 = 12984927.078265084 (gdb) print rec_count $616 = 4500 < Firstmatch_picker::check_qep
Sj_materialization_picker::check_qep (gdb) p prefix_cost.total_cost() $622 = 5431 # OK, same as $596
(gdb) p mat_info->materialization_cost.total_cost() $623 = 3007.1088587744121 ## This one is weird. It should be $575 + $582 + $584= 6229.6578587744098 ## I will investigate. ## The difference is not millions, but a ~3K. (gdb) print prefix_rec_count $625 = 4500 (gdb) p mat_info->lookup_cost.total_cost() $626 = 0.050000000000000003 < Sj_materialization_picker::check_qep ## Materialization is chosen as the best strategy with (gdb) print current_record_count $634 = 4500 (gdb) print current_read_time $635 = 8663.1088587744125 < advance_sj_state === 5. Finally === We hit these lines: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We have to make a temp table */ current_read_time+= current_record_count; and get the final query plan cost of (gdb) p join->record_count $643 = 4500 (gdb) p join->best_read $644 = 13163.107858774412 #records is the same for plan in this email and in the previous. The cost is very different, however, it is 13K vs 360K. Now, when we have the breakdown of the numbers, we can evaluate what did the optimizer get wrong. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog