Let's see if the optimizer is theoretically able to attach the subquery to the right table for the FirstMatch plan: On Thu, Aug 09, 2012 at 12:39:00PM +0400, Sergei Petrunia wrote:
== Modified query == ...
== Benchmarking ==
Number of times scalar-subquery was executed: Materialization: 9552 times (same as before) First-Match: 299 times (A LOT LESS!)
Execution times: Materialization: 5.30 sec (same as before) First-Match: 1.72 sec (a lot less, used to be 15 sec! Now it beats materialization)
Let's check whether the optimizer has sufficient input data to conclude that moving the scalar-subquery from table partsupp to table part is advantageous. The data about extra selectivity is basically tab->quick_condition_rows. Let's take the EXPLAIN and its analysis from the "Cost analysis: optimizer statistics vs real dataset properties" email. Relevant lines will start with 'NOTE>'. +--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+ |id|select_type |table |type ||key |key_len|ref |rows|filtered|Extra | +--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+ | 1|PRIMARY |nation |ref ||n_name |26 |const | 1| 100.00|Using where; Using index; Using temporary; Using filesort| | 1|PRIMARY |supplier|ref ||i_s_nationkey |5 |nation.n_nationkey | 251| 100.00| | | 1|PRIMARY |partsupp|ref ||i_ps_suppkey |4 |supplier.s_suppkey | 34| 100.00|Using where | | 1|PRIMARY |part |eq_ref||PRIMARY |4 |partsupp.ps_partkey | 1| 100.00|Using where; FirstMatch(supplier) | | 4|DEP. SUBQUERY|lineitem|ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 3| 100.00|Using where | +--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+ 5 rows in set, 3 warnings (0.01 sec) === nation: === like above, 1 row. OK. NOTE> - "using where" won't filter anything out. === supplier: === see above, 251 rows, ~ok 412 real... NOTE> - "using where" won't filter anything out. === partsupp: === - 800K rows and 10K distinct ps_suppkey, which gives rec_per_key=80 (EXPLAIN shows 34) - as for data that we will hit: select count(*) from nation, supplier, partsupp where s_suppkey=ps_suppkey and s_nationkey=n_nationkey and n_name='canada'; gives 32690 rows. 32690 / 412 = 80, matches rec_per_key. NOTE> "using where" won't filter anything out (NOT TAKING the scalar-subquery into account) === part: === - eq_ref, so exactly 1 match. We know DBT-3 dataset is such that it always has one. NOTE> "using where" is "p_name like forest"! it will filter stuff out! I haven't checked, but I suppose that condition "p_name like forest%'(*) is not correlated with the table access condition, 'p_partkey=partsupp_ps_partkey'. The email 'Cost analysis: Materialization plan' shows a range access on (*), with the estimate rows=2378. Total number of rows in table part is is 200K, InnoDB's estimate is 200999. This gives selectivity of 0.0118. === How many times scalar-subquery will be evaluated === Let's take estimate numbers: 1 * 251 * 34 * 1 * 0.0118 = 100.7012 evaluations. If the estimates were perfect: 1 * 400 * 80 * 1 * 0.118 = 377.6 evaluations As mentioned at the top of this email, in reality subquery is evaluated 299 times. === Conclusion#1 === quick_condition estimates provide information that will allow the optimizer to make a correct conclusion about # of times that scalar-subquery will be evaluated after table part. What remains to be checked: check that our cost formulas will make the plan of { FirstMatch, scalar-subquery attached to table part} cheaper than { FirstMatch, scalar-subquery attached to table partsupp} and cheaper than the SJ-Materialization plan. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog