On Mon, May 14, 2012 at 11:22:52PM +0300, Timour Katchaounov wrote:
/** + Estimate the number of rows that query execution will read. + + @todo This is a very pessimistic upper bound. Use join selectivity + when available to produce a more realistic number. +*/ + +double JOIN::get_examined_rows() +{ + /* Each constant table examines one row, and the result is at most one row. */ + ha_rows examined_rows= const_tables; + uint i= const_tables; + double prev_fanout; + + if (table_count == const_tables) + return examined_rows; + + examined_rows+= join_tab[i++].get_examined_rows(); + for (; i < table_count ; i++) + { + if (join_tab[i].type == JT_EQ_REF) + prev_fanout= 1; + else + prev_fanout= best_positions[i-1].records_read;
This looks wrong. Declaration of POSITION::records_read has this comment:
/* The "fanout": number of output rows that will be produced (after pushed down selection condition is applied) per each row combination of previous tables. */
note the "PER EACH ROW COMBINATION .." part. I would expect that this function would calculate a product of records_read values.
timour:
In this function we want to estimate how many rows will be *examined*, not produced by each JOIN operator. For the estimate I assume a simple nested loops model, where the JOIN read every row of its right table as many times as many rows there are in the left operand. The partial join that serves as left operand, contains records_read rows. This is the multiplier of the number of rows that will be examined in the right table.
Of course, for each partial join the join condition will filter a subset of these rows.
I agree that blocking algorithms may examine a lot less rows, but it doesn't make sense to have a very tight bound here. We use an upper bound, because it is better to miss some constant optimizations, rather than execute very expensive subqueries here.
I was talking about something much more simpler than that. Consider this example: create table ten (a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k (a int); insert into one_k select A.a + 10*B.a + 100*C.a from ten A, ten B, ten C; create table ti1 ( a int); insert into ti1 select a from ten limit 4; alter table ti1 add b int; create table ti2 (a int primary key, b int); create table ti3 (a int primary key, b int); insert into ti2 select a, a from one_k; insert into ti3 select a, a from one_k; MariaDB [test]> explain select * from ten where 3 in (select ti2.b + ti3.b from ti1, ti2, ti3 where ti2.a=ti1.a and ti3.a=ti1.a) or a <5; +------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+ | 1 | PRIMARY | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 2 | DEPENDENT SUBQUERY | ti1 | ALL | NULL | NULL | NULL | NULL | 4 | Using where | | 2 | DEPENDENT SUBQUERY | ti3 | eq_ref | PRIMARY | PRIMARY | 4 | test.ti1.a | 1 | | | 2 | DEPENDENT SUBQUERY | ti2 | eq_ref | PRIMARY | PRIMARY | 4 | test.ti1.a | 1 | Using where | +------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+ For this example, JOIN::get_examined_rows() produces 6. The number comes from 4 rows examined in table ti1 (correct) 1 row examined in table ti2 (incorrect, should be 4) 1 row examined in table ti3 (incorrect, should be 4) Do you agree that the number of 6 is incorrect and needs to be fixed? BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog