Re: [Maria-developers] Question about Query plan

14 May 2012


      Hi Sergei,

Thanks a lot for your answer, this was indeed extremely helpful!
...
- Do you really records_in_range() implementation like in (*)
Yes.
...
- Are estimates of 10 and 20 rows close to reality?
Not at all... The thing is: we can implement records_in_range not as a
function with constant complexity and since a full table scan is not
faster than a index range lookup in our storage engine (even if the
whole table is contained in the range), we wanted to force MySQL to
make always an index lookup. If I understand everything correctly, my
error in thinking was, that I thought MySQL uses the information
returned by records_in_range only to chose between index lookup and
fulltable scan (since on a traditional disk architecture a full table
scan will be faster than an index scan which scans the whole table).
But this is obviously not the case.
So we will have to implemented this records_in_range method. Do I
understand this correct, that a logarithmic (with respect to the table
size) run time will be ok for this method (InnoDB seems to walk two
times through the index if I get the comments in the code correct)?
Thanks again and best regards

Markus

On Mon, May 14, 2012 at 11:20 AM, Sergei Petrunia <psergey@askmonty.org> wrote:
...
On Mon, May 14, 2012 at 08:40:59AM +0200, Markus Pilman wrote:
...
Hello all,
At ETH Zurich we are working on a new storage engine, that allows us
to test several new architectures for transactional databases. So far
we worked with MySQL, but we had massive performance issues. After
some investigation we figured out, that MySQL generates different
query plans for InnoDB than for our engine. One query which killed our
performance was the following (this is a query from the TPC-W
benchmark):
SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol
FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY
o_date DESC LIMIT 10000) AS t
WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id
AND ol2.ol_i_id <> 10
GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5
MySQL generated the following plan for InnoDB:
+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
| id | select_type | table      | type  | possible_keys
 | key            | key_len | ref             | rows  | Extra
                 |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL
 | NULL           | NULL    | NULL            | 10000 | Using
temporary; Using filesort |
|  1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       | t.o_id
    |     1 | Using where                     |
|  1 | PRIMARY     | ol2        | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       |
tpcw.ol.OL_O_ID |     1 | Using where                     |
|  2 | DERIVED     | orders     | index | NULL
 | orders_o_date  | 4       | NULL            | 10000 | Using index
                 |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
while it generated the following one for our storage engine:
 +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table      | type  | possible_keys
 | key            | key_len | ref   | rows  | Extra
                    |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
|  1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | const |
10 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | ol2        | range |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | NULL  |
20 | Using where; Using join buffer               |
|  1 | PRIMARY     | <derived2> | ALL   | NULL
 | NULL           | NULL    | NULL  | 10000 | Using where; Using join
buffer               |
|  2 | DERIVED     | orders     | index | NULL
 | orders_o_date  | 4       | NULL  | 10000 |
                    |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
It's difficult to make guesses just on EXPLAINs, but I will try: we see
{table=ol, rows=10},  {table=ol2, rows=20} -- this makes me suspect that
your storage engine has this function (I recall some skeleton storage engine
implementation had this):
 ha_rows  records_in_range(...)  { return 10; }    (*)
The WHERE clause has:
- ol.ol_i_id = 10  -- this producess ref acess on table `ol`, and the estimate
 of 10 rows comes from records_in_range() call.
- ol2.ol_i_id <> 10 -- the optimizer converts this to
  (-inf < ol2.ol_i_id <10) OR ( 10< ol2.ol_i_id < +inf)
 which is two ranges and we get an estimate of 20 rows.
Questions:
- Do you really records_in_range() implementation like in (*)
- Are estimates of 10 and 20 rows close to reality?
...
The second one is obviously a very bad one. So we decided to try with
MariaDB, which generates the following query plan:
+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
| id   | select_type | table      | type  | possible_keys
   | key            | key_len | ref             | rows  | Extra
                   |
+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
|    1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | const
    |    10 | Using temporary; Using filesort |
|    1 | PRIMARY     | ol2        | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       |
test.ol.OL_O_ID |    11 | Using where                     |
|    1 | PRIMARY     | <derived2> | ref   | key0
   | key0           | 8       | test.ol.OL_O_ID |    10 |
                   |
|    2 | DERIVED     | orders     | index | NULL
   | orders_o_date  | 4       | NULL            | 10000 |
                   |
+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
table=<derived2>, key=key0 - apparently, this is MariaDB 5.3/5.5, and it's
using "derived table with keys"
(http://kb.askmonty.org/en/derived-table-with-key-optimization) optimization.
...
The query plan from MariaDB looks sane to me, and the numbers approve
this (the query runs on a middle sized data set about 200 times faster
with MariaDB than with MySQL). So we will continue our work with
MariaDB. But I have a question to these query plans: why are we
getting this differences in MySQL between our storage engine and
InnoDB? Is there a feature in our storage engine missing (we first
thought we need the ability to support HA_KEYREAD_ONLY - but
implementing this feature did not change the query plan)? Or does
MySQL some kind of "cheating"? We should understand this issue to be
able to present our results we get later (may be we will compare
MariaDB and MySQL, but in a paper we would have to explain why MySQL
sucks that much).
And btw: good work with MariaDB!! The optimizer seems to do a much
better job than MySQL - even with InnoDB/XtraDB (we had to rewrite
some queries in MySQL to force it to generate sane query plans - with
MariaDB this does not seem to be necessary anymore).
BR
 Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog