[Maria-developers] Question about Query plan

14 May 2012

      Hello all,

At ETH Zurich we are working on a new storage engine, that allows us
to test several new architectures for transactional databases. So far
we worked with MySQL, but we had massive performance issues. After
some investigation we figured out, that MySQL generates different
query plans for InnoDB than for our engine. One query which killed our
performance was the following (this is a query from the TPC-W
benchmark):

SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol
FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY
o_date DESC LIMIT 10000) AS t
WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id
AND ol2.ol_i_id <> 10
GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5

MySQL generated the following plan for InnoDB:

+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
| id | select_type | table      | type  | possible_keys
 | key            | key_len | ref             | rows  | Extra
                 |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL
 | NULL           | NULL    | NULL            | 10000 | Using
temporary; Using filesort |
|  1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       | t.o_id
    |     1 | Using where                     |
|  1 | PRIMARY     | ol2        | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       |
tpcw.ol.OL_O_ID |     1 | Using where                     |
|  2 | DERIVED     | orders     | index | NULL
 | orders_o_date  | 4       | NULL            | 10000 | Using index
                 |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+

while it generated the following one for our storage engine:

 +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table      | type  | possible_keys
 | key            | key_len | ref   | rows  | Extra
                    |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
|  1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | const |
10 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | ol2        | range |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | NULL  |
20 | Using where; Using join buffer               |
|  1 | PRIMARY     | <derived2> | ALL   | NULL
 | NULL           | NULL    | NULL  | 10000 | Using where; Using join
buffer               |
|  2 | DERIVED     | orders     | index | NULL
 | orders_o_date  | 4       | NULL  | 10000 |
                    |
+----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+

The second one is obviously a very bad one. So we decided to try with
MariaDB, which generates the following query plan:

+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
| id   | select_type | table      | type  | possible_keys
   | key            | key_len | ref             | rows  | Extra
                   |
+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
|    1 | PRIMARY     | ol         | ref   |
orderline_o_id,orderline_i_id | orderline_i_id | 5       | const
    |    10 | Using temporary; Using filesort |
|    1 | PRIMARY     | ol2        | ref   |
orderline_o_id,orderline_i_id | orderline_o_id | 8       |
test.ol.OL_O_ID |    11 | Using where                     |
|    1 | PRIMARY     | <derived2> | ref   | key0
   | key0           | 8       | test.ol.OL_O_ID |    10 |
                   |
|    2 | DERIVED     | orders     | index | NULL
   | orders_o_date  | 4       | NULL            | 10000 |
                   |
+------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+

The query plan from MariaDB looks sane to me, and the numbers approve
this (the query runs on a middle sized data set about 200 times faster
with MariaDB than with MySQL). So we will continue our work with
MariaDB. But I have a question to these query plans: why are we
getting this differences in MySQL between our storage engine and
InnoDB? Is there a feature in our storage engine missing (we first
thought we need the ability to support HA_KEYREAD_ONLY - but
implementing this feature did not change the query plan)? Or does
MySQL some kind of "cheating"? We should understand this issue to be
able to present our results we get later (may be we will compare
MariaDB and MySQL, but in a paper we would have to explain why MySQL
sucks that much).

And btw: good work with MariaDB!! The optimizer seems to do a much
better job than MySQL - even with InnoDB/XtraDB (we had to rewrite
some queries in MySQL to force it to generate sane query plans - with
MariaDB this does not seem to be necessary anymore).

Thanks for your help in advance and best regards

Markus

[Maria-developers] Question about Query plan

Markus Pilman