
Hello all, At ETH Zurich we are working on a new storage engine, that allows us to test several new architectures for transactional databases. So far we worked with MySQL, but we had massive performance issues. After some investigation we figured out, that MySQL generates different query plans for InnoDB than for our engine. One query which killed our performance was the following (this is a query from the TPC-W benchmark): SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY o_date DESC LIMIT 10000) AS t WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id AND ol2.ol_i_id <> 10 GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5 MySQL generated the following plan for InnoDB: +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10000 | Using temporary; Using filesort | | 1 | PRIMARY | ol | ref | orderline_o_id,orderline_i_id | orderline_o_id | 8 | t.o_id | 1 | Using where | | 1 | PRIMARY | ol2 | ref | orderline_o_id,orderline_i_id | orderline_o_id | 8 | tpcw.ol.OL_O_ID | 1 | Using where | | 2 | DERIVED | orders | index | NULL | orders_o_date | 4 | NULL | 10000 | Using index | +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ while it generated the following one for our storage engine: +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+ | 1 | PRIMARY | ol | ref | orderline_o_id,orderline_i_id | orderline_i_id | 5 | const | 10 | Using where; Using temporary; Using filesort | | 1 | PRIMARY | ol2 | range | orderline_o_id,orderline_i_id | orderline_i_id | 5 | NULL | 20 | Using where; Using join buffer | | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10000 | Using where; Using join buffer | | 2 | DERIVED | orders | index | NULL | orders_o_date | 4 | NULL | 10000 | | +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+ The second one is obviously a very bad one. So we decided to try with MariaDB, which generates the following query plan: +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ | 1 | PRIMARY | ol | ref | orderline_o_id,orderline_i_id | orderline_i_id | 5 | const | 10 | Using temporary; Using filesort | | 1 | PRIMARY | ol2 | ref | orderline_o_id,orderline_i_id | orderline_o_id | 8 | test.ol.OL_O_ID | 11 | Using where | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 8 | test.ol.OL_O_ID | 10 | | | 2 | DERIVED | orders | index | NULL | orders_o_date | 4 | NULL | 10000 | | +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+ The query plan from MariaDB looks sane to me, and the numbers approve this (the query runs on a middle sized data set about 200 times faster with MariaDB than with MySQL). So we will continue our work with MariaDB. But I have a question to these query plans: why are we getting this differences in MySQL between our storage engine and InnoDB? Is there a feature in our storage engine missing (we first thought we need the ability to support HA_KEYREAD_ONLY - but implementing this feature did not change the query plan)? Or does MySQL some kind of "cheating"? We should understand this issue to be able to present our results we get later (may be we will compare MariaDB and MySQL, but in a paper we would have to explain why MySQL sucks that much). And btw: good work with MariaDB!! The optimizer seems to do a much better job than MySQL - even with InnoDB/XtraDB (we had to rewrite some queries in MySQL to force it to generate sane query plans - with MariaDB this does not seem to be necessary anymore). Thanks for your help in advance and best regards Markus