[Maria-developers] Issue with index condition pushdown being used with no end_range set
Hello all, I've noticed a scenario with index condition pushdown (ICP) in MariaDB that leads to really bad performance in TokuDB. I don't know if it is a bug in ICP or if TokuDB is misusing/misunderstanding the API. Here is the problem. Suppose we run the following query on the following schema: SELECT * FROM foo WHERE col1 = '7' ORDER BY b desc | foo | CREATE TABLE `foo` ( `id` int(11) NOT NULL AUTO_INCREMENT, `col1` varchar(100) COLLATE utf8_unicode_ci NOT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, `d` int(11) DEFAULT NULL, PRIMARY KEY (`id`), CLUSTERING KEY `col1_2` (`col1`,`b`) ) ENGINE=TokuDB AUTO_INCREMENT=8001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=TOKUDB_ZLIB | The output of explain is: MariaDB [test]> explain SELECT * FROM foo WHERE col1 = '7' ORDER BY b desc; +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | foo | ref | col1_2 | col1_2 | 302 | const | 1320 | Using where | +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ 1 row in set (2.32 sec) The query is doing a reverse range scan, as it should. We get handler::idx_cond_push called, but end_range is not set. As a result, TokuDB thinks it can use index condition pushdown to filter rows. As we do this and get to the end, because end_range is not set, we never get a result of ICP_OUT_OF_RANGE, and always get a result of ICP_NO_MATCH. So, when we go to retrieve that first row past the end of the range, the row that will tell MySQL it should stop searching, TokuDB never finds a row and never gets ICP_NO_MATCH. It scans to the beginning of the index (because it is running in reverse order), getting ICP_NO_MATCH for every row it encounters. Although my index is clustering, I've seen this with a normal key as well. If col1 is an int instead of the funky varchar, handler::idx_cond_push is never called, so this problem does not exist. It seems to me that if end_range is not set and we cannot reliably learn when we go out of range, we should not have handler::idx_cond_push called, otherwise we can get bad performance such as the example above. Thoughts? Thanks --Zardosht
Hello, Does anyone have insight into this? Thanks -Zardosht On Sat, Dec 7, 2013 at 8:51 AM, Zardosht Kasheff <zardosht@gmail.com> wrote:
Hello all,
I've noticed a scenario with index condition pushdown (ICP) in MariaDB that leads to really bad performance in TokuDB. I don't know if it is a bug in ICP or if TokuDB is misusing/misunderstanding the API.
Here is the problem. Suppose we run the following query on the following schema: SELECT * FROM foo WHERE col1 = '7' ORDER BY b desc | foo | CREATE TABLE `foo` ( `id` int(11) NOT NULL AUTO_INCREMENT, `col1` varchar(100) COLLATE utf8_unicode_ci NOT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, `d` int(11) DEFAULT NULL, PRIMARY KEY (`id`), CLUSTERING KEY `col1_2` (`col1`,`b`) ) ENGINE=TokuDB AUTO_INCREMENT=8001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=TOKUDB_ZLIB |
The output of explain is: MariaDB [test]> explain SELECT * FROM foo WHERE col1 = '7' ORDER BY b desc; +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ | 1 | SIMPLE | foo | ref | col1_2 | col1_2 | 302 | const | 1320 | Using where | +------+-------------+-------+------+---------------+--------+---------+-------+------+-------------+ 1 row in set (2.32 sec)
The query is doing a reverse range scan, as it should.
We get handler::idx_cond_push called, but end_range is not set. As a result, TokuDB thinks it can use index condition pushdown to filter rows. As we do this and get to the end, because end_range is not set, we never get a result of ICP_OUT_OF_RANGE, and always get a result of ICP_NO_MATCH. So, when we go to retrieve that first row past the end of the range, the row that will tell MySQL it should stop searching, TokuDB never finds a row and never gets ICP_NO_MATCH. It scans to the beginning of the index (because it is running in reverse order), getting ICP_NO_MATCH for every row it encounters.
Although my index is clustering, I've seen this with a normal key as well.
If col1 is an int instead of the funky varchar, handler::idx_cond_push is never called, so this problem does not exist.
It seems to me that if end_range is not set and we cannot reliably learn when we go out of range, we should not have handler::idx_cond_push called, otherwise we can get bad performance such as the example above.
Thoughts?
Thanks
--Zardosht
participants (1)
-
Zardosht Kasheff