[Maria-developers] Proposals for improving BKA/MRR visibility in EXPLAIN and diagnostics
Hello, Below are some ideas on how to make DS-MRR/BKA easier to work with for the users (= those who don't run mysqld under debugger). Questions are marked with 'Q:' but any comments are welcome. Better EXPLAIN -------------- Philip has complained numerous times that it is not shown in EXPLAIN whether DS-MRR will use key sorting. Currently, MRR alone is show like this: MariaDB [test]> explain select * from t1 where key1<30; +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | 1 | SIMPLE | t1 | range | key1 | key1 | 5 | NULL | 1 | Using index condition; Using MRR | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ MRR with BKA are shown like this: MariaDB [test]> explain select * from t2,t1 where t2.a=t1.key1; +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | test.t2.a | 1 | Using join buffer (flat, BKA join) | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+ The suggestion is to - get rid of "Using MRR" word, - instead, show "Sort rowids" and/or "Sort keys". - Show the above two whenever rowid-ordered and/or key-ordered retrieval is performed. The above examples will look as follows: MariaDB [test]> explain select * from t1 where key1<30; +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | 1 | SIMPLE | t1 | range | key1 | key1 | 5 | NULL | 1 | Using index condition; Sort keys; Sort rowids | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ MRR with BKA are shown like this: MariaDB [test]> explain select * from t2,t1 where t2.a=t1.key1; +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | test.t2.a | 1 | Using join buffer (flat, BKA join); Sort keys; Sort rowids | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ Q: any comments? Counters -------- Like it is done with other kinds of table accesses, we want to have counter-based way of analyzing of what has been happening around MRR. The most obvious are: 1. Handler_multi_range_read_init_count status variable This will tell how many multi_range_read_init() calls have been made that used non-default MRR implementation, i.e. one will be able to see how many times real MRR scans were performed. Q: this counter doesn't show how many times key sorting/rowid sorting/both strategies were used. Is that ok? One could argue that information about strategy choice is not in high demand as MRR strategy choice is based mostly on system settings and DDLs. 2. Handler_multi_range_read_next_count status variable This will tell how many records were returned by MRR to the upper layer. Q: MRR does index and rnd_pos scans under the hood, and these scans do increase counters. This means that, for a single row returned by MRR, multiple counters will be incremented. Less obvious suggestions: 3. It would be useful to have an idea about whether DS-MRR had sufficient buffer space to operate. One can get a rough picture by adding counters of key/rowid sort operations: - Handler_mrr_rowid_sort_count - Handler_mrr_key_sort_count If Handler_mrr_key_sort_count == Handler_multi_range_read_init_count then we've had enough buffer space. If key_sort_count is two times greater than the number of init operations, then one could conclude that on average two buffer refills were needed and increasing buffer size up to 2x would be beneficial. 4. Other possible things - Index condition pushdown check count, true/false ratio - Some statistics to globally check if the optimizer's guesses about required sizes for key/rowid parts of the buffer were any good. Q: do we see the need for any of the above? BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
The above examples will look as follows:
MariaDB [test]> explain select * from t1 where key1<30; +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | 1 | SIMPLE | t1 | range | key1 | key1 | 5 | NULL | 1 | Using index condition; Sort keys; Sort rowids | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+
MRR with BKA are shown like this: MariaDB [test]> explain select * from t2,t1 where t2.a=t1.key1; +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | test.t2.a | 1 | Using join buffer (flat, BKA join); Sort keys; Sort rowids | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ Q: any comments?
That is totally fine with me.
Less obvious suggestions:
3. It would be useful to have an idea about whether DS-MRR had sufficient buffer space to operate. One can get a rough picture by adding counters of key/rowid sort operations: - Handler_mrr_rowid_sort_count - Handler_mrr_key_sort_count
If Handler_mrr_key_sort_count == Handler_multi_range_read_init_count then we've had enough buffer space. If key_sort_count is two times greater than the number of init operations, then one could conclude that on average two buffer refills were needed and increasing buffer size up to 2x would be beneficial.
I would say that instead there must be a variable that shows the number of buffer refills. This way the user gets a single variable to monitor, without having to do any math. Then, the performance advice becomes "watch this variable and try to minimize it by using a larger buffer", the way we currently say "watch the .ee.g threads created variable and try to minimize it by using a larger thread pool". Philip Stoevb
Hi!
"Philip" == Philip Stoev <pstoev@askmonty.org> writes:
<cut>
If Handler_mrr_key_sort_count == Handler_multi_range_read_init_count then we've had enough buffer space. If key_sort_count is two times greater than the number of init operations, then one could conclude that on average two buffer refills were needed and increasing buffer size up to 2x would be beneficial.
Philip> I would say that instead there must be a variable that shows the number of Philip> buffer refills. This way the user gets a single variable to monitor, without Philip> having to do any math. Then, the performance advice becomes "watch this Philip> variable and try to minimize it by using a larger buffer", the way we Philip> currently say "watch the .ee.g threads created variable and try to minimize Philip> it by using a larger thread pool". Agree that having a variable (maybe even in addition to the other suggested one) for buffer refills would be a good thing! Regards, Monty
Added feedback provided by Igor over skype: On Wed, Dec 15, 2010 at 06:31:52PM +0300, Sergey Petrunya wrote:
Below are some ideas on how to make DS-MRR/BKA easier to work with for the users (= those who don't run mysqld under debugger). Questions are marked with 'Q:' but any comments are welcome.
Better EXPLAIN -------------- Philip has complained numerous times that it is not shown in EXPLAIN whether DS-MRR will use key sorting.
Currently, MRR alone is show like this:
MariaDB [test]> explain select * from t1 where key1<30; +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | 1 | SIMPLE | t1 | range | key1 | key1 | 5 | NULL | 1 | Using index condition; Using MRR | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+
MRR with BKA are shown like this: MariaDB [test]> explain select * from t2,t1 where t2.a=t1.key1; +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | test.t2.a | 1 | Using join buffer (flat, BKA join) | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------+
The suggestion is to - get rid of "Using MRR" word, - instead, show "Sort rowids" and/or "Sort keys". - Show the above two whenever rowid-ordered and/or key-ordered retrieval is performed.
The above examples will look as follows:
MariaDB [test]> explain select * from t1 where key1<30; +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ | 1 | SIMPLE | t1 | range | key1 | key1 | 5 | NULL | 1 | Using index condition; Sort keys; Sort rowids | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+ The above will not have "Sort keys", key sorting is done only when working with BKA.
MRR with BKA are shown like this: MariaDB [test]> explain select * from t2,t1 where t2.a=t1.key1; +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | test.t2.a | 1 | Using join buffer (flat, BKA join); Sort keys; Sort rowids | +----+-------------+-------+------+---------------+------+---------+-----------+------+------------------------------------------------------------+ Q: any comments?
Counters -------- Like it is done with other kinds of table accesses, we want to have counter-based way of analyzing of what has been happening around MRR.
The counters should also be collected as per-table statistics (probably that happens automatically, but we'll need to check that)
The most obvious are:
1. Handler_multi_range_read_init_count status variable This will tell how many multi_range_read_init() calls have been made that used non-default MRR implementation, i.e. one will be able to see how many times real MRR scans were performed. Q: this counter doesn't show how many times key sorting/rowid sorting/both strategies were used. Is that ok? One could argue that information about strategy choice is not in high demand as MRR strategy choice is based mostly on system settings and DDLs.
The name is too long and hard to remember. Tentative suggestion: "Handler_mrr_init_count", or "Handler_mrr_refill_count".
2. Handler_multi_range_read_next_count status variable This will tell how many records were returned by MRR to the upper layer. Q: MRR does index and rnd_pos scans under the hood, and these scans do increase counters. This means that, for a single row returned by MRR, multiple counters will be incremented.
Less obvious suggestions:
3. It would be useful to have an idea about whether DS-MRR had sufficient buffer space to operate. One can get a rough picture by adding counters of key/rowid sort operations: - Handler_mrr_rowid_sort_count - Handler_mrr_key_sort_count
If Handler_mrr_key_sort_count == Handler_multi_range_read_init_count then we've had enough buffer space. If key_sort_count is two times greater than the number of init operations, then one could conclude that on average two buffer refills were needed and increasing buffer size up to 2x would be beneficial.
4. Other possible things - Index condition pushdown check count, true/false ratio - Some statistics to globally check if the optimizer's guesses about required sizes for key/rowid parts of the buffer were any good. Q: do we see the need for any of the above?
BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
participants (3)
-
Michael Widenius
-
Philip Stoev
-
Sergey Petrunya