Hi, Lichao! On Jan 20, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
MyISAM seemingly read all keys in the range. And I test MyISAM table use mysqlslap, the test case is:
Right. For the COUNT(*) query it does - MySQL does, there is no way around it.
the case runs too slow.... If there is a storage engine api like records_in_range() to get the exact records in the range from the storage engine, the result will be 10^4 qps for this case...
No, there is no such a API. The best you can do in your engine is to pay attention to TABLE::read_set and HA_EXTRA_KEYREAD which will allow you to reduce the amount of work you do per key. But still it will be the complete range scan - from the lower to the upper boundary or the range. Regards, Sergei