Hi, Lichao!
On Jan 10, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
MyISAM seemingly read all keys in the range. And I test MyISAM table use mysqlslap, the test case is: ./mysqlslap --query="select count(xn_id) as num from roman1.r_paihang where dengji > 4 and dengji < 11" --number-of-queries=10000000 --concurrency=30 -uroot --host=10.134.11.17 --port=10001 the result is : 10.134.11.17:/data/mqq_data$ mysqladmin extended-status -i 1 -r -uroot -S /tmp/mysql-tdmdb-14.sock | grep Com_select | Com_select | 41115084 | | Com_select | 9 | | Com_select | 14 | | Com_select | 23 | | Com_select | 25 | | Com_select | 36 | | Com_select | 8 | | Com_select | 26 | | Com_select | 34 | | Com_select | 24 | | Com_select | 16 | | Com_select | 10 | | Com_select | 8 | the table scheme is: CREATE TABLE `r_paihang` ( `xn_id` int(11) NOT NULL DEFAULT '0', `company` int(8) DEFAULT NULL, `dengji` int(6) DEFAULT NULL, `position` tinyint(3) DEFAULT '1', `tili` int(11) DEFAULT '0', `zhili` int(11) DEFAULT '0', `meili` int(11) DEFAULT '0', `worth` int(11) DEFAULT '0', `sex` int(1) DEFAULT '0', `online_time` int(11) DEFAULT NULL, `sign_time` datetime DEFAULT NULL, `last_time` datetime DEFAULT NULL, `yq_id` int(12) NOT NULL DEFAULT '0', PRIMARY KEY (`xn_id`), KEY `company` (`company`), KEY `worth` (`worth`), KEY `zhili` (`zhili`), KEY `position` (`position`), KEY `meili` (`meili`), KEY `tili` (`tili`), KEY `sex` (`sex`), KEY `dengji` (`dengji`), KEY `sign_time` (`sign_time`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; the case runs too slow.... If there is a storage engine api like records_in_range() to get the exact records in the range from the storage engine, the result will be 10^4 qps for this case...
Regards, Sergei
2011/1/20 Sergei Golubchik <serg@askmonty.org>:
Hi, Lichao!
Please send questions like this to maria-developers@lists.launchpad.net. In your replies make sure send replies to the list, e.g. use "Reply to All". It's a public mailing list dedicated to MySQL and MariaDB internals, source code, and related things. I am subscribed, so I'll see you mail there, and you may be sure I will, because it won't be accidentally catched by my spam filter, or sorted out in some obscure folder. Furthermore other subscribers will see your question and could reply if I will be not available (e.g. I could be travelling). Thank you.
Sarry, my fault! I am resending the email.
On Jan 20, Lichao Xie wrote:
Hi, Sergei
2011/1/20 Sergei Golubchik <serg@askmonty.org>:
On Jan 10, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
Regards, Sergei