Re: [Maria-developers] Optimization about count(*) of mysql 5.1.38
Hi, Lichao!
On Jan 10, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
MyISAM seemingly read all keys in the range. And I test MyISAM table use mysqlslap, the test case is: ./mysqlslap --query="select count(xn_id) as num from roman1.r_paihang where dengji > 4 and dengji < 11" --number-of-queries=10000000 --concurrency=30 -uroot --host=10.134.11.17 --port=10001 the result is : 10.134.11.17:/data/mqq_data$ mysqladmin extended-status -i 1 -r -uroot -S /tmp/mysql-tdmdb-14.sock | grep Com_select | Com_select | 41115084 | | Com_select | 9 | | Com_select | 14 | | Com_select | 23 | | Com_select | 25 | | Com_select | 36 | | Com_select | 8 | | Com_select | 26 | | Com_select | 34 | | Com_select | 24 | | Com_select | 16 | | Com_select | 10 | | Com_select | 8 | the table scheme is: CREATE TABLE `r_paihang` ( `xn_id` int(11) NOT NULL DEFAULT '0', `company` int(8) DEFAULT NULL, `dengji` int(6) DEFAULT NULL, `position` tinyint(3) DEFAULT '1', `tili` int(11) DEFAULT '0', `zhili` int(11) DEFAULT '0', `meili` int(11) DEFAULT '0', `worth` int(11) DEFAULT '0', `sex` int(1) DEFAULT '0', `online_time` int(11) DEFAULT NULL, `sign_time` datetime DEFAULT NULL, `last_time` datetime DEFAULT NULL, `yq_id` int(12) NOT NULL DEFAULT '0', PRIMARY KEY (`xn_id`), KEY `company` (`company`), KEY `worth` (`worth`), KEY `zhili` (`zhili`), KEY `position` (`position`), KEY `meili` (`meili`), KEY `tili` (`tili`), KEY `sex` (`sex`), KEY `dengji` (`dengji`), KEY `sign_time` (`sign_time`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; the case runs too slow.... If there is a storage engine api like records_in_range() to get the exact records in the range from the storage engine, the result will be 10^4 qps for this case...
Regards, Sergei
2011/1/20 Sergei Golubchik <serg@askmonty.org>:
Hi, Lichao!
Please send questions like this to maria-developers@lists.launchpad.net. In your replies make sure send replies to the list, e.g. use "Reply to All". It's a public mailing list dedicated to MySQL and MariaDB internals, source code, and related things. I am subscribed, so I'll see you mail there, and you may be sure I will, because it won't be accidentally catched by my spam filter, or sorted out in some obscure folder. Furthermore other subscribers will see your question and could reply if I will be not available (e.g. I could be travelling). Thank you.
Sarry, my fault! I am resending the email.
On Jan 20, Lichao Xie wrote:
Hi, Sergei
2011/1/20 Sergei Golubchik <serg@askmonty.org>:
On Jan 10, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
Regards, Sergei
Hi, Lichao! On Jan 20, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
MyISAM seemingly read all keys in the range. And I test MyISAM table use mysqlslap, the test case is:
Right. For the COUNT(*) query it does - MySQL does, there is no way around it.
the case runs too slow.... If there is a storage engine api like records_in_range() to get the exact records in the range from the storage engine, the result will be 10^4 qps for this case...
No, there is no such a API. The best you can do in your engine is to pay attention to TABLE::read_set and HA_EXTRA_KEYREAD which will allow you to reduce the amount of work you do per key. But still it will be the complete range scan - from the lower to the upper boundary or the range. Regards, Sergei
Hi Sergei 2011/1/21 Sergei Golubchik <serg@askmonty.org>:
Hi, Lichao!
On Jan 20, Lichao Xie wrote:
hi all: I have a question about the optimization of count(*) in a range. Sql for mysql like:select count(*) from t where id > 10 and id < 1000000; In table t, there is an B-tree index on id field, this sql will read hundreds of thousands of records from the storage engine, that is a time-consuming operation. I am developing a storage engine, I want to know, is there a good method to caculate the records in a range, rather than read all records or keys in the range?
Yes, see how MyISAM does it - it's pretty straightforward, and only requires two index lookups.
MyISAM seemingly read all keys in the range. And I test MyISAM table use mysqlslap, the test case is:
Right. For the COUNT(*) query it does - MySQL does, there is no way around it.
the case runs too slow.... If there is a storage engine api like records_in_range() to get the exact records in the range from the storage engine, the result will be 10^4 qps for this case...
No, there is no such a API. The best you can do in your engine is to pay attention to TABLE::read_set and HA_EXTRA_KEYREAD which will allow you to reduce the amount of work you do per key. But still it will be the complete range scan - from the lower to the upper boundary or the range.
Yes. But I want to know that with this use is based on what consideration? the lock or something else?
Regards, Sergei
Hi, Lichao! On Jan 21, Lichao Xie wrote:
No, there is no such a API. The best you can do in your engine is to pay attention to TABLE::read_set and HA_EXTRA_KEYREAD which will allow you to reduce the amount of work you do per key. But still it will be the complete range scan - from the lower to the upper boundary or the range.
Yes. But I want to know that with this use is based on what consideration? the lock or something else?
I am sorry, I cannot understand you. Could you try to rephrase your question please? Regards, Sergei
Hi, Sergei! I'm sorry, my English is not so good! There is no a API to get the exact records in a range, but the API can help to improve performance for the case: select count(*) from t where id > 10 and id < 1000000; I want to know why there is no such a API. Thank you very much! 2011/1/24 Sergei Golubchik <serg@askmonty.org>:
Hi, Lichao!
On Jan 21, Lichao Xie wrote:
No, there is no such a API. The best you can do in your engine is to pay attention to TABLE::read_set and HA_EXTRA_KEYREAD which will allow you to reduce the amount of work you do per key. But still it will be the complete range scan - from the lower to the upper boundary or the range.
Yes. But I want to know that with this use is based on what consideration? the lock or something else?
I am sorry, I cannot understand you. Could you try to rephrase your question please?
Regards, Sergei
Hi, Lichao! On Jan 24, Lichao Xie wrote:
Hi, Sergei! I'm sorry, my English is not so good! There is no a API to get the exact records in a range, but the API can help to improve performance for the case: select count(*) from t where id > 10 and id < 1000000; I want to know why there is no such a API.
Because we did not need it so far - no engine had it. And because the set of queries that it can optimize is pretty small, so it most cases this feature is not worth implementing. Regards, Sergei
Hi, Sergei! 2011/1/24 Sergei Golubchik <serg@askmonty.org>:
Hi, Lichao!
On Jan 24, Lichao Xie wrote:
Hi, Sergei! I'm sorry, my English is not so good! There is no a API to get the exact records in a range, but the API can help to improve performance for the case: select count(*) from t where id > 10 and id < 1000000; I want to know why there is no such a API.
Because we did not need it so far - no engine had it.
And because the set of queries that it can optimize is pretty small, so it most cases this feature is not worth implementing.
Well. However, we had encountered this problem. The application submits a lot of queries like this case. Then the mysql server and storage engine's loads had been very high. Although, we have used cache to reduce the number of this query, the result have been not very good. So, we have modified sql/opt_sum.cc to get the exact records in a range. Of course, it has effects only on our storage engine. Present seemed to work well.
Regards, Sergei
participants (2)
-
Lichao Xie(谢立超)
-
Sergei Golubchik