Re: [Maria-developers] [Commits] Rev 2958: better fix for MySQL bugs in http://bazaar.launchpad.net/~maria-captains/maria/5.1/
Hi, Zardosht! First: please send questions like this to the list, not only to me. I wanted to discuss this with Sergey Petrunia and had to paste this on irc. Anyway, On Oct 22, Zardosht Kasheff wrote:
Hello Sergei,
One of the problems we at Tokutek have run into with the optimizer many times is that the optimizer does not always use these functions to estimate costs. I think I have found a couple of places in the code, and want to verify that my understanding is correct.
In best_access_path in sql_select.cc, I see the following code in two places: if (table->covering_keys.is_set(key)) { /* we can use only index tree */ uint keys_per_block= table->file->stats.block_size/2/ (keyinfo->key_length+table->file->ref_length)+1; tmp= record_count*(tmp+keys_per_block-1)/keys_per_block; } else tmp= record_count*min(tmp,s->worst_seeks);
is there a reason why tmp cannot be evaluated using the functions keyread_time and read_time? If so, is there any reason why MariaDB or MySQL cannot use the functions in this place?
I cannot fix it in 5.1, unfortunately. The code should probably be rewritten as if (table->covering_keys.is_set(key)) { tmp= record_count*table->file->keyread_time(key, 1, tmp); } else tmp= record_count*table->file->read_time(key, 1, min(tmp,s->worst_seeks)); but tmp is double, while the third argument of keyread_time and read_time is ha_rows - an integer. That is, this change causes a small precision loss, and it is enough for optimizer to start generating different query plans. I've got quite a few test failures because EXPLAIN output changed. We cannot do changes that affect query plans in a stable version. One thing I've learned in MySQL - no matter how correct and good such a change is, there will always be queries that it will affect adversely. And it won't matter that the new plan is correct, and the old plan was completely illogical and caused by a bug in the optimizer or something - they'll say that their queries were faster that way. I'll see about doing this change it 5.2 though. Regards, Sergei
Thanks Sergei, I will CC maria-developers from now on. I understand why this cannot go into 5.1. I was more interested in knowing if this is logically correct, which it seems to be. I hope in the long term (MariaDB 5.3 and beyond) that the MySQL optimizer moves more and more towards getting nearly all estimates from the handler, and it is something I will continue to investigate in my off time. I realize that this is something that cannot be done easily in one step, and if it is deemed important enough to do, may need to be spread over multiple releases. In the meantime, I am going to investigate what tmp means, and why the units are different (one is a double and the other is an integer). I have not done the homework to try to understand this yet. If I have any questions, I will post them. Thanks -Zardosht On Sun, Oct 24, 2010 at 2:35 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Zardosht!
First: please send questions like this to the list, not only to me. I wanted to discuss this with Sergey Petrunia and had to paste this on irc.
Anyway,
On Oct 22, Zardosht Kasheff wrote:
Hello Sergei,
One of the problems we at Tokutek have run into with the optimizer many times is that the optimizer does not always use these functions to estimate costs. I think I have found a couple of places in the code, and want to verify that my understanding is correct.
In best_access_path in sql_select.cc, I see the following code in two places: if (table->covering_keys.is_set(key)) { /* we can use only index tree */ uint keys_per_block= table->file->stats.block_size/2/ (keyinfo->key_length+table->file->ref_length)+1; tmp= record_count*(tmp+keys_per_block-1)/keys_per_block; } else tmp= record_count*min(tmp,s->worst_seeks);
is there a reason why tmp cannot be evaluated using the functions keyread_time and read_time? If so, is there any reason why MariaDB or MySQL cannot use the functions in this place?
I cannot fix it in 5.1, unfortunately. The code should probably be rewritten as
if (table->covering_keys.is_set(key)) { tmp= record_count*table->file->keyread_time(key, 1, tmp); } else tmp= record_count*table->file->read_time(key, 1, min(tmp,s->worst_seeks));
but tmp is double, while the third argument of keyread_time and read_time is ha_rows - an integer. That is, this change causes a small precision loss, and it is enough for optimizer to start generating different query plans. I've got quite a few test failures because EXPLAIN output changed. We cannot do changes that affect query plans in a stable version. One thing I've learned in MySQL - no matter how correct and good such a change is, there will always be queries that it will affect adversely. And it won't matter that the new plan is correct, and the old plan was completely illogical and caused by a bug in the optimizer or something - they'll say that their queries were faster that way.
I'll see about doing this change it 5.2 though.
Regards, Sergei
Hi, Zardosht! On Oct 24, Zardosht Kasheff wrote:
Thanks Sergei,
I will CC maria-developers from now on.
I understand why this cannot go into 5.1. I was more interested in knowing if this is logically correct, which it seems to be. I hope in the long term (MariaDB 5.3 and beyond) that the MySQL optimizer moves more and more towards getting nearly all estimates from the handler, and it is something I will continue to investigate in my off time.
See http://lists.askmonty.org/pipermail/commits/2010-October/000626.html Regards, Sergei
participants (2)
-
Sergei Golubchik
-
Zardosht Kasheff