[Maria-developers] problem with partitioning and our storage engine in 5.2
Hello all, We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2. On MariaDB 5.2, the attached test produces the attached result file, that has incorrect results. Here is what the test is doing: 1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again. The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3. What makes this even stranger is that the query results for 4 and 5 are always correct. Any ideas on how we can go about figuring out what is wrong? Thanks -Zardosht
Zardosht Kasheff <zardosht@gmail.com> writes:
We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2.
Here is what the test is doing:
1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again.
The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3.
What makes this even stranger is that the query results for 4 and 5 are always correct.
It's hard to say without any code. Can you post link to launchpad tree or source tarball with which to reproduce?
From the information given, I would suggest to run the test with --valgrind, in case the problem is some uninitialised memory being referenced; sometimes Valgrind can help catch this.
- Kristian.
I think we found out the problem. In ha_partition.cc, in the function ha_partition::handle_ordered_index_scan, this code was added: case partition_index_last: /* MyISAM engine can fail if we call index_last() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong? -Zardosht On Wed, Dec 8, 2010 at 2:42 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Zardosht Kasheff <zardosht@gmail.com> writes:
We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2.
Here is what the test is doing:
1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again.
The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3.
What makes this even stranger is that the query results for 4 and 5 are always correct.
It's hard to say without any code. Can you post link to launchpad tree or source tarball with which to reproduce?
From the information given, I would suggest to run the test with --valgrind, in case the problem is some uninitialised memory being referenced; sometimes Valgrind can help catch this.
- Kristian.
Zardosht,
I think we found out the problem. In ha_partition.cc, in the function ha_partition::handle_ordered_index_scan, this code was added: case partition_index_last: /* MyISAM engine can fail if we call index_last() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break;
We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong?
The assumption is not generally wrong, however there are two kinds of storage engines in this respect. The first kind promises to provide exact count of rows, MyISAM is an example. The second kind doesn't promise exact counts, for instance InnoDB. The optimizer checks this, and if it can rely on the exact count, then it optimizes count(*) whenever possible (for count(*) queries without GROUP BY). The relevant code is in opt_sum.cc. The function opt_sum_query performs this test: ... if (!(tl->table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) || tl->schema_table) ... The question is what is your engine's implementation of your_handler_class::ha_table_flags()? If you "lie" to the optimizer that the count is exact, but then treat is an approximate value, you'll get the above wrong result for sure. If this is not the problem, then please post a reproducible example. Timour
-Zardosht
On Wed, Dec 8, 2010 at 2:42 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Zardosht Kasheff<zardosht@gmail.com> writes:
We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2.
Here is what the test is doing:
1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again.
The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3.
What makes this even stranger is that the query results for 4 and 5 are always correct.
It's hard to say without any code. Can you post link to launchpad tree or source tarball with which to reproduce?
From the information given, I would suggest to run the test with --valgrind, in case the problem is some uninitialised memory being referenced; sometimes Valgrind can help catch this.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Our storage engine does not lie to the optimizer. We do not set the HA_STATS_RECORDS_IS_EXACT flag. I do not know this code well enough to understand how the relevant code in opt_sum.cc results in the partition code I describe get executed. What I do not understand is this. Why does the following code at line 4734 of ha_partition.cc (at the bottom of this message) simply assume that if file->stats.records is 0, that it can blindly return HA_ERR_END_OF_FILE? I understand if file->ha_table_flags exposes HA_STATS_RECORDS_IS_EXACT, that it can do this, but that is not checked here. -Zardosht case partition_index_last: /* MyISAM engine can fail if we call index_last() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; } error= file->ha_index_last(rec_buf_ptr); reverse_order= TRUE; break; -Zardosht On Wed, Dec 8, 2010 at 5:34 PM, Timour Katchaounov <timour@askmonty.org> wrote:
Zardosht,
I think we found out the problem. In ha_partition.cc, in the function ha_partition::handle_ordered_index_scan, this code was added: case partition_index_last: /* MyISAM engine can fail if we call index_last() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break;
We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong?
The assumption is not generally wrong, however there are two kinds of storage engines in this respect. The first kind promises to provide exact count of rows, MyISAM is an example. The second kind doesn't promise exact counts, for instance InnoDB. The optimizer checks this, and if it can rely on the exact count, then it optimizes count(*) whenever possible (for count(*) queries without GROUP BY).
The relevant code is in opt_sum.cc. The function opt_sum_query performs this test: ... if (!(tl->table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) || tl->schema_table) ...
The question is what is your engine's implementation of your_handler_class::ha_table_flags()? If you "lie" to the optimizer that the count is exact, but then treat is an approximate value, you'll get the above wrong result for sure.
If this is not the problem, then please post a reproducible example.
Timour
-Zardosht
On Wed, Dec 8, 2010 at 2:42 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Zardosht Kasheff<zardosht@gmail.com> writes:
We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2.
Here is what the test is doing:
1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again.
The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3.
What makes this even stranger is that the query results for 4 and 5 are always correct.
It's hard to say without any code. Can you post link to launchpad tree or source tarball with which to reproduce?
From the information given, I would suggest to run the test with --valgrind, in case the problem is some uninitialised memory being referenced; sometimes Valgrind can help catch this.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Zardosht Kasheff <zardosht@gmail.com> writes:
What I do not understand is this. Why does the following code at line 4734 of ha_partition.cc (at the bottom of this message) simply assume that if file->stats.records is 0, that it can blindly return HA_ERR_END_OF_FILE? I understand if file->ha_table_flags exposes HA_STATS_RECORDS_IS_EXACT, that it can do this, but that is not checked here.
From sql/handler.h:
class ha_statistics { ... /* The number of records in the table. 0 - means the table has exactly 0 rows other - if (table_flags() & HA_STATS_RECORDS_IS_EXACT) the value is the exact number of records in the table else it is an estimate */ ha_rows records; So 0 is special, and must not be given as an estimate, according to this comment (which is also in MySQL 5.1). - Kristian.
This code was added as a fix for MySQL bug #38005 and then removed as a fix for bug #46639 The comment for the revision that removed the code is: Bug#46639: 1030 (HY000): Got error 124 from storage engine on INSERT ... SELECT ... Problem was that when bulk insert is used on an empty table/partition, it disables the indexes for better performance, but in this specific case it also tries to read from that partition using an index, which is not possible since it has been disabled. Solution was to allow index reads on disabled indexes if there are no records. Also reverted the patch for bug#38005, since that was a workaround in the partitioning engine instead of a fix in myisam. ----- Original Message ----- From: "Zardosht Kasheff" <zardosht@gmail.com> To: "Kristian Nielsen" <knielsen@knielsen-hq.org> Cc: <maria-developers@lists.launchpad.net> Sent: Wednesday, December 08, 2010 9:47 PM Subject: Re: [Maria-developers] problem with partitioning and our storage engine in 5.2 I think we found out the problem. In ha_partition.cc, in the function ha_partition::handle_ordered_index_scan, this code was added: case partition_index_last: /* MyISAM engine can fail if we call index_last() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong? -Zardosht On Wed, Dec 8, 2010 at 2:42 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Zardosht Kasheff <zardosht@gmail.com> writes:
We have been working on testing our storage engine, TokuDB, against MariaDB 5.2.3, and we have encountered a problem with partitioning that does not exist on MySQL 5.1, MySQL 5.5, and MariaDB 5.1.50. This problem also does not exist with any other storage engine that we have tried. It ONLY exists with TokuDB and MariaDB 5.2.
Here is what the test is doing:
1. Create a table with some partitions 2. update the table 3. run select max(f_int1) 4. run select * 5. run select max(f_int1) again.
The problem is that the query results for 3 are incorrect, even though the query results for 4 and 5 are correct. MySQL 5.1, MySQL 5.5, and MariaDB 5.1 produce the correct results for 3.
What makes this even stranger is that the query results for 4 and 5 are always correct.
It's hard to say without any code. Can you post link to launchpad tree or source tarball with which to reproduce?
From the information given, I would suggest to run the test with --valgrind, in case the problem is some uninitialised memory being referenced; sometimes Valgrind can help catch this.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
"Philip Stoev" <pstoev@askmonty.org> writes:
This code was added as a fix for MySQL bug #38005 and then removed as a fix for bug #46639
Yes. Turns out that the removal part of Bug#46639 was lost in MariaDB in this merge: revno: 2732 [merge] revision-id: igor@askmonty.org-20091112043128-yd6odg8zr5ribjal parent: psergey@askmonty.org-20091104224158-nk2s2luvlqwa02bl parent: igor@askmonty.org-20091110023239-vgttyweq2qmh0y25 committer: Igor Babaev <igor@askmonty.org> branch nick: maria-5.2-vcol timestamp: Wed 2009-11-11 20:31:28 -0800 message: Merge of the patch introducing virtual columns into maria-5.2 This merge gives the following conflict (and two more like it) in sql/ha_partition.cc: <<<<<<< TREE /* MyISAM engine can fail if we call index_first() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; } error= file->ha_index_first(buf); ||||||| BASE-REVISION /* MyISAM engine can fail if we call index_first() when indexes disabled */ /* that happens if the table is empty. */ /* Here we use file->stats.records instead of file->records() because */ /* file->records() is supposed to return an EXACT count, and it can be */ /* possibly slow. We don't need an exact number, an approximate one- from*/ /* the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; } error= file->index_first(buf); ======= error= file->index_first(buf);
> MERGE-SOURCE
In MariaDB 5.2, these conflicts were resolved by keeping the lines deleted in the patch, this reverting this part of the patch for Bug#46639. Igor, did you deliberately revert the of the patch like this, or is it just a mistake during merge? If a mistake, I can push a correction for this to 5.2. If deliberate, should the reverting also be done in MariaDB 5.1, or is it only necessary for 5.2+? - Kristian. [Note that for the tokudb problem, I believe they still need to fix their engine to not return zero estimate as per comments in handler.h. This appears to be just a symptom.]
Kristian Nielsen wrote:
"Philip Stoev" <pstoev@askmonty.org> writes:
This code was added as a fix for MySQL bug #38005 and then removed as a fix for bug #46639
Yes.
Turns out that the removal part of Bug#46639 was lost in MariaDB in this merge:
revno: 2732 [merge] revision-id: igor@askmonty.org-20091112043128-yd6odg8zr5ribjal parent: psergey@askmonty.org-20091104224158-nk2s2luvlqwa02bl parent: igor@askmonty.org-20091110023239-vgttyweq2qmh0y25 committer: Igor Babaev <igor@askmonty.org> branch nick: maria-5.2-vcol timestamp: Wed 2009-11-11 20:31:28 -0800 message: Merge of the patch introducing virtual columns into maria-5.2
This merge gives the following conflict (and two more like it) in sql/ha_partition.cc:
<<<<<<< TREE /* MyISAM engine can fail if we call index_first() when indexes disabled that happens if the table is empty. Here we use file->stats.records instead of file->records() because file->records() is supposed to return an EXACT count, and it can be possibly slow. We don't need an exact number, an approximate one- from the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; } error= file->ha_index_first(buf); ||||||| BASE-REVISION /* MyISAM engine can fail if we call index_first() when indexes disabled */ /* that happens if the table is empty. */ /* Here we use file->stats.records instead of file->records() because */ /* file->records() is supposed to return an EXACT count, and it can be */ /* possibly slow. We don't need an exact number, an approximate one- from*/ /* the last ::info() call - is sufficient. */ if (file->stats.records == 0) { error= HA_ERR_END_OF_FILE; break; } error= file->index_first(buf); ======= error= file->index_first(buf);
>> MERGE-SOURCE
In MariaDB 5.2, these conflicts were resolved by keeping the lines deleted in the patch, this reverting this part of the patch for Bug#46639.
Igor, did you deliberately revert the of the patch like this, or is it just a mistake during merge?
If a mistake, I can push a correction for this to 5.2.
If deliberate, should the reverting also be done in MariaDB 5.1, or is it only necessary for 5.2+?
Probably it was a mistake. Regards, Igor.
- Kristian.
[Note that for the tokudb problem, I believe they still need to fix their engine to not return zero estimate as per comments in handler.h. This appears to be just a symptom.]
Hi, Zardosht! On Dec 08, Zardosht Kasheff wrote:
We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong?
Yes, it is. stats.records is an estimate (unless you set HA_STATS_RECORDS_IS_EXACT) as long as it's not 0. The value of 0 is always exact, that is if stats.records is 0, MySQL considers the table to be empty, irrespectively from HA_STATS_RECORDS_IS_EXACT. If you aren't sure, return 1. Regards, Sergei
Thank you to everybody for the clarification. We will fix our engine to not set 0 for stats.records unless we are absolutely sure it is empty. -Zardosht On Thu, Dec 9, 2010 at 7:52 AM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Zardosht!
On Dec 08, Zardosht Kasheff wrote:
We always thought that stats.records was meant to be an estimate, and that an estimate of 0 was ok even if the table was non-empty. We were reporting that stats.records was 0 even though the table was non-empty. Is this assumption wrong?
Yes, it is. stats.records is an estimate (unless you set HA_STATS_RECORDS_IS_EXACT) as long as it's not 0. The value of 0 is always exact, that is if stats.records is 0, MySQL considers the table to be empty, irrespectively from HA_STATS_RECORDS_IS_EXACT.
If you aren't sure, return 1.
Regards, Sergei
participants (6)
-
Igor Babaev
-
Kristian Nielsen
-
Philip Stoev
-
Sergei Golubchik
-
Timour Katchaounov
-
Zardosht Kasheff