Hi, Monty! Thanks for the review! See my replies below. On Jun 03, Michael Widenius wrote:
At http://bazaar.launchpad.net/~maria-captains/maria/5.2/ ------------------------------------------------------------ revno: 2802
few small MySQL bugs/issues that impact the engines, as discussed in the SE summit * remove handler::index_read_last() * create handler::keyread_read_time() (was get_index_only_read_time() in opt_range.cc) * ha_show_status() allows engine's show_status() to fail * remove HTON_FLUSH_AFTER_RENAME * fix key_cmp_if_same() to work for floats and doubles * set table->status in the server, don't force engines to do it * increment status vars in the server, don't force engines to do it
+++ b/mysql-test/r/status_user.result 2010-06-01 22:39:29 +0000 @@ -100,8 +100,8 @@ Handler_commit 19 Handler_delete 1 Handler_discover 0 Handler_prepare 18 -Handler_read_first 1 -Handler_read_key 8 +Handler_read_first 0 +Handler_read_key 3
Any explanation why this change happened (as the test didn't change and I can't understand how the values could suddently be less now).
This change is correct. Before my commit, calls were counted in the handler, say, in the index_first() and index_next(). And ha_innobase::rnd_next() is implemented by calling index_first/index_next. So, innodb was incrementing Handler_read_first and Handler_read_next for table scans (and of course it was incrementing Handler_read_rnd_next too - double counting). It was wrong - first, it was double counting. Second, Handler_* should count handler calls as done by mysql, not expose internal implementation of the engine. For example, mi_rfirst() calls mi_rnext() internally, but we don't count it as Handler_read_next. The same should be true for any engine, even if it mixes implementation levels.
By the way, it would be nice if the file comments would be part of the commit email (as I assume you documented this issue there).
I have not :( But I will, when I recommit.
+++ b/mysql-test/r/partition_pruning.result 2010-06-01 22:39:29 +0000 @@ -2373,7 +2373,7 @@ flush status; update t1 set a=100 where a+1=5+1; show status like 'Handler_read_rnd_next'; Variable_name Value -Handler_read_rnd_next 10 +Handler_read_rnd_next 19
Any explanation why this change happened (as the test didn't change) Is it because we don't anymore count rows read in 'show' commands?
This is questionable change, I wanted to discuss it. ha_partition::index_next (for example) calls underlying engine's file->ha_index_next(), not file->index_next(). After my change Handler_read_key_next is incremented for both ha_partition::index_next and file->index_next(). Double counting. Before my change when a partition was pruned, Handler_read_key* counters were not incremented at all (as ha_partition::index_read did not call file->index_read() at all). Now it is incremented - that's why the numbers are increased. Possible solutions: * do not increment Handler_read* counters for ha_partition methods, only count calls to the underlying engines. * do not increment Handler_read* counters for underlying engines - only count calls from the upper layer into the handler, this is logical but counters won't show partition pruning or handler call overhead caused by many partitions. this can be solved by adding special set of counters Handler_partition_read_* (or something).
=== modified file 'sql/handler.cc' --- a/sql/handler.cc 2010-06-01 19:52:20 +0000 +++ b/sql/handler.cc 2010-06-01 22:39:29 +0000 @@ -2131,8 +2125,6 @@ int handler::read_first_row(uchar * buf, register int error; DBUG_ENTER("handler::read_first_row");
- ha_statistic_increment(&SSV::ha_read_first_count);
The above is wrong; We are later calling 'index_first()' in this function, not ha_index_first(), so we miss one increment (which was shown in the test cases). Note that we do also call rnd_next() in this function, without any counting of rows so we need to fix other things in this function too!
That's fine, the counter is incremented in ha_read_first_row() wrapper. If anything, the old code was wrong as it was incrementing ha_read_first_count twice (once here and once in index_first).
Simplest solution is to change to call ha_index_first / ha_rnd_next() in this function. This will also fix the 'table->status' variable that your are not counting anymore. This should be ok as we very seldom use 'handler::read_first_row()'
see above. I think it's ok to just increment ha_read_first_count in the wrapper. Especially because read_first_row() is rarely used.
Note that we should do same change in other functions that are calling handler functions directly:
handler::read_range_first - This calls index_first() and index_read_map() get_auto_increment() - This calls index_last() and index_read_map() index_read_idx_map() - This calls index_read_map() - Note that we can't trivially change this to call ha_index_read_map() as we increment things statistics in ha_index_read_idx_map() - We need to update table->status in this function!
Yes. But read_range_first() for example has no dedicated counter, so either it increments Handler_read_key* counters in the default implementation, or it increments nothing at all when any engine provides its own implementation :(
+/* + Calculate cost of 'index only' scan for given index and number of records. + + SYNOPSIS + handler->keyread_read_time() + param parameters structure + records #of records to read + keynr key to read + + NOTES + It is assumed that we will read trough the whole key range and that all + key blocks are half full (normally things are much better). It is also + assumed that each time we read the next key from the index, the handler + performs a random seek, thus the cost is proportional to the number of + blocks read. +*/ + +double handler::keyread_read_time(uint index, uint ranges, ha_rows rows) +{ + double read_time; + uint keys_per_block= (stats.block_size/2/ + (table->key_info[index].key_length + ref_length) + 1); + read_time=((double) (rows+keys_per_block-1)/ (double) keys_per_block); + return read_time; +}
Do we really need the 'ranges' argument ? (It's always '1' in the current code and you are not using it)
I don't know :) I've copied it from the handler::read_time(), just to have the interface the same for consistency. After all - logically - if the read_time() may depend on the number of ranges, keyread_read_time() certainly can do too.
=== modified file 'sql/key.cc' --- a/sql/key.cc 2008-10-10 10:01:01 +0000 +++ b/sql/key.cc 2010-06-01 22:39:29 +0000 @@ -278,8 +278,10 @@ bool key_cmp_if_same(TABLE *table,const key++; store_length--; } - if (key_part->key_part_flag & (HA_BLOB_PART | HA_VAR_LENGTH_PART | - HA_BIT_PART)) + if ((key_part->key_part_flag & (HA_BLOB_PART | HA_VAR_LENGTH_PART | + HA_BIT_PART)) || + key_part->type == HA_KEYTYPE_FLOAT || + key_part->type == HA_KEYTYPE_DOUBLE) { if (key_part->field->key_cmp(key, key_part->length)) return 1;
I understand that for float and double there is some extraordinary cases where memcmp() is not same as =, but who has had a problem with this?
there was a bug report in mysql bugdb. http://bugs.mysql.com/bug.php?id=44372
As a separate note, I think it would be better to add to key_part_flag HA_NO_CMP_WITH_MEMCMP for key_parts of type FLOAT or DOUBLE when we open the table. This would simplify this test a bit.
I'll try to
=== modified file 'sql/table.h' --- a/sql/table.h 2010-06-01 19:52:20 +0000 +++ b/sql/table.h 2010-06-01 22:39:29 +0000 @@ -13,6 +13,8 @@ along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
+#ifndef SQL_TABLE_INCLUDED +#define SQL_TABLE_INCLUDED
Do we really need this one as it's automaticly included by mysql_priv.h ? Anyway, it should probably be MYSQL_TABLE_H to be similar our other defines.
This is actually unrelated change, I tried to include table.h to handler.h (to solve the problem of inline handler methods needing TABLE) and had to add include guards. later I solved the problem differently but kept the guards as they're a good thing anyway. As for the name of the guard, it's new (~1 yr old) MySQL style. As I personally don't care about the name of the guards, as long as they all use a consistent style, I use the MySQL naming style here.
=== modified file 'storage/myisam/ha_myisam.cc'
<cut>
int ha_myisam::index_next(uchar *buf) { DBUG_ASSERT(inited==INDEX); - ha_statistic_increment(&SSV::ha_read_next_count); int error=mi_rnext(file,buf,active_index); table->status=error ? STATUS_NOT_FOUND: 0;
you should probably remove the setting of table->status here
Neither updating table->status not ha_statistic_increment() can hurt here, and as you have seen I've not updated any other engine at all. I've only did it in MyISAM to check that the change works, the code compiles, test results don't change, and so on. But I'll remove table->status updates from MyISAM.
The whole function can the be changed to:
return mi_rnext(file,buf,active_index);
Same goes for all other instances of setting table->status in this file
Regards, Sergei