Re: [Maria-developers] [Commits] Rev 2994: Original idea from Zardosht Kasheff to add HA_CLUSTERED_INDEX in lp:maria/5.3
Hi, Michael! On May 18, Michael Widenius wrote:
=== modified file 'sql/handler.h' --- a/sql/handler.h 2011-05-16 11:05:45 +0000 +++ b/sql/handler.h 2011-05-18 16:26:30 +0000 @@ -161,8 +161,11 @@ */ #define HA_KEY_SCAN_NOT_ROR 128 #define HA_DO_INDEX_COND_PUSHDOWN 256 /* Supports Index Condition Pushdown */ - - +/* + Data is clustered on this key. This means that when you read the key + you also get the row data in the same block.
perhaps, better to say that "you get the row data without an extra seek (or even without an extra disk read)". A question: what about memory-based (and SSD) engines? Could/Should one consider all indexes clustered?
+*/ +#define HA_CLUSTERED_INDEX 512
/* bits in alter_table_flags: @@ -2311,9 +2314,22 @@ class handler :public Sql_alloc
/* - @retval TRUE Primary key (if there is one) is clustered - key covering all fields - @retval FALSE otherwise + Check if the primary key (if there is one) is a clustered key covering + all fields. This means:
I wouldn't say that it "checks if the primary key is clustered", but something like "historical method that returns TRUE if everything below holds:"
+ + - Data is stored together with the primary key (no secondary lookup + needed to find the row data). The optimizer uses this to find out + the cost of fetching data. + - The primary key is part of each secondary key and is used + to find the row data in the primary index when reading trough + secondary indexes. + - When doing a HA_KEYREAD_ONLY we get also all the primary key parts + into the row. This is critical property used by index_merge.
and here "all the above is usually true for engines that store the row data in the primary key index (e.g. in a b-tree), and use the primary key value as a position()"
+ + For a clustered primary key, index_flags() returns also HA_CLUSTERED_INDEX + + @retval TRUE yes + @retval FALSE No. */ virtual bool primary_key_is_clustered() { return FALSE; } virtual int cmp_ref(const uchar *ref1, const uchar *ref2)
=== modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-05-16 12:07:04 +0000 +++ b/sql/sql_select.cc 2011-05-18 16:26:30 +0000 @@ -16525,9 +16527,9 @@ test_if_skip_sort_order(JOIN_TAB *tab,OR */ DBUG_ASSERT (ref_key != (int) nr);
- bool is_covering= table->covering_keys.is_set(nr) || - (nr == table->s->primary_key && - table->file->primary_key_is_clustered()); + bool is_covering= (table->covering_keys.is_set(nr) || + (table->file->index_flags(nr, 0, 1) & + HA_CLUSTERED_INDEX));
See, that's what I said. It make sense to set covering_keys bitmap for all clustered indexes once, when the table is opened, and then you not only get this condition simpler, but probably will also cover other cases where table->covering_keys is checked (or may be checked in the future).
/* Don't use an index scan with ORDER BY without limit. === modified file 'sql/sql_table.cc' --- a/sql/sql_table.cc 2011-05-16 11:05:45 +0000 +++ b/sql/sql_table.cc 2011-05-18 16:26:30 +0000 @@ -7983,7 +7983,8 @@ copy_data_between_tables(TABLE *from,TAB
if (order) { - if (to->s->primary_key != MAX_KEY && to->file->primary_key_is_clustered()) + if (to->s->primary_key != MAX_KEY && + to->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX)
Hmm, good. It took me a while to come up with a realistic use case where to->file->primary_key_is_clustered() is false, but to->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX is true. Example: new InnoDB optimization that makes it to use internal generated primary key when user creates a PK of, say, VARCHAR(255). That is, in this optimization InnoDB treat this key as UNIQUE but not PRIMARY. Not completely unrealistic idea. In this case to->s->primary_key != MAX_KEY is true, there is a primary key. to->file->primary_key_is_clustered() is false, but to->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX is true. Heh, this is a good example, perhaps we need to apply it as a test to other code that uses primary_key_is_clustered() and see if the condition is used correctly.
{ char warn_buff[MYSQL_ERRMSG_SIZE]; my_snprintf(warn_buff, sizeof(warn_buff),
Regards, Sergei
Hi!
"Sergei" == Sergei Golubchik
writes:
<cut>
+ Data is clustered on this key. This means that when you read the key + you also get the row data in the same block.
Sergei> perhaps, better to say that "you get the row data without an extra seek Sergei> (or even without an extra disk read)". I added the comment. Sergei> A question: what about memory-based (and SSD) engines? Could/Should one Sergei> consider all indexes clustered? No, as there is still a high cost for fetching and managing an extra disk block. SSD are faster than disk, but not unlimited fast. For SSD, we need to add a different factor for the time to fetch a block. The storage engine can already take this into account a bit by adjusting the times for scan / key reads if things are on ssd. I have been discussing with igor for making all the timing constants variables to allow us to do real time tuning with them. (Good for finding better values for the constants that we have now). <cut>
- @retval TRUE Primary key (if there is one) is clustered - key covering all fields - @retval FALSE otherwise + Check if the primary key (if there is one) is a clustered key covering + all fields. This means:
Sergei> I wouldn't say that it "checks if the primary key is clustered", but Sergei> something like "historical method that returns TRUE if everything below Sergei> holds:" I don't like to say 'historical' as there is no plans to remove this code, only rename it... I did change the start to: Check if the primary key (if there is one) is a clustered key covering all fields and that the primary key is also stored as part of all other keys and used as a reference key.
+ + - Data is stored together with the primary key (no secondary lookup + needed to find the row data). The optimizer uses this to find out + the cost of fetching data. + - The primary key is part of each secondary key and is used + to find the row data in the primary index when reading trough + secondary indexes. + - When doing a HA_KEYREAD_ONLY we get also all the primary key parts + into the row. This is critical property used by index_merge.
Sergei> and here "all the above is usually true for engines that store the row Sergei> data in the primary key index (e.g. in a b-tree), and use the primary Sergei> key value as a position()" Added <cut>
+++ b/sql/sql_select.cc 2011-05-18 16:26:30 +0000 @@ -16525,9 +16527,9 @@ test_if_skip_sort_order(JOIN_TAB *tab,OR */ DBUG_ASSERT (ref_key != (int) nr);
- bool is_covering= table->covering_keys.is_set(nr) || - (nr == table->s->primary_key && - table->file->primary_key_is_clustered()); + bool is_covering= (table->covering_keys.is_set(nr) || + (table->file->index_flags(nr, 0, 1) & + HA_CLUSTERED_INDEX));
Sergei> See, that's what I said. It make sense to set covering_keys bitmap Sergei> for all clustered indexes once, when the table is opened, and then you Sergei> not only get this condition simpler, but probably will also cover other Sergei> cases where table->covering_keys is checked (or may be checked in the Sergei> future). Looking at how covering keys is used in many places, I am not sure that change is straightforward or will not cause wrong timing results. This is because in many cases we don't want to use a clustered key when we are trying to do index-only reads as the covering key is much slower than any other key. I agree that your idea has some merits, but I would not like to try to do it just now. Something you can look at when you are ready with 5.5... Regards, Monty
Hi, Michael! On May 19, Michael Widenius wrote:
+++ b/sql/sql_select.cc 2011-05-18 16:26:30 +0000 @@ -16525,9 +16527,9 @@ test_if_skip_sort_order(JOIN_TAB *tab,OR */ DBUG_ASSERT (ref_key != (int) nr);
- bool is_covering= table->covering_keys.is_set(nr) || - (nr == table->s->primary_key && - table->file->primary_key_is_clustered()); + bool is_covering= (table->covering_keys.is_set(nr) || + (table->file->index_flags(nr, 0, 1) & + HA_CLUSTERED_INDEX));
See, that's what I said. It make sense to set covering_keys bitmap for all clustered indexes once, when the table is opened, and then you not only get this condition simpler, but probably will also cover other cases where table->covering_keys is checked (or may be checked in the future).
Looking at how covering keys is used in many places, I am not sure that change is straightforward or will not cause wrong timing results. This is because in many cases we don't want to use a clustered key when we are trying to do index-only reads as the covering key is much slower than any other key.
Precisely, it is used in many places. So if HA_CLUSTERED_INDEX will affect it, clustered keys will be taken into account more than now. We usually want to *consider* clustered keys for index-only reads. Clustered keys aren't much slower. I mean, they are not "slower, because they are clustered", they're "slower because they're long". But any long key, not only clustering, will be slower than a short one. That is, optimizer - when deciding on a keyread - needs to use the shortest possible key, and generally take key length into account. This problem has very little to do with clustering keys (although it manifests itself on clustering keys).
I agree that your idea has some merits, but I would not like to try to do it just now. Something you can look at when you are ready with 5.5...
Agree, that's what I said on the phone. Regards, Sergei
participants (2)
-
Michael Widenius
-
Sergei Golubchik