[Maria-developers] proposed patches for MWL#113: many clustered keys (not only primary)
Hello all, First off, thanks for hosting the storage engine summit last Friday. At the beginning of the day, I was encouraged to send the patches that I had for MWL#113 to the MariaDB developers list, in hopes possibly getting them into MariaDB 5.3. Here they are. They are patched off of MariaDB 5.2.3. They are all relatively simple. Here is what the patches do. - 1.diff - add an index flag HA_CLUSTERED_INDEX. If a storage engine exposes this flag for an index, then that index is clustered. This was proposed in http://bugs.mysql.com/bug.php?id=51687 - 2.diff - simple fix to get select ... order by DESC working - 3.diff - fix for index merge and clustering keys. This was inspired by Sergey Petrunya originally proposing http://lists.mysql.com/internals/37165. It has been altered a bit now that HA_CLUSTERED_INDEX is added. - 4.diff - extend fix of MySQL bug 50843 for clustering keys - http://lists.askmonty.org/pipermail/commits/2010-October/000626.html. I cannot find this checked in anywhere, but it seems to be a good patch for clustering keys. This was inspired from http://www.mail-archive.com/maria-developers@lists.launchpad.net/msg03491.ht.... We currently have a hacky workaround for this issue, but this seems like the right fix. I also have one more fix for clustering keys and joins (in make_join_readinfo), but I want to look over it and before sending it. Also, if for whatever reason all of the patches are not good enough for checking in, please consider some subset of the patches. This is by no means "all or none". That is why I broke it up into separate diff files. I would love to hear feedback on these patches. Thanks -Zardosht
Hi! Here comes the feedback, patch by patch. Those that are ok 'as such or with small modifications' I will add to the 5.3 tree at once.
"Zardosht" == Zardosht Kasheff <zardosht@gmail.com> writes:
Zardosht> Hello all, Zardosht> First off, thanks for hosting the storage engine summit last Friday. Zardosht> At the beginning of the day, I was encouraged to send the patches that Zardosht> I had for MWL#113 to the MariaDB developers list, in hopes possibly Zardosht> getting them into MariaDB 5.3. Here they are. They are patched off of Zardosht> MariaDB 5.2.3. Zardosht> They are all relatively simple. Here is what the patches do. Zardosht> - 1.diff - add an index flag HA_CLUSTERED_INDEX. If a storage engine Zardosht> exposes this flag for an index, then that index is clustered. This was Zardosht> proposed in http://bugs.mysql.com/bug.php?id=51687 1.diff
Index: sql/handler.h =================================================================== --- sql/handler.h (revision 25779) +++ sql/handler.h (revision 25780) @@ -151,6 +151,10 @@ #define HA_READ_RANGE 8 /* can find all records in a range */ #define HA_ONLY_WHOLE_INDEX 16 /* Can't use part key searches */ #define HA_KEYREAD_ONLY 64 /* Support HA_EXTRA_KEYREAD */ +// no IO if read data when scan index +// i.e index is covering +// skipping 128 because SOME product has HA_KEY_SCAN_NOT_ROR +#define HA_CLUSTERED_INDEX 256
/* bits in alter_table_flags: I changed the above to use 512, as this is the next free flag. I also added this comment: /* Data is clustered on this key. This means that when you read the key you also get the row data in the same block. */ ------------- I also added new comments in the code for how clustered data was used. I also removed some calls to table->file->primary_key_is_clustered(). The definition for primary_key_is_clustered() is after this as follows: /* Check if the primary key (if there is one) is a clustered key covering all fields. This means: - Data is stored together with the primary key (no secondary lookup needed to find the row data). The optimizer uses this to find out the cost of fetching data. - The primary key is part of each secondary key and is used to find the row data in the primary index when reading trough secondary indexes. - When doing a HA_KEYREAD_ONLY we get also all the primary key parts into the row. This is critical property used by index_merge. For a clustered primary key, index_flags() returns also HA_CLUSTERED_INDEX @retval TRUE yes @retval FALSE No. */ I also fixed the part in the code where it was possible to use the new HA_CLUSTERED_INDEX flag and changed get_best_ror_intersect() to prefer non clustered indexes before clustered index. Code is now commited to my 5.3 tree; Will be pushed shortly. ----------- Zardosht> - 2.diff - simple fix to get select ... order by DESC working My changes for 1.diff fixed this. The code is now: Here is your change:
+ bool is_covering= table->covering_keys.is_set(nr) || + (nr == table->s->primary_key && + table->file->primary_key_is_clustered()) || + test(table->file->index_flags(nr, 0, 0) & HA_CLUSTERED_INDEX);
- bool is_covering= table->covering_keys.is_set(nr) || - (nr == table->s->primary_key && - table->file->primary_key_is_clustered()); -
Here is the code after my change: bool is_covering= (table->covering_keys.is_set(nr) || (table->file->index_flags(nr, 0, 1) & HA_CLUSTERED_INDEX)); I was able to remove the test for table->file->primary_key_is_clustered() as I fixed all innodb code to set HA_CLUSTERED_INDEX for clustered keys. Zardosht> - 3.diff - fix for index merge and clustering keys. This was inspired Zardosht> by Sergey Petrunya originally proposing Zardosht> http://lists.mysql.com/internals/37165. It has been altered a bit now Zardosht> that HA_CLUSTERED_INDEX is added.
--- sql/opt_range.cc (revision 25783) +++ sql/opt_range.cc (revision 25784) @@ -4497,8 +4497,20 @@ for (idx= 0, cur_ror_scan= tree->ror_scans; idx < param->keys; idx++) { ROR_SCAN_INFO *scan; + uint keyno= param->real_keynr[idx]; + if (!tree->ror_scans_map.is_set(idx)) + { continue; + } + /* + Ignore clustering keys. + */ + if (keyno != cpk_no && test(param->table->file->index_flags(keyno,0,0) & HA_CLUSTERED_INDEX)) + { + tree->n_ror_scans--; + continue; + } if (!(scan= make_ror_scan(param, idx, tree->keys[idx]))) return NULL; if (param->real_keynr[idx] == cpk_no)
Hm.. I originally changed this code to put the clustered keys last. The idea was to make it possible to work with engines that supports multiple clustered keys but still prefer normal keys first. Have now changed the code to use your approach instead. Zardosht> - 4.diff - extend fix of MySQL bug 50843 for clustering keys Zardosht> - http://lists.askmonty.org/pipermail/commits/2010-October/000626.html. Zardosht> I cannot find this checked in anywhere, but it seems to be a good Zardosht> patch for clustering keys. This was inspired from Zardosht> http://www.mail-archive.com/maria-developers@lists.launchpad.net/msg03491.ht.... Zardosht> We currently have a hacky workaround for this issue, but this seems Zardosht> like the right fix. This part also changed in my original patch.
+ + Also not in case where the index is a clustering index */ if ((select_limit >= table_records) && (tab->type == JT_ALL && tab->join->tables > tab->join->const_tables + 1) && ((unsigned) best_key != table->s->primary_key || - !table->file->primary_key_is_clustered())) + !table->file->primary_key_is_clustered()) && + !(best_key >= 0 && test(table->file->index_flags(best_key,0,0) & HA_CLUSTERED_INDEX)) + ) DBUG_RETURN(0);
The new code looks like: if (best_key < 0 || ((select_limit >= table_records) && (tab->type == JT_ALL && tab->join->tables > tab->join->const_tables + 1) && !(table->file->index_flags(best_key, 0, 1) & HA_CLUSTERED_INDEX))) goto use_filesort; No reson to test for table->file->primary_key_is_clustered() anymore. Zardosht> I also have one more fix for clustering keys and joins (in Zardosht> make_join_readinfo), but I want to look over it and before sending it. ok. Zardosht> Also, if for whatever reason all of the patches are not good enough Zardosht> for checking in, please consider some subset of the patches. This is Zardosht> by no means "all or none". That is why I broke it up into separate Zardosht> diff files. No problem; All changes are in. I will push as soon as I have got an ok from Serg / Spetrunia. Regards, Monty
Hello Monty, Thank you for your feedback on the four diffs and for approving them. This makes getting TokuDB to work with MariaDB much simpler. I could not tell from the original email, but is the change Sergei made in http://lists.askmonty.org/pipermail/commits/2010-October/000626.html going to be committed as well? Thanks -Zardosht On Wed, May 18, 2011 at 12:25 PM, Michael Widenius <monty@askmonty.org> wrote:
Hi!
Here comes the feedback, patch by patch.
Those that are ok 'as such or with small modifications' I will add to the 5.3 tree at once.
"Zardosht" == Zardosht Kasheff <zardosht@gmail.com> writes:
Zardosht> Hello all, Zardosht> First off, thanks for hosting the storage engine summit last Friday.
Zardosht> At the beginning of the day, I was encouraged to send the patches that Zardosht> I had for MWL#113 to the MariaDB developers list, in hopes possibly Zardosht> getting them into MariaDB 5.3. Here they are. They are patched off of Zardosht> MariaDB 5.2.3.
Zardosht> They are all relatively simple. Here is what the patches do. Zardosht> - 1.diff - add an index flag HA_CLUSTERED_INDEX. If a storage engine Zardosht> exposes this flag for an index, then that index is clustered. This was Zardosht> proposed in http://bugs.mysql.com/bug.php?id=51687
1.diff
Index: sql/handler.h =================================================================== --- sql/handler.h (revision 25779) +++ sql/handler.h (revision 25780) @@ -151,6 +151,10 @@ #define HA_READ_RANGE 8 /* can find all records in a range */ #define HA_ONLY_WHOLE_INDEX 16 /* Can't use part key searches */ #define HA_KEYREAD_ONLY 64 /* Support HA_EXTRA_KEYREAD */ +// no IO if read data when scan index +// i.e index is covering +// skipping 128 because SOME product has HA_KEY_SCAN_NOT_ROR +#define HA_CLUSTERED_INDEX 256
/* bits in alter_table_flags:
I changed the above to use 512, as this is the next free flag. I also added this comment:
/* Data is clustered on this key. This means that when you read the key you also get the row data in the same block. */
------------- I also added new comments in the code for how clustered data was used. I also removed some calls to table->file->primary_key_is_clustered().
The definition for primary_key_is_clustered() is after this as follows:
/* Check if the primary key (if there is one) is a clustered key covering all fields. This means:
- Data is stored together with the primary key (no secondary lookup needed to find the row data). The optimizer uses this to find out the cost of fetching data. - The primary key is part of each secondary key and is used to find the row data in the primary index when reading trough secondary indexes. - When doing a HA_KEYREAD_ONLY we get also all the primary key parts into the row. This is critical property used by index_merge.
For a clustered primary key, index_flags() returns also HA_CLUSTERED_INDEX
@retval TRUE yes @retval FALSE No. */
I also fixed the part in the code where it was possible to use the new HA_CLUSTERED_INDEX flag and changed get_best_ror_intersect() to prefer non clustered indexes before clustered index.
Code is now commited to my 5.3 tree; Will be pushed shortly.
-----------
Zardosht> - 2.diff - simple fix to get select ... order by DESC working
My changes for 1.diff fixed this.
The code is now:
Here is your change:
+ bool is_covering= table->covering_keys.is_set(nr) || + (nr == table->s->primary_key && + table->file->primary_key_is_clustered()) || + test(table->file->index_flags(nr, 0, 0) & HA_CLUSTERED_INDEX);
- bool is_covering= table->covering_keys.is_set(nr) || - (nr == table->s->primary_key && - table->file->primary_key_is_clustered()); -
Here is the code after my change:
bool is_covering= (table->covering_keys.is_set(nr) || (table->file->index_flags(nr, 0, 1) & HA_CLUSTERED_INDEX));
I was able to remove the test for table->file->primary_key_is_clustered() as I fixed all innodb code to set HA_CLUSTERED_INDEX for clustered keys.
Zardosht> - 3.diff - fix for index merge and clustering keys. This was inspired Zardosht> by Sergey Petrunya originally proposing Zardosht> http://lists.mysql.com/internals/37165. It has been altered a bit now Zardosht> that HA_CLUSTERED_INDEX is added.
--- sql/opt_range.cc (revision 25783) +++ sql/opt_range.cc (revision 25784) @@ -4497,8 +4497,20 @@ for (idx= 0, cur_ror_scan= tree->ror_scans; idx < param->keys; idx++) { ROR_SCAN_INFO *scan; + uint keyno= param->real_keynr[idx]; + if (!tree->ror_scans_map.is_set(idx)) + { continue; + } + /* + Ignore clustering keys. + */ + if (keyno != cpk_no && test(param->table->file->index_flags(keyno,0,0) & HA_CLUSTERED_INDEX)) + { + tree->n_ror_scans--; + continue; + } if (!(scan= make_ror_scan(param, idx, tree->keys[idx]))) return NULL; if (param->real_keynr[idx] == cpk_no)
Hm.. I originally changed this code to put the clustered keys last. The idea was to make it possible to work with engines that supports multiple clustered keys but still prefer normal keys first.
Have now changed the code to use your approach instead.
Zardosht> - 4.diff - extend fix of MySQL bug 50843 for clustering keys Zardosht> - http://lists.askmonty.org/pipermail/commits/2010-October/000626.html. Zardosht> I cannot find this checked in anywhere, but it seems to be a good Zardosht> patch for clustering keys. This was inspired from Zardosht> http://www.mail-archive.com/maria-developers@lists.launchpad.net/msg03491.ht.... Zardosht> We currently have a hacky workaround for this issue, but this seems Zardosht> like the right fix.
This part also changed in my original patch.
+ + Also not in case where the index is a clustering index */ if ((select_limit >= table_records) && (tab->type == JT_ALL && tab->join->tables > tab->join->const_tables + 1) && ((unsigned) best_key != table->s->primary_key || - !table->file->primary_key_is_clustered())) + !table->file->primary_key_is_clustered()) && + !(best_key >= 0 && test(table->file->index_flags(best_key,0,0) & HA_CLUSTERED_INDEX)) + ) DBUG_RETURN(0);
The new code looks like:
if (best_key < 0 || ((select_limit >= table_records) && (tab->type == JT_ALL && tab->join->tables > tab->join->const_tables + 1) && !(table->file->index_flags(best_key, 0, 1) & HA_CLUSTERED_INDEX))) goto use_filesort;
No reson to test for table->file->primary_key_is_clustered() anymore.
Zardosht> I also have one more fix for clustering keys and joins (in Zardosht> make_join_readinfo), but I want to look over it and before sending it.
ok.
Zardosht> Also, if for whatever reason all of the patches are not good enough Zardosht> for checking in, please consider some subset of the patches. This is Zardosht> by no means "all or none". That is why I broke it up into separate Zardosht> diff files.
No problem; All changes are in. I will push as soon as I have got an ok from Serg / Spetrunia.
Regards, Monty
Hi, Zardosht! On May 18, Zardosht Kasheff wrote:
Hello Monty,
Thank you for your feedback on the four diffs and for approving them. This makes getting TokuDB to work with MariaDB much simpler.
I could not tell from the original email, but is the change Sergei made in http://lists.askmonty.org/pipermail/commits/2010-October/000626.html going to be committed as well?
As far as I can see, it's only in 5.3 tree (already there). Regards, Sergei
Hi!
"Zardosht" == Zardosht Kasheff <zardosht@gmail.com> writes:
Zardosht> Hello Monty, Zardosht> Thank you for your feedback on the four diffs and for approving them. Zardosht> This makes getting TokuDB to work with MariaDB much simpler. Zardosht> I could not tell from the original email, but is the change Sergei Zardosht> made in http://lists.askmonty.org/pipermail/commits/2010-October/000626.html Zardosht> going to be committed as well? Yes, it's already in 5.3 Regards, Monty
participants (3)
-
Michael Widenius
-
Sergei Golubchik
-
Zardosht Kasheff