----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add DS-MRR support for clustered primary keys CREATION DATE..: Sat, 12 Jun 2010, 08:23 SUPERVISOR.....: Igor IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 121 (http://askmonty.org/worklog/?tid=121) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 13 Jun 2010, 11:56)=-=- High-Level Specification modified. --- /tmp/wklog.121.old.9396 2010-06-13 11:56:34.000000000 +0000 +++ /tmp/wklog.121.new.9396 2010-06-13 11:56:34.000000000 +0000 @@ -1,17 +1,55 @@ -Basic idea: DS-MRR scan should be done as follows: +1. Choices to be made +--------------------- -1. Sort incoming keys -2. Use the sorted keys to do a disk-ordered retrieval +1.1 Handling of complex ranges +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The "sort incoming keys" part is easy when we have only equality ranges. +If we allow ranges of arbitrary form (including ranges with one endpoint +being infinity and/or ranges overlapping with one another), sorting becomes +non-trivial. Do we need to support this case or support only equality ranges? -Unresolved questions: +Decision: the new code should handle only the case with equality ranges. +For non-equality ranges, the execution will proceed as before. -* The "sort incoming keys" part is trivial when we have only equality ranges. - If we allow ranges of arbitrary form ( including ranges with one endpoint - being infinity or ranges overlapping with one another), sorting becomes - non-trivial. Do we need to support this case or support only equality ranges? +1.2 Handling index prefix scans +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +What do we do if asked to do a scan on a prefix of clustered PK? -* Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered - PKs? (current decision: No) +Decision: handle this if the ranges are equality ranges. The difference from +scan on full primary key is that +- we will have to use read_range_XXX() or index_read()/index_next_same() + functions, while for full primary key value we could have used rnd_pos(). +- One equality range can produce multiple matching records. -* Do we support scanning on a prefix of clustered PK? (Yes but scan on prefix - of clustered PK will not be in disk order. We need to run it with regular mode) +1.3 Use of knowledge that primary_key==rowid +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered +PKs? +Decision: don't make this assumption. + + +2. Code-level changes overview +------------------------------ + +DsMrr_impl::choose_mrr_impl(): +Enable MRR when + - ihis is a clustered primary key + - incoming ranges are single-point (HA_MRR_SINGLE_POINT is set) + - will need to make the SQL layer to set this flag + - incoming ranges are not already sorted (HA_MRR_SORTED is not set) + +(TODO do we need new cost formula?) + +DsMrr_impl::dsmrr_init() + - different elem_size (not rowid length but key tuple length) + - don't create the secondary handler object, we won't need it. + +DsMrr_impl::dsmrr_fill_buffer(): + - need a variant of this function that would not access the index but just + fill and sort the array. + +DsMrr_impl::dsmrr_next(): + - should abstract-out: + - buffer element size + - rnd_pos/index_read call. + - Also for CPK prefix scans there can be multi -=-=(Psergey - Sun, 13 Jun 2010, 11:55)=-=- High Level Description modified. --- /tmp/wklog.121.old.9380 2010-06-13 11:55:42.000000000 +0000 +++ /tmp/wklog.121.new.9380 2010-06-13 11:55:42.000000000 +0000 @@ -1,18 +1,18 @@ -Currently, DS-MRR doesn't support operation over clustered primary keys. The -reason for this was that - - Clustered primary keys are stored in disk order and so, if the sequence of - ranges is ordered, the reads will already go in disk order (and so DS-MRR's - step of re-ordering reads is not necessary). +Currently, DS-MRR doesn't allow to do MRR scans over clustered primary keys. The +reason for this is that + - Clustered primary keys are stored in disk order, so, if the sequence of + scanned ranges is ordered, the reads will automatically happen in disk + order, and DS-MRR's step of re-ordering reads is redundant. - Within DS-MRR implementation, the "get rowids from keys" step is not necessary when using clustered primary key, because in InnoDB/XtraDB clustered primary key values are the rowids. -However, with BKA making the MRR calls, there are cases where DS-MRR over +However, when MRR calls are made by BKA, there are cases where DS-MRR over clustered primary key is beneficial: * BKA may provide lookup keys that have duplicates and/or are in arbitrary order. In that case, DS-MRR implementation may sort the key values and - order them, so that it hits the disk in key order. + order them, so that it hits the disk in key(=disk) order. * When running multi-table join with high @@join_cache_level value (and so, linked join buffers), lack of MRR implementation causes the chain of linked @@ -20,3 +20,9 @@ * TODO anything else? +This WL entry is about addressing the above by adding support of DS-MRR over +clustered primary key that would work according to this strategy: +1. Sort incoming keys +2. Use the sorted keys to do a disk-ordered retrieval. + + -=-=(Psergey - Sun, 13 Jun 2010, 09:42)=-=- High-Level Specification modified. --- /tmp/wklog.121.old.5009 2010-06-13 09:42:38.000000000 +0000 +++ /tmp/wklog.121.new.5009 2010-06-13 09:42:38.000000000 +0000 @@ -6,11 +6,12 @@ Unresolved questions: * The "sort incoming keys" part is trivial when we have only equality ranges. - If we allow ranges of arbitrary form ( ncluding ranges with one endpoint + If we allow ranges of arbitrary form ( including ranges with one endpoint being infinity or ranges overlapping with one another), sorting becomes - non-trival. Do we need to support this case or support only equality ranges? + non-trivial. Do we need to support this case or support only equality ranges? * Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered - PKs? + PKs? (current decision: No) -* Do we support scanning on a prefix of clustered PK? (seems to be yes?) +* Do we support scanning on a prefix of clustered PK? (Yes but scan on prefix + of clustered PK will not be in disk order. We need to run it with regular mode) -=-=(Psergey - Sat, 12 Jun 2010, 08:39)=-=- High-Level Specification modified. --- /tmp/wklog.121.old.538 2010-06-12 08:39:46.000000000 +0000 +++ /tmp/wklog.121.new.538 2010-06-12 08:39:46.000000000 +0000 @@ -1 +1,16 @@ +Basic idea: DS-MRR scan should be done as follows: +1. Sort incoming keys +2. Use the sorted keys to do a disk-ordered retrieval + +Unresolved questions: + +* The "sort incoming keys" part is trivial when we have only equality ranges. + If we allow ranges of arbitrary form ( ncluding ranges with one endpoint + being infinity or ranges overlapping with one another), sorting becomes + non-trival. Do we need to support this case or support only equality ranges? + +* Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered + PKs? + +* Do we support scanning on a prefix of clustered PK? (seems to be yes?) DESCRIPTION: Currently, DS-MRR doesn't allow to do MRR scans over clustered primary keys. The reason for this is that - Clustered primary keys are stored in disk order, so, if the sequence of scanned ranges is ordered, the reads will automatically happen in disk order, and DS-MRR's step of re-ordering reads is redundant. - Within DS-MRR implementation, the "get rowids from keys" step is not necessary when using clustered primary key, because in InnoDB/XtraDB clustered primary key values are the rowids. However, when MRR calls are made by BKA, there are cases where DS-MRR over clustered primary key is beneficial: * BKA may provide lookup keys that have duplicates and/or are in arbitrary order. In that case, DS-MRR implementation may sort the key values and order them, so that it hits the disk in key(=disk) order. * When running multi-table join with high @@join_cache_level value (and so, linked join buffers), lack of MRR implementation causes the chain of linked join buffers to break. (TODO and so what? Is that really a problem?) * TODO anything else? This WL entry is about addressing the above by adding support of DS-MRR over clustered primary key that would work according to this strategy: 1. Sort incoming keys 2. Use the sorted keys to do a disk-ordered retrieval. HIGH-LEVEL SPECIFICATION: 1. Choices to be made --------------------- 1.1 Handling of complex ranges ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "sort incoming keys" part is easy when we have only equality ranges. If we allow ranges of arbitrary form (including ranges with one endpoint being infinity and/or ranges overlapping with one another), sorting becomes non-trivial. Do we need to support this case or support only equality ranges? Decision: the new code should handle only the case with equality ranges. For non-equality ranges, the execution will proceed as before. 1.2 Handling index prefix scans ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What do we do if asked to do a scan on a prefix of clustered PK? Decision: handle this if the ranges are equality ranges. The difference from scan on full primary key is that - we will have to use read_range_XXX() or index_read()/index_next_same() functions, while for full primary key value we could have used rnd_pos(). - One equality range can produce multiple matching records. 1.3 Use of knowledge that primary_key==rowid ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Can/should we use the fact rowid=={clustered PK value} for InnoDB's clustered PKs? Decision: don't make this assumption. 2. Code-level changes overview ------------------------------ DsMrr_impl::choose_mrr_impl(): Enable MRR when - ihis is a clustered primary key - incoming ranges are single-point (HA_MRR_SINGLE_POINT is set) - will need to make the SQL layer to set this flag - incoming ranges are not already sorted (HA_MRR_SORTED is not set) (TODO do we need new cost formula?) DsMrr_impl::dsmrr_init() - different elem_size (not rowid length but key tuple length) - don't create the secondary handler object, we won't need it. DsMrr_impl::dsmrr_fill_buffer(): - need a variant of this function that would not access the index but just fill and sort the array. DsMrr_impl::dsmrr_next(): - should abstract-out: - buffer element size - rnd_pos/index_read call. - Also for CPK prefix scans there can be multi ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)