[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)

16 Aug 2009

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Sun, 16 Aug 2009, 01:03)=-=-
Low Level Design modified.

--- /tmp/wklog.24.old.20767     2009-08-16 01:03:11.000000000 +0300
+++ /tmp/wklog.24.new.20767     2009-08-16 01:03:11.000000000 +0300
@@ -18,6 +18,8 @@
   
   # a range tree has range access options, possibly for several keys
   range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+  (here range(keyi) may represent ranges not for initial keyi prefixes,
+   but ranges for any infixes for keyi)
   
   # merge tree represents several way to index_merge
   imerge_tree = imerge1 AND imerge2 AND ...
@@ -47,13 +49,13 @@
         R.add(range_union(A.range(i), B.range(i)));
 
       if (R has at least one range access)
-        return R;
+        return R; // DISCARD-IMERGE-2
       else
       {
         /* could not build any range accesses. construct index_merge */
-        remove non-ranges from A; // DISCARD-IMERGE-2
+        remove non-ranges from A; 
         remove non-ranges from B;
-        return new index_merge(A, B);
+        return new index_merge(A, B); // DISCARD-IMERGE-3
       }
     }
     else if (A is range tree and B is index_merge tree (or vice versa))
@@ -65,12 +67,12 @@
       (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
       (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
       ...
-      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND 
+      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN)  
       =
       (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
       (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
       ...
-      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N)
 
       Now each line represents an index_merge..
     }
@@ -82,18 +84,18 @@
       OR 
       imergeB1 AND imergeB2 AND ... AND imergeBN 
 
-      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-3
+      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-4
 
       imergeA1 
       OR 
-      imergeB1 AND imergeB2 AND ... AND imergeBN  =
+      imergeB1 =
 
-      = (combine imergeA1 with each of the imergeB{i} ) = 
+      = (combine imergeA1 with each of the range_treeB_1{i} ) = 
       
-      combine(imergeA1 OR imergeB1) AND 
-      combine(imergeA1 OR imergeB2) AND 
+      combine(imergeA1 OR range_treeB_11) AND 
+      combine(imergeA1 OR range_treeB_12) AND 
       ...                           AND 
-      combine(imergeA1 OR imergeBN)
+      combine(imergeA1 OR range_treeB_1N)
     }
   }
  
@@ -109,7 +111,7 @@
 DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
 the WHERE clause has this form (conditions t.badkey may have abritrary form):
 
- (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+ (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2)
 
 DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are 
 two indexes:
@@ -123,6 +125,8 @@
 
 The optimizer will generate the plans that only use the "col1=c1" part. The
 right side of the AND will be ignored even if it has good selectivity.
+(Here an imerge  for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor
+col3=c3 represent index ranges.)
 
 
 2. New implementation

-=-=(Guest - Mon, 20 Jul 2009, 17:13)=-=-
Dependency deleted: 30 no longer depends on 24

-=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.21663     2009-06-20 09:34:48.000000000 +0300
+++ /tmp/wklog.24.new.21663     2009-06-20 09:34:48.000000000 +0300
@@ -4,6 +4,7 @@
 2. New implementation
 2.1 New tree_and()
 2.2 New tree_or()
+3. Testing and required coverage
 </contents>
 
 1. Current implementation overview
@@ -240,3 +241,14 @@
 In order to limit the impact of this combinatorial explosion, we will 
 introduce a rule that we won't generate more than #defined 
 MAX_IMERGE_OPTS options.
+
+3. Testing and required coverage
+================================
+So far could find the following user cases:
+
+* BUG#17259: Query optimizer chooses wrong index
+* BUG#17673: Optimizer does not use Index Merge optimization in some cases
+* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
+* BUG#30151: optimizer is very reluctant to chose index_merge algorithm
+
+

-=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.19152     2009-06-18 16:55:00.000000000 +0300
+++ /tmp/wklog.24.new.19152     2009-06-18 16:55:00.000000000 +0300
@@ -141,13 +141,15 @@
 Operations on SEL_ARG trees will be modified to produce/process the trees of
 this kind:
 
+
 2.1 New tree_and()
 ------------------
 In order not to lose plans, we'll make these changes:
 
-1. Don't remove index_merge part of the tree.
+A1. Don't remove index_merge part of the tree (this will take care of 
+     DISCARD-IMERGE-1 problem)
 
-2. Push range conditions down into index_merge trees that may support them.
+A2. Push range conditions down into index_merge trees that may support them.
    if one tree has range(key1) and the other tree has imerge(key1 OR key2)
    then perform an equvalent of this operation:
 
@@ -155,8 +157,86 @@
 
      (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
 
-3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
    concatenate them together.
 
-2.2 New tree_or()
 
+2.2 New tree_or()
+-----------------
+O1. Dont remove non-range plans:
+  Current tree_or() code will refuse to produce index_merge plans for 
+  conditions like
+  
+  "t.key1part2=const OR t.key2part1=const" 
+  
+  (this is marked as DISCARD-IMERGE-3).  This was justifed as the left part of
+  the AND condition is not usable for range access, and the operation of
+  tree_and() guaranteed that there was no way it could changed to make a
+  usable range plan. With new tree_and() and rule A2, this is no longer the 
+  case. For example for this query:
+
+    (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
+  
+  it will construct a
+
+     imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
+  
+  then tree_and() will apply rule A2 to push the range down into index merge 
+  and after that we'll have:
+  
+    range(t.key1part1=const)
+    imerge(
+        t.key1part2=const AND t.key1part1=const,
+        t.key2part1=const
+    )
+  note that imerge(...) describes a usable index_merge plan and it's possible
+  that it will be the best access path.
+
+O2. "Create index_merge accesses when possible"
+   Current tree_or() will not create index_merge access when it could create
+   non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
+   in the current implementation" section).  This will be changed to work as
+   follows: we will create index_merge made for index scans that didn't have
+   their match in the other sel_tree.
+   Ilustrating it with an example:
+
+           | sel_tree_A | sel_tree_B | A or B | include in index_merge?
+     ------+------------+------------+--------+------------------------
+      key1 |    cond1   |    cond2   |  condM |  no
+      key2 |    cond3   |    cond4   |  NULL  |  no 
+      key3 |    cond5   |            |        |  yes, A-side
+      key4 |    cond6   |            |        |  yes, A-side
+      key5 |            |    cond7   |        |  yes, B-side
+      key6 |            |    cond8   |        |  yes, B-side
+
+   here we assume that 
+    - (cond1 OR cond2) did produce a combined range. Not including them in 
+                       index_merge.
+    - (cond3 OR cond4) didn't produce a usable range (e.g. they were 
+        t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
+        didn't yield any range list)
+    - All other scand didn't have their counterparts, so we'll end up with a
+      SEL_TREE of:
+
+       range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
+    .
+      
+O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is 
+that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven 
+seen any complaints that could be attributed to it.
+If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
+lift it ,and produce a cross-product:
+
+   ((key1p OR key2p) AND (key3p OR key4p))
+   OR
+   ((key5p OR key6p) AND (key7p OR key8p))
+
+  = (key1p OR key2p OR key5p OR key6p) AND  // this part is currently 
+    (key3p OR key4p OR key5p OR key6p) AND  // produced
+
+    (key1p OR key2p OR key5p OR key6p) AND  // this part will be added
+    (key3p OR key4p OR key5p OR key6p)      //. 
+
+In order to limit the impact of this combinatorial explosion, we will 
+introduce a rule that we won't generate more than #defined 
+MAX_IMERGE_OPTS options.

-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612     2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612     2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation 
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents 
+  
+  # There are sel_trees, a sel_tree is either range or merge tree  
+  sel_tree =  range_tree | imerge_tree
+  
+  # a range tree has range access options, possibly for several keys
+  range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+  
+  # merge tree represents several way to index_merge
+  imerge_tree = imerge1 AND imerge2 AND ...
+  
+  # a way to do index merge == a set to use of different indexes.
+  imergeX = range_tree1 OR range_tree2 OR ..
+    where no pair of range_treeX have ranges over the same index.
+
+
+  tree_and(A, B) 
+  {
+    if (both A and B are range trees)
+      return a range_tree with computed intersection for each range;
+    if (only one of A and B is a range tree)
+      return that tree;  // DISCARD-IMERGE-1
+    // at this point both trees are index_merge trees
+    return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+  }
+
+
+  tree_or(A, B)
+  {
+    if (A and B are range trees)
+    {
+      R = new range_tree;
+      for each index i
+        R.add(range_union(A.range(i), B.range(i)));
+
+      if (R has at least one range access)
+        return R;
+      else
+      {
+        /* could not build any range accesses. construct index_merge */
+        remove non-ranges from A; // DISCARD-IMERGE-2
+        remove non-ranges from B;
+        return new index_merge(A, B);
+      }
+    }
+    else if (A is range tree and B is index_merge tree (or vice versa))
+    {
+      Perform this transformation: 
+
+      range_treeA                // this is A
+      OR
+      (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+      (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+      ...
+      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND 
+      =
+      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+      (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+      ...
+      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+      Now each line represents an index_merge..
+    }
+    else if (both A and B are index_merge trees)
+    {
+      Perform this transformation:
+      
+      imergeA1 AND imergeA2 AND ... AND imergeAN
+      OR 
+      imergeB1 AND imergeB2 AND ... AND imergeBN 
+
+      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-3
+
+      imergeA1 
+      OR 
+      imergeB1 AND imergeB2 AND ... AND imergeBN  =
+
+      = (combine imergeA1 with each of the imergeB{i} ) = 
+      
+      combine(imergeA1 OR imergeB1) AND 
+      combine(imergeA1 OR imergeB2) AND 
+      ...                           AND 
+      combine(imergeA1 OR imergeBN)
+    }
+  }
+ 
+1.1. Problems in the current implementation 
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are 
+two indexes:
+
+  INDEX i1(col1, col2), 
+  INDEX i2(col1, col3)
+
+and this WHERE clause:
+  
+  col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is, 
+
+  sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+   if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+   then perform an equvalent of this operation:
+
+     rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+     (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+   concatenate them together.
+
+2.2 New tree_or()
 

-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24

-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580     2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580     2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
 
 * How strict is the limitation on the form of the WHERE?
 
+* Which version should this be based on? 5.1? Which patches are should be in 
+  (google's/percona's/maria/etc?) 
+
 * TODO: The optimizer didn't compare costs of index_merge and range before (ok 
   it did but that was done for accesses to different tables). Will there be any
   possible gotchas here?

-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498      2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498      2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access

-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305       2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305       2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point 
+where it is really necessary. 
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to 
+take it into consideration together with range access plans. 
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND 
+          range_cond(key2_1) AND
+          other_cond       AND 
+          index_merge_OR_cond1(key3_1, key3_2, ...)
+          index_merge_OR_cond2(key4_1, key4_2, ...)
+  
+where 
+  
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR 
+                            range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+                     and doesn't allow to construct range/index_merge accesses
+                     for any keys of the table in question. 
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+  SEL_TREE(
+    range(key1_1),
+    ...
+    range(key2_1),
+    SEL_IMERGE(                         (1)
+      SEL_TREE(key3_1})
+      SEL_TREE(key3_2})
+      ...
+    )
+    ...
+  )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+  index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok 
+  it did but that was done for accesses to different tables). Will there be any
+  possible gotchas here?
 



DESCRIPTION:

Current range optimizer will discard possible index_merge/[sort]union 
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.

A bad side effect of this is that for WHERE clauses  in form 

t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')

the optimizer will 
 - discard union(key2,key3) in favor of range(key1)
 - consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.

This WL is to about lifting this limitation at least for some subset of WHERE
clauses.


HIGH-LEVEL SPECIFICATION:



(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO

</contents>

Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point 
where it is really necessary. 

This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to 
take it into consideration together with range access plans. 

Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:

 WHERE := range_cond(key1_1) AND 
          range_cond(key2_1) AND
          other_cond       AND 
          index_merge_OR_cond1(key3_1, key3_2, ...)
          index_merge_OR_cond2(key4_1, key4_2, ...)
  
where 
  
 index_merge_OR_cond{N} := (range_cond(keyN_1) OR 
                            range_cond(keyN_2) OR ...)


 range_cond(keyX) := condition that allows to construct range access of keyX
                     and doesn't allow to construct range/index_merge accesses
                     for any keys of the table in question. 


For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:

  SEL_TREE(
    range(key1_1),
    ...
    range(key2_1),
    SEL_IMERGE(                         (1)
      SEL_TREE(key3_1})
      SEL_TREE(key3_2})
      ...
    )
    ...
  )

which can be used to make a cost-based choice between range and index_merge.

Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).

TODO
----
* is it a problem if there are keys that are referred to both from
  index_merge and from range access?

* How strict is the limitation on the form of the WHERE?

* Which version should this be based on? 5.1? Which patches are should be in 
  (google's/percona's/maria/etc?) 

* TODO: The optimizer didn't compare costs of index_merge and range before (ok 
  it did but that was done for accesses to different tables). Will there be any
  possible gotchas here?


LOW-LEVEL DESIGN:



<contents>
1. Current implementation overview
1.1. Problems in the current implementation 
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
3. Testing and required coverage
</contents>

1. Current implementation overview
==================================
At the moment, range analyzer works as follows:

SEL_TREE structure represents 
  
  # There are sel_trees, a sel_tree is either range or merge tree  
  sel_tree =  range_tree | imerge_tree
  
  # a range tree has range access options, possibly for several keys
  range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
  (here range(keyi) may represent ranges not for initial keyi prefixes,
   but ranges for any infixes for keyi)
  
  # merge tree represents several way to index_merge
  imerge_tree = imerge1 AND imerge2 AND ...
  
  # a way to do index merge == a set to use of different indexes.
  imergeX = range_tree1 OR range_tree2 OR ..
    where no pair of range_treeX have ranges over the same index.


  tree_and(A, B) 
  {
    if (both A and B are range trees)
      return a range_tree with computed intersection for each range;
    if (only one of A and B is a range tree)
      return that tree;  // DISCARD-IMERGE-1
    // at this point both trees are index_merge trees
    return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
  }


  tree_or(A, B)
  {
    if (A and B are range trees)
    {
      R = new range_tree;
      for each index i
        R.add(range_union(A.range(i), B.range(i)));

      if (R has at least one range access)
        return R; // DISCARD-IMERGE-2
      else
      {
        /* could not build any range accesses. construct index_merge */
        remove non-ranges from A; 
        remove non-ranges from B;
        return new index_merge(A, B); // DISCARD-IMERGE-3
      }
    }
    else if (A is range tree and B is index_merge tree (or vice versa))
    {
      Perform this transformation: 

      range_treeA                // this is A
      OR
      (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
      (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
      ...
      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN)  
      =
      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
      (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
      ...
      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N)

      Now each line represents an index_merge..
    }
    else if (both A and B are index_merge trees)
    {
      Perform this transformation:
      
      imergeA1 AND imergeA2 AND ... AND imergeAN
      OR 
      imergeB1 AND imergeB2 AND ... AND imergeBN 

      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-4

      imergeA1 
      OR 
      imergeB1 =

      = (combine imergeA1 with each of the range_treeB_1{i} ) = 
      
      combine(imergeA1 OR range_treeB_11) AND 
      combine(imergeA1 OR range_treeB_12) AND 
      ...                           AND 
      combine(imergeA1 OR range_treeB_1N)
    }
  }
 
1.1. Problems in the current implementation 
-------------------------------------------
As marked in the code above:

DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:

 (t.key1=c1 OR t.key2=c2) AND t.badkey < c3

DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):

 (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2)

DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are 
two indexes:

  INDEX i1(col1, col2), 
  INDEX i2(col1, col3)

and this WHERE clause:
  
  col1=c1 AND (col2=c2 OR col3=c3)

The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
(Here an imerge  for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor
col3=c3 represent index ranges.)


2. New implementation
=====================

<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>

SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is, 

  sel_tree2 = range_tree AND imerge_tree

where both parts are optional (i.e. can be empty)

Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:


2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:

A1. Don't remove index_merge part of the tree (this will take care of 
     DISCARD-IMERGE-1 problem)

A2. Push range conditions down into index_merge trees that may support them.
   if one tree has range(key1) and the other tree has imerge(key1 OR key2)
   then perform an equvalent of this operation:

     rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =

     (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))

A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
   concatenate them together.


2.2 New tree_or()
-----------------
O1. Dont remove non-range plans:
  Current tree_or() code will refuse to produce index_merge plans for 
  conditions like
  
  "t.key1part2=const OR t.key2part1=const" 
  
  (this is marked as DISCARD-IMERGE-3).  This was justifed as the left part of
  the AND condition is not usable for range access, and the operation of
  tree_and() guaranteed that there was no way it could changed to make a
  usable range plan. With new tree_and() and rule A2, this is no longer the 
  case. For example for this query:

    (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
  
  it will construct a

     imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
  
  then tree_and() will apply rule A2 to push the range down into index merge 
  and after that we'll have:
  
    range(t.key1part1=const)
    imerge(
        t.key1part2=const AND t.key1part1=const,
        t.key2part1=const
    )
  note that imerge(...) describes a usable index_merge plan and it's possible
  that it will be the best access path.

O2. "Create index_merge accesses when possible"
   Current tree_or() will not create index_merge access when it could create
   non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
   in the current implementation" section).  This will be changed to work as
   follows: we will create index_merge made for index scans that didn't have
   their match in the other sel_tree.
   Ilustrating it with an example:

           | sel_tree_A | sel_tree_B | A or B | include in index_merge?
     ------+------------+------------+--------+------------------------
      key1 |    cond1   |    cond2   |  condM |  no
      key2 |    cond3   |    cond4   |  NULL  |  no 
      key3 |    cond5   |            |        |  yes, A-side
      key4 |    cond6   |            |        |  yes, A-side
      key5 |            |    cond7   |        |  yes, B-side
      key6 |            |    cond8   |        |  yes, B-side

   here we assume that 
    - (cond1 OR cond2) did produce a combined range. Not including them in 
                       index_merge.
    - (cond3 OR cond4) didn't produce a usable range (e.g. they were 
        t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
        didn't yield any range list)
    - All other scand didn't have their counterparts, so we'll end up with a
      SEL_TREE of:

       range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
    .
      
O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is 
that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven 
seen any complaints that could be attributed to it.
If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
lift it ,and produce a cross-product:

   ((key1p OR key2p) AND (key3p OR key4p))
   OR
   ((key5p OR key6p) AND (key7p OR key8p))

  = (key1p OR key2p OR key5p OR key6p) AND  // this part is currently 
    (key3p OR key4p OR key5p OR key6p) AND  // produced

    (key1p OR key2p OR key5p OR key6p) AND  // this part will be added
    (key3p OR key4p OR key5p OR key6p)      //. 

In order to limit the impact of this combinatorial explosion, we will 
introduce a rule that we won't generate more than #defined 
MAX_IMERGE_OPTS options.

3. Testing and required coverage
================================
So far could find the following user cases:

* BUG#17259: Query optimizer chooses wrong index
* BUG#17673: Optimizer does not use Index Merge optimization in some cases
* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
* BUG#30151: optimizer is very reluctant to chose index_merge algorithm



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)

    

worklog-noreply＠askmonty.org

tags

participants (1)