Re: [Maria-developers] Fwd: [Commits] Rev 3335: Fix bug lp:825075 in file:///home/tsk/mprog/src/5.3/
Hi Timour, Feedback follows skype discussion: First, here's a general comment Consider a query: SELECT b, min(a) FROM t1 GROUP BY b; Suppose there is an index (b,a). Loose scan would execute it as follows: S00 h->index_first(); S01 produce an output record of (b, a) // first value of 'a' is the minimum S02 S03 h->read_next_different_b(); // Jump to the next GROUP-BY group. S04 ... S05 If we modify the query by adding a WHERE: SELECT b, min(a) FROM t1 WHERE arbitrary_cond(a,b) GROUP BY b; then LooseScan (in its current form) will be unable to execute it. The problem is here: S00 h->index_first(); S01 produce an output record of (b, a) // first value of 'a' is the minimum ^^^ what if this record doesn't match the WHERE? We'll have to scan further within the record having b=first_record.b until we find a match. LooseScan executor doesn't support such "walk forward" action currently. If the WHERE clause has been converted into a range, such that there are no records that would be contained in the range but not match the WHERE clause (***) then LooseScan is able to handle it. Example: SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b; Execution would proceed as follows: S00 h->index_first(); S00 h->index_read(HA_READ_AFTER_KEY, {current_row.b, 'a>10'}) S01 produce an output record of (b, a) // first value of 'a>10' is the minimum Step S01 will produce correct results as long as the following holds: "The first record we've got in a range will satisfy the WHERE condition" This is not always the case. Here's a counterexample: SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b; SELECT b, min(a) FROM t1 WHERE a>10 and ((a MOD 2) = 0) GROUP BY b; -- (C1) Consider the second query and this dataset: b , a 1, 5 1, 7 1, 11 1, 13 1, 14 range optimizer would give a range of "a > 10", and the first matching record is (1,11). It doesn't satisfy the "(a MOD 2)=0" condition, neither does the next record of (1,13), and only the record (1,14) will satisfy the MOD part of the condition. The problem: current code doesn't seem to have the logic to detect such cases. The submitted patch disables LooseScan for apparently invalid cases where the WHERE condition covers a column while range condition does not, which means that condition (***) is certainly unmet. However, cases like (C1) are still not handled. Further comments are inline. On Wed, Dec 07, 2011 at 02:18:02PM +0200, Timour Katchaounov wrote:
Could you please review the below patch.
Timour
----------------------------------------------------------- revno: 3335 revision-id: timour@askmonty.org-20111207115332-3oa536gbt20r22po parent: igor@askmonty.org-20111206214218-lev0vyidyt8h3431 fixes bug(s): https://launchpad.net/bugs/825075 committer: timour@askmonty.org branch nick: 5.3 timestamp: Wed 2011-12-07 13:53:32 +0200 message: Fix bug lp:825075
Analys: The cause for the wrong result was that the optimizer incorrectly chose min/max loose scan when it is not applicable. The applicability test missed the case when a condition on the MIN/MAX argument was OR-ed with a condition on some other field. In this case, the MIN/MAX condition cannot be used for loose scan.
Solution: The function check_group_min_max_predicates() duplicated functionality performed by the range optimizer, but this duplication was incomplete. The solution is to reuse instead the analys and the ranges produced by the range optimizer. If a suitable range could not be extracted for the MIN/MAX argument, then loose scan is not applicable.
=== modified file 'mysql-test/r/group_min_max.result' --- a/mysql-test/r/group_min_max.result 2011-12-04 21:31:42 +0000 +++ b/mysql-test/r/group_min_max.result 2011-12-07 11:53:32 +0000 @@ -2103,7 +2103,7 @@ id select_type table type possible_keys explain select a1,a2,b,min(c),max(c) from t2 where (c > 'a000') and (c <= 'd999') and (c like '_8__') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 index NULL idx_t2_1 163 NULL 164 Using where; Using index +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 33 Using where; Using index for group-by explain select a1, a2, b, c, min(d), max(d) from t1 group by a1,a2,b,c; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 128 Using temporary; Using filesort
Why there is no testcase from the bug report? I think it's better to have a testcase that - will explicitly mention BUG# - will show the wrong query result. A changed EXPLAIN in some irrelevant test is a too poor coverage.
=== modified file 'sql/opt_range.cc' --- a/sql/opt_range.cc 2011-11-18 17:35:51 +0000 +++ b/sql/opt_range.cc 2011-12-07 11:53:32 +0000 @@ -11984,164 +11996,41 @@ get_best_group_min_max(PARAM *param, SEL }
-/* - Check that the MIN/MAX attribute participates only in range predicates - with constants. - - SYNOPSIS - check_group_min_max_predicates() - cond tree (or subtree) describing all or part of the WHERE - clause being analyzed - min_max_arg_item the field referenced by the MIN/MAX function(s) - min_max_arg_part the keypart of the MIN/MAX argument if any - - DESCRIPTION - The function walks recursively over the cond tree representing a WHERE - clause, and checks condition (SA3) - if a field is referenced by a MIN/MAX - aggregate function, it is referenced only by one of the following - predicates: {=, !=, <, <=, >, >=, between, is null, is not null}. +/** + Find the range tree for the field that is an argument of MIN/MAX.
- RETURN - TRUE if cond passes the test - FALSE o/w + @param cond The WHERE clause of the query + @param min_max_arg_item The field referenced by the MIN/MAX function(s) + @param index_tree Range tree for the chosen index + + @note + The function iterates over the range trees for each keypart of the + chosen index until it finds the range for the MIN/MAX keypart. + + @retval + If there is a range for the MIN/MAX argument + @retval + NULL if there is no range */
-static bool -check_group_min_max_predicates(Item *cond, Item_field *min_max_arg_item, - Field::imagetype image_type) +static SEL_ARG * +get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree) { - DBUG_ENTER("check_group_min_max_predicates"); - DBUG_ASSERT(cond && min_max_arg_item); - + if (!cond || !min_max_arg_item) + return NULL; cond= cond->real_item(); So we've got:
get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree) { if (!cond || !min_max_arg_item) return NULL; cond= cond->real_item(); 'cond' is passed as parameter, checked, modified, and .. then never used in the function. Please fix this.
- Item::Type cond_type= cond->real_type(); - if (cond_type == Item::COND_ITEM) /* 'AND' or 'OR' */ - { - DBUG_PRINT("info", ("Analyzing: %s", ((Item_func*) cond)->func_name())); - List_iterator_fast<Item> li(*((Item_cond*) cond)->argument_list()); - Item *and_or_arg; - while ((and_or_arg= li++)) - { - if (!check_group_min_max_predicates(and_or_arg, min_max_arg_item, - image_type)) - DBUG_RETURN(FALSE); - } - DBUG_RETURN(TRUE); - } - - /* - Disallow loose index scan if the MIN/MAX argument field is referenced by - a subquery in the WHERE clause. - */ - - if (cond_type == Item::SUBSELECT_ITEM) - { - Item_subselect *subs_cond= (Item_subselect*) cond; - if (subs_cond->is_correlated) - { - DBUG_ASSERT(subs_cond->upper_refs.elements > 0); - List_iterator_fast<Item_subselect::Ref_to_outside> - li(subs_cond->upper_refs); - Item_subselect::Ref_to_outside *dep; - while ((dep= li++)) - { - if (dep->item->eq(min_max_arg_item, FALSE)) - DBUG_RETURN(FALSE); - } - } - DBUG_RETURN(TRUE); - } - /* - Condition of the form 'field' is equivalent to 'field <> 0' and thus - satisfies the SA3 condition. + Extract the SEL_ARG subtree that contains only ranges for the MIN/MAX + attribute. */ - if (cond_type == Item::FIELD_ITEM) + SEL_ARG *min_max_range= index_tree; + while (min_max_range) /* Find the tree for the MIN/MAX key part. */ { - DBUG_PRINT("info", ("Analyzing: %s", cond->full_name())); - DBUG_RETURN(TRUE); - } - - /* We presume that at this point there are no other Items than functions. */ - DBUG_ASSERT(cond_type == Item::FUNC_ITEM); - - /* Test if cond references only group-by or non-group fields. */ - Item_func *pred= (Item_func*) cond; - Item **arguments= pred->arguments(); - Item *cur_arg; - DBUG_PRINT("info", ("Analyzing: %s", pred->func_name())); - for (uint arg_idx= 0; arg_idx < pred->argument_count (); arg_idx++) - { - cur_arg= arguments[arg_idx]->real_item(); - DBUG_PRINT("info", ("cur_arg: %s", cur_arg->full_name())); - if (cur_arg->type() == Item::FIELD_ITEM) - { - if (min_max_arg_item->eq(cur_arg, 1)) - { - /* - If pred references the MIN/MAX argument, check whether pred is a range - condition that compares the MIN/MAX argument with a constant. - */ - Item_func::Functype pred_type= pred->functype(); - if (pred_type != Item_func::EQUAL_FUNC && - pred_type != Item_func::LT_FUNC && - pred_type != Item_func::LE_FUNC && - pred_type != Item_func::GT_FUNC && - pred_type != Item_func::GE_FUNC && - pred_type != Item_func::BETWEEN && - pred_type != Item_func::ISNULL_FUNC && - pred_type != Item_func::ISNOTNULL_FUNC && - pred_type != Item_func::EQ_FUNC && - pred_type != Item_func::NE_FUNC) - DBUG_RETURN(FALSE); - - /* Check that pred compares min_max_arg_item with a constant. */ - Item *args[3]; - bzero(args, 3 * sizeof(Item*)); - bool inv; - /* Test if this is a comparison of a field and a constant. */ - if (!simple_pred(pred, args, &inv)) - DBUG_RETURN(FALSE); - - /* Check for compatible string comparisons - similar to get_mm_leaf. */ - if (args[0] && args[1] && !args[2] && // this is a binary function - min_max_arg_item->result_type() == STRING_RESULT && - /* - Don't use an index when comparing strings of different collations. - */ - ((args[1]->result_type() == STRING_RESULT && - image_type == Field::itRAW && - ((Field_str*) min_max_arg_item->field)->charset() != - pred->compare_collation()) - || - /* - We can't always use indexes when comparing a string index to a - number. - */ - (args[1]->result_type() != STRING_RESULT && - min_max_arg_item->field->cmp_type() != args[1]->result_type()))) - DBUG_RETURN(FALSE); - } - } - else if (cur_arg->type() == Item::FUNC_ITEM) - { - if (!check_group_min_max_predicates(cur_arg, min_max_arg_item, - image_type)) - DBUG_RETURN(FALSE); - } - else if (cur_arg->const_item()) - { - /* - For predicates of the form "const OP expr" we also have to check 'expr' - to make a decision. - */ - continue; - } - else - DBUG_RETURN(FALSE); + if (min_max_range->field->eq(min_max_arg_item->field)) + break; + min_max_range= min_max_range->next_key_part; } - - DBUG_RETURN(TRUE); + return min_max_range; }
BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
Hi Timour, So, we're trying to fix this by making group-by-loose-scan access method be able to scan forwards (when evaluating MIN) and/or backwards (if evaluating MAX) until it finds a record that satisfies the WHERE condition. We still need to determine whether we need to do the scan or index jumps are sufficient. == Task definition == We're targeting cases where - the WHERE condition has references to MIN/MAX column - the range optimizer has constructed a SEL_ARG graph that refers to the MIN/MAX column. - the index is defined as (ignoring the bound columns): INDEX( group_by_columns, min_max_column) and the range optimizer has constructed a SEL_TREE in form: range_tree(group_by_columns) ---> range_tree1(min_max_column) | | | \---------> range_tree2(min_max_column) | | ... .... | \-------------------> range_treeN(min_max_column) (the edges in the picture a SEL_ARG::next_key_part edges) When the query executed, loose scan walks the index forward: - it gets to some value group of {group_by_columns} - Within that group, we jump to the first (or to the last) index record that matches SEL_ARG(min_max). The question: when can we guarantee that the first index record will match the WHERE condition (PROB1) ? == Proposed resolution == (PROB1) is true, when the WHERE condition is equivalent to "cond1 AND cond2" where cond1 - does not use min_max_column at all. cond2 - is an AND/OR tree with leaves in form "min_max_column $CMP$ const". I think the above is a sound solution. Please let me know if it is not. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
On 02/17/2012 08:00 AM, Sergei Petrunia wrote:
Hi Timour,
So, we're trying to fix this by making group-by-loose-scan access method be able to scan forwards (when evaluating MIN) and/or backwards (if evaluating MAX) until it finds a record that satisfies the WHERE condition.
We still need to determine whether we need to do the scan or index jumps are sufficient.
== Task definition ==
We're targeting cases where
- the WHERE condition has references to MIN/MAX column - the range optimizer has constructed a SEL_ARG graph that refers to the MIN/MAX column.
- the index is defined as (ignoring the bound columns):
INDEX( group_by_columns, min_max_column)
and the range optimizer has constructed a SEL_TREE in form:
range_tree(group_by_columns) ---> range_tree1(min_max_column) | | | \---------> range_tree2(min_max_column) | | ... .... | \-------------------> range_treeN(min_max_column)
(the edges in the picture a SEL_ARG::next_key_part edges)
When the query executed, loose scan walks the index forward: - it gets to some value group of {group_by_columns} - Within that group, we jump to the first (or to the last) index record that matches SEL_ARG(min_max).
The question:
when can we guarantee that the first index record will match the WHERE condition (PROB1)
?
== Proposed resolution ==
(PROB1) is true, when the WHERE condition is equivalent to
"cond1 AND cond2"
where cond1 - does not use min_max_column at all. cond2 - is an AND/OR tree with leaves in form "min_max_column $CMP$ const".
BETWEEN, IN, IS [NOT] NULL predicates are also ok here. Hidden type conversions should be taken into account. Regards, Igor.
I think the above is a sound solution. Please let me know if it is not. BR Sergei
participants (2)
-
Igor Babaev
-
Sergei Petrunia