Re: [Maria-developers] [Commits] 0f35480f3d6: MDEV-6707: Wrong result (extra row) with group by, multi-part key
Hi Varun, Nice to see feedback for the first review fixed, but there's more to do. Please find more input below. On Fri, Jan 19, 2018 at 01:07:20AM +0530, Varun wrote:
revision-id: 0f35480f3d60fac91bf1f91f8140ac5f55724139 (mariadb-5.5.56-142-g0f35480f3d6) parent(s): fafdac3365f4943e73bcefd0e0d07d69997a9724 author: Varun Gupta committer: Varun Gupta timestamp: 2018-01-18 23:53:35 +0530 message:
MDEV-6707: Wrong result (extra row) with group by, multi-part key
This case involves using a composite key,few parts of which are involved in GROUP BY and few in the MIN/MAX functions in the select list. Ranges are created in accordance with the where condition, so during the execution of such queries, we try to find a prefix using the fields involved in GROUP BY and we then check if this newly returned prefix lies within the range we had calculated earlier.
For queries which use composite key, few parts of which are involved in GROUP BY and few in the MIN/MAX functtions in the select list, we try to find the prefix of the ranges by using the fields involved in the group by clause. We get extra rows in the output when we have same partial ranges created for the fields in the GROUP BY clause
This issue can be fixed if we compare such partial ranges and don't lookup if we see the same prefix again.
--- mysql-test/r/range.result | 14 +++++++++++++ mysql-test/r/range_mrr_icp.result | 14 +++++++++++++ mysql-test/t/range.test | 13 ++++++++++++ sql/opt_range.cc | 44 ++++++++++++++++++++++++++++++++++++--- sql/opt_range.h | 5 +++-- 5 files changed, 85 insertions(+), 5 deletions(-)
diff --git a/mysql-test/r/range.result b/mysql-test/r/range.result index 630a692cef6..72b31de4df4 100644 --- a/mysql-test/r/range.result +++ b/mysql-test/r/range.result @@ -2144,3 +2144,17 @@ value1 1000685 12345 value1 1003560 12345 value1 1004807 12345 drop table t1; +# +# MDEV-6707: Wrong result (extra row) with group by, multi-part key +# +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB; +INSERT INTO t1 VALUES +(7,'v'),(0,'s'),(9,'l'),(4,'c'); +explain +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 range f2 f2 9 NULL 2 Using where; Using index for group-by +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +MAX(f1) f2 +4 c +DROP TABLE t1; diff --git a/mysql-test/r/range_mrr_icp.result b/mysql-test/r/range_mrr_icp.result index 3f5de5b0189..1050bfcd887 100644 --- a/mysql-test/r/range_mrr_icp.result +++ b/mysql-test/r/range_mrr_icp.result @@ -2146,4 +2146,18 @@ value1 1000685 12345 value1 1003560 12345 value1 1004807 12345 drop table t1; +# +# MDEV-6707: Wrong result (extra row) with group by, multi-part key +# +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB; +INSERT INTO t1 VALUES +(7,'v'),(0,'s'),(9,'l'),(4,'c'); +explain +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 range f2 f2 9 NULL 2 Using where; Using index for group-by +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +MAX(f1) f2 +4 c +DROP TABLE t1; set optimizer_switch=@mrr_icp_extra_tmp; diff --git a/mysql-test/t/range.test b/mysql-test/t/range.test index 393ca68e945..62e907b7b4a 100644 --- a/mysql-test/t/range.test +++ b/mysql-test/t/range.test @@ -1718,3 +1718,16 @@ where (key1varchar='value1' AND (key2int <=1 OR key2int > 1)); --echo # The following must show col1=12345 for all rows: select * from t1; drop table t1; + +--echo # +--echo # MDEV-6707: Wrong result (extra row) with group by, multi-part key +--echo # + +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB; +INSERT INTO t1 VALUES +(7,'v'),(0,'s'),(9,'l'),(4,'c'); + +explain +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2; +DROP TABLE t1; diff --git a/sql/opt_range.cc b/sql/opt_range.cc index 25a9e729a8b..be959764b16 100644 --- a/sql/opt_range.cc +++ b/sql/opt_range.cc @@ -11249,6 +11249,7 @@ int QUICK_RANGE_SELECT::get_next() @param prefix_length length of cur_prefix @param group_key_parts The number of key parts in the group prefix @param cur_prefix prefix of a key to be searched for + @param save_last_range INOUT Saving the last range we encountered
Each subsequent call to the method retrieves the first record that has a prefix with length prefix_length and which is different from cur_prefix, @@ -11271,7 +11272,8 @@ int QUICK_RANGE_SELECT::get_next()
int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length, uint group_key_parts, - uchar *cur_prefix) + uchar *cur_prefix, + QUICK_RANGE **save_last_range) { DBUG_ENTER("QUICK_RANGE_SELECT::get_next_prefix"); const key_part_map keypart_map= make_prev_keypart_map(group_key_parts); @@ -11301,8 +11303,43 @@ int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length, last_range= 0; DBUG_RETURN(HA_ERR_END_OF_FILE); } + last_range= *(cur_range++);
+ /* + While calculating these prefixes we might encounter a case where there + woud be same partial ranges for multiple ranges. Typo.
+ An example would be + select max(f1), f2 from t1 where f2 ='c' AND f1 <> 9 group by f2; + so the ranges would be + (NULL,c <= f1,f2 <= c,9) + (c,9 <= f1,f2 <= c,+infinity)
This looks as if index was defined on (f1, f2) ? Then the comment doesn't make sense because loose index scan does not support queries in form: select max(key_part1) ... group by key_part2 It only can handle them when the index is on (f2,f1), but this is counterintuitive for the reader that is not familiar with the testcase for this bug. Please rephrase the comment. Please use key_part1 and key_part2 as column names. Tuple comparisons should look like this (note the braces): (a,b) < (c,d) .
+ + In this case for calculating prefixes with the group by field we take up the + partial ranges involving field f2 those would be + c<= f2 <=c + c<= f2 <=c + + So we lookup rows with the same prefix in all such ranges and + then we check for the other part(in this case f1) in ALL the ranges. + So if a record lies in a range, then it would satisfy both the partial + ranges in this case and therefore there would be multiple outputs for + the same row. + For such cases we should calculate the prefix only when we have the next + partial range different from the previous one. + */ + + + if (*save_last_range) + { + if (!key_tuple_cmp(key_part_info, (*save_last_range)->min_key, + last_range->min_key, prefix_length)) + { + last_range=NULL; + continue; + } + } + *save_last_range= last_range; key_range start_key, end_key; last_range->make_min_endpoint(&start_key, prefix_length, keypart_map); last_range->make_max_endpoint(&end_key, prefix_length, keypart_map); @@ -13315,7 +13352,7 @@ QUICK_GROUP_MIN_MAX_SELECT(TABLE *table, JOIN *join_arg, bool have_min_arg, seen_first_key(FALSE), doing_key_read(FALSE), min_max_arg_part(min_max_arg_part_arg), key_infix(key_infix_arg), key_infix_len(key_infix_len_arg), min_functions_it(NULL), max_functions_it(NULL), - is_index_scan(is_index_scan_arg) + is_index_scan(is_index_scan_arg), save_last_range(NULL)
This is not the right place for this initialization. It needs to be in QUICK_GROUP_MIN_MAX_SELECT::reset(), as the quick select may be executed multiple times. Testcase: create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2) AS SUBQ from ten; +------+ | SUBQ | +------+ | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | +------+ 10 rows in set (0.00 sec) Now, let's add a HAVING clause (which always evaluates to TRUE but forces the subquery to be recomputed: MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2 having maxval<100+ten.a) AS SUBQ from ten; +------+ | SUBQ | +------+ | 4 | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | +------+ 10 rows in set (0.00 sec) To make sure this is loose index, try with IGNORE INDEX: MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 ignore index(f2) WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2 having maxval<100+ten.a) AS SUBQ from ten; +------+ | SUBQ | +------+ | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | | 4 | +------+ 10 rows in set (0.00 sec) Please fix this and add the above into the testcase.
{ head= table; index= use_index; @@ -13968,7 +14005,8 @@ int QUICK_GROUP_MIN_MAX_SELECT::next_prefix() uchar *cur_prefix= seen_first_key ? group_prefix : NULL; if ((result= quick_prefix_select->get_next_prefix(group_prefix_len, group_key_parts, - cur_prefix))) + cur_prefix, + &save_last_range))) DBUG_RETURN(result); seen_first_key= TRUE; } diff --git a/sql/opt_range.h b/sql/opt_range.h index b8b46ae5ab1..206d65f6da4 100644 --- a/sql/opt_range.h +++ b/sql/opt_range.h @@ -477,7 +477,7 @@ class QUICK_RANGE_SELECT : public QUICK_SELECT_I int get_next(); void range_end(); int get_next_prefix(uint prefix_length, uint group_key_parts, - uchar *cur_prefix); + uchar *cur_prefix, QUICK_RANGE **save_last_range); bool reverse_sorted() { return 0; } bool unique_key_range(); int init_ror_merged_scan(bool reuse_handler, MEM_ROOT *alloc); @@ -916,7 +916,8 @@ class QUICK_GROUP_MIN_MAX_SELECT : public QUICK_SELECT_I Use index scan to get the next different key instead of jumping into it through index read */ - bool is_index_scan; + bool is_index_scan; + QUICK_RANGE *save_last_range;
Please add a comment describing the new member variable.
public: /* The following two members are public to allow easy access from
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
participants (1)
-
Sergey Petrunia