Hi Varun, On Wed, Apr 24, 2019 at 01:31:38PM +0530, Varun wrote:
revision-id: 88a80e92dc444ce30718f3e08d3ab66fb02bcea4 (mariadb-10.3.10-284-g88a80e92dc4) parent(s): 3032cd8e91f1e1ead8b6f941e75cd29e473e7eaa author: Varun Gupta committer: Varun Gupta timestamp: 2019-04-24 13:31:24 +0530 message:
MDEV-9959: A serious MariaDB server performance bug
Step #2: If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.
...
diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index c52005e7683..aa645ac00ca 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -7617,6 +7617,68 @@ Item *st_select_lex::build_cond_for_grouping_fields(THD *thd, Item *cond, }
+/** + Check if any any item in the group by list is also present in the select_list + @retval true: All elements common between select and group by list +*/ + +void st_select_lex::is_group_by_prefix(KEY *keyinfo)
I think the name of the function is poor - "is_something()" hints at hints at boolean return value, and I would say it doesn't imply any side effects. Something like 'mark_derived_index_stats()' would be better, I think.
+{ + uint key_parts= keyinfo->usable_key_parts; + KEY_PART_INFO *key_part_info= keyinfo->key_part; + bool found= FALSE; + + if (key_parts < group_list.elements) + return; + + uint matched_fields=0, i, j; + Item *item; + + for (i= 0; i < key_parts; key_part_info++, i++) + { + uint fld_idx= key_part_info->fieldnr - 1; + item= join->fields_list.elem(fld_idx); + for (ORDER *order= group_list.first; order; order= order->next) + { + Item *ord_item= order->item[0]->real_item(); + Item_equal *item_equal= ord_item->get_item_equal(); + + if (item_equal) + { + Item_equal_fields_iterator it(*item_equal); + Item *equal_item; + while ((equal_item= it++)) + { + if (equal_item->eq(item, 0)) + { + matched_fields++; + found= TRUE; + break; + } + } + } + else + { + if (item->eq(ord_item, 0)) + { + matched_fields++; + found= TRUE; + } + } + if (found) + break; + } + + if (matched_fields == group_list.elements) + { + for (j=matched_fields - 1; j < key_parts; j++) + keyinfo->rec_per_key[j]= 1;
This doesn't always work. An example: create table t1 (a int, b int, c int, d int); insert into t1 select a,a,a,a from one_k; create table t2 as select * from t1; # Note the "a, a AS b" in the select list. This is intentional. explain select * from t2, (select a, a as b from t1 group by a,b) TBL where t2.a=TBL.a and t2.b=TBL.b; Here, the select output is not guaranteed to be unique (as it contains only one of two GROUP BY columns). But I see the execution to reac the above line, assigning keyinfo->rec_per_key[j]= 1.
+ return; + } + found= FALSE; + } +} + int set_statement_var_if_exists(THD *thd, const char *var_name, size_t var_name_length, ulonglong value) {
A general comment: do you think we should expose this in optimizer trace? At the moment, Derived-with-keys optimization doesn't print anything there (neither in MariaDB nor in MySQL). One can see the generated KEYUSEs when they are printed. But perhaps we could add a node that would print that a derived table was added, with this cardinality, and index with these keypart cardinalities. This would make debugging easier. Any thoughts? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog