Hi! I can't offer any testcases but I think this patch has several issues. See below. On Tue, Oct 13, 2009 at 09:38:52AM +0000, Evgeny Potemkin wrote:
#At file:///work/bzrroot/45174-bug-azalea/ based on revid:alik@sun.com-20090702085822-8svd0aslr7qnddbb
2814 Evgeny Potemkin 2009-10-13 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join.
When a subquery is a subject to a semi-join optimization its tables are merged to the upper query and later they treated as usual tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However it can't be applied as is to any semi-join table. The semi-join materialization strategy differs from other semi-join strategies that the data from materialized semi-join tables isn't used directly but saved to a temporary table first. The materialization isn't isolated is a separate step, it is done inline within the nested loop execution. When it comes to fetch rows from the first table in the block of materialized semi-join tables they are isolated and the sub_select function is called to materialize result and save it in the semi-join result table. Materialization is done once and later data from the semi-join result table is used. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa.
Example: suppose we have a join order:
ot1 ot2 SJ-Mat(it1 it2 it3) ot3
and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col.
For a field that is not in a materialized semi-join we must pick a field that's not embedded in a materialized semi-join.
Example: suppose we have a join order:
SJ-Mat(it1 it2) ot1 ot2
and equality ot2.col = ot1.col = it2.col If we're looking for best substitute for 'ot2.col', we should pick ot1.col and not it2.col, because when we run a join between ot1 and ot2 execution of SJ-Mat(...) has already finished and we can't rely on the value of it*.*.
Now the Item_equal::get_first function accepts as a parameter a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned.
The sj_strategy field is added to the st_join_table structure. It's a copy of the POSITION::sj_strategy field and is used to easy checks. @ mysql-test/r/subselect_sj.result A test case added for the bug#45174. @ mysql-test/r/subselect_sj_jcl6.result A test case added for the bug#45174. @ mysql-test/t/subselect_sj.test A test case added for the bug#45174. @ sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join.
Now the Item_equal::get_first function accepts a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned. @ sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join.
Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. The is_sj_materialization_strategy method is added to the JOIN_TAB class to check whether JOIN_TAB belongs to a materialized semi-join. @ sql/sql_select.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join.
The sj_strategy field is added to the st_join_table structure. It's a copy of the POSITION::sj_strategy field and is used to easy checks.
modified: mysql-test/r/subselect_sj.result mysql-test/r/subselect_sj_jcl6.result mysql-test/t/subselect_sj.test sql/item.cc sql/item_cmpfunc.cc sql/item_cmpfunc.h sql/sql_select.cc sql/sql_select.h
=== modified file 'sql/item.cc' --- a/sql/item.cc 2009-06-09 16:53:34 +0000 +++ b/sql/item.cc 2009-10-13 09:38:46 +0000 @@ -4883,7 +4883,7 @@ Item *Item_field::replace_equal_field(uc return this; return const_item; } - Item_field *subst= item_equal->get_first(); + Item_field *subst= item_equal->get_first(this); if (subst && field->table != subst->field->table && !field->eq(subst->field)) return subst; }
=== modified file 'sql/item_cmpfunc.cc' --- a/sql/item_cmpfunc.cc 2009-06-09 16:53:34 +0000 +++ b/sql/item_cmpfunc.cc 2009-10-13 09:38:46 +0000 @@ -5376,7 +5376,7 @@ longlong Item_equal::val_int()
void Item_equal::fix_length_and_dec() { - Item *item= get_first(); + Item *item= get_first(NULL); eval_item= cmp_item::get_comparator(item->result_type(), item->collation.collation); } @@ -5439,3 +5439,115 @@ void Item_equal::print(String *str, enum str->append(')'); }
+ +/* + @brief Get the first equal field of multiple equality. + @param[in] field the field to get equal field to + + @details Get the first field of multiple equality that is equal to the + given field. In order to make semi-join materialization strategy work + correctly we can't propagate equal fields from upper select to a + materialized semi-join. + Thus the fields is returned according to following rules: + + 1) If the given field belongs to a semi-join then the first field in + multiple equality which belong to the same semi-join is returned. + Otherwise NULL is returned. + 2) If the given field doesn't belong to a semi-join then + the first field in the multiple equality that doesn't belong to any + semi-join is returned. + If all fields in the equality are belong to semi-join(s) then NULL + is returned. + 3) If no field is given then the first field in the multiple equality + is returned without regarding whether it belongs to a semi-join or not. + + @retval Found first field in the multiple equality. + @retval 0 if no field found. +*/ + +Item_field* Item_equal::get_first(Item_field *field) +{ + List_iterator<Item_field> it(fields); + Item_field *item; + JOIN_TAB *field_tab; + + if (!field) + return fields.head(); + /* + Of all equal fields, return the first one we can use. Normally, this is the + field which belongs to the table that is the first in the join order. + + There is one exception to this: When semi-join materialization strategy is + used, and the given field belongs to a table within the semi-join nest, we + must pick the first field in the semi-join nest. + + Example: suppose we have a join order: + + ot1 ot2 SJ-Mat(it1 it2 it3) ot3 + + and equality ot2.col = it1.col = it2.col + If we're looking for best substitute for 'it2.col', we should pick it1.col + and not ot2.col. + */ + + field_tab= field->field->table->reginfo.join_tab; + if (field_tab->sj_strategy == SJ_OPT_MATERIALIZE || + field_tab->sj_strategy == SJ_OPT_MATERIALIZE_SCAN) + { + /* + It's a field from an materialized semi-join. We can substitute it only + for a field from the same semi-join. + */ + JOIN_TAB *first; + JOIN *join= field_tab->join; + uint tab_idx= field_tab - field_tab->join->join_tab; + /* Find first table of this semi-join. */ + for (int i=tab_idx; i >= join->const_tables; i--) + { + if (join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE || + join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE_SCAN) + first= join->join_tab + i; + else + // Found first tab that doesn't belong to current SJ. + break; + } + /* Find an item to substitute for. */ + while ((item= it++)) + { + if (item->field->table->reginfo.join_tab >= first) + { + /* + If we found given field then return NULL to avoid unnecessary + substitution. + */ + return (item != field) ? item : NULL; + } + } + } + else + { + /* + The field is not in SJ-Materialization nest. We must return the first + field that's not embedded in a SJ-Materialization nest. + Example: suppose we have a join order: + + SJ-Mat(it1 it2) ot1 ot2 + + and equality ot2.col = ot1.col = it2.col + If we're looking for best substitute for 'ot2.col', we should pick ot1.col + and not it2.col, because when we run a join between ot1 and ot2 + execution of SJ-Mat(...) has already finished and we can't rely on the + value of it*.*. This can cause cross-join to be computed between materialization result and table it1. Actually, substitution with table it2 should be fine, as SJ-Materialization-Scan (and this example cannot be lookup) will 'unpack'
column value to it2.col when doing the scan of the materialized temptable.
I've wrote up my understanding of the problem here (with pics, so on the wiki): http://askmonty.org/wiki/EqualityPropagationAndEqualityPropagationAndSemiJoi...
+ */ + while ((item= it++)) + { + field_tab= item->field->table->reginfo.join_tab; + if (!(field_tab->sj_strategy == SJ_OPT_MATERIALIZE || + field_tab->sj_strategy == SJ_OPT_MATERIALIZE_SCAN)) This is a wrong way to check if a field is inside SJ-Materialization nest. The condition is true only for the first table in SJ-Materialization nest, while we need to catch *any* SJ-Mat-inner table.
the correct way to check this is as follows: field_tab->pos_in_table_list->embedding && field_tab->pos_in_table_list->embedding->sj_mat && field_tab->pos_in_table_list->embedding->sj_mat->is_used
+ return item; + } + } + // Shouldn't get here. + DBUG_ASSERT(0); + return NULL; +}
=== modified file 'sql/item_cmpfunc.h' --- a/sql/item_cmpfunc.h 2009-01-26 16:03:39 +0000 +++ b/sql/item_cmpfunc.h 2009-10-13 09:38:46 +0000 @@ -1592,7 +1592,7 @@ public: void add(Item_field *f); uint members(); bool contains(Field *field); - Item_field* get_first() { return fields.head(); } + Item_field* get_first(Item_field *field); void merge(Item_equal *item); void update_const(); enum Functype functype() const { return MULT_EQUAL_FUNC; }
=== modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2009-06-30 08:03:05 +0000 +++ b/sql/sql_select.cc 2009-10-13 09:38:46 +0000 @@ -7911,6 +7911,7 @@ static void fix_semijoin_strategies_for_ if (tablenr != first) pos->sj_strategy= SJ_OPT_NONE; remaining_tables |= s->table->map; + s->sj_strategy= pos->sj_strategy; } }
@@ -11706,7 +11707,7 @@ Item *eliminate_item_equal(COND *cond, C head= item_const; else { - head= item_equal->get_first(); + head= item_equal->get_first(NULL); it++; } Item_field *item_field;
=== modified file 'sql/sql_select.h' --- a/sql/sql_select.h 2009-05-07 20:48:24 +0000 +++ b/sql/sql_select.h 2009-10-13 09:38:46 +0000 @@ -274,6 +274,13 @@ typedef struct st_join_table /* NestedOuterJoins: Bitmap of nested joins this table is part of */ nested_join_map embedding_map;
+ /* + Semi-join strategy to be used for this join table. This is a copy of + POSITION::sj_strategy field. This field is set up by the + fix_semijion_strategies_for_picked_join_order. + */ + uint sj_strategy; + void cleanup(); inline bool is_using_loose_index_scan() {
BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog