Hi Yuchen, Please find below review input for
commit ba9cf8efeb0b1e1c132d565adb79c98156783f15 Author: Yuchen Pei <yuchen.pei@mariadb.com> Date: Fri May 19 20:06:31 2023 +1000
MDEV-22534 Decorrelate IN subquery
First, general questions: Q1: It seems there is some overlap between Item_in_subselect::exists2in_processor() and Item_exists_subselect::exists2in_processor() Did you consider factoring out common code rather than copying it? (if yes, I'd like to know why it couldn't be done). Q2: Where is the coe that removes the correlated equalities from the subquery's WHERE? (I assume Item_exists_subselect code does it?)
diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc index 9e6c205ca76..8d893794da3 100644 --- a/sql/item_subselect.cc +++ b/sql/item_subselect.cc @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item **conds, return TRUE; }
+/* Checks whether item tree intersects with the free list */ +static bool intersects_free_list(Item *item, THD *thd) +{ + for (const Item *to_find= thd->free_list; to_find; to_find= to_find->next) + if (item->walk(&Item::find_item_processor, 1, (void *) to_find)) + return true; + return false; +} +
Please try moving this function into a separate file. First name that came to my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea is that we should aim for better structure. (This request is only valid if the factoring out suggested above doesn't work out).
+/* De-correlate where conditions in an IN subquery */
This needs a bigger comment describing what the rewrite does.
+bool Item_in_subselect::exists2in_processor(void *opt_arg) +{ + THD *thd= (THD *)opt_arg; + SELECT_LEX *first_select= unit->first_select(); + JOIN *join= first_select->join; + bool will_be_correlated; + Dynamic_array<EQ_FIELD_OUTER> eqs(PSI_INSTRUMENT_MEM, 5, 5); + List<Item> outer; + Query_arena *arena= NULL, backup; + int res= FALSE; + DBUG_ENTER("Item_in_subselect::exists2in_processor"); +
Please move comments out of the if statement and number the conditions. For examples, see best_access_path(), large if-statement starting with "Don't test table scan if it can't be better." or "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause" in Item_exists_subselect::exists2in_processor().
+ if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) || + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */ + !(is_top_level_item() || + (upper_not && upper_not->is_top_level_item())) ||
what is the reason behind this condition? If we're able to handle top-level NOT IN, why can't we handle subquery in any context? Can we really handle the NOT IN? (I suspect not, it will suffer from the semantics of IN subqueries with NULLs.. I can't find the name for this, it's described e.g. here: https://petrunia.net/2006/11/16/inany-subqueries-null-woes/ )
+ first_select->is_part_of_union() || /* skip if part of a union */ + first_select->group_list.elements || /* skip if with group by */ + first_select->with_sum_func || /* skip if aggregation */ + join->having || /* skip if with having */ + !join->conds || /* skip if no conds */ + /* skip if left expr is a single nonscalar subselect */ + (left_expr->type() == Item::SUBSELECT_ITEM && + !left_expr->type_handler()->is_scalar_type()))
The above seems to be some exotic SQL. Is it something like (select col1,lol2 from t1) IN (select col3,col4 from t2 where ..) and does MariaDB really support this? Please add a comment about this.
+ DBUG_RETURN(FALSE); + /* iterate over conditions, and check whether they can be moved out. */ + if (find_inner_outer_equalities(&join->conds, eqs)) + DBUG_RETURN(FALSE); + /* If we are in the execution of a prepared statement, check for + intersection with the free list to avoid segfault. Note that the + check for prepared statement execution is necessary, otherwise it + will likely always find intersection thus abort the + transformation. + + fixme: this only works when HAVE_PSI_STATEMENT_INTERFACE is + defined. It causes CI's like amd64-debian-10-debug-embedded to + fail. Are there other ways to find out we are in the execution of a + prepared statement? */
I'm looking at the code of activate_stmt_arena_if_needed(). It seems "stmt_arena->is_conventional()" is the check you're looking for.
+ if (thd->m_statement_state.m_parent_prepared_stmt) + { + for (uint i= 0; i < (uint) eqs.elements(); i++) + { + if (intersects_free_list(*eqs.at(i).eq_ref, thd)) + DBUG_RETURN(FALSE);
This check doesn't look good. As far as I understand, it is a temporary measure until related re-execution bug(s) are fixed by Igor (and/or Sanja)?
+ } + }
+ /* Determines whether the result will be correlated */ + { + List<Item> unused; + Collect_deps_prm prm= {&unused, // parameters + unit->first_select()->nest_level_base, // nest_level_base + 0, // count + unit->first_select()->nest_level, // nest_level + FALSE // collect + }; + walk(&Item::collect_outer_ref_processor, TRUE, &prm); + DBUG_ASSERT(prm.count > 0); + DBUG_ASSERT(prm.count >= (uint)eqs.elements()); + will_be_correlated= prm.count > (uint)eqs.elements(); See the example pasted in the MDEV: https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259705
Somehow, I get here will_be_correlated==FALSE but Materialization is still not considered for the subquery?
+ } + /* Back up the free list */ + arena= thd->activate_stmt_arena_if_needed(&backup); + /* Construct the items for left_expr */ + if (left_expr->type() == Item::ROW_ITEM) + for (uint i= 0; i < left_expr->cols(); i++) + outer.push_back(left_expr->element_index(i)); + else + outer.push_back(left_expr); + const uint offset= first_select->item_list.elements; + /* Move items to outer and select item list */ + for (uint i= 0; i < (uint)eqs.elements(); i++) + { + Item **eq_ref= eqs.at(i).eq_ref; + Item_ident *local_field= eqs.at(i).local_field; + Item *outer_exp= eqs.at(i).outer_exp; + first_select->item_list.push_back(local_field, thd->mem_root); + first_select->ref_pointer_array[offset + i]= (Item *)local_field;
How do we know that ref_pointer_array has enough room for the new elements? (We can discuss this with Sanja, he's an expert on the topic). I see this check in Item_exists_subselect::exists2in_processor if ((uint)eqs.elements() > (first_select->item_list.elements + first_select->select_n_reserved)) Is it how Item_exists_subselect handles this issue?
+ outer.push_back(outer_exp); + *eq_ref= new (thd->mem_root) Item_int(thd, 1); + if((*eq_ref)->fix_fields(thd, (Item **)eq_ref)) + { + res= TRUE; + goto out; + } + } + /* Update the left_expr */ + left_expr= new (thd->mem_root) Item_row(thd, outer); + if (left_expr->fix_fields(thd, &left_expr)) + { + res= TRUE; + goto out; + } + left_expr_orig= left_expr; + is_correlated= will_be_correlated; + /* Update any Item_in_optimizer wrapper if exists */ + if (optimizer) + { + optimizer->reset_cache(); + if (optimizer->fix_left(thd)) + { + res= TRUE; + goto out; + } + } + { + OPT_TRACE_TRANSFORM(thd, trace_wrapper, trace_transform, + get_select_lex()->select_number, "IN (SELECT)", + "decorrelation"); + } +out: + /* Restores the free list */ + if (arena) + thd->restore_active_arena(arena, &backup); + DBUG_RETURN(res); +}
On Mon, Dec 20, 2021 at 09:29:47PM +0300, Sergey Petrunia wrote:
Hello Igor,
I'm trying examples with this patch:
create table t1 (a int not null, key(a)); insert into t1 select seq from seq_1_to_1000;
First, I debug this example:
explain select * from t1 where a <10 or a>=10;
The execution enters key_or(), passes through the new code this patch has added and key_or() returns NULL. Good.
But if I modify the example slightly:
explain select * from t1 where a <20 or a>=10 ;
Here, key_or() still returns a SEL_ARG object that represents an infinite range:
(gdb) fini Run till exit from #0 key_or (...) 0x0000555555f1fca2 in tree_or (...) Value returned is $24 = (SEL_ARG *) 0x7fff70064d88 (gdb) p dbug_print_sel_arg($24) $26 = 0x7fff70024390 "SEL_ARG(-inf<=a<=+inf)"
Should this case be fixed as well?
On Fri, Dec 17, 2021 at 02:11:40PM -0800, IgorBabaev wrote:
revision-id: 677643a80986107491b7886441f2828384f0494b (mariadb-10.2.31-1286-g677643a) parent(s): 8bb55633699612279744c055e22eeca8d4058273 author: Igor Babaev committer: Igor Babaev timestamp: 2021-12-17 14:11:39 -0800 message:
MDEV-27262 Unexpected index intersection with full index scan for an index
If when extracting a range condition foran index from the WHERE condition
I think the comment wording could be improved also.
Range Optimizer sees that the range condition covers the whole index then such condition should be discarded because cannot it be used in any range scan. In some cases Range Optimizer really does it, but there remained some conditions for which it was not done. As a result the optimizer could produce index merge plans with the full index scan for one of the indexes participating in the index merge. This could be observed in one of the test cases from index_merge1.inc where a plan with index_merge_sort_union was produced and in the test case reported for this bug where a plan with index_merge_sort_intersect was produced. In both cases one of two index scans participating in index merge ran over the whole index. The patch slightly changes the original above mentioned test case from index_merge1.inc to be able to produce an intended plan employing index_merge_sort_union. The original query was left to show that index merge is not used for it anymore. It should be noted that for the plan with index_merge_sort_intersect could be chosen for execution only due to a defect in the InnoDB code that returns wrong estimates for the cardinality of big ranges.
This bug led to serious problems in 10.4+ where the optimization using Rowid filters is employed (see mdev-26446).
Approved by Oleksandr Byelkin <sanja@mariadb.com>
--- mysql-test/include/index_merge1.inc | 8 +++- mysql-test/r/index_merge_myisam.result | 12 +++-- mysql-test/r/range_innodb.result | 81 ++++++++++++++++++++++++++++++++++ mysql-test/t/range_innodb.test | 78 ++++++++++++++++++++++++++++++++ sql/opt_range.cc | 7 +++ 5 files changed, 181 insertions(+), 5 deletions(-)
diff --git a/mysql-test/include/index_merge1.inc b/mysql-test/include/index_merge1.inc index b168a76..440f1f7 100644 --- a/mysql-test/include/index_merge1.inc +++ b/mysql-test/include/index_merge1.inc @@ -150,15 +150,19 @@ explain select * from t0 where (((key3 <7 and key7 < 6) or key5 < 2) and (key5 < 5 or key6 < 6));
explain select * from t0 where - ((key3 <5 or key5 < 4) and (key1 < 4 or key2 < 4)) + ((key3 < 4 or key5 < 4) and (key1 < 4 or key2 < 4)) or ((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6));
explain select * from t0 force index(i1, i2, i3, i4, i5, i6 ) where - ((key3 <5 or key5 < 4) and (key1 < 4 or key2 < 4)) + ((key3 < 4 or key5 < 4) and (key1 < 4 or key2 < 4)) or ((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6));
+explain select * from t0 force index(i1, i2, i3, i4, i5, i6 ) where + ((key3 < 5 or key5 < 4) and (key1 < 4 or key2 < 4)) + or + ((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6)); # 8. Verify that "order by" after index merge uses filesort select * from t0 where key1 < 5 or key8 < 4 order by key1;
diff --git a/mysql-test/r/index_merge_myisam.result b/mysql-test/r/index_merge_myisam.result index 5a23092..a096c34 100644 --- a/mysql-test/r/index_merge_myisam.result +++ b/mysql-test/r/index_merge_myisam.result @@ -173,17 +173,23 @@ or id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t0 index_merge i1,i2,i3,i5,i6,i7 i3,i5 4,4 NULL 11 Using sort_union(i3,i5); Using where explain select * from t0 where -((key3 <5 or key5 < 4) and (key1 < 4 or key2 < 4)) +((key3 < 4 or key5 < 4) and (key1 < 4 or key2 < 4)) or ((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6)); id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t0 ALL i1,i2,i3,i5,i6 NULL NULL NULL 1024 Using where explain select * from t0 force index(i1, i2, i3, i4, i5, i6 ) where -((key3 <5 or key5 < 4) and (key1 < 4 or key2 < 4)) +((key3 < 4 or key5 < 4) and (key1 < 4 or key2 < 4)) or ((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6)); id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t0 index_merge i1,i2,i3,i5,i6 i3,i5 0,4 NULL 1024 Using sort_union(i3,i5); Using where +1 SIMPLE t0 index_merge i1,i2,i3,i5,i6 i3,i5 4,4 NULL 1024 Using sort_union(i3,i5); Using where +explain select * from t0 force index(i1, i2, i3, i4, i5, i6 ) where +((key3 < 5 or key5 < 4) and (key1 < 4 or key2 < 4)) +or +((key3 >=5 or key5 < 2) and (key5 < 5 or key6 < 6)); +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t0 ALL i1,i2,i3,i5,i6 NULL NULL NULL 1024 Using where select * from t0 where key1 < 5 or key8 < 4 order by key1; key1 key2 key3 key4 key5 key6 key7 key8 1 1 1 1 1 1 1 1023 diff --git a/mysql-test/r/range_innodb.result b/mysql-test/r/range_innodb.result index f2349f2..ccb6da3 100644 --- a/mysql-test/r/range_innodb.result +++ b/mysql-test/r/range_innodb.result @@ -108,3 +108,84 @@ DROP TABLE t0,t1; SET @@GLOBAL.debug_dbug = @saved_dbug; set @@optimizer_switch= @optimizer_switch_save; # End of 10.1 tests +# +# MDEV-27262: Index intersection with full scan over an index +# +CREATE TABLE t1 ( +id int(10) unsigned NOT NULL AUTO_INCREMENT, +p char(32) DEFAULT NULL, +es tinyint(3) unsigned NOT NULL DEFAULT 0, +er tinyint(3) unsigned NOT NULL DEFAULT 0, +x mediumint(8) unsigned NOT NULL DEFAULT 0, +PRIMARY KEY (id), +INDEX es (es), +INDEX x (x), +INDEX er (er,x), +INDEX p (p) +) ENGINE=InnoDB DEFAULT CHARSET=latin1; +insert into t1(es,er) select 0, 1 from seq_1_to_45; +insert into t1(es,er) select 0, 2 from seq_1_to_49; +insert into t1(es,er) select 0, 3 from seq_1_to_951; +insert into t1(es,er) select 0, 3 from seq_1_to_1054; +insert into t1(es,er) select 0, 6 from seq_1_to_25; +insert into t1(es,er) select 0, 11 from seq_1_to_1; +insert into t1(es,er) select 1, 1 from seq_1_to_45; +insert into t1(es,er) select 1, 2 from seq_1_to_16; +insert into t1(es,er) select 1, 3 from seq_1_to_511; +insert into t1(es,er) select 1, 4 from seq_1_to_687; +insert into t1(es,er) select 1, 6 from seq_1_to_50; +insert into t1(es,er) select 1, 7 from seq_1_to_4; +insert into t1(es,er) select 1, 11 from seq_1_to_1; +insert into t1(es,er) select 2, 1 from seq_1_to_82; +insert into t1(es,er) select 2, 2 from seq_1_to_82; +insert into t1(es,er) select 2, 3 from seq_1_to_1626; +insert into t1(es,er) select 2, 4 from seq_1_to_977; +insert into t1(es,er) select 2, 6 from seq_1_to_33; +insert into t1(es,er) select 2, 11 from seq_1_to_1; +insert into t1(es,er) select 3, 1 from seq_1_to_245; +insert into t1(es,er) select 3, 2 from seq_1_to_81; +insert into t1(es,er) select 3, 3 from seq_1_to_852; +insert into t1(es,er) select 3, 4 from seq_1_to_2243; +insert into t1(es,er) select 3, 6 from seq_1_to_44; +insert into t1(es,er) select 3, 11 from seq_1_to_1; +insert into t1(es,er) select 4, 1 from seq_1_to_91; +insert into t1(es,er) select 4, 2 from seq_1_to_83; +insert into t1(es,er) select 4, 3 from seq_1_to_297; +insert into t1(es,er) select 4, 4 from seq_1_to_2456; +insert into t1(es,er) select 4, 6 from seq_1_to_19; +insert into t1(es,er) select 4, 11 from seq_1_to_1; +update t1 set p='foobar'; +update t1 set x=0; +set @save_isp=@@innodb_stats_persistent; +set global innodb_stats_persistent= 1; +analyze table t1; +Table Op Msg_type Msg_text +test.t1 analyze status OK +set optimizer_switch='index_merge_sort_intersection=on'; +SELECT * FROM t1 +WHERE ((p = 'foo' AND er != 4) OR er = 4 ) AND (es >= 4) LIMIT 2; +id p es er x +14645 foobar 4 4 0 +14646 foobar 4 4 0 +EXPLAIN EXTENDED SELECT * FROM t1 +WHERE ((p = 'foo' AND er != 4) OR er = 4 ) AND (es >= 4) LIMIT 2; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t1 range es,er,p es 1 NULL # 100.00 Using index condition; Using where +Warnings: +Note 1003 select `test`.`t1`.`id` AS `id`,`test`.`t1`.`p` AS `p`,`test`.`t1`.`es` AS `es`,`test`.`t1`.`er` AS `er`,`test`.`t1`.`x` AS `x` from `test`.`t1` where (`test`.`t1`.`p` = 'foo' and `test`.`t1`.`er` <> 4 or `test`.`t1`.`er` = 4) and `test`.`t1`.`es` >= 4 limit 2 +set optimizer_switch='index_merge_sort_intersection=off'; +SELECT * FROM t1 +WHERE ((p = 'foo' AND er != 4) OR er = 4 ) AND (es >= 4) LIMIT 2; +id p es er x +14645 foobar 4 4 0 +14646 foobar 4 4 0 +EXPLAIN EXTENDED SELECT * FROM t1 +WHERE ((p = 'foo' AND er != 4) OR er = 4 ) AND (es >= 4) LIMIT 2; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t1 range es,er,p es 1 NULL # 100.00 Using index condition; Using where +Warnings: +Note 1003 select `test`.`t1`.`id` AS `id`,`test`.`t1`.`p` AS `p`,`test`.`t1`.`es` AS `es`,`test`.`t1`.`er` AS `er`,`test`.`t1`.`x` AS `x` from `test`.`t1` where (`test`.`t1`.`p` = 'foo' and `test`.`t1`.`er` <> 4 or `test`.`t1`.`er` = 4) and `test`.`t1`.`es` >= 4 limit 2 +set optimizer_switch='index_merge_sort_intersection=default'; +set global innodb_stats_persistent= @save_isp; +DROP TABLE t1; +# End of 10.2 tests diff --git a/mysql-test/t/range_innodb.test b/mysql-test/t/range_innodb.test index 428e5c2..7420e72 100644 --- a/mysql-test/t/range_innodb.test +++ b/mysql-test/t/range_innodb.test @@ -116,3 +116,81 @@ SET @@GLOBAL.debug_dbug = @saved_dbug; set @@optimizer_switch= @optimizer_switch_save;
--echo # End of 10.1 tests + +--echo # +--echo # MDEV-27262: Index intersection with full scan over an index +--echo # + +--source include/have_sequence.inc + +CREATE TABLE t1 ( + id int(10) unsigned NOT NULL AUTO_INCREMENT, + p char(32) DEFAULT NULL, + es tinyint(3) unsigned NOT NULL DEFAULT 0, + er tinyint(3) unsigned NOT NULL DEFAULT 0, + x mediumint(8) unsigned NOT NULL DEFAULT 0, + PRIMARY KEY (id), + INDEX es (es), + INDEX x (x), + INDEX er (er,x), + INDEX p (p) +) ENGINE=InnoDB DEFAULT CHARSET=latin1; + +insert into t1(es,er) select 0, 1 from seq_1_to_45; +insert into t1(es,er) select 0, 2 from seq_1_to_49; +insert into t1(es,er) select 0, 3 from seq_1_to_951; +insert into t1(es,er) select 0, 3 from seq_1_to_1054; +insert into t1(es,er) select 0, 6 from seq_1_to_25; +insert into t1(es,er) select 0, 11 from seq_1_to_1; +insert into t1(es,er) select 1, 1 from seq_1_to_45; +insert into t1(es,er) select 1, 2 from seq_1_to_16; +insert into t1(es,er) select 1, 3 from seq_1_to_511; +insert into t1(es,er) select 1, 4 from seq_1_to_687; +insert into t1(es,er) select 1, 6 from seq_1_to_50; +insert into t1(es,er) select 1, 7 from seq_1_to_4; +insert into t1(es,er) select 1, 11 from seq_1_to_1; +insert into t1(es,er) select 2, 1 from seq_1_to_82; +insert into t1(es,er) select 2, 2 from seq_1_to_82; +insert into t1(es,er) select 2, 3 from seq_1_to_1626; +insert into t1(es,er) select 2, 4 from seq_1_to_977; +insert into t1(es,er) select 2, 6 from seq_1_to_33; +insert into t1(es,er) select 2, 11 from seq_1_to_1; +insert into t1(es,er) select 3, 1 from seq_1_to_245; +insert into t1(es,er) select 3, 2 from seq_1_to_81; +insert into t1(es,er) select 3, 3 from seq_1_to_852; +insert into t1(es,er) select 3, 4 from seq_1_to_2243; +insert into t1(es,er) select 3, 6 from seq_1_to_44; +insert into t1(es,er) select 3, 11 from seq_1_to_1; +insert into t1(es,er) select 4, 1 from seq_1_to_91; +insert into t1(es,er) select 4, 2 from seq_1_to_83; +insert into t1(es,er) select 4, 3 from seq_1_to_297; +insert into t1(es,er) select 4, 4 from seq_1_to_2456; +insert into t1(es,er) select 4, 6 from seq_1_to_19; +insert into t1(es,er) select 4, 11 from seq_1_to_1; +update t1 set p='foobar'; +update t1 set x=0; +set @save_isp=@@innodb_stats_persistent; +set global innodb_stats_persistent= 1; +analyze table t1; + +let $q= +SELECT * FROM t1 + WHERE ((p = 'foo' AND er != 4) OR er = 4 ) AND (es >= 4) LIMIT 2; + +set optimizer_switch='index_merge_sort_intersection=on'; +eval $q; +--replace_column 9 # +eval EXPLAIN EXTENDED $q; + +set optimizer_switch='index_merge_sort_intersection=off'; +# execution of $q and explain for it led to an assertion failure in 10.4 +# (with the optimizer switch rowid_filter set to 'on') +eval $q; +--replace_column 9 # +eval EXPLAIN EXTENDED $q; +set optimizer_switch='index_merge_sort_intersection=default'; + +set global innodb_stats_persistent= @save_isp; +DROP TABLE t1; + +--echo # End of 10.2 tests diff --git a/sql/opt_range.cc b/sql/opt_range.cc index f3f1843..2e05b88 100644 --- a/sql/opt_range.cc +++ b/sql/opt_range.cc @@ -9413,6 +9413,13 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2) key2->copy_min(tmp); if (!(key1=key1->tree_delete(tmp))) { // Only one key in tree + if (key2->min_flag & NO_MIN_RANGE && + key2->max_flag & NO_MAX_RANGE) + { + if (key2->maybe_flag) + return new SEL_ARG(SEL_ARG::MAYBE_KEY); + return 0; // Always true OR + } key1=key2; key1->make_root(); key2=key2_next;
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
-- BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net