[Commits] 56d339111a4: MDEV-16374: Filtered shows 0 for materilization scan for a semi join, which makes optimizer always picks
revision-id: 56d339111a4e5c868ffa3b1f2538c5f101f67767 (mariadb-10.0.30-366-g56d339111a4) parent(s): a61724a3ca187f6eb44fc67948acb76a94efa783 author: Varun Gupta committer: Varun Gupta timestamp: 2018-06-04 12:26:56 +0530 message: MDEV-16374: Filtered shows 0 for materilization scan for a semi join, which makes optimizer always picks materialization scan over materialization lookup For non-mergeable semi-joins we don't store the estimates of the IN subquery in table->file->stats.records. In the function TABLE_LIST::fetch_number_of_rows, we store the number of rows in the tables (estimates in case of derived table/views). Currently we don't store the estimates for non-mergeable semi-joins, which leads to a problem of selecting materialization scan over materialization lookup. Fixed this by storing these estimated appropriately --- mysql-test/r/selectivity.result | 70 ++++++++++++++++++++++++++++++++++++++--- mysql-test/t/selectivity.test | 18 +++++++++++ sql/table.cc | 8 +++++ 3 files changed, 92 insertions(+), 4 deletions(-) diff --git a/mysql-test/r/selectivity.result b/mysql-test/r/selectivity.result index 61a77d135e7..3e8fb8e2e41 100644 --- a/mysql-test/r/selectivity.result +++ b/mysql-test/r/selectivity.result @@ -356,13 +356,13 @@ and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate; id select_type table type possible_keys key key_len ref rows filtered Extra -1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 6005 0.00 Using temporary; Using filesort -1 PRIMARY orders eq_ref PRIMARY,i_o_custkey PRIMARY 4 <subquery2>.l_orderkey 1 100.00 Using where +1 PRIMARY orders ALL PRIMARY,i_o_custkey NULL NULL NULL 1500 100.00 Using where; Using temporary; Using filesort +1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 dbt3_s001.orders.o_orderkey 1 100.00 1 PRIMARY customer eq_ref PRIMARY PRIMARY 4 dbt3_s001.orders.o_custkey 1 100.00 -1 PRIMARY lineitem ref PRIMARY,i_l_orderkey,i_l_orderkey_quantity PRIMARY 4 <subquery2>.l_orderkey 4 100.00 +1 PRIMARY lineitem ref PRIMARY,i_l_orderkey,i_l_orderkey_quantity i_l_orderkey_quantity 4 dbt3_s001.orders.o_orderkey 4 100.00 Using index 2 MATERIALIZED lineitem index NULL i_l_orderkey_quantity 13 NULL 6005 100.00 Using index Warnings: -Note 1003 select `dbt3_s001`.`customer`.`c_name` AS `c_name`,`dbt3_s001`.`customer`.`c_custkey` AS `c_custkey`,`dbt3_s001`.`orders`.`o_orderkey` AS `o_orderkey`,`dbt3_s001`.`orders`.`o_orderDATE` AS `o_orderdate`,`dbt3_s001`.`orders`.`o_totalprice` AS `o_totalprice`,sum(`dbt3_s001`.`lineitem`.`l_quantity`) AS `sum(l_quantity)` from <materialize> (select `dbt3_s001`.`lineitem`.`l_orderkey` from `dbt3_s001`.`lineitem` group by `dbt3_s001`.`lineitem`.`l_orderkey` having (sum(`dbt3_s001`.`lineitem`.`l_quantity`) > 250)) join `dbt3_s001`.`customer` join `dbt3_s001`.`orders` join `dbt3_s001`.`lineitem` where ((`dbt3_s001`.`customer`.`c_custkey` = `dbt3_s001`.`orders`.`o_custkey`) and (`dbt3_s001`.`orders`.`o_orderkey` = `<subquery2>`.`l_orderkey`) and (`dbt3_s001`.`lineitem`.`l_orderkey` = `<subquery2>`.`l_orderkey`)) group by `dbt3_s001`.`customer`.`c_name`,`dbt3_s001`.`customer`.`c_custkey`,`dbt3_s001`.`orders`.`o_orderkey`,`dbt3_s001`.`orders`.`o_orderDATE`,`dbt3_s001`.`orders `.`o_tot alprice` order by `dbt3_s001`.`orders`.`o_totalprice` desc,`dbt3_s001`.`orders`.`o_orderDATE` +Note 1003 select `dbt3_s001`.`customer`.`c_name` AS `c_name`,`dbt3_s001`.`customer`.`c_custkey` AS `c_custkey`,`dbt3_s001`.`orders`.`o_orderkey` AS `o_orderkey`,`dbt3_s001`.`orders`.`o_orderDATE` AS `o_orderdate`,`dbt3_s001`.`orders`.`o_totalprice` AS `o_totalprice`,sum(`dbt3_s001`.`lineitem`.`l_quantity`) AS `sum(l_quantity)` from <materialize> (select `dbt3_s001`.`lineitem`.`l_orderkey` from `dbt3_s001`.`lineitem` group by `dbt3_s001`.`lineitem`.`l_orderkey` having (sum(`dbt3_s001`.`lineitem`.`l_quantity`) > 250)) join `dbt3_s001`.`customer` join `dbt3_s001`.`orders` join `dbt3_s001`.`lineitem` where ((`dbt3_s001`.`customer`.`c_custkey` = `dbt3_s001`.`orders`.`o_custkey`) and (`<subquery2>`.`l_orderkey` = `dbt3_s001`.`orders`.`o_orderkey`) and (`dbt3_s001`.`lineitem`.`l_orderkey` = `dbt3_s001`.`orders`.`o_orderkey`)) group by `dbt3_s001`.`customer`.`c_name`,`dbt3_s001`.`customer`.`c_custkey`,`dbt3_s001`.`orders`.`o_orderkey`,`dbt3_s001`.`orders`.`o_orderDATE`,`dbt3_s001`. `orders` .`o_totalprice` order by `dbt3_s001`.`orders`.`o_totalprice` desc,`dbt3_s001`.`orders`.`o_orderDATE` select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem @@ -1535,6 +1535,68 @@ t 10:00:00 11:00:00 DROP TABLE t1; +# +# MDEV-16374: filtered shows 0 for materilization scan for a semi join, which makes optimizer +# always pick materialization scan over materialization lookup +# +create table t0(a int); +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); +create table t1 (a int, b int); +insert into t1 values (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10), +(11,11),(12,12),(13,13),(14,14),(15,15); +set @@optimizer_use_condition_selectivity=2; +explain extended select * from t1 where a in (select max(a) from t1 group by b); +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY t1 ALL NULL NULL NULL NULL 16 100.00 Using where +1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 test.t1.a 1 100.00 +2 MATERIALIZED t1 ALL NULL NULL NULL NULL 16 100.00 Using temporary +Warnings: +Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from <materialize> (select max(`test`.`t1`.`a`) from `test`.`t1` group by `test`.`t1`.`b`) join `test`.`t1` where (`<subquery2>`.`max(a)` = `test`.`t1`.`a`) +select * from t1 where a in (select max(a) from t1 group by b); +a b +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 +10 10 +11 11 +12 12 +13 13 +14 14 +15 15 +set @@optimizer_use_condition_selectivity=1; +explain extended select * from t1 where a in (select max(a) from t1 group by b); +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY t1 ALL NULL NULL NULL NULL 16 100.00 Using where +1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 test.t1.a 1 100.00 +2 MATERIALIZED t1 ALL NULL NULL NULL NULL 16 100.00 Using temporary +Warnings: +Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from <materialize> (select max(`test`.`t1`.`a`) from `test`.`t1` group by `test`.`t1`.`b`) join `test`.`t1` where (`<subquery2>`.`max(a)` = `test`.`t1`.`a`) +select * from t1 where a in (select max(a) from t1 group by b); +a b +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 +10 10 +11 11 +12 12 +13 13 +14 14 +15 15 +drop table t1,t0; set histogram_size=@save_histogram_size; set optimizer_use_condition_selectivity=@save_optimizer_use_condition_selectivity; set use_stat_tables=@save_use_stat_tables; diff --git a/mysql-test/t/selectivity.test b/mysql-test/t/selectivity.test index 548ef295fb2..afaa937c360 100644 --- a/mysql-test/t/selectivity.test +++ b/mysql-test/t/selectivity.test @@ -1043,6 +1043,24 @@ SELECT * FROM (SELECT t FROM t1 WHERE d IS NULL) sq; DROP TABLE t1; +--echo # +--echo # MDEV-16374: filtered shows 0 for materilization scan for a semi join, which makes optimizer +--echo # always pick materialization scan over materialization lookup +--echo # + +create table t0(a int); +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); +create table t1 (a int, b int); +insert into t1 values (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10), +(11,11),(12,12),(13,13),(14,14),(15,15); +set @@optimizer_use_condition_selectivity=2; +explain extended select * from t1 where a in (select max(a) from t1 group by b); +select * from t1 where a in (select max(a) from t1 group by b); +set @@optimizer_use_condition_selectivity=1; +explain extended select * from t1 where a in (select max(a) from t1 group by b); +select * from t1 where a in (select max(a) from t1 group by b); +drop table t1,t0; + set histogram_size=@save_histogram_size; set optimizer_use_condition_selectivity=@save_optimizer_use_condition_selectivity; set use_stat_tables=@save_use_stat_tables; diff --git a/sql/table.cc b/sql/table.cc index bc6e1e754ee..b5082df7076 100644 --- a/sql/table.cc +++ b/sql/table.cc @@ -7099,7 +7099,15 @@ int TABLE_LIST::fetch_number_of_rows() { int error= 0; if (jtbm_subselect) + { + if (jtbm_subselect->is_jtbm_merged) + { + table->file->stats.records= jtbm_subselect->jtbm_record_count; + set_if_bigger(table->file->stats.records, 2); + table->used_stat_records= table->file->stats.records; + } return 0; + } if (is_materialized_derived() && !fill_me) {
Hi Varun, On Mon, Jun 04, 2018 at 12:34:49PM +0530, Varun wrote:
revision-id: 56d339111a4e5c868ffa3b1f2538c5f101f67767 (mariadb-10.0.30-366-g56d339111a4) parent(s): a61724a3ca187f6eb44fc67948acb76a94efa783 author: Varun Gupta committer: Varun Gupta timestamp: 2018-06-04 12:26:56 +0530 message:
MDEV-16374: Filtered shows 0 for materilization scan for a semi join, which makes optimizer always picks materialization scan over materialization lookup
For non-mergeable semi-joins we don't store the estimates of the IN subquery in table->file->stats.records. In the function TABLE_LIST::fetch_number_of_rows, we store the number of rows in the tables (estimates in case of derived table/views). Currently we don't store the estimates for non-mergeable semi-joins, which leads to a problem of selecting materialization scan over materialization lookup. Fixed this by storing these estimated appropriately
* This is a change in the optimizer. Should we really do it in 10.0? In my opinion, this should go into 10.3 or 10.4. Let's discuss this, and I think Igor may have something to say, too. * The patch updates selectivity.test but not selectivity_innodb.result (which includes selectivity.test). * Please see my last comments at https://jira.mariadb.org/browse/MDEV-16374 . Here is an alternative patch (it passes the testsuite, and produces the same result for selectivity* tes): diff --git a/sql/sql_select.cc b/sql/sql_select.cc index 1285835..884997c 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -6602,7 +6602,7 @@ double matching_candidates_in_table(JOIN_TAB *s, bool with_found_constraint, { TABLE *table= s->table; double sel= table->cond_selectivity; - double table_records= (double)table->stat_records(); + double table_records= s->records; // psergey-10 dbl_records= table_records * sel; return dbl_records; } What do you think about it? I would like to get Igor's opinion also because I'm not certain which member variable should be changed here. (Why does derived table modify table->stats.records while JTBM semi-joins modify JOIN_TAB::records? These seem like two solutions for the same problem?) BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
participants (2)
-
Sergey Petrunia
-
varunraiko1803@gmail.com