Re: [Maria-developers] 5e16a49f5d9: MDEV-26013 distinct not work properly in some cases for spider tables
Hi, Nayuta! On Jul 21, Nayuta Yanagisawa wrote:
revision-id: 5e16a49f5d9 (mariadb-10.4.20-32-g5e16a49f5d9) parent(s): 78735dcaf75 author: Nayuta Yanagisawa committer: Nayuta Yanagisawa timestamp: 2021-07-21 09:58:34 +0000 message:
MDEV-26013 distinct not work properly in some cases for spider tables
diff --git a/sql/sql_select.cc b/sql/sql_select.cc index d8116b20cbe..cfd04d93662 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -3274,9 +3274,17 @@ bool JOIN::make_aggr_tables_info()
if (ht && ht->create_group_by) { - /* Check if the storage engine can intercept the query */ - Query query= {&all_fields, select_distinct, tables_list, conds, - group_list, order ? order : group_list, having}; + /* + Check if the storage engine can intercept the query. + + JOIN::optimize_stage2() might convert DISTINCT into GROUP BY and then + optimize away GROUP BY. In such a case, we need to notify a storage engine + supporting a group by handler of the existence of the original DISTINCT. + Thus, we set select_distinct || (no_order && group_optimized_away) to + Query::distinct. + */ + Query query= {&all_fields, select_distinct || (no_order && group_optimized_away), + tables_list, conds, group_list, order ? order : group_list, having};
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it? Why does the query fail with GROUP BY?
group_by_handler *gbh= ht->create_group_by(thd, &query);
if (gbh)
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei, Thank you for your review!
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it? Why does the query fail with GROUP BY?
For the select query in the test case (SELECT distinct b FROM tbl_a WHERE b=999), the optimizer seems to convert DISTINCT to GROUP by and then optimize away GROUP BY. The, we get select_distinct = 0, no_order = 1, group_optimized_away = 1. Please see sql/sql_select.cc:2721-2781. In such a case, group_list is NULL and thus the Spider SE misunderstand that the query has neither DISTINCT and GROUP BY without my fix. Optimizing group_list away itself might be a bug but there seems no problem with other storage engines. Regards, Nayuta On July 22, 2021, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nayuta!
revision-id: 5e16a49f5d9 (mariadb-10.4.20-32-g5e16a49f5d9) parent(s): 78735dcaf75 author: Nayuta Yanagisawa committer: Nayuta Yanagisawa timestamp: 2021-07-21 09:58:34 +0000 message: MDEV-26013 distinct not work properly in some cases for spider
diff --git a/sql/sql_select.cc b/sql/sql_select.cc index d8116b20cbe..cfd04d93662 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -3274,9 +3274,17 @@ bool JOIN::make_aggr_tables_info() if (ht && ht->create_group_by) { - /* Check if the storage engine can intercept the query */ - Query query= {&all_fields, select_distinct, tables_list, conds, - group_list, order ? order : group_list, having}; + /* + Check if the storage engine can intercept the query. + + JOIN::optimize_stage2() might convert DISTINCT into GROUP BY and
On Jul 21, Nayuta Yanagisawa wrote: tables then
+ optimize away GROUP BY. In such a case, we need to notify a storage engine + supporting a group by handler of the existence of the original DISTINCT. + Thus, we set select_distinct || (no_order && group_optimized_away) to + Query::distinct. + */ + Query query= {&all_fields, select_distinct || (no_order && group_optimized_away), + tables_list, conds, group_list, order ? order : group_list, having};
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it?
Why does the query fail with GROUP BY?
group_by_handler *gbh= ht->create_group_by(thd, &query); if (gbh)
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Nayuta! On Jul 21, Nayuta Yanagisawa wrote:
Hi Sergei,
Thank you for your review!
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it? Why does the query fail with GROUP BY?
For the select query in the test case (SELECT distinct b FROM tbl_a WHERE b=999), the optimizer seems to convert DISTINCT to GROUP by and then optimize away GROUP BY. The, we get select_distinct = 0, no_order = 1, group_optimized_away = 1. Please see sql/sql_select.cc:2721-2781.
In such a case, group_list is NULL and thus the Spider SE misunderstand that the query has neither DISTINCT and GROUP BY without my fix.
Oh, I see. group_list is NULL, I missed that. It's select distinct b from tbl_a where b=999; see, it asks for distinct values of `b` for b=999. There can be only one row in this query. Or none. So the optimizer changes DISTINCT to an implicit GROUP BY, similar to `select count(*) from tbl_a` - such an implicit GROUP BY can return at most one row. I agree, it is wrong that implicit GROUP BY has no place in the Query structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY? If it does, it likely deducts the need for GROUP BY when seeing COUNT() or SUM() in the select list. Which doesn't work for converted DISTINCT. In that case, I think your fix is good. One question: why `no_order &&` ? Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei!
I agree, it is wrong that implicit GROUP BY has no place in the Query structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
If it does, it likely deducts the need for GROUP BY when seeing COUNT() or SUM() in the select list. Which doesn't work for converted DISTINCT.
The Spider works correctly for COUNT and SUM. For example, the optimizer keeps group_list not NULL for the following query. So, it works. SELECT COUNT(*) FROM spider_tab WHERE col2=999;
In that case, I think your fix is good. One question: why `no_order &&` ?
In fact, 'no_order &&' seems to be redundant. Thank you! `select_distinct || group_optimized_away` would be enough. I removed the redundant condition from the patch and push to bb-10.4- mdev-26013. https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf... Please take another look at it. Best regards, Nayuta On July 22, 2021, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nayuta!
On Jul 21, Nayuta Yanagisawa wrote:
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it? Why does the query fail with GROUP BY? For the select query in the test case (SELECT distinct b FROM tbl_a WHERE b=999),
Hi Sergei, Thank you for your review! the optimizer seems to convert DISTINCT to GROUP by and then optimize away GROUP BY. The, we get select_distinct = 0, no_order = 1, group_optimized_away = 1. Please see sql/sql_select.cc:2721-2781. In such a case, group_list is NULL and thus the Spider SE misunderstand that the query has neither DISTINCT and GROUP BY without my fix.
Oh, I see. group_list is NULL, I missed that. It's
select distinct b from tbl_a where b=999;
see, it asks for distinct values of `b` for b=999. There can be only one row in this query. Or none. So the optimizer changes DISTINCT to an implicit GROUP BY, similar to `select count(*) from tbl_a` - such an implicit GROUP BY can return at most one row.
I agree, it is wrong that implicit GROUP BY has no place in the Query structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
If it does, it likely deducts the need for GROUP BY when seeing COUNT() or SUM() in the select list. Which doesn't work for converted DISTINCT.
In that case, I think your fix is good. One question: why `no_order &&` ?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Nayuta! Ok to push. Thanks! On Jul 22, Nayuta Yanagisawa wrote:
Hi Sergei!
I agree, it is wrong that implicit GROUP BY has no place in the Query structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
If it does, it likely deducts the need for GROUP BY when seeing COUNT() or SUM() in the select list. Which doesn't work for converted DISTINCT.
The Spider works correctly for COUNT and SUM. For example, the optimizer keeps group_list not NULL for the following query. So, it works.
SELECT COUNT(*) FROM spider_tab WHERE col2=999;
In that case, I think your fix is good. One question: why `no_order &&` ?
In fact, 'no_order &&' seems to be redundant. Thank you! `select_distinct || group_optimized_away` would be enough.
I removed the redundant condition from the patch and push to bb-10.4- mdev-26013. https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf...
Please take another look at it.
Best regards, Nayuta
On July 22, 2021, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nayuta!
On Jul 21, Nayuta Yanagisawa wrote:
If DISTINCT was coverted to a GROUP BY, why would the engine need to know whether there was DISTINCT or not originally? There is no DISTINCT on the execution plan now, that should be sufficient, shouldn't it? Why does the query fail with GROUP BY? For the select query in the test case (SELECT distinct b FROM tbl_a WHERE b=999),
Hi Sergei, Thank you for your review! the optimizer seems to convert DISTINCT to GROUP by and then optimize away GROUP BY. The, we get select_distinct = 0, no_order = 1, group_optimized_away = 1. Please see sql/sql_select.cc:2721-2781. In such a case, group_list is NULL and thus the Spider SE misunderstand that the query has neither DISTINCT and GROUP BY without my fix.
Oh, I see. group_list is NULL, I missed that. It's
select distinct b from tbl_a where b=999;
see, it asks for distinct values of `b` for b=999. There can be only one row in this query. Or none. So the optimizer changes DISTINCT to an implicit GROUP BY, similar to `select count(*) from tbl_a` - such an implicit GROUP BY can return at most one row.
I agree, it is wrong that implicit GROUP BY has no place in the Query structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
If it does, it likely deducts the need for GROUP BY when seeing COUNT() or SUM() in the select list. Which doesn't work for converted DISTINCT.
In that case, I think your fix is good. One question: why `no_order &&` ?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (2)
-
Nayuta Yanagisawa
-
Sergei Golubchik