Timour, It's not clear for me: 1. why group by queries with GROUP BY col1, col2, where col1,col2 are defined over the same table, are considered as possibly good for for loose scan optimization. 2. how it comes that unique index is considered as a candidate for a loose scan (see your test case). (The second issue, of course, are unrelated to bug mdev-641). Besides, didn't you notice that you removes duplicates in group by under any circumstances. How it will affect the following query: select a, sum(c) from t1 group by a, a with rollup; ? In general, removing duplicates from GROUP BY and ORDER BY lists is a good optimization, but why it should be done in connection with this bug? Regards, Igor. On 01/24/2013 06:38 AM, Timour Katchaounov wrote:
Igor,
Could you please review the following patch (we discussed it at the team call):
------------------------------------------------------------ revno: 3616 revision-id: timour@askmonty.org-20130124143526-bpaiyushble1ieek parent: timour@askmonty.org-20130117140805-4kyoq7azx4v2irhq fixes bug: https://mariadb.atlassian.net/browse/MDEV-614 committer: timour@askmonty.org branch nick: 5.3-md641 timestamp: Thu 2013-01-24 16:35:26 +0200 message: Fix for bug MDEV-641 LP:1002108 - Wrong result (or crash) from a query with duplicated field in the group list and a limit clause Bug#11761078: 53534: INCORRECT 'SELECT SQL_BIG_RESULT...' WITH GROUP BY ON DUPLICATED FIELDS
Analysis Range analysis discoveres that the query can be executed via loose index scan for GROUP BY. Later, GROUP BY analysis fails to confirm that the GROUP operation can be computed via an index because there is no logic to handle duplicate field references in the GROUP clause. As a result the optimizer the optimizer produces an inconsistent plan. It constructs a temporary table, but on the other hand the group fields are not set to point there.
Solution: Make order by analysis be in sync with loose scan analysis for group by. Instead of changing the logic of how indexes are analyzed for order by, the patch takes the same approach as MySQL - normalize the query by removing duplicate column references. The difference from MySQL is that we remove duplicates earlier, by reusing remove_const() which already removed unneeded ORDER elements.