[Maria-discuss] Dynamically rewrite query
Hello list, I'm administering a MariaDB (10.0.23) backend used by some proprietary software. I'm having trouble with one specific slow query. The tables look like this (simplified for clarity): CREATE TABLE `t1` ( `i` int(11) NOT NULL, `j` int(11) NOT NULL, `l` int(11) NOT NULL, `m` int(11) DEFAULT NULL, PRIMARY KEY (`i`,`j`,`l`) ); CREATE TABLE `t2` ( `i` int(11) NOT NULL, `k` int(11) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; The problematic query is this: SELECT DISTINCT t2.i, t2.k, t1.j, t1.l, t1.m FROM t2 LEFT JOIN t1 ON t2.i=t1.i WHERE t2.k < 123; The query is slow because the DISTINCT clause forces the query to use a temporary table. Removing DISTINCT makes the query 100x faster. Now, since (t1.i, t1.j, t1.l, t1.m) are guaranteed to be unique (and also (t2.i, t2.k) on their own, if the left join doesn't match), I cannot see how "SELECT ..." and "SELECT DISTINCT ..." can possibly produce different output. So I mention that to the developers of said software, and they reply that they forgot why they added the DISTINCT but that they decided it's too risky to remove it (thereby admitting that they don't comment their code, and don't write unit tests for bugs they find, but I knew that already). I'm aware of "Garbage in, garbage out", but it could also be argued that MariaDB should be smart enough to detect that the DISTINCT is unnecessary and ignore it (should I create a bug report for that?). In the meantime, is there any way I can rewrite queries like this dynamically to remove the DISTINCT clause? MySQL 5.7 documentation mentions query rewrite plugins, but I don't see that mentioned in MariaDB documentation. Any other suggestions are welcome. Thanks, Erik
You can use MaxScale : https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale/maxscale-regex... 2016-04-20 16:09 GMT+02:00 Erik Cederstrand <erik@cederstrand.dk>:
Hello list,
I'm administering a MariaDB (10.0.23) backend used by some proprietary software. I'm having trouble with one specific slow query. The tables look like this (simplified for clarity):
CREATE TABLE `t1` ( `i` int(11) NOT NULL, `j` int(11) NOT NULL, `l` int(11) NOT NULL, `m` int(11) DEFAULT NULL, PRIMARY KEY (`i`,`j`,`l`) );
CREATE TABLE `t2` ( `i` int(11) NOT NULL, `k` int(11) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The problematic query is this: SELECT DISTINCT t2.i, t2.k, t1.j, t1.l, t1.m FROM t2 LEFT JOIN t1 ON t2.i=t1.i WHERE t2.k < 123;
The query is slow because the DISTINCT clause forces the query to use a temporary table. Removing DISTINCT makes the query 100x faster.
Now, since (t1.i, t1.j, t1.l, t1.m) are guaranteed to be unique (and also (t2.i, t2.k) on their own, if the left join doesn't match), I cannot see how "SELECT ..." and "SELECT DISTINCT ..." can possibly produce different output. So I mention that to the developers of said software, and they reply that they forgot why they added the DISTINCT but that they decided it's too risky to remove it (thereby admitting that they don't comment their code, and don't write unit tests for bugs they find, but I knew that already).
I'm aware of "Garbage in, garbage out", but it could also be argued that MariaDB should be smart enough to detect that the DISTINCT is unnecessary and ignore it (should I create a bug report for that?).
In the meantime, is there any way I can rewrite queries like this dynamically to remove the DISTINCT clause? MySQL 5.7 documentation mentions query rewrite plugins, but I don't see that mentioned in MariaDB documentation. Any other suggestions are welcome.
Thanks, Erik
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Den 20. apr. 2016 kl. 17.07 skrev Guillaume Lefranc <guillaume@adishatz.net>:
You can use MaxScale : https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale/maxscale-regex...
Ok, thanks. I'll have a look at it. Erik
On 20 Apr 2016, at 22:09, Erik Cederstrand <erik@cederstrand.dk> wrote:
In the meantime, is there any way I can rewrite queries like this dynamically to remove the DISTINCT clause? MySQL 5.7 documentation mentions query rewrite plugins, but I don't see that mentioned in MariaDB documentation. Any other suggestions are welcome.
Query rewriting is not available in MariaDB Server up to 10.1 yet, and the current solution for this would be to use MariaDB MaxScale, which will sit between your app and the server. If you feel that query rewriting should also be in the server, I suggest you start watching this issue and even voting for it: https://jira.mariadb.org/browse/MDEV-5561 -- Colin Charles, http://bytebot.net/blog/ twitter: @bytebot | skype: colincharles "First they ignore you, then they laugh at you, then they fight you, then you win." -- Mohandas Gandhi
Hi, Erik! On Apr 20, Erik Cederstrand wrote:
CREATE TABLE `t1` ( `i` int(11) NOT NULL, `j` int(11) NOT NULL, `l` int(11) NOT NULL, `m` int(11) DEFAULT NULL, PRIMARY KEY (`i`,`j`,`l`));
CREATE TABLE `t2` ( `i` int(11) NOT NULL, `k` int(11) DEFAULT NULL, PRIMARY KEY (`i`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The problematic query is this: SELECT DISTINCT t2.i, t2.k, t1.j, t1.l, t1.m FROM t2 LEFT JOIN t1 ON t2.i=t1.i WHERE t2.k < 123;
The query is slow because the DISTINCT clause forces the query to use a temporary table. Removing DISTINCT makes the query 100x faster. ... it could also be argued that MariaDB should be smart enough to detect that the DISTINCT is unnecessary and ignore it (should I create a bug report for that?).
Yes, please, do report it! On the first glance this looks like a rather simple optimization. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Den 20. apr. 2016 kl. 20.15 skrev Sergei Golubchik <serg@mariadb.org>:
On Apr 20, Erik Cederstrand wrote:
CREATE TABLE `t1` ( `i` int(11) NOT NULL, `j` int(11) NOT NULL, `l` int(11) NOT NULL, `m` int(11) DEFAULT NULL, PRIMARY KEY (`i`,`j`,`l`));
CREATE TABLE `t2` ( `i` int(11) NOT NULL, `k` int(11) DEFAULT NULL, PRIMARY KEY (`i`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The problematic query is this: SELECT DISTINCT t2.i, t2.k, t1.j, t1.l, t1.m FROM t2 LEFT JOIN t1 ON t2.i=t1.i WHERE t2.k < 123;
The query is slow because the DISTINCT clause forces the query to use a temporary table. Removing DISTINCT makes the query 100x faster. ... it could also be argued that MariaDB should be smart enough to detect that the DISTINCT is unnecessary and ignore it (should I create a bug report for that?).
Yes, please, do report it! On the first glance this looks like a rather simple optimization.
Thanks. Reported as https://jira.mariadb.org/browse/MDEV-9964 Erik
participants (4)
-
Colin Charles
-
Erik Cederstrand
-
Guillaume Lefranc
-
Sergei Golubchik