[Maria-developers] Order by speed optimization
I was talking with a few people about a query such as select * from table order by X, Y, Z limit G;
As the "ORDER BY" for me could change to any number of 12 values in any order I am not able to use a multicolumn index as such I was looking at how to optimize in such case where sorting is dynamic. On my dataset the query takes ~7 seconds for a full table scan but only .3 seconds when using the column index and returns ~80,000 records for the query (before the LIMIT) I was wondering if there is any reason that the order by syntax doesn't use the following logic, I was thinking of looking at the code but was told such a simple optimization must have a reason that its not already in the code, so I am asking here before I spend hours being stupid. This logic block will be based on the above query. 1. If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G 2. Use this smaller subset of data for then ordering the data based on Y 3. Order the data again based on Z 4. Return the sorted data like normal If there is a reason this optimization isn't in the code please someone let me know, if not I think I will take a look at adding it. -Botanic
On 01/11/2013 09:33 PM, Matthew Lagoe wrote:
I was talking with a few people about a query such as
|select * from table order by X, Y, Z limit G;|
As the "ORDER BY" for me could change to any number of 12 values in any order I am not able to use a multicolumn index as such I was looking at how to optimize in such case where sorting is dynamic.
On my dataset the query takes ~7 seconds for a full table scan but only .3 seconds when using the column index and returns ~80,000 records for the query (before the LIMIT)
I was wondering if there is any reason that the order by syntax doesn't use the following logic, I was thinking of looking at the code but was told such a simple optimization must have a reason that its not already in the code, so I am asking here before I spend hours being stupid.
This logic block will be based on the above query.
1. If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G 2. Use this smaller subset of data for then ordering the data based on Y 3. Order the data again based on Z 4. Return the sorted data like normal
If there is a reason this optimization isn't in the code please someone let me know, if not I think I will take a look at adding it.
Mathew, It would be nice if you could support such an optimization for ORDER BY <columns> LIMIT BY N when only a prefix of the <columns> list is compatible with some index or with its major components. Regards, Igor.
-Botanic
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
On 01/11/2013 09:33 PM, Matthew Lagoe wrote:
I was talking with a few people about a query such as
|select * from table order by X, Y, Z limit G;|
As the "ORDER BY" for me could change to any number of 12 values in any order I am not able to use a multicolumn index as such I was looking at how to optimize in such case where sorting is dynamic.
On my dataset the query takes ~7 seconds for a full table scan but only .3 seconds when using the column index and returns ~80,000 records for the query (before the LIMIT)
I was wondering if there is any reason that the order by syntax doesn't use the following logic, I was thinking of looking at the code but was told such a simple optimization must have a reason that its not already in the code, so I am asking here before I spend hours being stupid.
This logic block will be based on the above query.
1. If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G 2. Use this smaller subset of data for then ordering the data based on Y 3. Order the data again based on Z 4. Return the sorted data like normal
If there is a reason this optimization isn't in the code please someone let me know, if not I think I will take a look at adding it.
-Botanic
Hi Matthew, I gave more thoughts to your optimization for ORDER BY with LIMIT. I find it very good for the following reasons. MariaDB 10.0 (and MySQL 5.6) support an optimization for ORDER BY with LIMIT that employes a priority queue. This optimization requires a full scan of the table. Your optimization does not require this. Suppose you have a query with ORDER BY a,b,x,y,z LIMIT N and an index on (a,b,c). In this case you could perform an index scan putting the records you read into the priority queue as the regular algorithm for ORDER BY with LIMIT does. Yet you don't have to scan all records. You can stop when you reach the keys with the values of (a,b) greater than all those stored in the priority queue (remember that our priority queue contains N elements). I believe that the change for the regular algorithm to support your optimization will not be difficult for you. When you implement such a modification you already will be able to see how big benefits you have for your application with this optimization. For this purpose it would be enough for you to force using a proper index for the ORDER BY clause. An automatic choice of a proper index will be a more complex part of the development. Hopefully I will be helpful here. Regards, Igor.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (2)
-
Igor Babaev
-
Matthew Lagoe