[Maria-developers] LIMIT optimisations
Hi, Following this discussion in 2007 : http://lists.mysql.com/internals/34287 Is there any plan to implement such an optimisation in MariaDB ? (I think a lot of web app using pagination could take benefit of such an optimisation, although there are some workarounds to avoid big LIMIT for pagination) Thanks ! Jocelyn
Hi! Catching up with my old emails...
"Jocelyn" == Jocelyn Fournier <joce@doctissimo.fr> writes:
Jocelyn> Hi, Jocelyn> Following this discussion in 2007 : http://lists.mysql.com/internals/34287 Jocelyn> Is there any plan to implement such an optimisation in MariaDB ? (I Jocelyn> think a lot of web app using pagination could take benefit of such an Jocelyn> optimisation, although there are some workarounds to avoid big LIMIT for Jocelyn> pagination) Thanks for the reference to the old discussion, I had missed the original and it was interesting reading. The problem here is that if you do a lot of deletes of rows or updates of keys, it would be hard to impossible to efficiently store a position id for each row. However some of the storage engines have a direct position to rows. - For MyISAM (and Maria) with static lengths row, you can directly jump to row 'X'. - For Maria, each 'dynamic length row' has an ID that one can use for positioning (There may be 'holes' in the ID, but that could be taken care of). However, neither of the above would help if you want to have position based on an index. Exactly what problem is it that you want to solve ? I saw your question of: How to optimize select * from abc limit 1000,30? Can't you use the HANDLER calls for this ? (This allows you to freely read backwards/forwards on an index with limit) In this case you don't know exactly where you are in the table, but you can freely move 30 rows from your current position. If you could describe your application a bit, there is maybe ways to easily extend the handler interface to solve your problem... For example, knowing this would help: - Do you want rows in position order, not key order? - Do you need to read a row from a specific position (1000) or do you need to read rows backward/forward based on an old position? - Do you need to read rows backwards? (By the way, I don't understand Sergei's comment about read_set in this context) Regards, Monty
Hi ! On a bulletin board context, that's quite simple : Let's say we want to display a forum thread containing a lot of posts. To simplify, I have the following table 'posts' which contains : - id_thread - id_post - content - id_author If I want to display a paginated posts list of a given topic, with 30 posts per page, I have to do : SELECT content, author_name FROM posts LEFT JOIN author USING (id_author) ....... WHERE id_thread=.... ORDER BY id_post ASC LIMIT x,30 I have a PK on (id_thread, id_post). If I have a lot of posts in this thread, I could have easily a big LIMIT to get the last pages of the thread, which are the more often read (and the query will be triggered quite often especially if google like my bulletin board :)). The current behaviour of MyISAM seems to be to always scan all the rows; than means if I have a LIMIT 12000,40, the first useless 12000 rows will be scanned, and this is especially bad if "content" is a TEXT field (no static lengths row here). That means the more pages I have, to slower is the query. The only workarounds I have found for now are : - to maintain in my application a (id_post => position) mapping, which allows me to do a range search without limit clause which is quite fast. - to do a SELECT id_post FROM id_thread=.... ORDER BY id_post ASC LIMIT x,1 and SELECT id_post FROM id_thread=... ORDER BY id_post ASC LIMIT x+31,1, and use these two positions to make a range search without limit clause (a kind of manually ICP) HANDLER doesn't seem to solve the issue, because I cannot jump easily on the row number 12000 in index (id_thread, id_post). Moreover I cannot make my JOIN as well. So to sum up : - I want rows in key order, but key order should most of the time matches the row position - I need to read a row from a specific position - I don't need to read rows backwards I don't know if index condition pushdown is currently used to optimize LIMIT clause, but than could indeed improve a lot those kind of issues. MRR should not help here since usually rows in the data file are in the same order than in this specific index. Thanks, Jocelyn Le 10/03/10 21:12, Michael Widenius a écrit :
Hi!
Catching up with my old emails...
"Jocelyn" == Jocelyn Fournier<joce@doctissimo.fr> writes:
Jocelyn> Hi, Jocelyn> Following this discussion in 2007 : http://lists.mysql.com/internals/34287
Jocelyn> Is there any plan to implement such an optimisation in MariaDB ? (I Jocelyn> think a lot of web app using pagination could take benefit of such an Jocelyn> optimisation, although there are some workarounds to avoid big LIMIT for Jocelyn> pagination)
Thanks for the reference to the old discussion, I had missed the original and it was interesting reading.
The problem here is that if you do a lot of deletes of rows or updates of keys, it would be hard to impossible to efficiently store a position id for each row.
However some of the storage engines have a direct position to rows.
- For MyISAM (and Maria) with static lengths row, you can directly jump to row 'X'.
- For Maria, each 'dynamic length row' has an ID that one can use for positioning (There may be 'holes' in the ID, but that could be taken care of).
However, neither of the above would help if you want to have position based on an index.
Exactly what problem is it that you want to solve ?
I saw your question of:
How to optimize select * from abc limit 1000,30?
Can't you use the HANDLER calls for this ?
(This allows you to freely read backwards/forwards on an index with limit)
In this case you don't know exactly where you are in the table, but you can freely move 30 rows from your current position.
If you could describe your application a bit, there is maybe ways to easily extend the handler interface to solve your problem...
For example, knowing this would help: - Do you want rows in position order, not key order? - Do you need to read a row from a specific position (1000) or do you need to read rows backward/forward based on an old position? - Do you need to read rows backwards?
(By the way, I don't understand Sergei's comment about read_set in this context)
Regards, Monty
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
On Wed, Mar 10, 2010 at 4:01 PM, Jocelyn Fournier <joce@doctissimo.fr> wrote:
Hi !
On a bulletin board context, that's quite simple :
Let's say we want to display a forum thread containing a lot of posts.
To simplify, I have the following table 'posts' which contains :
- id_thread - id_post - content - id_author
If I want to display a paginated posts list of a given topic, with 30 posts per page, I have to do :
SELECT content, author_name FROM posts LEFT JOIN author USING (id_author) ....... WHERE id_thread=.... ORDER BY id_post ASC LIMIT x,30 I have a PK on (id_thread, id_post).
If I have a lot of posts in this thread, I could have easily a big LIMIT to get the last pages of the thread, which are the more often read (and the query will be triggered quite often especially if google like my bulletin board :)). The current behaviour of MyISAM seems to be to always scan all the rows; than means if I have a LIMIT 12000,40, the first useless 12000 rows will be scanned, and this is especially bad if "content" is a TEXT field (no static lengths row here).
This is the behavior for all storage engines. I don't think you are going to get the optimization in MySQL that I think you are asking for. I have written about the performance problems of pagination and a workaround in http://www.facebook.com/note.php?note_id=206034210932. -- Mark Callaghan mdcallag@gmail.com
Hi Mark ! Unfortunately, if you want to have the ability to jump to random page, you're stuck and cannot use this workaround ! And in the case of a forum, you're most of the time jumping to the latest page of a topic. Anyway I'm pretty sure we could add some optimisations to improve the current behaviour (like ICP), if it's not already the case (I've not checked this with MySQL 5.5 / Maria yet) Thanks, Jocelyn Le 11/03/10 01:35, MARK CALLAGHAN a écrit :
On Wed, Mar 10, 2010 at 4:01 PM, Jocelyn Fournier<joce@doctissimo.fr> wrote:
Hi !
On a bulletin board context, that's quite simple :
Let's say we want to display a forum thread containing a lot of posts.
To simplify, I have the following table 'posts' which contains :
- id_thread - id_post - content - id_author
If I want to display a paginated posts list of a given topic, with 30 posts per page, I have to do :
SELECT content, author_name FROM posts LEFT JOIN author USING (id_author) ....... WHERE id_thread=.... ORDER BY id_post ASC LIMIT x,30 I have a PK on (id_thread, id_post).
If I have a lot of posts in this thread, I could have easily a big LIMIT to get the last pages of the thread, which are the more often read (and the query will be triggered quite often especially if google like my bulletin board :)). The current behaviour of MyISAM seems to be to always scan all the rows; than means if I have a LIMIT 12000,40, the first useless 12000 rows will be scanned, and this is especially bad if "content" is a TEXT field (no static lengths row here).
This is the behavior for all storage engines. I don't think you are going to get the optimization in MySQL that I think you are asking for. I have written about the performance problems of pagination and a workaround in http://www.facebook.com/note.php?note_id=206034210932.
participants (3)
-
Jocelyn Fournier
-
MARK CALLAGHAN
-
Michael Widenius