[Maria-developers] implementing index condition pushdown in MariaDB 5.5
Hello, Is there any documentation for what a storage engine needs to do to implement index condition pushdown in MariaDB 5.5? I see some related things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN, and idx_cond_push, but I don't understand how these all interact with each other. Thanks -Zardosht
Hi Zardosht, On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
Is there any documentation for what a storage engine needs to do to implement index condition pushdown in MariaDB 5.5? I see some related things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN, and idx_cond_push, but I don't understand how these all interact with each other.
There is no documentation that I am aware of. Basic considerations are == Interface == h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise, SQL layer will not attempt to do condition pushdown. The primary functions to overload are: Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg) void handler::cancel_pushed_idx_cond() SQL layer will extract a part of WHERE condition that refers to an index (it may also have references to fields in other tables) and pass it as parameter to idx_cond_push(). idx_cond_push() returns the "remainder", i.e. a part of the passed condition that it is not able to check. In particular - returning NULL means that the condition was completely pushed down, and storage engine will only return records for which idx_cond_arg evaluates to TRUE. - returning idx_cond_arg back means that the engine was unable to push the index condition. SQL layer will need to check idx_cond_arg on its own. cancel_pushed_idx_cond() cancels the effect of idx_cond_push(). idx_cond_push() is called before index_init() call, and pushed condition should survive index_init()/index_end() calls, as long as the used index is the same as keyno_arg argument of idx_cond_push(). The SQL layer will not attempt to have multiple pushed index conditions for multiple indexes. This is it about the interface. == Implementation == ICP implementations we have at the moment are all similar, and share some code. It is useful but not mandatory. handler_index_cond_check() is an utility function that one may call from the storage engine (after having unpacked index columns to their places in table->record[0]) to check the index condition and do some related things: 1. increment ICP Handler_xxx counters 2. Check if the query was killed (useful, when the query is showeling through lots of records that don't match the index condition. It's nice to be able to KILL the query) 3. Check if we've ran out of range. #3 is critical. Suppose you're doing a range scan on a range, t.key BETWEEN 'bar' and 'foo'. MariaDB (and MySQL) will not remove the above predicate from the WHERE clause. The predicate uses index columns, so it will be pushed down as index condition. MySQL will scan the range with these calls: h->read_range_first('bar', 'foo') while (h->read_range_next() != HA_ERR...) { emit a row; } Most storage engines do not implement read_range_next(), they rely on implementation in handler::read_range_next(). Which calls h->index_next() and then returns EOF if the returned record is out of range we're scanning. But what if we make h->index_next() to perform index condition pushdown checks? The following can happen: h->index_next() sees index record 'fop' (which is greater than 'foo'). It checks index condition of "t.key BETWEEN 'bar' and 'foo'", finds it to be FALSE, and proceeds to read the next index entry. The next index entry is greater (or equal) than 'fop', so it will proceed further until the end of the index. To prevent this, handler_index_cond_check() checks whether the retrieved index record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if that is the case). These were the primary points, let me know if there is something not clear yet. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
Hello Sergei, Thanks for the feedback. What makes this feature a little difficult for me to grasp is the requirement that the engine is responsible for doing some of the condition checking, and that MySQL is doing none of the checking. Here are the handler cursor API functions: index_first index_last index_next index_prev index_next_same index_read If we return NULL for idx_cond_push, meaning we will do all of the condition checking, does that mean ALL of these functions need to check the condition before returning something to MySQL? Or just a subset? Also, suppose we always return the full condition to MySQL, meaning the engine is responsible for none of the condition checking, but then the engine does condition checking anyway. Is that ok? Here is an example of what I mean. Suppose we have a key (a,b), and a query of select * from table where a between 5 and 10 and b=5; Would it be ok for our engine to return the full condition to MySQL, but then in subsequent calls to index_next and index_prev, choose to not return rows where b != 5? Thanks -Zardosht On Wed, Feb 13, 2013 at 5:07 AM, Sergei Petrunia <psergey@askmonty.org> wrote:
Hi Zardosht,
On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
Is there any documentation for what a storage engine needs to do to implement index condition pushdown in MariaDB 5.5? I see some related things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN, and idx_cond_push, but I don't understand how these all interact with each other.
There is no documentation that I am aware of. Basic considerations are
== Interface == h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise, SQL layer will not attempt to do condition pushdown.
The primary functions to overload are:
Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg) void handler::cancel_pushed_idx_cond()
SQL layer will extract a part of WHERE condition that refers to an index (it may also have references to fields in other tables) and pass it as parameter to idx_cond_push().
idx_cond_push() returns the "remainder", i.e. a part of the passed condition that it is not able to check. In particular
- returning NULL means that the condition was completely pushed down, and storage engine will only return records for which idx_cond_arg evaluates to TRUE.
- returning idx_cond_arg back means that the engine was unable to push the index condition. SQL layer will need to check idx_cond_arg on its own.
cancel_pushed_idx_cond() cancels the effect of idx_cond_push(). idx_cond_push() is called before index_init() call, and pushed condition should survive index_init()/index_end() calls, as long as the used index is the same as keyno_arg argument of idx_cond_push().
The SQL layer will not attempt to have multiple pushed index conditions for multiple indexes.
This is it about the interface.
== Implementation == ICP implementations we have at the moment are all similar, and share some code. It is useful but not mandatory.
handler_index_cond_check() is an utility function that one may call from the storage engine (after having unpacked index columns to their places in table->record[0]) to check the index condition and do some related things:
1. increment ICP Handler_xxx counters 2. Check if the query was killed (useful, when the query is showeling through lots of records that don't match the index condition. It's nice to be able to KILL the query) 3. Check if we've ran out of range.
#3 is critical. Suppose you're doing a range scan on a range,
t.key BETWEEN 'bar' and 'foo'.
MariaDB (and MySQL) will not remove the above predicate from the WHERE clause. The predicate uses index columns, so it will be pushed down as index condition.
MySQL will scan the range with these calls:
h->read_range_first('bar', 'foo') while (h->read_range_next() != HA_ERR...) { emit a row; }
Most storage engines do not implement read_range_next(), they rely on implementation in handler::read_range_next(). Which calls h->index_next() and then returns EOF if the returned record is out of range we're scanning.
But what if we make h->index_next() to perform index condition pushdown checks?
The following can happen: h->index_next() sees index record 'fop' (which is greater than 'foo'). It checks index condition of "t.key BETWEEN 'bar' and 'foo'", finds it to be FALSE, and proceeds to read the next index entry. The next index entry is greater (or equal) than 'fop', so it will proceed further until the end of the index.
To prevent this, handler_index_cond_check() checks whether the retrieved index record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if that is the case).
These were the primary points, let me know if there is something not clear yet.
BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
Also, can somebody please explain how handler_index_cond_check checks index conditions? The key function seems to be item->val_int. How does this get each value and check check conditions? On Thu, Feb 14, 2013 at 11:31 AM, Zardosht Kasheff <zardosht@gmail.com> wrote:
Hello Sergei,
Thanks for the feedback. What makes this feature a little difficult for me to grasp is the requirement that the engine is responsible for doing some of the condition checking, and that MySQL is doing none of the checking.
Here are the handler cursor API functions:
index_first index_last index_next index_prev index_next_same index_read
If we return NULL for idx_cond_push, meaning we will do all of the condition checking, does that mean ALL of these functions need to check the condition before returning something to MySQL? Or just a subset?
Also, suppose we always return the full condition to MySQL, meaning the engine is responsible for none of the condition checking, but then the engine does condition checking anyway. Is that ok? Here is an example of what I mean. Suppose we have a key (a,b), and a query of select * from table where a between 5 and 10 and b=5; Would it be ok for our engine to return the full condition to MySQL, but then in subsequent calls to index_next and index_prev, choose to not return rows where b != 5?
Thanks -Zardosht
On Wed, Feb 13, 2013 at 5:07 AM, Sergei Petrunia <psergey@askmonty.org> wrote:
Hi Zardosht,
On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
Is there any documentation for what a storage engine needs to do to implement index condition pushdown in MariaDB 5.5? I see some related things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN, and idx_cond_push, but I don't understand how these all interact with each other.
There is no documentation that I am aware of. Basic considerations are
== Interface == h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise, SQL layer will not attempt to do condition pushdown.
The primary functions to overload are:
Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg) void handler::cancel_pushed_idx_cond()
SQL layer will extract a part of WHERE condition that refers to an index (it may also have references to fields in other tables) and pass it as parameter to idx_cond_push().
idx_cond_push() returns the "remainder", i.e. a part of the passed condition that it is not able to check. In particular
- returning NULL means that the condition was completely pushed down, and storage engine will only return records for which idx_cond_arg evaluates to TRUE.
- returning idx_cond_arg back means that the engine was unable to push the index condition. SQL layer will need to check idx_cond_arg on its own.
cancel_pushed_idx_cond() cancels the effect of idx_cond_push(). idx_cond_push() is called before index_init() call, and pushed condition should survive index_init()/index_end() calls, as long as the used index is the same as keyno_arg argument of idx_cond_push().
The SQL layer will not attempt to have multiple pushed index conditions for multiple indexes.
This is it about the interface.
== Implementation == ICP implementations we have at the moment are all similar, and share some code. It is useful but not mandatory.
handler_index_cond_check() is an utility function that one may call from the storage engine (after having unpacked index columns to their places in table->record[0]) to check the index condition and do some related things:
1. increment ICP Handler_xxx counters 2. Check if the query was killed (useful, when the query is showeling through lots of records that don't match the index condition. It's nice to be able to KILL the query) 3. Check if we've ran out of range.
#3 is critical. Suppose you're doing a range scan on a range,
t.key BETWEEN 'bar' and 'foo'.
MariaDB (and MySQL) will not remove the above predicate from the WHERE clause. The predicate uses index columns, so it will be pushed down as index condition.
MySQL will scan the range with these calls:
h->read_range_first('bar', 'foo') while (h->read_range_next() != HA_ERR...) { emit a row; }
Most storage engines do not implement read_range_next(), they rely on implementation in handler::read_range_next(). Which calls h->index_next() and then returns EOF if the returned record is out of range we're scanning.
But what if we make h->index_next() to perform index condition pushdown checks?
The following can happen: h->index_next() sees index record 'fop' (which is greater than 'foo'). It checks index condition of "t.key BETWEEN 'bar' and 'foo'", finds it to be FALSE, and proceeds to read the next index entry. The next index entry is greater (or equal) than 'fop', so it will proceed further until the end of the index.
To prevent this, handler_index_cond_check() checks whether the retrieved index record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if that is the case).
These were the primary points, let me know if there is something not clear yet.
BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
On Thu, Feb 14, 2013 at 02:24:57PM -0500, Zardosht Kasheff wrote:
Also, can somebody please explain how handler_index_cond_check checks index conditions? The key function seems to be item->val_int. How does this get each value and check check conditions?
Function handler_index_cond_check() is invoked once for each index tuple. That is, the storage is expected to do something like: while() { read the next index tuple; unpack index columns into their places in table->record[0]; if ((res= handler_index_cond_check()) != ICP_NO_MATCH) { // return either an error or record, depending on the value of res. } } BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
Thanks a lot Sergei. This information is very helpful. I think this will be my (initial) attempt at implementing ICP for TokuDB. I will return the full pushed condition to MySQL, but I will use the information passed down to filter all index_next, index_next_same, and index_prev calls. This way, if we happen to accidentally not filter a row, MySQL will still behave correctly. On Fri, Feb 15, 2013 at 8:28 PM, Sergei Petrunia <psergey@askmonty.org> wrote:
On Thu, Feb 14, 2013 at 02:24:57PM -0500, Zardosht Kasheff wrote:
Also, can somebody please explain how handler_index_cond_check checks index conditions? The key function seems to be item->val_int. How does this get each value and check check conditions?
Function handler_index_cond_check() is invoked once for each index tuple.
That is, the storage is expected to do something like:
while() { read the next index tuple; unpack index columns into their places in table->record[0];
if ((res= handler_index_cond_check()) != ICP_NO_MATCH) { // return either an error or record, depending on the value of res. } }
BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
Thanks for the feedback. What makes this feature a little difficult for me to grasp is the requirement that the engine is responsible for doing some of the condition checking, and that MySQL is doing none of the checking.
Here are the handler cursor API functions:
index_first index_last index_next index_prev index_next_same index_read
If we return NULL for idx_cond_push, meaning we will do all of the condition checking, does that mean ALL of these functions need to check the condition before returning something to MySQL? Or just a subset? Actually, a subset. All of the functions that do forward scans must do
Hi Zardosht, On Thu, Feb 14, 2013 at 11:31:13AM -0500, Zardosht Kasheff wrote: the checking. ICP will not be used with functions that do reverse scans: index_last, index_prev.
Also, suppose we always return the full condition to MySQL, meaning the engine is responsible for none of the condition checking, but then the engine does condition checking anyway. Is that ok? Here is an example of what I mean. Suppose we have a key (a,b), and a query of select * from table where a between 5 and 10 and b=5; Would it be ok for our engine to return the full condition to MySQL, but then in subsequent calls to index_next and index_prev, choose to not return rows where b != 5?
I think it should be ok. Nothing comes to mind which could make it a problem. BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
participants (2)
-
Sergei Petrunia
-
Zardosht Kasheff