[Maria-developers] Storage Engine API changes

Stewart Smith

8 Jul 2009 8 Jul '09

9:42 a.m.

So, I've been making a fair bit of changes around bits of the storage engine API (by all accounts for the better) in Drizzle. The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL). There's also the (now rather old) change to drop table return code. The next thing that will move into the StorageEngine is metadata handling with the engine being able to be responsible for its own (table) metadata. This is well and truly increasing the differences between MySQL/MariaDB and Drizzle in this area of code - increasing the work needed to port an engine (either way). I would guess it makes little sense for MySQL and MariaDB to diverge here, although I have been (and continue to be) okay with Drizzle diverging. So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way? -- Stewart Smith

Show replies by date

Peter Benjamin Volk

8 Jul 8 Jul

9:53 a.m.

Hi Stewart, Stewart Smith wrote:

...

The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL).

That sounds like an interesting Idea. This would though involve quite a few changes. Would the curser then still be row based? Or might you convert to a more column oriented way?

...

There's also the (now rather old) change to drop table return code.

The next thing that will move into the StorageEngine is metadata handling with the engine being able to be responsible for its own (table) metadata.

That would be wonderful. Does the Meta data include statistics on the table (e.g. selectivity estimates etc) that would be needed for the optimizers cost model?

...

So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way?

Is there a targeted MySQL version for this? Cheers, Peter -- Peter Benjamin Volk

Stewart Smith

12:23 p.m.

On Wed, Jul 08, 2009 at 03:53:15AM -0400, Peter Benjamin Volk wrote:

...

Stewart Smith wrote:

...
The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL).

That sounds like an interesting Idea. This would though involve quite a few changes. Would the curser then still be row based? Or might you convert to a more column oriented way?

We had a more column oriented API in NDB, and found that going to NdbRecord (very much row based) got us a pretty big performance boost. Or are you thinking for column based engines?

...

...
There's also the (now rather old) change to drop table return code.

The next thing that will move into the StorageEngine is metadata handling with the engine being able to be responsible for its own (table) metadata.

That would be wonderful. Does the Meta data include statistics on the table (e.g. selectivity estimates etc) that would be needed for the optimizers cost model?

Not yet. I have been thinking in the back of my mind how to make this better though.

...

...
So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way?

Is there a targeted MySQL version for this?

Not at all up to me. I'm doing things in Drizzle.... it's really up to the MySQL folks if they want to pull in the improvements too. -- Stewart Smith

Peter Benjamin Volk

12:42 p.m.

Stewart Smith

5:33 p.m.

On Wed, Jul 08, 2009 at 06:42:30AM -0400, Peter Benjamin Volk wrote:

...

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

HTML mail isn't a good idea... not as easy to read.

...

Multiple things. At first of course at column store oriented engines. The problem with this is that the MySQL core is strongly row based and would not take any column oriented optimizations into account even if the engine is column oriented. So any column store would actually need to bring their own optimizer and partially operators with them. The Idea would be to have a curser on the table (as proposed by you) but when scrolling over the data set to have two different access methods. One column oriented and the other row oriented. With both of these options it would be possible to access complete rows or have the SE it self filter out the unnecessary columns. Maybe the access methods could be controllable by the capabilities of the SE. This would give the optimizer multiple possibilities to access the data on a row with different costs. For column oriented stores it would be cheaper to read over the column API and for row based it would read over the row API. To compensate the different APIs you can define different transformation operators. Also this way specific selections can be pushed down to the SE very easily--SELECT a,b FROM t would only read columns a and b from the table t. Instead of reading the complete row of t (maybe consisting of N+1 columns). The performance degradation of the API to NDB is possibly due to the fact that MySQL would read the complete row from a column oriented interface (speculation only. Don't know the work on this topic in NDB). There has been quite a bit of publications on the SIGMOD '09 that column stores are very suitable for analytics as well as OLTP. So maybe this might be an interesting trend? Also having the different access methods modularizes the DBMS a lot more --> more plugins :) wich is in the sense of Drizzle. <br>

the read_set and write_set parameters should take care of the only retreiving/setting certain columns. Although that still leaves optimiser things... which is another project altogether :) -- Stewart Smith

Sergei Golubchik

3:43 p.m.

Hi, Peter! On Jul 08, Peter Benjamin Volk wrote:

...

Hi Stewart,

Stewart Smith wrote:

...
The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL).

So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way?

Is there a targeted MySQL version for this?

it's WL#3288 (created 22 May 2006) for the API and server support and WL#4603 (17 Oct 2008) for the default implementation in the handler class. These WL's are part of the foreign key work, and will probably be released together with foreign keys (but could go out earlier too). Regards / Mit vielen Grüßen, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring

Konstantin Osipov

4:49 p.m.

* Sergei Golubchik <serg@mysql.com> [09/07/08 18:12]:

...

it's WL#3288 (created 22 May 2006) for the API and server support and WL#4603 (17 Oct 2008) for the default implementation in the handler class.

Taking the opportunity to remind that I still need a review on my patch for 3288 :-). Stewart, as for Drizzle, well, at the point you have a patch it will be a bit too late to decide whether we want to "pull it in". A storage engine API change that introduces cursors needs architecture work and prototyping. I fear we will end up with 3 disparate storage engine APIs in Drizzle, MySQL and MariaDB. --

Brian Aker

5:24 p.m.

Hi! On Jul 8, 2009, at 7:49 AM, Konstantin Osipov wrote:

...

3 disparate storage engine APIs in Drizzle, MySQL and MariaDB.

We have 4 :) The original MySQL one which all of the vendors use. Drizzle, the MySQL (Sun/Oracle) created one, and the MariaDB one. Most of the vendors who have ported to Drizzle haven't had issues. Depending on what the other ones turn out to be like... I somewhat suspect this won't be much of an issue. We could all simplify and pick some sort of standard, like say modeling from BDB, but... that would require a bit of working together. Cheers, -Brian BTW I suspect for Drizzle we will resurrect BDB at some point in the future mainly to provide this sort of interface. From talking to Josh Berkus, there may be a standard interface out there defined by some committee that could also be used (I believe they are looking at this for Postgres).

Greg Stark

6:19 p.m.

On Wed, Jul 8, 2009 at 4:24 PM, Brian Aker<brian@tangent.org> wrote:

...

BTW I suspect for Drizzle we will resurrect BDB at some point in the future mainly to provide this sort of interface. From talking to Josh Berkus, there may be a standard interface out there defined by some committee that could also be used (I believe they are looking at this for Postgres).

Hm, well there's SQL/MED which is support for federated tables. Ie, declaring a table which lives off in another database and accessing it requires issuing a query against that remote database. Federated tables do have some things in common with MySQL-style pluggable storage engines in that the different remote databases may support different access paths and require delegating a lot of intelligence to the remote database to be able to exploit them. You also want to be able to push joins between two remote tables to the remote database etc. It's also natural to express things like text files or spreadsheets as SQL/MED data sources. But the SQL/MED standard is just about the SQL syntax to declare these things. It doesn't really describe the internal API or get into any implementation details. -- greg http://mit.edu/~gsstark/resume.pdf

Stewart Smith

5:46 p.m.

On Wed, Jul 08, 2009 at 06:49:43PM +0400, Konstantin Osipov wrote:

...

* Sergei Golubchik <serg@mysql.com> [09/07/08 18:12]:

...
it's WL#3288 (created 22 May 2006) for the API and server support and WL#4603 (17 Oct 2008) for the default implementation in the handler class.

Taking the opportunity to remind that I still need a review on my patch for 3288 :-).

Stewart, as for Drizzle, well, at the point you have a patch it will be a bit too late to decide whether we want to "pull it in".

A storage engine API change that introduces cursors needs architecture work and prototyping. I fear we will end up with 3 disparate storage engine APIs in Drizzle, MySQL and MariaDB.

The current aim is to remove things that obviously aren't related to a cursor and move them up. e.g. create table is now in StorageEngine. at some point s/handler/cursor/ -- Stewart Smith

Konstantin Osipov

7:47 p.m.

* Stewart Smith <stewart@sun.com> [09/07/08 21:16]:

...

The current aim is to remove things that obviously aren't related to a cursor and move them up.

e.g. create table is now in StorageEngine.

at some point s/handler/cursor/

Bad idea IMHO. You don't need to keep all cursors in the table cache and look every cursor with thr_lock. You should be able to create cursors at leisure during execution, and destroy them when not needed. --

Jay Pipes

9:56 p.m.

Konstantin Osipov wrote:

...

* Stewart Smith <stewart@sun.com> [09/07/08 21:16]:

...
The current aim is to remove things that obviously aren't related to a cursor and move them up.

e.g. create table is now in StorageEngine.

at some point s/handler/cursor/

Bad idea IMHO. You don't need to keep all cursors in the table cache and look every cursor with thr_lock.

Hi! I'm not sure what you mean by "look every cursor with thr_lock"? Do you mean "lock every cursor with thr_lock"? If so, we're in the process of removing the thr_lock stuff entirely (a lot easier now that MyISAM is TEMP TABLE ONLY in Drizzle...

...

You should be able to create cursors at leisure during execution, and destroy them when not needed.

Agreed. Part of the work we are doing is reducing and simplifiying the Handler class before it gets converted to a stand-alone Cursor class that can be allocated, used, and destroyed without worrying about thr_lock mess... -jay

Konstantin Osipov

10:50 p.m.

* Jay Pipes <Jay.Pipes@Sun.COM> [09/07/09 00:02]:

...

...
...
The current aim is to remove things that obviously aren't related to a cursor and move them up. e.g. create table is now in StorageEngine. at some point s/handler/cursor/ Bad idea IMHO. You don't need to keep all cursors in the table cache and look every cursor with thr_lock.

What is the suggested way of getting hold of a cursor? open_table(), correct? But open_table() is overloaded: - it protects the table from being altered in another connection - it has bits of StorageInterface in it. - finally, it's a cursor. So the current way to read from a storage engine cursor, and you're not changing that it seems, is to get hold of a TABLE instance, a handler instance, and then do rnd_init()/rnd_next()/rnd_end() on it. If you want to read from two different indexes of the same table simultaneously, you need two TABLE and handler instances. This is at the core of the problem. Instead, one needs to be able to do: Table *table= storage_interface->get_table(); Cursor *cursor1= table->create_rnd_scan_cursor(); Cursor *cursor2= table->create_index_scan_cursor(index); cursor1->open(); cursor2->open(); cursor1->read_next(); cursor2->read_next(); printf("First row in index: %s, First row physically %s\n", cursor1->record[0].to_string(), cursor2->record[0].to_string()); delete cursor1; delete cursor2; delete table; If you just move stuff that logically belongs to StorageInterface, and then rename handler to cursor you still have to: - have an own handler instance for each independent read/scan or update. An update is also part of the cursor API, but more advanced one - class Updatable_cursor: public Cursor. - have an own TABLE instance for each independent read/scan/update. Because of table->field[], table->auto_increment* and table->record[], which is, really, not part of the table, but a structure to exchange data with the table. - use a different method depending on whether you want to read the next record from an index (index_next), or in physical order (rnd_next()) To change the server to use the new cursor API you need to change the execution layer -- JOIN::execute(), READ_RECORD, join_read* functions and sub_select code, the parser to not add another TABLE_LIST to thd->lex->query_tables for each time a table is mentioned in a statement, open_tables to open no more than one instance of the same table, the table cache, to not hold more than one instance of the same table. Put it this way: once you have a correct solution for the problem, Bug#7742 will get solved naturally.

...

Hi! I'm not sure what you mean by "look every cursor with thr_lock"? Do you mean "lock every cursor with thr_lock"? If so, we're in the process of removing the thr_lock stuff entirely (a lot easier now that MyISAM is TEMP TABLE ONLY in Drizzle...

...
You should be able to create cursors at leisure during execution, and destroy them when not needed.

Agreed. Part of the work we are doing is reducing and simplifiying the Handler class before it gets converted to a stand-alone Cursor class that can be allocated, used, and destroyed without worrying about thr_lock mess...

-jay

Jay Pipes

11:22 p.m.

Konstantin Osipov wrote:

...

* Jay Pipes <Jay.Pipes@Sun.COM> [09/07/09 00:02]:

...
...
...
The current aim is to remove things that obviously aren't related to a cursor and move them up. e.g. create table is now in StorageEngine. at some point s/handler/cursor/ Bad idea IMHO. You don't need to keep all cursors in the table cache and look every cursor with thr_lock.

What is the suggested way of getting hold of a cursor? open_table(), correct? But open_table() is overloaded: - it protects the table from being altered in another connection - it has bits of StorageInterface in it. - finally, it's a cursor.

Exactly! Too much going on in the handler class and in methods like open_table() :)

...

So the current way to read from a storage engine cursor, and you're not changing that it seems, is to get hold of a TABLE instance, a handler instance, and then do rnd_init()/rnd_next()/rnd_end() on it.

If you want to read from two different indexes of the same table simultaneously, you need two TABLE and handler instances.

This is at the core of the problem.

Instead, one needs to be able to do:

Table *table= storage_interface->get_table();

Cursor *cursor1= table->create_rnd_scan_cursor(); Cursor *cursor2= table->create_index_scan_cursor(index);

cursor1->open(); cursor2->open(); cursor1->read_next(); cursor2->read_next();

printf("First row in index: %s, First row physically %s\n", cursor1->record[0].to_string(), cursor2->record[0].to_string());

delete cursor1; delete cursor2; delete table;

This is actually very similar to the efforts underway in my local trees. Although, I have to admit, I've tried killing the term "rnd" since nobody seems to understand what the heck it means ;)

...

If you just move stuff that logically belongs to StorageInterface, and then rename handler to cursor you still have to:

- have an own handler instance for each independent read/scan or update. An update is also part of the cursor API, but more advanced one - class Updatable_cursor: public Cursor.

Sure, but first things first...get the basic stuff fine-tuned and working correctly before we try to bite off more than we can chew (or that the kernel should chew itself vs. a plugin...) :)

...

- have an own TABLE instance for each independent read/scan/update. Because of table->field[], table->auto_increment* and table->record[], which is, really, not part of the table, but a structure to exchange data with the table.

These are more appropriately handled by a TableShare with a built-in version or timestamp, no?

...

- use a different method depending on whether you want to read the next record from an index (index_next), or in physical order (rnd_next())

This is fairly simple to do with abstract base classes and subclass implementations in C++... Basically, the READ_RECORD structure currently does this work in a C-style way, no?

...

To change the server to use the new cursor API you need to change the execution layer -- JOIN::execute(), READ_RECORD, join_read* functions and sub_select code, the parser to not add another TABLE_LIST to thd->lex->query_tables for each time a table is mentioned in a statement, open_tables to open no more than one instance of the same table, the table cache, to not hold more than one instance of the same table.

Yes, exactly. This is the work we are currently doing.

...

Put it this way: once you have a correct solution for the problem, Bug#7742 will get solved naturally.

Cool :)

...

...
Hi! I'm not sure what you mean by "look every cursor with thr_lock"? Do you mean "lock every cursor with thr_lock"? If so, we're in the process of removing the thr_lock stuff entirely (a lot easier now that MyISAM is TEMP TABLE ONLY in Drizzle...

...
You should be able to create cursors at leisure during execution, and destroy them when not needed. Agreed. Part of the work we are doing is reducing and simplifiying the Handler class before it gets converted to a stand-alone Cursor class that can be allocated, used, and destroyed without worrying about thr_lock mess...

-jay

Brian Aker

10:12 p.m.

Hi! Can you explain this a bit more? I can't follow what you mean by your statement (even as to how it released to the class MySQL branch). Cheers, -Brian On Jul 8, 2009, at 10:47 AM, Konstantin Osipov wrote:

...

* Stewart Smith <stewart@sun.com> [09/07/08 21:16]:

...
The current aim is to remove things that obviously aren't related to a cursor and move them up.

e.g. create table is now in StorageEngine.

at some point s/handler/cursor/

Bad idea IMHO. You don't need to keep all cursors in the table cache and look every cursor with thr_lock. You should be able to create cursors at leisure during execution, and destroy them when not needed.

--

-- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/internals?unsub=brian@tangent.org

5845

Age (days ago)

5845

Last active (days ago)

List overview

14 comments

7 participants

participants (7)

Brian Aker
Greg Stark
Jay Pipes
Konstantin Osipov
Peter Benjamin Volk
Sergei Golubchik
Stewart Smith