[Maria-developers] Storage Engine API changes
So, I've been making a fair bit of changes around bits of the storage engine API (by all accounts for the better) in Drizzle. The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL). There's also the (now rather old) change to drop table return code. The next thing that will move into the StorageEngine is metadata handling with the engine being able to be responsible for its own (table) metadata. This is well and truly increasing the differences between MySQL/MariaDB and Drizzle in this area of code - increasing the work needed to port an engine (either way). I would guess it makes little sense for MySQL and MariaDB to diverge here, although I have been (and continue to be) okay with Drizzle diverging. So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way? -- Stewart Smith
Hi Stewart, Stewart Smith wrote:
The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL).
That sounds like an interesting Idea. This would though involve quite a few changes. Would the curser then still be row based? Or might you convert to a more column oriented way?
That would be wonderful. Does the Meta data include statistics on the table (e.g. selectivity estimates etc) that would be needed for the optimizers cost model?
So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way?
Is there a targeted MySQL version for this? Cheers, Peter -- Peter Benjamin Volk
On Wed, Jul 08, 2009 at 03:53:15AM -0400, Peter Benjamin Volk wrote:
We had a more column oriented API in NDB, and found that going to NdbRecord (very much row based) got us a pretty big performance boost. Or are you thinking for column based engines?
Not yet. I have been thinking in the back of my mind how to make this better though.
Not at all up to me. I'm doing things in Drizzle.... it's really up to the MySQL folks if they want to pull in the improvements too. -- Stewart Smith
On Wed, Jul 08, 2009 at 06:42:30AM -0400, Peter Benjamin Volk wrote:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
HTML mail isn't a good idea... not as easy to read.
the read_set and write_set parameters should take care of the only retreiving/setting certain columns. Although that still leaves optimiser things... which is another project altogether :) -- Stewart Smith
Hi, Peter! On Jul 08, Peter Benjamin Volk wrote:
it's WL#3288 (created 22 May 2006) for the API and server support and WL#4603 (17 Oct 2008) for the default implementation in the handler class. These WL's are part of the foreign key work, and will probably be released together with foreign keys (but could go out earlier too). Regards / Mit vielen Grüßen, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@sun.com> / /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect /_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028 <___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring
* Sergei Golubchik <serg@mysql.com> [09/07/08 18:12]:
Taking the opportunity to remind that I still need a review on my patch for 3288 :-). Stewart, as for Drizzle, well, at the point you have a patch it will be a bit too late to decide whether we want to "pull it in". A storage engine API change that introduces cursors needs architecture work and prototyping. I fear we will end up with 3 disparate storage engine APIs in Drizzle, MySQL and MariaDB. --
Hi! On Jul 8, 2009, at 7:49 AM, Konstantin Osipov wrote:
3 disparate storage engine APIs in Drizzle, MySQL and MariaDB.
We have 4 :) The original MySQL one which all of the vendors use. Drizzle, the MySQL (Sun/Oracle) created one, and the MariaDB one. Most of the vendors who have ported to Drizzle haven't had issues. Depending on what the other ones turn out to be like... I somewhat suspect this won't be much of an issue. We could all simplify and pick some sort of standard, like say modeling from BDB, but... that would require a bit of working together. Cheers, -Brian BTW I suspect for Drizzle we will resurrect BDB at some point in the future mainly to provide this sort of interface. From talking to Josh Berkus, there may be a standard interface out there defined by some committee that could also be used (I believe they are looking at this for Postgres).
On Wed, Jul 8, 2009 at 4:24 PM, Brian Aker<brian@tangent.org> wrote:
Hm, well there's SQL/MED which is support for federated tables. Ie, declaring a table which lives off in another database and accessing it requires issuing a query against that remote database. Federated tables do have some things in common with MySQL-style pluggable storage engines in that the different remote databases may support different access paths and require delegating a lot of intelligence to the remote database to be able to exploit them. You also want to be able to push joins between two remote tables to the remote database etc. It's also natural to express things like text files or spreadsheets as SQL/MED data sources. But the SQL/MED standard is just about the SQL syntax to declare these things. It doesn't really describe the internal API or get into any implementation details. -- greg http://mit.edu/~gsstark/resume.pdf
Konstantin Osipov wrote:
Hi! I'm not sure what you mean by "look every cursor with thr_lock"? Do you mean "lock every cursor with thr_lock"? If so, we're in the process of removing the thr_lock stuff entirely (a lot easier now that MyISAM is TEMP TABLE ONLY in Drizzle...
Agreed. Part of the work we are doing is reducing and simplifiying the Handler class before it gets converted to a stand-alone Cursor class that can be allocated, used, and destroyed without worrying about thr_lock mess... -jay
* Jay Pipes <Jay.Pipes@Sun.COM> [09/07/09 00:02]:
What is the suggested way of getting hold of a cursor? open_table(), correct? But open_table() is overloaded: - it protects the table from being altered in another connection - it has bits of StorageInterface in it. - finally, it's a cursor. So the current way to read from a storage engine cursor, and you're not changing that it seems, is to get hold of a TABLE instance, a handler instance, and then do rnd_init()/rnd_next()/rnd_end() on it. If you want to read from two different indexes of the same table simultaneously, you need two TABLE and handler instances. This is at the core of the problem. Instead, one needs to be able to do: Table *table= storage_interface->get_table(); Cursor *cursor1= table->create_rnd_scan_cursor(); Cursor *cursor2= table->create_index_scan_cursor(index); cursor1->open(); cursor2->open(); cursor1->read_next(); cursor2->read_next(); printf("First row in index: %s, First row physically %s\n", cursor1->record[0].to_string(), cursor2->record[0].to_string()); delete cursor1; delete cursor2; delete table; If you just move stuff that logically belongs to StorageInterface, and then rename handler to cursor you still have to: - have an own handler instance for each independent read/scan or update. An update is also part of the cursor API, but more advanced one - class Updatable_cursor: public Cursor. - have an own TABLE instance for each independent read/scan/update. Because of table->field[], table->auto_increment* and table->record[], which is, really, not part of the table, but a structure to exchange data with the table. - use a different method depending on whether you want to read the next record from an index (index_next), or in physical order (rnd_next()) To change the server to use the new cursor API you need to change the execution layer -- JOIN::execute(), READ_RECORD, join_read* functions and sub_select code, the parser to not add another TABLE_LIST to thd->lex->query_tables for each time a table is mentioned in a statement, open_tables to open no more than one instance of the same table, the table cache, to not hold more than one instance of the same table. Put it this way: once you have a correct solution for the problem, Bug#7742 will get solved naturally.
--
Konstantin Osipov wrote:
Exactly! Too much going on in the handler class and in methods like open_table() :)
This is actually very similar to the efforts underway in my local trees. Although, I have to admit, I've tried killing the term "rnd" since nobody seems to understand what the heck it means ;)
Sure, but first things first...get the basic stuff fine-tuned and working correctly before we try to bite off more than we can chew (or that the kernel should chew itself vs. a plugin...) :)
These are more appropriately handled by a TableShare with a built-in version or timestamp, no?
This is fairly simple to do with abstract base classes and subclass implementations in C++... Basically, the READ_RECORD structure currently does this work in a C-style way, no?
Yes, exactly. This is the work we are currently doing.
Put it this way: once you have a correct solution for the problem, Bug#7742 will get solved naturally.
Cool :)
participants (7)
-
Brian Aker
-
Greg Stark
-
Jay Pipes
-
Konstantin Osipov
-
Peter Benjamin Volk
-
Sergei Golubchik
-
Stewart Smith