[Maria-developers] Can there be a better storage engine API?
The storage engine API has never been my favorite part of MySQL. I haven't written a storage engine but I used to maintain a custom storage engine and migrated that from 4.0 to 5.0 so I had some experience with the API. Is there a different API that would make it easier for innovation to come to MySQL/MariaDB? For example there is a lot going on in the write-optimized database space. Until recently we had nothing for MySQL open-source. Now we have open source TokuDB (great news). I haven't read their source but I have been told they use something that was derived from the BerkeleyDB API. Would the BerkeleyDB API make it easier to get new storage engines into MySQL? For example, we don't have an engine optimized for in-memory workloads. And while InnoDB will have great cache hit rates when all data is in-memory. It might write much more data to disk than required. Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work. -- Mark Callaghan mdcallag@gmail.com
I've worked on the TokuDB storage engine for quite a while now. I have had many experiences over the years, so I guess it's hard to know where to begin. I guess I will start small, and if the conversation evolves, I can contribute more thoughts. I think the current API is really good, as evidenced by the fact that many storage engines have used it to plug into MySQL. The two areas that I see we can really benefit from are the following: - documentation of the API, discussing clearly what the contracts for the functions are. There are still several functions I am not fully clear on, but am glad that they "just work". This includes documentation on the locking mechanisms. For instance, clear documentation on what external_lock, store_lock, and other related functions do would be great. - general API cleanup. In too many places, the functions just assume things, such as the buffer passed into write_row is the same as table->record[0]. In other instances, the well defined way to answer certain questions is not obvious. For instance: - If I want information on the fields, do I use the table object or table_share? - There seem to be several ways to determine if a field is NULL - There is redundant information in the KEY* structure and the fields that make up the KEY. I recall when working on MySQL 5.1, this information was inconsistent and it led to bugs, such as MySQL bug 37292 - I am sure there are plenty of other instances of this that I can find. - Transactions. The API seems to not be designed with transactions taken into account. Transactions are started in some cases of external_lock, or in start_stmt. Transactions are committed either in the handlerton's commit function or in some cases in external_lock. A clear API that takes transactions into consideration would be quite helpful. Those are my early thoughts. I am happy to share more if people are interested. -Zardosht On Tue, Aug 13, 2013 at 6:41 PM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
The storage engine API has never been my favorite part of MySQL. I haven't written a storage engine but I used to maintain a custom storage engine and migrated that from 4.0 to 5.0 so I had some experience with the API.
Is there a different API that would make it easier for innovation to come to MySQL/MariaDB? For example there is a lot going on in the write-optimized database space. Until recently we had nothing for MySQL open-source. Now we have open source TokuDB (great news). I haven't read their source but I have been told they use something that was derived from the BerkeleyDB API.
Would the BerkeleyDB API make it easier to get new storage engines into MySQL? For example, we don't have an engine optimized for in-memory workloads. And while InnoDB will have great cache hit rates when all data is in-memory. It might write much more data to disk than required.
Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work.
-- Mark Callaghan mdcallag@gmail.com
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi, Zardosht! Thanks for your feedback! I'm looking at the storage engine API from the other side, but my opinion is mostly the same as yours. Current API works, but it could be simplified/clarified in few places. There are part of it that I needed to explain many times. And I was saying, like, "it's historically so, just accept the way it is" or something along these lines. In particular, external_lock/start_stmt - that's, perhaps, the worst part in this regard. There's unnecessary boilerplate code - lines that every engine has to have, and they just copy it from each other. Even if minor, like incrementing status variables, setting auto-increment and timestamp columns, table->status, SHARE's. This is getting fixed though. I don't like the fact that some methods assume that char *buf is table->record[0]. I'd rather fix it (remove either the assumption or the method's argument). It never occurred to me that one can be confuse TABLE::fields and TABLE_SHARE::fields, but now I see that you're right. TABLE_SHARE should have kind of a Field_share object, while TABLE - Field_instance. Currently Field object inclused both, so the shared part is unnecessary duplicated and the "instance" part is unused in the TABLE_SHARE::fields. Come to think of it, almost everything I had to explain about storage engine API needs fixing :) Good parts of the API just work, there's no need to explain them. So, thanks for asking - it highlights problems! Besides, we're trying to add more features too. Like engine-specific attributes in the CREATE TABLE or the new table discovery API. Regards, Sergei On Aug 16, Zardosht Kasheff wrote:
I've worked on the TokuDB storage engine for quite a while now. I have had many experiences over the years, so I guess it's hard to know where to begin. I guess I will start small, and if the conversation evolves, I can contribute more thoughts. I think the current API is really good, as evidenced by the fact that many storage engines have used it to plug into MySQL. The two areas that I see we can really benefit from are the following: - documentation of the API, discussing clearly what the contracts for the functions are. There are still several functions I am not fully clear on, but am glad that they "just work". This includes documentation on the locking mechanisms. For instance, clear documentation on what external_lock, store_lock, and other related functions do would be great. - general API cleanup. In too many places, the functions just assume things, such as the buffer passed into write_row is the same as table->record[0]. In other instances, the well defined way to answer certain questions is not obvious. For instance: - If I want information on the fields, do I use the table object or table_share? - There seem to be several ways to determine if a field is NULL - There is redundant information in the KEY* structure and the fields that make up the KEY. I recall when working on MySQL 5.1, this information was inconsistent and it led to bugs, such as MySQL bug 37292 - I am sure there are plenty of other instances of this that I can find. - Transactions. The API seems to not be designed with transactions taken into account. Transactions are started in some cases of external_lock, or in start_stmt. Transactions are committed either in the handlerton's commit function or in some cases in external_lock. A clear API that takes transactions into consideration would be quite helpful.
Those are my early thoughts. I am happy to share more if people are interested.
Thanks for your response. On Fri, Aug 16, 2013 at 11:23 AM, Zardosht Kasheff <zardosht@gmail.com>wrote:
I've worked on the TokuDB storage engine for quite a while now. I have had many experiences over the years, so I guess it's hard to know where to begin. I guess I will start small, and if the conversation evolves, I can contribute more thoughts. I think the current API is really good, as evidenced by the fact that many storage engines have used it to plug into MySQL. The two areas that I see we can really benefit from are the following:
Many were written in the long-ago past. Besides TokuDB how many new storage engines have reached GA in the past decade? I worked on a custom storage engine and I am sure others have done the same, but there hasn't been much innovation in the public. Aria is also GA, but that was written by people who know and wrote parts of the API, so it isn't a sign that the API is something people want to use. Was TokuMX easier to implement than TokuDB? -- Mark Callaghan mdcallag@gmail.com
I think PBXT is the best example till now that the SE API actually works. Paul, who created it, also was not grown in MySQL but he managed to figure out good enough to write a complete transactional general-purpose storage engine. It did not quite reach GA level. But I don't think the API is to blame. I think Paul needed to turn his efforts towards activities earning him a living (and maybe also he realized that after Oracle's aquisition of Sun that there were no chances that PBXT would be aquired). But that does not exclude that better and more detailed docs as well as 'guides' would be useful. -- Peter Laursen On Mon, Aug 19, 2013 at 5:36 PM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
Thanks for your response.
On Fri, Aug 16, 2013 at 11:23 AM, Zardosht Kasheff <zardosht@gmail.com>wrote:
I've worked on the TokuDB storage engine for quite a while now. I have had many experiences over the years, so I guess it's hard to know where to begin. I guess I will start small, and if the conversation evolves, I can contribute more thoughts. I think the current API is really good, as evidenced by the fact that many storage engines have used it to plug into MySQL. The two areas that I see we can really benefit from are the following:
Many were written in the long-ago past. Besides TokuDB how many new storage engines have reached GA in the past decade? I worked on a custom storage engine and I am sure others have done the same, but there hasn't been much innovation in the public. Aria is also GA, but that was written by people who know and wrote parts of the API, so it isn't a sign that the API is something people want to use.
Was TokuMX easier to implement than TokuDB?
-- Mark Callaghan mdcallag@gmail.com
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi Mark, TokuMX is a quite different beast than TokuDB. First of all, we already had the experience of integrating our engine into one database product before we started. So many kinks in the TokuKV layer had already been worked out. But more importantly, TokuMX/MongoDB doesn't have a storage engine API. I think some people thought we were going to add a storage engine API to MongoDB and then plug ourselves into it. That wasn't the goal of TokuMX, the goal was simply to get our engine inside MongoDB as fast as possible, and the way to do that was to avoid thinking about what would be a good interface and instead to just do it. As everyone here I'm sure knows, making a good storage engine API is /really/ hard. Probably the hardest things in the TokuMX integration were learning how to deal with DDL (everything in MongoDB seems to use "lazy initialization"---for DDL operations at least), finding the right model within the MongoDB code to represent transactions, and reorganizing the locking. All these things were tightly coupled with the way the MongoDB storage system works (except transactions, well, because they didn't exist), but now in TokuMX they're pretty tightly coupled with the way TokuKV does things. In a way, we've created a storage API, but the API is defined by our version of db.h and nothing else implements that with the same assumptions we have, so it's probably not useful to compare the "TokuMX storage engine API" with the one in MySQL. In short, I'd say yes it was easier, but not because MongoDB has a better API (it doesn't have one), but because we had a bit of experience and because we didn't try to create or conform to a generic API. On Mon, Aug 19, 2013 at 11:36 AM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
Thanks for your response.
On Fri, Aug 16, 2013 at 11:23 AM, Zardosht Kasheff <zardosht@gmail.com>wrote:
I've worked on the TokuDB storage engine for quite a while now. I have had many experiences over the years, so I guess it's hard to know where to begin. I guess I will start small, and if the conversation evolves, I can contribute more thoughts. I think the current API is really good, as evidenced by the fact that many storage engines have used it to plug into MySQL. The two areas that I see we can really benefit from are the following:
Many were written in the long-ago past. Besides TokuDB how many new storage engines have reached GA in the past decade? I worked on a custom storage engine and I am sure others have done the same, but there hasn't been much innovation in the public. Aria is also GA, but that was written by people who know and wrote parts of the API, so it isn't a sign that the API is something people want to use.
Was TokuMX easier to implement than TokuDB?
-- Mark Callaghan mdcallag@gmail.com
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Cheers, Leif
Hi, From experience working on two storage engines on MySQL and being main developer for the second one, and one "storage engine" on big company, I would say MySQL storage engine API has some legacy problems and clearly some missing functionality. e.g. - Some part of the storage engines are always the same, parameters are not fully given to functions or you really can't return proper error messages - There is interface to get e.g. row type, but still you need to hack handler.cc/handler.h because some values are hard coded there (show create table) - Storage engine API clearly shows that transaction concept has being introduced after the first API version was designed, still you can't really return errors on commit operation - Storage engine can announce some that it can do some functionality to upper layer, but this is very limited. Storage engines should have more control what is done on MySQL parser/optimiser level and what is done on storage engine layer. E.g. for columnar storage engine, I would love to have opportunity to inform MySQL that I can do join's myself with all conditions pushed down to attributes, but that I can't do e.g. external sorting. - For in-memory storage engine there should be API where MySQL data type and in-memory data types are exactly the same to avoid conversions, similarly in-memory buffer pool should contain directly usable types for MySQL. Thus, as a conclusion, we really would need a better and cleaner API where storage engine has more control. R: Jan
The storage engine API has never been my favorite part of MySQL. I haven't written a storage engine but I used to maintain a custom storage engine and migrated that from 4.0 to 5.0 so I had some experience with the API.
Is there a different API that would make it easier for innovation to come to MySQL/MariaDB? For example there is a lot going on in the write-optimized database space. Until recently we had nothing for MySQL open-source. Now we have open source TokuDB (great news). I haven't read their source but I have been told they use something that was derived from the BerkeleyDB API.
Would the BerkeleyDB API make it easier to get new storage engines into MySQL? For example, we don't have an engine optimized for in-memory workloads. And while InnoDB will have great cache hit rates when all data is in-memory. It might write much more data to disk than required.
Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work.
-- Mark Callaghan mdcallag@gmail.com <mailto:mdcallag@gmail.com>
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Welcome back to the MySQL community Jan. It is good to see SkySQL/MariaDB get more InnoDB expertise. On Mon, Aug 19, 2013 at 9:32 AM, Jan Lindström <jplindst@mariadb.org> wrote:
Hi,
From experience working on two storage engines on MySQL and being main developer for the second one, and one "storage engine" on big company, I would say MySQL storage engine API has some legacy problems and clearly some missing functionality. e.g.
- Some part of the storage engines are always the same, parameters are not fully given to functions or you really can't return proper error messages
- There is interface to get e.g. row type, but still you need to hack handler.cc/handler.h because some values are hard coded there (show create table)
- Storage engine API clearly shows that transaction concept has being introduced after the first API version was designed, still you can't really return errors on commit operation
- Storage engine can announce some that it can do some functionality to upper layer, but this is very limited. Storage engines should have more control what is done on MySQL parser/optimiser level and what is done on storage engine layer. E.g. for columnar storage engine, I would love to have opportunity to inform MySQL that I can do join's myself with all conditions pushed down to attributes, but that I can't do e.g. external sorting.
- For in-memory storage engine there should be API where MySQL data type and in-memory data types are exactly the same to avoid conversions, similarly in-memory buffer pool should contain directly usable types for MySQL.
Thus, as a conclusion, we really would need a better and cleaner API where storage engine has more control.
R: Jan
The storage engine API has never been my favorite part of MySQL. I haven't written a storage engine but I used to maintain a custom storage engine and migrated that from 4.0 to 5.0 so I had some experience with the API.
Is there a different API that would make it easier for innovation to come to MySQL/MariaDB? For example there is a lot going on in the write-optimized database space. Until recently we had nothing for MySQL open-source. Now we have open source TokuDB (great news). I haven't read their source but I have been told they use something that was derived from the BerkeleyDB API.
Would the BerkeleyDB API make it easier to get new storage engines into MySQL? For example, we don't have an engine optimized for in-memory workloads. And while InnoDB will have great cache hit rates when all data is in-memory. It might write much more data to disk than required.
Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work.
-- Mark Callaghan mdcallag@gmail.com
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
On 08/19/2013 07:43 PM, MARK CALLAGHAN wrote:
Welcome back to the MySQL community Jan. It is good to see SkySQL/MariaDB get more InnoDB expertise.
Thanks, I have really enjoyed back on MySQL community. Many things are changed but many are as they were when I was moved to develop other databases by IBM. R: Jan
On 8/13/13 6:41 PM, MARK CALLAGHAN wrote:
Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work.
I'm another person who did a storage engine implementation for MySQL. The Akiban adapter for MySQL did make it to GA, but was not widely adopted. Our frustrations with the existing API were it's inconsistencies. Specific aggravations were: - The Create Table data didn't include the anything not supported by the MyISAM engine. Which meant digging through the Lex structure to find Foreign Keys declarations. This added a whole layer of complexity to the create and alter table processing. - Data type access. As part of our engine, we ended up translating the MySQL data and Datatypes to our own internal one. But the interface on the types was so inconsistent, we couldn't make one call to the type to extract/insert the data for each type. - No Access to the Optimizer. The Akiban engine was doing some interesting things with data layout to improve join performance. We had no way of hinting to the Optimizer to prefer x,y join over y,z join. - In addition to the begin/end transaction, we were looking for begin and end of a statement. We were doing this because the data for, what looked to MySQL as multiple tables, query may be loaded in one large chunk. Access to the list of tables used in a single query was not available easily, and the definitive start and end processing of a query to know when to stop our processing. - Returning errors is inconsistent and dependent upon the MyISAM engine assumptions of what could and could not error. So we would have problems notifying the user that their query had failed Many of the other oddities and badness we also encountered and found a work around for.
Were I interested in writing a loosely coupled storage engine for MySQL/MariaDB I like the API and docs available for PG and SQLite. We don't have docs like that and the API was much more difficult back in the day when I supported something that did this: http://www.postgresql.org/docs/9.3/static/fdwhandler.html http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html http://www.sqlite.org/vtab.html On Fri, Sep 6, 2013 at 12:10 PM, Thomas Jones-Low < thomas.jones-low@foundationdb.com> wrote:
On 8/13/13 6:41 PM, MARK CALLAGHAN wrote:
Do we need a better API or is the current code good enough? I hope to hear from people who have done serious storage engine work.
I'm another person who did a storage engine implementation for MySQL. The Akiban adapter for MySQL did make it to GA, but was not widely adopted.
Our frustrations with the existing API were it's inconsistencies. Specific aggravations were:
- The Create Table data didn't include the anything not supported by the MyISAM engine. Which meant digging through the Lex structure to find Foreign Keys declarations. This added a whole layer of complexity to the create and alter table processing.
- Data type access. As part of our engine, we ended up translating the MySQL data and Datatypes to our own internal one. But the interface on the types was so inconsistent, we couldn't make one call to the type to extract/insert the data for each type.
- No Access to the Optimizer. The Akiban engine was doing some interesting things with data layout to improve join performance. We had no way of hinting to the Optimizer to prefer x,y join over y,z join.
- In addition to the begin/end transaction, we were looking for begin and end of a statement. We were doing this because the data for, what looked to MySQL as multiple tables, query may be loaded in one large chunk. Access to the list of tables used in a single query was not available easily, and the definitive start and end processing of a query to know when to stop our processing.
- Returning errors is inconsistent and dependent upon the MyISAM engine assumptions of what could and could not error. So we would have problems notifying the user that their query had failed
Many of the other oddities and badness we also encountered and found a work around for.
______________________________**_________________ Mailing list: https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers> Post to : maria-developers@lists.**launchpad.net<maria-developers@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers> More help : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
-- Mark Callaghan mdcallag@gmail.com
participants (7)
-
Jan Lindström
-
Leif Walsh
-
MARK CALLAGHAN
-
Peter Laursen
-
Sergei Golubchik
-
Thomas Jones-Low
-
Zardosht Kasheff