[Maria-discuss] DBMS vs Storage Engine Alone
Sorry to ask this question here, since it is not strictly MariaDB-related, but I need the knowledge of people that develop with databases: is there significant overhead to using a DBMS versus a storage engine directly? I have a project for which speed / latency is very important, and I was wondering if I could make significant performance gains by using a storage engine (such as WiredTiger, RocksDB, LevelDB, etc.) directly instead of using a DBMS (like MariaDB). More concretely, are there any studies out there on this to give me some objective numbers on this? Assume it is over a local loopback connection, so pretty much only the overhead of the networking itself would apply. Thank you in advance, - Jonathan M. Wilbur
On Fri, Dec 16, 2022 at 12:44 PM Jonathan M. Wilbur <jonathan@wilbur.space> wrote:
Sorry to ask this question here, since it is not strictly MariaDB-related, but I need the knowledge of people that develop with databases: is there significant overhead to using a DBMS versus a storage engine directly?
DBMS add a lot of query functionality, enforces ACID constraints, but also tries to make generalized tradeoffs favouring read performance and consistency and doing its best to minimize the write speed impacts. So in general you're better picking a solution than rolling your own.
I have a project for which speed / latency is very important,
Read speed or write speed? What about durability? Is the read access on recent writes or archival? Are reads complex?
and I was wondering if I could make significant performance gains by using a storage engine (such as WiredTiger, RocksDB, LevelDB, etc.) directly instead of using a DBMS (like MariaDB). More concretely, are there any studies out there on this to give me some objective numbers on this?
The MariaDB layer aims to be as minimal as possible. The only test case that really matters is your workload. I'd start modeling the client as a benchmark on this and see how it goes on MariaDB. See if the numbers seem fair. Modelling an alternate implementation should be easy after this.
Assume it is over a local loopback connection, so pretty much only the overhead of the networking itself would apply.
Thank you in advance,
- Jonathan M. Wilbur _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
More concretely, are there any studies out there on this to give me some objective numbers on this?
Not sure about studies but you can read about HandlerSocket plugin [1] to get some insights and numbers (from that time). [1] http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-fo... rr
On Fri, Dec 16, 2022 at 3:46 AM Jonathan M. Wilbur <jonathan@wilbur.space> wrote:
Sorry to ask this question here, since it is not strictly MariaDB-related, but I need the knowledge of people that develop with databases: is there significant overhead to using a DBMS versus a storage engine directly?
I was thinking about this recently. For some predetermined access patterns, it might pay off to translate the SQL statements directly into executable code that would directly access the buffer pool (page cache) of the storage engine. In terms of InnoDB, the generated code could work on the level of mini-transactions (atomic access to multiple buffer pages). The "class handler" interface that HandlerSocket uses still involves conversions between row formats, for example. I did not look into details, but https://github.com/KowalskiThomas/LLVMSQLite seemed interesting to me. Of course, we have to keep in mind that SQLite is a much simpler architecture, maybe not at all suitable for high-concurrency applications, but it could make sense in the embedded space or in some read-mostly settings. In a general purpose database, some just-in-time compilation could help with more complex queries. It would be a trade-off between memory management, time to compile, and time to run the compiled vs. interpreted code path. For a stream of simple arbitrary queries, interpreting might win. Best regards, Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
participants (4)
-
Daniel Black
-
Jonathan M. Wilbur
-
Marko Mäkelä
-
Reinis Rozitis