[Maria-discuss] Rocks, toku and some performance considerations.
Hi Everyone, i got recently interested in rocks and have a couple of questions if anyone else is doing/done some migrations to this engine: 1. Noticed that compared to toku - I/O overhead for range SELECT's with cold caches is much higher, especially on vmware I/O utilization (WAit) goes over 50% of available CPU on standard cfq scheduler and 4 cores. Changing it to noop helps and utilization drops around tenfold to 5-10%. It doesn't matter if we're doing full table scan or range scan with index. Basically i thought for LSM lists there'll be almost no IOPs and I/O utilization will be non-existing because all you need to do is reading a long, continuous block on disk (we're using SSDs). Any reason for that or am i doing something incorrectly? (default config + 6gb rocks key cache) Any reason it's working so badly with cfq and much better with noop? 2. Now with keys 100% cached - rocks is still around 2-3 times slower than toku for range scans, and even considerably slower than innodb. From what i understand keycache for rocks stores uncompressed keys, so is there some performance issue eg. with copying data from global keycache to local storage and is it going to be fixed or that's something related to how rocks works internally and there's no way of making it work better in the future? 3. In general it was a huge surprise that toku which is (at least it seems) so complicated to implement, and is based on very complex variation of b tree is having so much better read performance and so much better i/o characteristics... than "simple" (from what i understand) sorted list, which could be probably just read in bulk like a simple log file... 4. I found some info that having rocks databases which are over 100gb in size is not recommended (and it seems this is tiny... we were able to work with myisam tables which were close to 2TB). Also merging data could make the DB to end up being twice as big for some time. Are there any plans of implementing one-file-per-table like for inno? 5. What's the future of toku? I understand that percona is considering dropping it, will you take developement if that will happen or are you going to obsolete it and focus on rocks? From what i saw it seems that rocks and toku have totally different characteristics. So rocks is decent for point-reads (seems faster than toku when number of row reads is low)... and it seems to require less memory than toku, but for range scans it isn't going nowhere close. So both engines may have completely different use cases (eg. toku is great for long running statistical queries and servers with a lot of memory) Thanks
Hi! It seems MariaDB is dropping TokuDB in 10.5. See https://jira.mariadb.org/browse/MDEV-19780 BR, Jocelyn Fournier
Le 12 sept. 2019 à 14:31, pslawek83 <pslawek83@o2.pl> a écrit :
Hi Everyone, i got recently interested in rocks and have a couple of questions if anyone else is doing/done some migrations to this engine:
1. Noticed that compared to toku - I/O overhead for range SELECT's with cold caches is much higher, especially on vmware I/O utilization (WAit) goes over 50% of available CPU on standard cfq scheduler and 4 cores. Changing it to noop helps and utilization drops around tenfold to 5-10%. It doesn't matter if we're doing full table scan or range scan with index. Basically i thought for LSM lists there'll be almost no IOPs and I/O utilization will be non-existing because all you need to do is reading a long, continuous block on disk (we're using SSDs). Any reason for that or am i doing something incorrectly? (default config + 6gb rocks key cache) Any reason it's working so badly with cfq and much better with noop?
2. Now with keys 100% cached - rocks is still around 2-3 times slower than toku for range scans, and even considerably slower than innodb. From what i understand keycache for rocks stores uncompressed keys, so is there some performance issue eg. with copying data from global keycache to local storage and is it going to be fixed or that's something related to how rocks works internally and there's no way of making it work better in the future?
3. In general it was a huge surprise that toku which is (at least it seems) so complicated to implement, and is based on very complex variation of b tree is having so much better read performance and so much better i/o characteristics... than "simple" (from what i understand) sorted list, which could be probably just read in bulk like a simple log file...
4. I found some info that having rocks databases which are over 100gb in size is not recommended (and it seems this is tiny... we were able to work with myisam tables which were close to 2TB). Also merging data could make the DB to end up being twice as big for some time. Are there any plans of implementing one-file-per-table like for inno?
5. What's the future of toku? I understand that percona is considering dropping it, will you take developement if that will happen or are you going to obsolete it and focus on rocks? From what i saw it seems that rocks and toku have totally different characteristics. So rocks is decent for point-reads (seems faster than toku when number of row reads is low)... and it seems to require less memory than toku, but for range scans it isn't going nowhere close. So both engines may have completely different use cases (eg. toku is great for long running statistical queries and servers with a lot of memory)
Thanks _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hello, how did you configure myrocks/rocksdb? Have you set up bloom filters? Maybe you could post your rocksdb options from the my.cnf in use here. This link was helpful to me when setting up myrocks: https://smalldatum.blogspot.com/search?q=myrocks+options Regards Jonas Am Do., 12. Sept. 2019 um 14:44 Uhr schrieb jocelyn fournier < jocelyn.fournier@gmail.com>:
Hi!
It seems MariaDB is dropping TokuDB in 10.5. See https://jira.mariadb.org/browse/MDEV-19780
BR, Jocelyn Fournier
Le 12 sept. 2019 à 14:31, pslawek83 <pslawek83@o2.pl> a écrit :
Hi Everyone, i got recently interested in rocks and have a couple of questions if anyone else is doing/done some migrations to this engine:
1. Noticed that compared to toku - I/O overhead for range SELECT's with
cold caches is much higher, especially on vmware I/O utilization (WAit) goes over 50% of available CPU on standard cfq scheduler and 4 cores. Changing it to noop helps and utilization drops around tenfold to 5-10%. It doesn't matter if we're doing full table scan or range scan with index. Basically i thought for LSM lists there'll be almost no IOPs and I/O utilization will be non-existing because all you need to do is reading a long, continuous block on disk (we're using SSDs). Any reason for that or am i doing something incorrectly? (default config + 6gb rocks key cache) Any reason it's working so badly with cfq and much better with noop? > > 2. Now with keys 100% cached - rocks is still around 2-3 times slower than toku for range scans, and even considerably slower than innodb. From what i understand keycache for rocks stores uncompressed keys, so is there some performance issue eg. with copying data from global keycache to local storage and is it going to be fixed or that's something related to how rocks works internally and there's no way of making it work better in the future? > > 3. In general it was a huge surprise that toku which is (at least it seems) so complicated to implement, and is based on very complex variation of b tree is having so much better read performance and so much better i/o characteristics... than "simple" (from what i understand) sorted list, which could be probably just read in bulk like a simple log file... > > 4. I found some info that having rocks databases which are over 100gb in size is not recommended (and it seems this is tiny... we were able to work with myisam tables which were close to 2TB). Also merging data could make the DB to end up being twice as big for some time. Are there any plans of implementing one-file-per-table like for inno? > > 5. What's the future of toku? I understand that percona is considering dropping it, will you take developement if that will happen or are you going to obsolete it and focus on rocks? From what i saw it seems that rocks and toku have totally different characteristics. So rocks is decent for point-reads (seems faster than toku when number of row reads is low)... and it seems to require less memory than toku, but for range scans it isn't going nowhere close. So both engines may have completely different use cases (eg. toku is great for long running statistical queries and servers with a lot of memory) > > Thanks > _______________________________________________ > Mailing list: https://launchpad.net/~maria-discuss > Post to : maria-discuss@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-discuss > More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
It seems MariaDB is dropping TokuDB in 10.5. See https://jira.mariadb.org/browse/MDEV-19780
BR, Jocelyn Fournier
Why oh why ... it's such a good engine for large / arhive type data tables (great compression / no fragmentation). All our attempts to use rocksdb have failed so far with random issues :( rr
Hi, Reinis! On Sep 12, Reinis Rozitis wrote:
It seems MariaDB is dropping TokuDB in 10.5. See https://jira.mariadb.org/browse/MDEV-19780
BR, Jocelyn Fournier
Why oh why ... it's such a good engine for large / arhive type data tables (great compression / no fragmentation).
The MDEV-19780 has the reason. Quoting: https://www.percona.com/doc/percona-server/8.0/tokudb/tokudb_intro.html TokuDB is deprecated in the 8.0 series and will be supported through the 8.0 series until further notice. This storage engine will not be included in the next major release of Percona Server for MySQL. We recommend MyRocks as a long-term migration path. Percona is the owner and the maintainer of TokuDB. We package it, we fix MariaDB-specific bugs, but most TokuDB bugs are reported to and fixed by upstream. When Percona deprecates TokuDB, we don't have much of a choice, but to do the same.
All our attempts to use rocksdb have failed so far with random issues :(
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (5)
-
jocelyn fournier
-
Jonas Krauss
-
pslawek83
-
Reinis Rozitis
-
Sergei Golubchik