[Maria-developers] MariaDB scalability: mutexes
Hi Axel, I benchmarked various mutexes available to MariaDB. Results are here: http://svoj-db.blogspot.ru/2014/02/mariadb-mutexes-scalability.html Among other things I noticed that normal mutexes scale better than adaptive. And it looks like it gives over 5 000 TPS in my benchmarks: Original -------- 64 threads, time spent: 60s, queries executed: 11500723, qps: 191678, 1 thread qps: 2994, tps: 13691 64 threads, time spent: 60s, queries executed: 10983665, qps: 183061, 1 thread qps: 2860, tps: 13075 64 threads, time spent: 60s, queries executed: 11080325, qps: 184672, 1 thread qps: 2885, tps: 13190 slow LOCK_open -------------- 64 threads, time spent: 60s, queries executed: 12948365, qps: 215806, 1 thread qps: 3371, tps: 15414 64 threads, time spent: 60s, queries executed: 13191584, qps: 219859, 1 thread qps: 3435, tps: 15704 64 threads, time spent: 60s, queries executed: 13050647, qps: 217510, 1 thread qps: 3398, tps: 15536 slow LOCK_open + THR_LOCK::mutex -------------------------------- 64 threads, time spent: 60s, queries executed: 15863174, qps: 264386, 1 thread qps: 4131, tps: 18884 64 threads, time spent: 60s, queries executed: 15761944, qps: 262699, 1 thread qps: 4104, tps: 18764 64 threads, time spent: 60s, queries executed: 15773239, qps: 262887, 1 thread qps: 4107, tps: 18777 Could you do official benchmarks (patch attached)? It would be nice to also benchmark MySQL-5.6 with thr_lock part of the patch. Thanks, Sergey
Hi Svoj, Sergey Vojtovich wrote:
I benchmarked various mutexes available to MariaDB. Among other things I noticed that normal mutexes scale better than adaptive.
Could you do official benchmarks (patch attached)? It would be nice to also benchmark MySQL-5.6 with thr_lock part of the patch.
Here are sysbench results for 10.0.8. I added 10.0.7 and MySQL-5.6.10 from an earlier run for comparison. The test is sysbench OLTP ro with multiple (32) tables. The normal mutexes give significantly better throughput. The raw data contains also the snapshots from performance schema with the mutex numbers. Top mutex is still LOCK_open. I currently run another sysbench OLTP, this time also r/w and with both lower and higher concurrency. I added 10.0.8-patched to this, but it won't finish before tomorrow. XL
Hi Axel, thanks for benchmarking it! On Wed, Feb 26, 2014 at 02:30:50PM +0100, Axel Schwenke wrote:
Hi Svoj,
Sergey Vojtovich wrote:
I benchmarked various mutexes available to MariaDB. Among other things I noticed that normal mutexes scale better than adaptive.
Could you do official benchmarks (patch attached)? It would be nice to also benchmark MySQL-5.6 with thr_lock part of the patch.
Here are sysbench results for 10.0.8. I added 10.0.7 and MySQL-5.6.10 from an earlier run for comparison. The test is sysbench OLTP ro with multiple (32) tables.
The normal mutexes give significantly better throughput. The raw data contains also the snapshots from performance schema with the mutex numbers. Top mutex is still LOCK_open. In my benchmark I got like 40% throughput increase whereas in your benchmark it is like 10%, still not bad.
But you say that there are 32 tables now. It greatly offloads work from THR_LOCK::mutex (which is per-table) to ... LOCK_open (which is global). I believe we should see bigger difference for single table load. Anyway 10% increase looks worthy. I'll try to get this patch into 10.0.9. Hope it won't sacrifice other loads (according to my microbenchmark it shouldn't).
I currently run another sysbench OLTP, this time also r/w and with both lower and higher concurrency. I added 10.0.8-patched to this, but it won't finish before tomorrow. Great! Please keep me updated.
Thanks, Sergey
Hi Svoj, you seemed to be interested in a OLTP benchmark with a single table, so here it is. Axel Schwenke wrote:
Here are sysbench results for 10.0.8. I added 10.0.7 and MySQL-5.6.10 from an earlier run for comparison. The test is sysbench OLTP ro with multiple (32) tables.
Attached more results. tps.dat shows numbers for the same benchmark as above, but using a single big table. Again the normal mutexes show some benefits, but only at elevated concurrency. Mutex statistics are again in the zip files. The other benchmark (tps1.dat) is sysbench OLTP both read-only and read-write. It also tests concurrencies from 1 to 512. This shows basically the same behavior. The benchmark was with PFS=off, so no mutex stats. BR, XL
Hi Axel, On Thu, Feb 27, 2014 at 01:15:14PM +0100, Axel Schwenke wrote:
Hi Svoj,
you seemed to be interested in a OLTP benchmark with a single table, so here it is. yes, thanks a lot! Good to see that our results go almost inline. It is also good to see that we finally outperform 5.6.10 at higher threads count. :)
Thanks, Sergey
Axel Schwenke wrote:
Here are sysbench results for 10.0.8. I added 10.0.7 and MySQL-5.6.10 from an earlier run for comparison. The test is sysbench OLTP ro with multiple (32) tables.
Attached more results. tps.dat shows numbers for the same benchmark as above, but using a single big table. Again the normal mutexes show some benefits, but only at elevated concurrency. Mutex statistics are again in the zip files.
The other benchmark (tps1.dat) is sysbench OLTP both read-only and read-write. It also tests concurrencies from 1 to 512. This shows basically the same behavior. The benchmark was with PFS=off, so no mutex stats.
BR, XL
Hi Axel, Sergei pointed out that there is up to 10% slowdown in read-write benchmark at high concurrency. Could you share benchmark details, I'd like to debug it. Just pointers to build/run scripts on the benchmark server should be enough. Thanks, Sergey On Thu, Feb 27, 2014 at 01:15:14PM +0100, Axel Schwenke wrote:
Hi Svoj,
you seemed to be interested in a OLTP benchmark with a single table, so here it is.
Axel Schwenke wrote:
Here are sysbench results for 10.0.8. I added 10.0.7 and MySQL-5.6.10 from an earlier run for comparison. The test is sysbench OLTP ro with multiple (32) tables.
Attached more results. tps.dat shows numbers for the same benchmark as above, but using a single big table. Again the normal mutexes show some benefits, but only at elevated concurrency. Mutex statistics are again in the zip files.
The other benchmark (tps1.dat) is sysbench OLTP both read-only and read-write. It also tests concurrencies from 1 to 512. This shows basically the same behavior. The benchmark was with PFS=off, so no mutex stats.
BR, XL
Moin Svoj, Sergey Vojtovich wrote:
Sergei pointed out that there is up to 10% slowdown in read-write benchmark at high concurrency. Could you share benchmark details, I'd like to debug it. Just pointers to build/run scripts on the benchmark server should be enough.
The benchmark was run on lizard2. Build script is ~/bin/xl_build_new Benchmark runs under ~/benchmark/sysbench series46 ... ro, single table, PFS=on, 32-128 threads series47 ... ro, single table, PFS=off, 32-128 threads series48 ... ro, multi table, PFS=on, 32-128 threads series49 ... ro, multi table, PFS=off, 32-128 threads series50 ... rw, multi table, PFS=off, 1-512 threads HTH, XL
Hi Axel, it looks like locking profile of thr locks in read-write load is more complex. Probably fast mutex behaves better in this scenario. OTOH table cache lock durations do not depend on load type. Could you benchmark it again with the following patches (attached): 1. unpatched 2. with thr lock patched 3. with table cache lock patched 4. with thr and table cache locks patched ...so we could pick best combination. Thanks, Sergey On Mon, Mar 03, 2014 at 09:37:01AM +0100, Axel Schwenke wrote:
Moin Svoj,
Sergey Vojtovich wrote:
Sergei pointed out that there is up to 10% slowdown in read-write benchmark at high concurrency. Could you share benchmark details, I'd like to debug it. Just pointers to build/run scripts on the benchmark server should be enough.
The benchmark was run on lizard2. Build script is ~/bin/xl_build_new
Benchmark runs under ~/benchmark/sysbench
series46 ... ro, single table, PFS=on, 32-128 threads series47 ... ro, single table, PFS=off, 32-128 threads series48 ... ro, multi table, PFS=on, 32-128 threads series49 ... ro, multi table, PFS=off, 32-128 threads series50 ... rw, multi table, PFS=off, 1-512 threads
HTH, XL
That is an excellent blog post
On Mon, Feb 24, 2014 at 6:22 AM, Sergey Vojtovich
Hi Axel,
I benchmarked various mutexes available to MariaDB. Results are here: http://svoj-db.blogspot.ru/2014/02/mariadb-mutexes-scalability.html
Among other things I noticed that normal mutexes scale better than adaptive. And it looks like it gives over 5 000 TPS in my benchmarks:
Original -------- 64 threads, time spent: 60s, queries executed: 11500723, qps: 191678, 1 thread qps: 2994, tps: 13691 64 threads, time spent: 60s, queries executed: 10983665, qps: 183061, 1 thread qps: 2860, tps: 13075 64 threads, time spent: 60s, queries executed: 11080325, qps: 184672, 1 thread qps: 2885, tps: 13190
slow LOCK_open -------------- 64 threads, time spent: 60s, queries executed: 12948365, qps: 215806, 1 thread qps: 3371, tps: 15414 64 threads, time spent: 60s, queries executed: 13191584, qps: 219859, 1 thread qps: 3435, tps: 15704 64 threads, time spent: 60s, queries executed: 13050647, qps: 217510, 1 thread qps: 3398, tps: 15536
slow LOCK_open + THR_LOCK::mutex -------------------------------- 64 threads, time spent: 60s, queries executed: 15863174, qps: 264386, 1 thread qps: 4131, tps: 18884 64 threads, time spent: 60s, queries executed: 15761944, qps: 262699, 1 thread qps: 4104, tps: 18764 64 threads, time spent: 60s, queries executed: 15773239, qps: 262887, 1 thread qps: 4107, tps: 18777
Could you do official benchmarks (patch attached)? It would be nice to also benchmark MySQL-5.6 with thr_lock part of the patch.
Thanks, Sergey
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
participants (3)
-
Axel Schwenke
-
MARK CALLAGHAN
-
Sergey Vojtovich