Axel Schwenke <axel@askmonty.org> writes:
The poor performance of MySQL with 1 database is expected. However MySQL is much better with 32 databases. Even better than MariaDB :(
Ok, so I looked closer at the numbers (of which there were quite a few, thanks for doing this work :-). I have some idea of what could be going on. The short story is that for this kind of configuration, it will be necessary to configure @@slave_domain_parallel_threads to be at most the value of @@slave_parallel_threads divided by the number of domain ids used. Eg. with 32 databases/domains, --slave-parallel-threads=64 --slave-domain-parallel-threads=2. Otherwise the domains will starve each other for threads, leading to poor parallelism. In fact, with such high domain parallelism @ 32 databases, I think just using --slave-domain-parallel-threads=1 could give much better performance. The longer story: MariaDB parallel replication uses a single pool of worker threads, of size @@slave_parallel_threads. GTIDs are read serially from the relay log in the SQL driver thread, and scheduled to worker threads. Each replication domain is scheduled independently from all other domains. Each GTID is allocated to a new worker thread if possible. If no worker thread is available (because all workers already have at least one GTID queued for them), then the SQL driver thread waits for one to become free; this stalls _all_ domains from getting more work queued. When your benchmark starts, the SQL driver thread will quickly read 64 events from the relay logs and queue them each to a separate worker thread. Then no more free worker threads are available, and the SQL thread will have to wait. With 64 worker threads and 32 domains, it is quite likely that some domains will not have any work to do. At the same time, the group commit size will most likely be much less than 64, so many workers will have to wait for the previous GTID in their domain without being able to do any work, especially at the "fast" configuration where commit (which can always run in parallel) does not take much time. The result will be a lot of threads waiting on each other and waking each other up, and little real work done. By setting @@slave_domain_parallel_threads, it is possible to limit how many worker threads a single domain is allowed to grab, thereby ensuring that worker threads will be available for all domains to be active doing work. Those limited number of threads will be used round-robin to queue work for; this also reduces thread sleep and wakeup, as with luck a worker thread that completes one commit will already have more work queued up for it at that point that it can process without first returning to the thread pool and going to sleep. In particular, with @@slave_domain_parallel_threads=1, we will just have 32 threads, one for each domain, each processing events independently in parallel at full speed, which is probably the best way to handle a workload as evenly distributed across domains as this one is. It would be very useful to see new benchmarks with those results. And if time permits, @@slave_domain_parallel_threads=2 could also be interesting; maybe there could be some extra performance due to more group commit in the "peer" configuration, or maybe the cost of repeatedly alternating between two worker threads within the domain will cause reduced performance. We can actually see some evidence of this theory in the numbers. Where number of worker threads is much larger than number of domains (so starvation is less of a problem), MariaDB does much better, even better than MySQL 5.6 in some cases. But performance drops as the number of domains increase, possibly because starvation kicks in. Still, even if this theory holds, we do see 5-10% slower performance of MariaDB at eg. slave@1 compared to MySQL. This suggests some higher overhead somewhere, which I need to look into at some point, though it will be a while yet before I have time for that, I am afraid. And in any case, this benchmark is highly useful as a test - as I said before, serious load testing of the domain_id parallelism has received very little testing, so it is quite possible that there is simply some bugs that cause things to work poorly. The extra testing with @@slave_domain_parallel_threads=1 (and possibly 2) will be very useful to help understand this better.
I think 32 databases is a bit too much, not only that MySQL then shines so bright, it's also unrealistic in the real world. What do you think? I can rerun the benchmark with 4 or 8 databases (will take 4-5 hours). Quite certainly MySQL will have worse numbers then :)
Well, I think 32 databases is not that unrealistic - shared hosting for example could have many more. The completely even spacing of the load among those databases however is probably unrealistic. But I think we should investigate and solve the issues for MariaDB @ 32 databases. More databases, and thus more parallelism, should be better for the parallel replication, if it is worse then that is a bug that should be fixed. Even if it is just the need to configure @@slave_domain_parallel_threads, maybe we could document that better.
I also notice that for MariaDB 10.0 the tuning of the commit-wait-* variables is critical. Settings that give trx per commit group tend to slow down the master a lot. The default settings give only few transactions per
Right, this is something to be aware of. Note that for a typical web client application or similar, --commit-wait-* probably will not be much of an issue. An individual client will see slightly higher latency, which should not be much of an issue if it is in the order of a few milliseconds. Throughput is not likely to be affected, just the number of concurrent connections will increase (at least as long as system does not get choked). On the other hand, a load with fixed parallelism, like batch processing or your sysbench runs, can be hurt significantly on throughput by --commit-wait-*, as your tests show. Also, I have heard from several users that the ability to throttle a busy master in order to prevent it from running too far ahead of the slaves, is actually often seen as a benefit rather than a performance problem. Semi-sync replication also is sometimes used to achieve similar throttling. So yes, it is a tuning parameter that is unfortunately rather sensitive depending on the actual load and applications.
Please also check the "master" page of the comparison sheet. It shows the cost of turning on the binlog. Both MySQL and MariaDB show significant performance loss at high concurrency when the binlog is turned off. This is completely counter-intuitive. Kristian, can this be related to MDEV-5802?
It might be related. Another possibility is that this is related to InnoDB's known weaknesses at high concurrency. When the binlog is enabled, group commit will handle the commit of a lot of transactions serially, in a tight loop inside a single thread. When the binlog is disabled, all of those transactions will be committing each in their separate thread, all of those threads heavily contending each other for the hot mutexes inside InnoDB. It would need a lot of research to be sure that this is the cause, but it's a plausible guess, at least. Thanks for your efforts so far! Overall, I think the results look pretty good, actually. We see a good speedup from MariaDB parallel replication in many of the configurations. And your comparison shows that using group commit, a large percentage of the speedup can be obtained automatically from group commit, without any need to change applications and partition the data into separate databases. - Kristian.