Re: [Maria-developers] FWD: Status on MDEV-4506, parallel replication

2 Sep 2013

      Michael Widenius <monty@askmonty.org> writes:
...
...
On the master, I implemented --binlog-commit-wait-count=N and
--binlog-commit-wait-usec=T. A transaction will wait at most T microseconds
for at least N transactions to queue up and be ready for group commit. This
allows to deliberately delay transactions on the master in order to get bigger
group commits and thus better opportunity for parallel execution (and again it
makes testing easier).
Do you think the above helps in any real world case, except testing?
Assuming we are have N=4 and T=2
- We have 3 threads ready to commit
- We wait 2 milliseconds and get 2 more threads.
- We commit now 5 threads. During this time you get 3 other threads
  waiting to commit.
- We wait again...
In a scenario without waiting:
- We have 3 threads ready to commit
- We commit 3 threads. During this time we get 5 more threads waiting
to commit.
- We commit 5 threads.
In other words, as the group commit will anyway take up 50
microseconds on a hard disk, and thus automaticly group things
together for the next commit, why do we need ever to wait more?
I agree that with --binlog-commit-wait-count you may get less sync
calls, but at an expense that a lot of threads took up to T
microseconds longer to execute.
The worst case scenario is when you have only one user doing a lot of
inserts with auto-commit. In this case using wait will slow down the
server with T microseconds for every query.
Have you been able to run any kind of benchmark where using
binglog-commit-wait-count will give more performance?
I agree that the options are good for testing. The main question I
have is if we want to have the variables in the production server and
how we should document when and how a user should use the variables.
I agree that with respect to group commit, the --binlog-commit-wait-count in
many cases will not improve performance (that is why I did not implement it
earlier).

However, for parallel replication things are a bit different. Suppose that the
application is doing C transaction commits per second, and that the disk
system is capable of F binlog fsyncs per second.

Now, if C is significantly bigger than F, then things are as you
describe. Generally several transactions will queue up while the previous
group commit runs, and there will be sufficient parallelism without using
--bicommit_wait-*. This will be typical for eg. a simple harddisk-based system
(F=40 commits/second perhaps) where all data is cached in the InnoDB buffer
pool (eg. C>500 transactions/second).

On the other hand, if C is smaller than F (or of similar magnitide), then
usually only few or no new transactions will have time to queue up while the
previous transaction is committing. So there will not be much parallelism for
parallel replication to exploit without using --binlog-commit-wait-*. This
will be typical for eg. a good-quality server with battery-backup RAID
controller (F > 1000 commits/second) where data is too big to fit in buffer
pool and every update requires disk access to complete (C < 500
transactions/second for example).

And in fact, it is the second case, where random I/O is the bottleneck and
multiple disk spindels are needed to improve I/O throughput, that
single-threaded slave hurts the most, and where increasing
--binlog-commit-wait-usec can be used with least penalty.

I agree we need to document clearly the risk that --binlog-commit-wait-* will
decrease performance on the master. Basically, the user can look at the ratio
of Binlog_commits and Binlog_group_commits to check if enough transactions are
part of each group commit for parallel replication to be effective.
...
...
1. The existing code is not thread-safe for class Relay_log_info. This class
contains a bunch of stuff that is specific to executed transactions, not
related to relay-log at all. This needs to be moved to the new struct
rpl_group_info I introduced, and all code updated to pass around a pointer to
that struct instead. There may also be a need to add additional locking on
Relay_log_info, existing code needs review for this.
I could take a look working on the above tomorrow and next week.
Ok, great. Ping me when you can so we can coordinate, I may have some partial
patches for this lying around.

 - Kristian.

Kristian Nielsen

tags

participants (1)