Re: [Maria-developers] More suggestions for changing option names for optimistic parallel replication

8 Dec 2014

      Pavel Ivanov <pivanof@google.com> writes:
...
Does this mean that group commit will be possible if slave is able to
execute several transactions consecutively while previous transaction
commits/fsyncs?
Yes. Because the following transaction is in all cases allowed to start as
soon as the prior transaction reaches its COMMIT event, see previous mail for
details.
...
I'd suggest to name this option differently because looking just at
the list of available values it's not quite clear what could be the
difference between follow_master_commits and only_commits. I don't
know yet what is the best name for this. Maybe overlap_commits?
I think overlap_commits sounds good, thanks for the suggest.
...
...
--slave-parallel-domains=on|off     (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When
    enabled, parallel replication will apply in parallel transactions whose
    GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag
with --slave-parallel-mode. Does it mean that when this flag is on
transactions from different domains are executed on "all_transactions"
level of parallelism no matter what value --slave-parallel-mode has?
What will happen if this flag off but
--slave-parallel-mode=all_transactions?
These apply on two different levels. With --slave-parallel-domains=on, each
replication domain is replicated as completely independent streams, similar to
different multi-source replication slaves. The position in each stream is
tracked with each one GTID in gtid_slave_pos, and one stream can be
arbitrarily ahead of another.

The --slave-parallel-mode applies within each stream. Within one stream,
commits are strictly ordered, and --slave-parallel-mode specifies how much
parallelism is attempted.

The --slave-parallel-mode can be set to any value and the server is
responsible to ensure that replication works correctly. In contrast, using
--slave-parallel-domains, it is the users/DBAs responsibility to ensure that
replication domains are set up correctly so that no conflict can occur between
them.
...
I feel like you are up to something here, but implementing it using
this flag is not quite right.
Can you elaborate? --slave-parallel-domains controls whether we have one
stream or many. --slave-parallel-mode controls what happens inside each
stream. Any suggestion how to clarify?
...
Hm... The fact that a transaction did a lock wait on master doesn't
mean that the conflicting transaction was committed on master, or that
both of these transactions were committed close enough to even make it
possible to be executed in parallel on slaves, right? Are you sure
that this flag will be useful?
Right, and no, I'm not sure. Testing will be needed to have a better idea.

If two short transactions T1 and T2 conflict on a row, T2 is quite likely to
commit just after T1, and thus likely to conflict on the slave. So there is
some rationale behind this.
...
...
@@SESSION.replicate_expect_conflicts=0|1   (default 0)
...
I think this variable will be completely useless and is not worth
implementing. How user will understand that the transaction he is
about to execute is likely to conflict with another transactions
committed at about the same time? I think it will be completely
impossible to do that judgement, at the same time it will give too
much impact on the slave's behavior into users' hands. Am I missing
something? What kind of scenario you are envisioning this variable to
be used in?
My main worry with optimistic parallel replication is if too many conflicts
and retries on the slave will outweight the performance gained from
parallelism. If this does not happen, I feel it will be awesome. So I was very
focused on what to do if we _do_ get a lot of conflicts. So I wanted to give
advanced users the possibility to work around hotspot rows basically, if
necessary. Like single row that is updated very frequently.

I did not think that this was allowing users much impact on the slave's
behaviour. This option is only a heuristics, it controls how aggressive the
slave will try to parallelise, but it cannot affect correctness. And the user
alredy has a lot of ways to affect parallelism in optimistic parallel
replication.

For example, imagine lots of transactions like this executed serially on the
master:

  UPDATE t1 SET a=a+1 WHERE id=0;
  UPDATE t1 SET a=a+1 WHERE id=0;
  UPDATE t1 SET a=a+1 WHERE id=0;
  ...

All of these would conflict on a slave. It seems likely to cause O(N**2)
transaction retries on the slave for --slave-parallel-mode=all_transactions
--slave-parallel-threads=N.

So the idea was that user can already cause trouble for parallelism on the
slave; @@replicate_expect_conflicts is intended for the poweruser to be able
to hint the slave at how to get less trouble.

But I'm open to change it, if you think it's important. Your perspective is
rather different from my usual point of view, which is useful input.

Jonas Oreland <jonaso@google.com> writes:
...
I still prefer "auto" as default,
Right...

I really want "normal" users to be able to just enable "auto" and have things
work reasonably.

And I really need fine-grained control to enable testing various combinations
in real life, to better understand how to implement "auto".

I need to find a good way to combine these...
...
if you want the fine grained control, I think an optimizer_switch approach
is better than adding X new config options, i.e
--parallel_mode=option1=true,option3=4
don't you think that there will be new options/variants ? i do
don't you think you will want to remove options/variants ? i do
Good point.

I think --slave-parallel-domains is reasonable. This is a question of
semantics, the user must explicitly request it, as it can break replication if
not used correctly by applications.

But the other options are less clear. They all should behave equally correct.

Maybe something like this:

1. --slave-parallel-domains is off by default, can be enabled if application
is written to guarantee no conflicts.

2. --parallel-slave=on|off. This defines if parallelism will be done within
each replication stream.

3. --slave-parallel-threads=N. This could default to 3*#CPUs or whatever if
--parallel-slave is enabled on at least one multi-master slave.

4. --slave-parallel-mode=auto | <lots of fine-grained options>.

Then "normal" user will only need to say --parallel-slave=on.
--slave-parallel-mode defaults to "auto".

I then need to consider how this affects backwards compatibility with 10.0...
...
I don't like having the word "transaction" in the names of some modes, ALL
variants will maintain transaction semantics.
Having "transaction" is name of only a few sort of implies that others are
not transactional...
i think "optimistic" is better than "transactional" or "all_transactions".
in the same spirit "conservative" could be "follow_master_commits"
Nice, I really like "conservative" and "optimistic". And maybe "only_commits"
isn't really useful. There seems little reason not to at least use the
"conservative" mode, it shouldn't cause any problems over "only_commits".
So slave-parallel-mode= none | conservative | optimistic, perhaps...
...
i don't understand what "only_commits" is, how can it prepare transaction
"queue up" for group commit
if only 1 is prepared in parallel ?
The following transaction T2 is started as soon as T1 sees its COMMIT
event. So if T2 can reach its own commit while T1 is still waiting for
LOCK_log (or --binlog-commit-wait-usec), this is possible.
...
though, i think all suggestions might work if they have good defaults and a
solid implementation
Right, I'd like to get the interface right from the start, but too much
bikeshedding is also counterproductive.

Thanks, Pavel and Jonas, for your comments!

 - Kristian.

Re: [Maria-developers] More suggestions for changing option names for optimistic parallel replication

Kristian Nielsen