On Thu, Dec 4, 2014 at 5:49 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I discussed with Monty, and we came up with some more suggested changes for the options used to configure the optimistic parallel replication feature (MDEV-6676, https://mariadb.atlassian.net/browse/MDEV-6676).
The biggest change is to split up the --slave-parallel-mode into multiple options. I think that is reasonable, probably that option was doing too many things at once.
Instead, we could have the following three options:
--slave-parallel-mode=all_transactions | follow_master_commits | only_commits | none
"all_transactions" is what was called "transactional" before. The slave will try to apply all transactional DML in parallel; in case of conflicts it will roll back the later transaction and retry it.
"follow_master_commits" is the 10.0 functionality, apply in parallel transactions that group-committed together on the master (the default).
"only_commits" was suggested to me by a user testing parallel replication. It does not attempt to apply transactions in parallel, but still runs the commit steps in parallel, making slave group commit possible and thus saving on fsyncs if durability settings are on.
Does this mean that group commit will be possible if slave is able to execute several transactions consecutively while previous transaction commits/fsyncs? I'd suggest to name this option differently because looking just at the list of available values it's not quite clear what could be the difference between follow_master_commits and only_commits. I don't know yet what is the best name for this. Maybe overlap_commits?
"none" means the parallel replication code is not used (same as --slave-parallel-threads=0, but now configurable per multimaster connection). (This corresponds to empty value in old --slave-parallel-mode).
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag with --slave-parallel-mode. Does it mean that when this flag is on transactions from different domains are executed on "all_transactions" level of parallelism no matter what value --slave-parallel-mode has? What will happen if this flag off but --slave-parallel-mode=all_transactions? I feel like you are up to something here, but implementing it using this flag is not quite right.
--slave-parallel-wait-if-conflict-on-master=on|off (default on)
When enabled, if a transaction had to do a row lock wait on the master, it will not be applied in parallel with any earlier transaction on the slave (idea is that such transaction is likely to get a conflict on the slave, causing a needless retry). (This was the "waiting" option to old --slave-parallel-mode).
Hm... The fact that a transaction did a lock wait on master doesn't mean that the conflicting transaction was committed on master, or that both of these transactions were committed close enough to even make it possible to be executed in parallel on slaves, right? Are you sure that this flag will be useful?
These options will also be usable per multi-source master connection, like --master1.slave-parallel-mode=all_transactions. The options will be possible to change dynamically also (with SET GLOBAL), though the associated slave threads must be stopped while changing.
Also, Monty suggested to rename @@replicate_allow_parallel to
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
When this option is enabled on the master when a transaction is committed, that transaction will not be applied in parallel with earlier transactions (when --slave-parallel-mode=all_transactions). This can be used to reduce retries on the slave, if an application is about to do a transaction that is likely to cause a conflict and retry on a slave if applied in parallel with earlier transactions.
I think this variable will be completely useless and is not worth implementing. How user will understand that the transaction he is about to execute is likely to conflict with another transactions committed at about the same time? I think it will be completely impossible to do that judgement, at the same time it will give too much impact on the slave's behavior into users' hands. Am I missing something? What kind of scenario you are envisioning this variable to be used in?
Let me know if there are any comments to these or suggestions for changes. It is best to get these as right as possible before release (seems the intention is to include optimistic parallel replication in 10.1), since it is the user-visible part of the feature.
With these option names, the normal way to use optimistic parallel replication would be these two options in my.cnf:
slave_parallel_mode=all_transactions slave_parallel_threads=20 (or whatever)
This seems reasonably, I think. None of the other options would need be considered except in more special cases.
Hope that helps, Pavel