![](https://secure.gravatar.com/avatar/39b623a1559cf9c69ac3d9d4fb44e7fe.jpg?s=120&d=mm&r=g)
Hi, Kristian! On Oct 06, Kristian Nielsen wrote:
- Parallel replication is still a somewhat experimental feature, so it seems too risky to enable it by default. Also, it doesn't really seem possible for the server to automatically set the best number of threads to use, with current implementation (or possibly any implementation).
Increase parallelization when replication just works, and penalize it when retries happen? With an upper limit similar to (or derived from) innodb-concurrency-tickets. Just a thought.
- When replicating with non-transactional updates, or in non-gtid mode, slave state is not crash safe. This is true in non-parallel replication also, but in parallel replication, the problem seems amplified, as there may be multiple transactions in progress at the time of a crash, complicating possible manual recovery. This also suggests that parallel replication must be configurable.
Hm. From reading the MDEV, I've got an idea that you won't replicate non-transactional updates concurrently (as they cannot be rolled back, so your base assumption doesn't work). Was it wrong - will you replicate non-transactional updates concurrently?
- When using domain-based parallel replication, the user is responsible for ensuring that independent domains are non-conflicting and can be replicated out-of-order wrt. each other. So if replication domains are used, but this property is not guaranteed, then domain-based parallel replication need to be configurable, or parallel replication cannot be used at all.
As you like. I'd simply say that in domain-based parallel replication, the user is responsible for domain independence. If he has misconfigured domains, the server is not at fault, and we should not bother covering this use case.
- The new speculative replication feature in MDEV-6676 is not always guaranteed to be a win - in some workloads, where there are many conflicts between successive transactions, excessive rollback could cause it to be less efficient than not using it. Again, this suggests it needs to be configurable.
Agree. Though if the concurrency will be auto-tuned as I mentioned above, it'll auto-disable itself in this case. With no user intervention.
So given this, I came up with the following idea for syntax:
CHANGE MASTER TO PARALLEL_MODE=(domain,groupcommit,transactional,waiting)
Each of the four keywords in the parenthesis is optional.
"domain" enables domain-based parallelisation, where each replication domain is treated independently.
"groupcommit" enables the non-speculative mode, where only transactions that group-committed together on the master are applied in parallel on the slave.
"transactional" enables the speculative mode, where all transactional DML is optimistically tried in parallel, and then in case of conflict a rollback and retry is done.
"groupcommit" and "transactional" are mutually exclusive, at most one of them can be specified.
Assorted thoughts in no specific order: 1. I'd rename "groupcommit" to something less technical, like "master", or "following_master", or "following", (or whatever) 2. How does it work with multi-source? The usual "CHANGE MASTER name TO" ? 3. How to specify the degree of parallelization - the number of threads? Still --slave-parallel-threads=N ? You syntax doesn't seem to cover that. 4. Command line? None? CHANGE MASTER specifies replication coordinates, and they change on every restart, that's why there's no command-line option for them. They're stored in master-info. But your "TO PARALLEL_MODE" only configures how to apply events, seems like something that should rather be in the my.cnf. In the view of 4) above, did you consider using system variables? Like --slave-parallel-mode={domain|groupcommit|transactional|waiting} and, the usual, --connection_name.slave-parallel-mode=... for multi-source. This variable can be of SET or FLAGSET type, so it could be set to a combination of values. Regards, Sergei