[Maria-developers] Syntax for parallel replication
Hi Serg, Can you help me with suggestions/comments for the following proposal for how to do configuration for MDEV-6676, speculative parallel replication? I am not too happy about how things are in 10.0. There, there is a single option --slave-parallel-threads=N. If N>0, then parallel replication is enabled, else not. One problem is that this makes it not configurable per multi-source master connection. Another problem is that there are two possible mechanisms for parallelisation, group-commit based and domain_id based; currently, one can enable none of them or both, but not just one of them. MDEV-6676 will introduce at least one other mechanism, which seems to make it essential to make a better way to configure this. Now, ideally, there would not be any configuration at all. The server would just run things in parallel when possible and to the degree desirable. However, I think there are some reasons that we need to allow the user this configurability: - Parallel replication is still a somewhat experimental feature, so it seems too risky to enable it by default. Also, it doesn't really seem possible for the server to automatically set the best number of threads to use, with current implementation (or possibly any implementation). - When replicating with non-transactional updates, or in non-gtid mode, slave state is not crash safe. This is true in non-parallel replication also, but in parallel replication, the problem seems amplified, as there may be multiple transactions in progress at the time of a crash, complicating possible manual recovery. This also suggests that parallel replication must be configurable. - When using domain-based parallel replication, the user is responsible for ensuring that independent domains are non-conflicting and can be replicated out-of-order wrt. each other. So if replication domains are used, but this property is not guaranteed, then domain-based parallel replication need to be configurable, or parallel replication cannot be used at all. - The new speculative replication feature in MDEV-6676 is not always guaranteed to be a win - in some workloads, where there are many conflicts between successive transactions, excessive rollback could cause it to be less efficient than not using it. Again, this suggests it needs to be configurable. So given this, I came up with the following idea for syntax: CHANGE MASTER TO PARALLEL_MODE=(domain,groupcommit,transactional,waiting) Each of the four keywords in the parenthesis is optional. "domain" enables domain-based parallelisation, where each replication domain is treated independently. "groupcommit" enables the non-speculative mode, where only transactions that group-committed together on the master are applied in parallel on the slave. "transactional" enables the speculative mode, where all transactional DML is optimistically tried in parallel, and then in case of conflict a rollback and retry is done. "groupcommit" and "transactional" are mutually exclusive, at most one of them can be specified. The default would be (domain,groupcommit) to be back-wards compatible with 10.0. If slave_parallel_thread=0, then no parallel apply will happen even if PARALLEL_MODE is non-empty. If slave_parallel_thread>0 but PARALLEL_MODE is empty (PARALLEL_MODE=()), then again no parallel apply will be done. The "waiting" option is not essential to add, we could remove it, I put it in because there were already a number of options so it seemed to cause no harm. The idea is that on the master, we detect if transaction T2 had to do a row lock wait on transaction T1. If so, it seems likely that a similar conflict could occur on the slave, so we will not run T2 in parallel with T1; instead we will let T2 wait for T1 to commit before T2 is started. The "waiting" option could be enabled by a user to disable this check, enabling even more aggressive parallelisation. I am not sure if it is worth it to have this configurable though, comments welcome. I have not checked how hard it will be to implement the new syntax. We do not have any similar multi-option CHANGE MASTER elements, as far as I know, but it is similar to ENUM system variables. And we already have a IGNORE_SERVER_IDS syntax with comma-separated list within parenthesis. So hopefully not too hard to do, and somewhat consistent with existing syntax. So what do you think? Is it a reasonable syntax? Any comments, or suggestions for better way to do it? Thanks, - Kristian.
Hi, Kristian! On Oct 06, Kristian Nielsen wrote:
- Parallel replication is still a somewhat experimental feature, so it seems too risky to enable it by default. Also, it doesn't really seem possible for the server to automatically set the best number of threads to use, with current implementation (or possibly any implementation).
Increase parallelization when replication just works, and penalize it when retries happen? With an upper limit similar to (or derived from) innodb-concurrency-tickets. Just a thought.
- When replicating with non-transactional updates, or in non-gtid mode, slave state is not crash safe. This is true in non-parallel replication also, but in parallel replication, the problem seems amplified, as there may be multiple transactions in progress at the time of a crash, complicating possible manual recovery. This also suggests that parallel replication must be configurable.
Hm. From reading the MDEV, I've got an idea that you won't replicate non-transactional updates concurrently (as they cannot be rolled back, so your base assumption doesn't work). Was it wrong - will you replicate non-transactional updates concurrently?
- When using domain-based parallel replication, the user is responsible for ensuring that independent domains are non-conflicting and can be replicated out-of-order wrt. each other. So if replication domains are used, but this property is not guaranteed, then domain-based parallel replication need to be configurable, or parallel replication cannot be used at all.
As you like. I'd simply say that in domain-based parallel replication, the user is responsible for domain independence. If he has misconfigured domains, the server is not at fault, and we should not bother covering this use case.
- The new speculative replication feature in MDEV-6676 is not always guaranteed to be a win - in some workloads, where there are many conflicts between successive transactions, excessive rollback could cause it to be less efficient than not using it. Again, this suggests it needs to be configurable.
Agree. Though if the concurrency will be auto-tuned as I mentioned above, it'll auto-disable itself in this case. With no user intervention.
So given this, I came up with the following idea for syntax:
CHANGE MASTER TO PARALLEL_MODE=(domain,groupcommit,transactional,waiting)
Each of the four keywords in the parenthesis is optional.
"domain" enables domain-based parallelisation, where each replication domain is treated independently.
"groupcommit" enables the non-speculative mode, where only transactions that group-committed together on the master are applied in parallel on the slave.
"transactional" enables the speculative mode, where all transactional DML is optimistically tried in parallel, and then in case of conflict a rollback and retry is done.
"groupcommit" and "transactional" are mutually exclusive, at most one of them can be specified.
Assorted thoughts in no specific order: 1. I'd rename "groupcommit" to something less technical, like "master", or "following_master", or "following", (or whatever) 2. How does it work with multi-source? The usual "CHANGE MASTER name TO" ? 3. How to specify the degree of parallelization - the number of threads? Still --slave-parallel-threads=N ? You syntax doesn't seem to cover that. 4. Command line? None? CHANGE MASTER specifies replication coordinates, and they change on every restart, that's why there's no command-line option for them. They're stored in master-info. But your "TO PARALLEL_MODE" only configures how to apply events, seems like something that should rather be in the my.cnf. In the view of 4) above, did you consider using system variables? Like --slave-parallel-mode={domain|groupcommit|transactional|waiting} and, the usual, --connection_name.slave-parallel-mode=... for multi-source. This variable can be of SET or FLAGSET type, so it could be set to a combination of values. Regards, Sergei
Sergei Golubchik <serg@mariadb.org> writes:
In the view of 4) above, did you consider using system variables? Like
--slave-parallel-mode={domain|groupcommit|transactional|waiting}
and, the usual, --connection_name.slave-parallel-mode=... for multi-source. This variable can be of SET or FLAGSET type, so it could be set to a combination of values.
In fact, this is what I did first (use an enum system variable). I didn't know that it was possible to use --connection_name.XXX to configure things, which is why I thought I needed to use CHANGE MASTER instead. I can try to figure out how --connection_name.slave-parallel-mode works and change it back. As I'm thinking more about this discussion, it seems clear that the design of parallel replication in 10.0 is rather lacking, too complex and poor configurability. On the one hand, I would really like it to work automatically. On the other hand, there are complex issues, and fine-grained control seems to be needed for power users and for testing. Best would be if it was just enabled by default. But I am not sure that is possible, mainly because old-style binlog position is not transactional (relay-log.info file) and thus behaves differently if parallel is enabled. Maybe in GTID mode, we could actually have parallel be the default. But what about jonas' suggestion of --slave-parallel-mode=auto? Just one simple configuration option for users to enable. Once this is set, the server will do its best to replicate in parallel as well as possible, using though only methods that are safe no matter what the replication load is (DDL, non-transactional statements, and so on). (In practice, "auto" will mean the same as "transactional", InnoDB DML will be run in parallel speculatively with prior transactions, no other parallelisation will be made. Maybe some simple heuristics to turn of parallel in case of many retries, if there is time to implement it). What do you think? Is this the way forward? Following some more detailed comments:
- When replicating with non-transactional updates, or in non-gtid mode, slave state is not crash safe. This is true in non-parallel replication also, but in parallel replication, the problem seems amplified, as there may be multiple transactions in progress at the time of a crash, complicating possible manual recovery. This also suggests that parallel replication must be configurable.
Hm. From reading the MDEV, I've got an idea that you won't replicate non-transactional updates concurrently (as they cannot be rolled back, so your base assumption doesn't work). Was it wrong - will you replicate non-transactional updates concurrently?
There are multiple ways that two transactions T1 and T2 can be run in parallel. Here is how it is in 10.0, when --slave-parallel-threads > 0: 1. If using GTID mode, and T1 and T2 have GTIDs with different domain_ids, then T1 and T2 will be applied in parallel without any restrictions. 2. If T1 and T2 have the same group commit id in the master binlog, then they can be applied in parallel, but their commit step is serialised to keep the same commit order. 3. If T1 and T2 do not have the same group commit id, then the commit step of T1 can run in parallel with T2, but no other part of T1. All of these work the same for non-transactional and transactional event groups in 10.0. With MDEV-6676, I am proposing introducing a new speculative parallel replication mode called "transactional". If enabled, it replaces (2) and (3) above with: 4. If T2 is transactional, and T1 is not DDL, T2 it is allowed to run in parallel with T1, but its commit step is serialised to keep the same commit order. In transactional mode, a non-transactional T2 is _not_ applied in parallel with a prior T1, unless point (1) with different domain ids apply. However, non-transactional T2 can be applied in parallel with a following transactional T3. Anyway, my point is mainly that parallel replication needs to be configurable, including the possibility to turn it off. I don't think we disagree on that. My current proposal is that (1) can be turned on and off; and independently, either (2,3) or (4) or none of them can be enabled (but not both).
3. How to specify the degree of parallelization - the number of threads? Still --slave-parallel-threads=N ? You syntax doesn't seem to cover that.
The degree of parallelism is controlled by three points: --slave-parallel-threads=N specifies the number of threads that are used. This is a static pool of threads, cannot be changed unless all multi-source slaves are stopped. The threads are shared among all multi-source slaves. Parallel replication always tries to maximise parallelism up to the --slave-parallel-threads=N limit. Every event from the binlog is queued for a new thread, in round-robin fashion. But often, the actual parallelism will be lower than N, because of the contraints between applying particular transactions in parallel, as described above. --slave-domain-parallel-threads optionally limits how many threads can be used to replicate a single domain in a single multi-source slave. Without this, it would be possible for one slow transaction T1 to starve other multi-source slaves; because we might end up queueing T1, T2, T3, ... TN for all the available threads, all of them will then wait for T1 to complete, and until then no other threads are available to other multi-source slaves. The static thread pool is somewhat primitive, but it is what we have now. Probably --slave-domain-parallel-threads would be better if it was per multi-source slave, --connection_name.slave-domain-parallel-threads.
1. I'd rename "groupcommit" to something less technical, like "master", or "following_master", or "following", (or whatever)
I agree that "groupcommit" is rather too technical. However, "following_master" is too generic, it doesn't say anything. The point is, we will run transactions in parallel on the slave if they _committed_ in parallel on the master. This is already a rather technical issue. The user needs to be aware that it is related to commit, as tuning options like --binlog-commit-wait-count may be needed on the master. I guess this technical nature of the "groupcommit" feature is part of the motivation for something better with speculative replication. In mysql 5.7, they call the corresponding feature: --slave-parallel-type=LOGICAL_CLOCK However, we do not use the term LOGICAL_CLOCK in MariaDB, and it's hardly any less technical, so doesn't seem a good solution. BTW, I wonder if we should use the same option name? But I suppose not, it's better to have a different name, so users will be forced to change the config, rather than silently pick up a mysql config option with the same name but possibly different semantics. How about calling the option "binlog_commit" instead, to match the related --binlog-commit-wait-* options on the master?
As you like. I'd simply say that in domain-based parallel replication, the user is responsible for domain independence. If he has misconfigured domains, the server is not at fault, and we should not bother covering this use case.
Yes, this is how it is currently. However, it just seemed to me, that it would be useful if domain-based parallel replication could be turned on or off, independently of the other modes of parallel replication. There might be use cases where different domains were used, without the intention that they could be replicated out-of-order. - Kristian.
Hi, Kristian! On Oct 16, Kristian Nielsen wrote:
Sergei Golubchik <serg@mariadb.org> writes:
In the view of 4) above, did you consider using system variables? Like
--slave-parallel-mode={domain|groupcommit|transactional|waiting}
and, the usual, --connection_name.slave-parallel-mode=... for multi-source. This variable can be of SET or FLAGSET type, so it could be set to a combination of values.
In fact, this is what I did first (use an enum system variable). I didn't know that it was possible to use --connection_name.XXX to configure things, which is why I thought I needed to use CHANGE MASTER instead.
Well, "possible" can have different meanings. I meant that from the user point view --connection_name.variable_name is the normal and expected way to configure per-connection variables. Consistent with other per-connection variables and with named key caches. But it is well possible that there's no existing class that implements this functionality yet.
But what about jonas' suggestion of --slave-parallel-mode=auto?
sounds fine
Anyway, my point is mainly that parallel replication needs to be configurable, including the possibility to turn it off. I don't think we disagree on that.
No, we don't. It should be configurable.
1. I'd rename "groupcommit" to something less technical, like "master", or "following_master", or "following", (or whatever)
I agree that "groupcommit" is rather too technical. However, "following_master" is too generic, it doesn't say anything.
It means that the degree of parallelization on the slave is following the degree of parallelization on the master. If more threads are executed (commited, strictly speaking) in parallel on master - more threads can be commited in parallel on the slave. If the master is strictly single-threaded, there's only one connection doing changes - the slave will follow that and will serialize all transactions too.
The point is, we will run transactions in parallel on the slave if they _committed_ in parallel on the master. This is already a rather technical issue. The user needs to be aware that it is related to commit, as tuning options like --binlog-commit-wait-count may be needed on the master.
"following_master_commits" if you want to be really verbose, but I think a shorter version is ok too.
I guess this technical nature of the "groupcommit" feature is part of the motivation for something better with speculative replication.
In mysql 5.7, they call the corresponding feature:
--slave-parallel-type=LOGICAL_CLOCK
This doesn't say anything either. Not until you read the manual, that is. And if you do read the manual, then XYZ or DARK_VOODOO is almost equally good.
BTW, I wonder if we should use the same option name? But I suppose not, it's better to have a different name, so users will be forced to change the config, rather than silently pick up a mysql config option with the same name but possibly different semantics.
It is different, isn't it? You call it slave-parallel-mode.
How about calling the option "binlog_commit" instead, to match the related --binlog-commit-wait-* options on the master?
but it's not only about commits, a user may want to disable all parallelization, for example. Or to make sure that non-transactional updates are not *run* in parallel with anything, not even with non-commited transactions. I like your slave-parallel-mode name.
As you like. I'd simply say that in domain-based parallel replication, the user is responsible for domain independence. If he has misconfigured domains, the server is not at fault, and we should not bother covering this use case.
Yes, this is how it is currently.
However, it just seemed to me, that it would be useful if domain-based parallel replication could be turned on or off, independently of the other modes of parallel replication. There might be use cases where different domains were used, without the intention that they could be replicated out-of-order.
Sure. As you're going to have an option for selecting slave parallel mode anyway... Regards, Sergei
Sergei Golubchik <serg@mariadb.org> writes:
I didn't know that it was possible to use --connection_name.XXX to configure things, which is why I thought I needed to use CHANGE MASTER instead.
Well, "possible" can have different meanings.
I meant that from the user point view --connection_name.variable_name is the normal and expected way to configure per-connection variables. Consistent with other per-connection variables and with named key caches.
But it is well possible that there's no existing class that implements this functionality yet.
Right, understood. I like the idea of --connection_name.XXX to configure parallel replication per-connection. I will see what it takes to implement that.
I agree that "groupcommit" is rather too technical. However, "following_master" is too generic, it doesn't say anything.
It means that the degree of parallelization on the slave is following the degree of parallelization on the master. If more threads are executed (commited, strictly speaking) in parallel on master - more threads can be commited in parallel on the slave. If the master is strictly single-threaded, there's only one connection doing changes - the slave will follow that and will serialize all transactions too.
Right. Let's go with your "following_master". Thanks, - Kristian.
On Thu, Nov 13, 2014 at 4:31 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I agree that "groupcommit" is rather too technical. However, "following_master" is too generic, it doesn't say anything.
It means that the degree of parallelization on the slave is following the degree of parallelization on the master. If more threads are executed (commited, strictly speaking) in parallel on master - more threads can be commited in parallel on the slave. If the master is strictly single-threaded, there's only one connection doing changes - the slave will follow that and will serialize all transactions too.
Right. Let's go with your "following_master".
I'd say "following_master" will be very confusing, because degree of parallelization on slave won't match degree of parallelization on master in this case. Just because some commits are in different commit groups on master doesn't mean that they were executed consecutively. They may start executing at the same time, have a long execution in parallel, but be committed a few milliseconds apart and thus end up in different groups. Such commits will be executed consecutively on slave and that will take much more time combined. So it can be hardly called "following master". "follow_commits" could be more appropriate.
Pavel Ivanov <pivanof@google.com> writes:
I'd say "following_master" will be very confusing, because degree of parallelization on slave won't match degree of parallelization on master in this case. Just because some commits are in different commit
Right, that was my original concern as well. With two of us, there will probably be other users who would find it confusing.
and that will take much more time combined. So it can be hardly called "following master". "follow_commits" could be more appropriate.
Maybe "follow_master_commits", as Serg suggested? Not _too_ long, and seems to better describe what is going on. Thanks, - Kristian.
i think follow_master_commits is better than follow_commits (and better than following_master) /Jonas On Fri, Nov 14, 2014 at 11:31 AM, Kristian Nielsen <knielsen@knielsen-hq.org
wrote:
Pavel Ivanov <pivanof@google.com> writes:
I'd say "following_master" will be very confusing, because degree of parallelization on slave won't match degree of parallelization on master in this case. Just because some commits are in different commit
Right, that was my original concern as well. With two of us, there will probably be other users who would find it confusing.
and that will take much more time combined. So it can be hardly called "following master". "follow_commits" could be more appropriate.
Maybe "follow_master_commits", as Serg suggested? Not _too_ long, and seems to better describe what is going on.
Thanks,
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Agreed, follow_master_commits sounds pretty good. On Fri, Nov 14, 2014 at 2:46 AM, Jonas Oreland <jonaso@google.com> wrote:
i think follow_master_commits is better than follow_commits (and better than following_master)
/Jonas
On Fri, Nov 14, 2014 at 11:31 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
I'd say "following_master" will be very confusing, because degree of parallelization on slave won't match degree of parallelization on master in this case. Just because some commits are in different commit
Right, that was my original concern as well. With two of us, there will probably be other users who would find it confusing.
and that will take much more time combined. So it can be hardly called "following master". "follow_commits" could be more appropriate.
Maybe "follow_master_commits", as Serg suggested? Not _too_ long, and seems to better describe what is going on.
Thanks,
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
I discussed with Monty, and we came up with some more suggested changes for the options used to configure the optimistic parallel replication feature (MDEV-6676, https://mariadb.atlassian.net/browse/MDEV-6676). The biggest change is to split up the --slave-parallel-mode into multiple options. I think that is reasonable, probably that option was doing too many things at once. Instead, we could have the following three options: --slave-parallel-mode=all_transactions | follow_master_commits | only_commits | none "all_transactions" is what was called "transactional" before. The slave will try to apply all transactional DML in parallel; in case of conflicts it will roll back the later transaction and retry it. "follow_master_commits" is the 10.0 functionality, apply in parallel transactions that group-committed together on the master (the default). "only_commits" was suggested to me by a user testing parallel replication. It does not attempt to apply transactions in parallel, but still runs the commit steps in parallel, making slave group commit possible and thus saving on fsyncs if durability settings are on. "none" means the parallel replication code is not used (same as --slave-parallel-threads=0, but now configurable per multimaster connection). (This corresponds to empty value in old --slave-parallel-mode). --slave-parallel-domains=on|off (default on) "This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only). --slave-parallel-wait-if-conflict-on-master=on|off (default on) When enabled, if a transaction had to do a row lock wait on the master, it will not be applied in parallel with any earlier transaction on the slave (idea is that such transaction is likely to get a conflict on the slave, causing a needless retry). (This was the "waiting" option to old --slave-parallel-mode). These options will also be usable per multi-source master connection, like --master1.slave-parallel-mode=all_transactions. The options will be possible to change dynamically also (with SET GLOBAL), though the associated slave threads must be stopped while changing. Also, Monty suggested to rename @@replicate_allow_parallel to @@SESSION.replicate_expect_conflicts=0|1 (default 0) When this option is enabled on the master when a transaction is committed, that transaction will not be applied in parallel with earlier transactions (when --slave-parallel-mode=all_transactions). This can be used to reduce retries on the slave, if an application is about to do a transaction that is likely to cause a conflict and retry on a slave if applied in parallel with earlier transactions. Let me know if there are any comments to these or suggestions for changes. It is best to get these as right as possible before release (seems the intention is to include optimistic parallel replication in 10.1), since it is the user-visible part of the feature. With these option names, the normal way to use optimistic parallel replication would be these two options in my.cnf: slave_parallel_mode=all_transactions slave_parallel_threads=20 (or whatever) This seems reasonably, I think. None of the other options would need be considered except in more special cases. - Kristian.
On Thu, Dec 4, 2014 at 5:49 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I discussed with Monty, and we came up with some more suggested changes for the options used to configure the optimistic parallel replication feature (MDEV-6676, https://mariadb.atlassian.net/browse/MDEV-6676).
The biggest change is to split up the --slave-parallel-mode into multiple options. I think that is reasonable, probably that option was doing too many things at once.
Instead, we could have the following three options:
--slave-parallel-mode=all_transactions | follow_master_commits | only_commits | none
"all_transactions" is what was called "transactional" before. The slave will try to apply all transactional DML in parallel; in case of conflicts it will roll back the later transaction and retry it.
"follow_master_commits" is the 10.0 functionality, apply in parallel transactions that group-committed together on the master (the default).
"only_commits" was suggested to me by a user testing parallel replication. It does not attempt to apply transactions in parallel, but still runs the commit steps in parallel, making slave group commit possible and thus saving on fsyncs if durability settings are on.
Does this mean that group commit will be possible if slave is able to execute several transactions consecutively while previous transaction commits/fsyncs? I'd suggest to name this option differently because looking just at the list of available values it's not quite clear what could be the difference between follow_master_commits and only_commits. I don't know yet what is the best name for this. Maybe overlap_commits?
"none" means the parallel replication code is not used (same as --slave-parallel-threads=0, but now configurable per multimaster connection). (This corresponds to empty value in old --slave-parallel-mode).
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag with --slave-parallel-mode. Does it mean that when this flag is on transactions from different domains are executed on "all_transactions" level of parallelism no matter what value --slave-parallel-mode has? What will happen if this flag off but --slave-parallel-mode=all_transactions? I feel like you are up to something here, but implementing it using this flag is not quite right.
--slave-parallel-wait-if-conflict-on-master=on|off (default on)
When enabled, if a transaction had to do a row lock wait on the master, it will not be applied in parallel with any earlier transaction on the slave (idea is that such transaction is likely to get a conflict on the slave, causing a needless retry). (This was the "waiting" option to old --slave-parallel-mode).
Hm... The fact that a transaction did a lock wait on master doesn't mean that the conflicting transaction was committed on master, or that both of these transactions were committed close enough to even make it possible to be executed in parallel on slaves, right? Are you sure that this flag will be useful?
These options will also be usable per multi-source master connection, like --master1.slave-parallel-mode=all_transactions. The options will be possible to change dynamically also (with SET GLOBAL), though the associated slave threads must be stopped while changing.
Also, Monty suggested to rename @@replicate_allow_parallel to
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
When this option is enabled on the master when a transaction is committed, that transaction will not be applied in parallel with earlier transactions (when --slave-parallel-mode=all_transactions). This can be used to reduce retries on the slave, if an application is about to do a transaction that is likely to cause a conflict and retry on a slave if applied in parallel with earlier transactions.
I think this variable will be completely useless and is not worth implementing. How user will understand that the transaction he is about to execute is likely to conflict with another transactions committed at about the same time? I think it will be completely impossible to do that judgement, at the same time it will give too much impact on the slave's behavior into users' hands. Am I missing something? What kind of scenario you are envisioning this variable to be used in?
Let me know if there are any comments to these or suggestions for changes. It is best to get these as right as possible before release (seems the intention is to include optimistic parallel replication in 10.1), since it is the user-visible part of the feature.
With these option names, the normal way to use optimistic parallel replication would be these two options in my.cnf:
slave_parallel_mode=all_transactions slave_parallel_threads=20 (or whatever)
This seems reasonably, I think. None of the other options would need be considered except in more special cases.
Hope that helps, Pavel
I still prefer "auto" as default, if you want the fine grained control, I think an optimizer_switch approach is better than adding X new config options, i.e --parallel_mode=option1=true,option3=4 don't you think that there will be new options/variants ? i do don't you think you will want to remove options/variants ? i do I don't like having the word "transaction" in the names of some modes, ALL variants will maintain transaction semantics. Having "transaction" is name of only a few sort of implies that others are not transactional... i think "optimistic" is better than "transactional" or "all_transactions". in the same spirit "conservative" could be "follow_master_commits" i don't understand what "only_commits" is, how can it prepare transaction "queue up" for group commit if only 1 is prepared in parallel ? -- though, i think all suggestions might work if they have good defaults and a solid implementation /Jonas On Thu, Dec 4, 2014 at 9:06 PM, Pavel Ivanov <pivanof@google.com> wrote:
On Thu, Dec 4, 2014 at 5:49 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I discussed with Monty, and we came up with some more suggested changes for the options used to configure the optimistic parallel replication feature (MDEV-6676, https://mariadb.atlassian.net/browse/MDEV-6676).
The biggest change is to split up the --slave-parallel-mode into multiple options. I think that is reasonable, probably that option was doing too many things at once.
Instead, we could have the following three options:
--slave-parallel-mode=all_transactions | " | only_commits | none
"all_transactions" is what was called "transactional" before. The slave will try to apply all transactional DML in parallel; in case of conflicts it will roll back the later transaction and retry it.
"follow_master_commits" is the 10.0 functionality, apply in parallel transactions that group-committed together on the master (the default).
"only_commits" was suggested to me by a user testing parallel replication. It does not attempt to apply transactions in parallel, but still runs the commit steps in parallel, making slave group commit possible and thus saving on fsyncs if durability settings are on.
Does this mean that group commit will be possible if slave is able to execute several transactions consecutively while previous transaction commits/fsyncs? I'd suggest to name this option differently because looking just at the list of available values it's not quite clear what could be the difference between follow_master_commits and only_commits. I don't know yet what is the best name for this. Maybe overlap_commits?
i also don't understand what only_commits means, how can it prepare transaction "queue up" for group commit if only 1 is prepared in parallel ?
"none" means the parallel replication code is not used (same as --slave-parallel-threads=0, but now configurable per multimaster connection). (This corresponds to empty value in old --slave-parallel-mode).
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions
whose
GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag with --slave-parallel-mode. Does it mean that when this flag is on transactions from different domains are executed on "all_transactions" level of parallelism no matter what value --slave-parallel-mode has? What will happen if this flag off but --slave-parallel-mode=all_transactions?
I feel like you are up to something here, but implementing it using this flag is not quite right.
--slave-parallel-wait-if-conflict-on-master=on|off (default on)
When enabled, if a transaction had to do a row lock wait on the master, it will not be applied in parallel with any earlier transaction on the slave (idea is that such transaction is likely to get a conflict on the slave, causing a needless retry). (This was the "waiting" option to old --slave-parallel-mode).
Hm... The fact that a transaction did a lock wait on master doesn't mean that the conflicting transaction was committed on master, or that both of these transactions were committed close enough to even make it possible to be executed in parallel on slaves, right? Are you sure that this flag will be useful?
These options will also be usable per multi-source master connection, like --master1.slave-parallel-mode=all_transactions. The options will be possible to change dynamically also (with SET GLOBAL), though the associated slave threads must be stopped while changing.
Also, Monty suggested to rename @@replicate_allow_parallel to
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
When this option is enabled on the master when a transaction is committed, that transaction will not be applied in parallel with earlier transactions (when --slave-parallel-mode=all_transactions). This can be used to reduce retries on the slave, if an application is about to do a transaction that is likely to cause a conflict and retry on a slave if applied in parallel with earlier transactions.
I think this variable will be completely useless and is not worth implementing. How user will understand that the transaction he is about to execute is likely to conflict with another transactions committed at about the same time? I think it will be completely impossible to do that judgement, at the same time it will give too much impact on the slave's behavior into users' hands. Am I missing something? What kind of scenario you are envisioning this variable to be used in?
Let me know if there are any comments to these or suggestions for changes. It is best to get these as right as possible before release (seems the intention is to include optimistic parallel replication in 10.1), since it is the user-visible part of the feature.
With these option names, the normal way to use optimistic parallel replication would be these two options in my.cnf:
slave_parallel_mode=all_transactions slave_parallel_threads=20 (or whatever)
This seems reasonably, I think. None of the other options would need be considered except in more special cases.
Hope that helps, Pavel
i don't understand what "only_commits" is, how can it prepare transaction "queue up" for group commit if only 1 is prepared in parallel ?
Right, so this is an important optimisation, let me try to explain that, and then I'll answer the other points in a different (shorter) mail for clarity. Parallel replication in MariaDB works by having the SQL driver thread schedule each transaction on a pool of worker threads in a round-robin fashion: worker1 worker2 worker3 worker4 T1 T2 T3 T4 T5 T6 T7 T8 T9 ... So in this sense, transactions are always applied in parallel threads when @@slave_parallel_threads > 0 and @@slave_parallel_mode != none. But one transaction may need to wait for another at various points to avoid conflicts. The various modes of parallel replication differ in how they do these waits. Considering a transaction to be applied, there are three points of interest during its replication: T: BEGIN; <P1: transaction start> INSERT ... UPDATE ... <P2: just before commit> COMMIT; <P3: just after commit> The COMMIT step involves updating the slave GTID position and writing to the binlog (if --log-slave-updates) as well as the InnoDB commit proper. So consider now transactions T1 and T2, T1 committed first on the master: 1. If T1 and T2 have different replication domains, and --slave-parallel-domains is ON, then they are applied in parallel without any restrictions; T2 does not wait for T1 at any of the points. This is out-of-order replication, T2 is allowed to commit before T1 on the slave. 2. Otherwise, we have in-order parallel replication, T2 will always commit after T1 on the slave. Thus, during commit, T2 will wait at its point P3 for T1 to reach P3 first. This wait is done in a way such that T1 can do a group commit of both T1 and T2 simultaneously. 3. If we have --slave-parallel-mode=only_commits, then T2 will at point P1 wait for T1 to reach point P2. This means that no query of T2 can run in parallel with any query of T1. However, all of T2 can run in parallel with the COMMIT step of T1. In particular, the COMMIT step of T2 can run in parallel with COMMIT of T1; this is what makes it possible to do group commit in MariaDB parallel replication, even without running the queries themselves in parallel. (If there is an actual row lock conflict between T1 and T2 (maybe they update the same row), then T2 will wait for T1 inside InnoDB, and group commit will not be possible. However, as long as T1 has reached its commit step, it is safe to start T2; T2 will not be able to cause T1 to wait for fail, at most T2 will just have to wait). 4. Consider now --slave-parallel-mode=follow_master_commit. In this case, if T1 and T2 were group committed together on the master (same commit_id in GTID), then at point P1 T2 will not wait for T1; T1 and T2 will be allowed to run in parallel. If we have two different group commits (T1 T2 T3) and (T4 T5) on the master, then at point P1, T4 and T5 will wait for all of T1, T2, and T3 to reach their point P2. Since T4 and T5 were group-committed together, they are conflict-free and safe to run in parallel. But we do not know if they might conflict with T1, T2, or T3, so we make sure those have reached the COMMIT step first. 5. In --slave-parallel-mode=all_transactions, we relax this even further. T2 will _not_ wait at point P1, even if it has a different commit id from T1. This is the new optimistic parallel replication mode. In this case, we might get a conflict, if first T2 modifies a row and then later T1 needs to modify this row. The result will be a deadlock; T2 gets to point P3 and waits for T1 to commit, but T1 is inside some query waiting for T2 to commit and release its row locks. The deadlock is detected, T2 is killed and rolled back, so that T1 can proceed. We then retry transaction T2. Before retrying T2, we wait for T1 to reach its point P3 first (the idea is that we now _know_ that T1 and T2 conflict; so there seems little point in trying to do them in parallel). (A conflict can also be detected as a different kind of error, for example duplicate key violation if T1 deletes a row and T2 inserts the same row. Such errors are handled the same way). 6. If T2 is not transactional (eg. MyISAM update), then it is not safe to attempt to run it in parallel with T1. So at point P1 in T2, we wait for T1 to reach point P3. (It would be enough to wait for all prior transactions to reach their point P2, but this would be more complex (=costly) to keep track of, and in optimistic parallel replication we would expect few MyISAM updates, if any). Note that a following InnoDB T3 could run in parallel with both T1 and T2. 7. If T2 is DDL, then it is not safe to run in parallel with neither T1, nor a following T3 (for example, T2 might ALTER from InnoDB to MyISAM a table modified by T1 or T3). In this case, T2 will at point P1 wait for all prior transactions to reach their point P2. and T3 will likewise at point P1 wait for T2 to reach its point P2. ----------------------------------------------------------------------- So those are the details of how the different modes actually work. The optimistic parallel replication approach looks really promising, I think. It has the potential to use _all_ the parallelism available in the replication stream, also more than what was possible on the master. There are two main limitations: (A) Whenever parallel apply is not possible due to a conflict, we run the risk of having to rollback and retry a transaction. So if this happens too frequently, the cost of this might outweight the benefit from parallel apply. (B) We commit in-order. If we have 10 worker threads, this means that we cannot start T11 before T1 has completed - all the worker threads will be occupied with the transactions T1, ..., T10. Thus, while some long-running transaction T1 is being replicated, we can at most do work N transactions ahead, where N is the value of --slave-parallel-threads. What I want to achieve is that optimistic parallel replication will work well out-of-the-box in most cases, _and_ to provide some way for the DBA to tune it to overcome limitations (A) and (B), in special cases where it will be needed. This is the motivation for the extra options (that I agree tend to look a bit out of place). For example, an application may have a hot-spot like a single row that is updated by a lot of transactions. Such row could become even more hot on an optimistic parallel slave, where retries will increase the contention on the row (limitation (A)). So we can try to detect such hot rows on the master (--slave-parallel-wait-if-conflict-on-master). And we can allow the application to explicitly declare such hot-spot updates (@@SESSION.replicate_expect_conflicts). Or imagine a common operation like: CREATE TABLE t1 ( ... ); SET SESSION replicate_expect_conflicts = 1; INSERT INTO t1 VALUES (1, ...); INSERT INTO t1 VALUES (2, ...); INSERT INTO t1 VALUES (3, ...); ... All the inserts are very likely to fail if attempted in parallel with the CREATE TABLE. If this becomes a bottleneck, it seems useful to have a way for the application to work around it. For limitation (B), I provide the --slave-parallel-domains options. If a long-running operation would block the slave for too long, it can be put into a different replication domain: SET SESSION gtid_domain_id=2; ALTER large_table ... ; Then the operation can run freely in parallel with other operations. But then the application / DBA is responsible for making sure that the operation has completed on all slaves before attempting any following operations that might conflict with it.
I still prefer "auto" as default,
Indeed, I agree, that is where I want to go as well. I guess I'm just being a bit humble here; this is new stuff, and frankly not well tested in practice yet. I feel that I need to start with something that has fine-grained control, to get some real-life experience about what works - and that will make it easier for me to understand how an "auto" mode should actually be implemented. (But I do like your earlier suggestion of just creating the "auto" option now to mean something - and then that can be later refined. I will try to get that into the first version somehow). For me, I'm still at the stage where I want to just understand if the optimistic approach is even semantically correct, before being ready to truly consider the finer points of tuning. But it should work - and even if we manage to find some corner cases where it breaks, it seems it should be possible to handle by forcing serialisation, similar to the DDL case. MySQL/MariaDB has the property of fully supporting the 'I' in ACID, unlike some other popular databases - this is what makes statement-based replication possible. Given some arbitrary transactions that run in parallel, and assume that they commit in the order T1, T2, T3, ... Then we know that if we were to run those transaction _serially_, we would end up with the exact same state of the database, no matter in what order the operations in T1, T2, T3, ... were originally run. On the parallel slave, this means that as long as we commit T1, T2, T3, ... in the same order as was done on the master, we will get the same results - if the execution of the transactions is successful. If there is a conflict, we might get a failure or a deadlock, but silently wrong data is not possible. So this is what should make the optimistic approach correct, as long as we handle (with rollback and retry) any failures and deadlocks. Thanks for reading so far ;-) - Kristian.
Pavel Ivanov <pivanof@google.com> writes:
Does this mean that group commit will be possible if slave is able to execute several transactions consecutively while previous transaction commits/fsyncs?
Yes. Because the following transaction is in all cases allowed to start as soon as the prior transaction reaches its COMMIT event, see previous mail for details.
I'd suggest to name this option differently because looking just at the list of available values it's not quite clear what could be the difference between follow_master_commits and only_commits. I don't know yet what is the best name for this. Maybe overlap_commits?
I think overlap_commits sounds good, thanks for the suggest.
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag with --slave-parallel-mode. Does it mean that when this flag is on transactions from different domains are executed on "all_transactions" level of parallelism no matter what value --slave-parallel-mode has? What will happen if this flag off but --slave-parallel-mode=all_transactions?
These apply on two different levels. With --slave-parallel-domains=on, each replication domain is replicated as completely independent streams, similar to different multi-source replication slaves. The position in each stream is tracked with each one GTID in gtid_slave_pos, and one stream can be arbitrarily ahead of another. The --slave-parallel-mode applies within each stream. Within one stream, commits are strictly ordered, and --slave-parallel-mode specifies how much parallelism is attempted. The --slave-parallel-mode can be set to any value and the server is responsible to ensure that replication works correctly. In contrast, using --slave-parallel-domains, it is the users/DBAs responsibility to ensure that replication domains are set up correctly so that no conflict can occur between them.
I feel like you are up to something here, but implementing it using this flag is not quite right.
Can you elaborate? --slave-parallel-domains controls whether we have one stream or many. --slave-parallel-mode controls what happens inside each stream. Any suggestion how to clarify?
Hm... The fact that a transaction did a lock wait on master doesn't mean that the conflicting transaction was committed on master, or that both of these transactions were committed close enough to even make it possible to be executed in parallel on slaves, right? Are you sure that this flag will be useful?
Right, and no, I'm not sure. Testing will be needed to have a better idea. If two short transactions T1 and T2 conflict on a row, T2 is quite likely to commit just after T1, and thus likely to conflict on the slave. So there is some rationale behind this.
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
I think this variable will be completely useless and is not worth implementing. How user will understand that the transaction he is about to execute is likely to conflict with another transactions committed at about the same time? I think it will be completely impossible to do that judgement, at the same time it will give too much impact on the slave's behavior into users' hands. Am I missing something? What kind of scenario you are envisioning this variable to be used in?
My main worry with optimistic parallel replication is if too many conflicts and retries on the slave will outweight the performance gained from parallelism. If this does not happen, I feel it will be awesome. So I was very focused on what to do if we _do_ get a lot of conflicts. So I wanted to give advanced users the possibility to work around hotspot rows basically, if necessary. Like single row that is updated very frequently. I did not think that this was allowing users much impact on the slave's behaviour. This option is only a heuristics, it controls how aggressive the slave will try to parallelise, but it cannot affect correctness. And the user alredy has a lot of ways to affect parallelism in optimistic parallel replication. For example, imagine lots of transactions like this executed serially on the master: UPDATE t1 SET a=a+1 WHERE id=0; UPDATE t1 SET a=a+1 WHERE id=0; UPDATE t1 SET a=a+1 WHERE id=0; ... All of these would conflict on a slave. It seems likely to cause O(N**2) transaction retries on the slave for --slave-parallel-mode=all_transactions --slave-parallel-threads=N. So the idea was that user can already cause trouble for parallelism on the slave; @@replicate_expect_conflicts is intended for the poweruser to be able to hint the slave at how to get less trouble. But I'm open to change it, if you think it's important. Your perspective is rather different from my usual point of view, which is useful input. Jonas Oreland <jonaso@google.com> writes:
I still prefer "auto" as default,
Right... I really want "normal" users to be able to just enable "auto" and have things work reasonably. And I really need fine-grained control to enable testing various combinations in real life, to better understand how to implement "auto". I need to find a good way to combine these...
if you want the fine grained control, I think an optimizer_switch approach is better than adding X new config options, i.e --parallel_mode=option1=true,option3=4 don't you think that there will be new options/variants ? i do don't you think you will want to remove options/variants ? i do
Good point. I think --slave-parallel-domains is reasonable. This is a question of semantics, the user must explicitly request it, as it can break replication if not used correctly by applications. But the other options are less clear. They all should behave equally correct. Maybe something like this: 1. --slave-parallel-domains is off by default, can be enabled if application is written to guarantee no conflicts. 2. --parallel-slave=on|off. This defines if parallelism will be done within each replication stream. 3. --slave-parallel-threads=N. This could default to 3*#CPUs or whatever if --parallel-slave is enabled on at least one multi-master slave. 4. --slave-parallel-mode=auto | <lots of fine-grained options>. Then "normal" user will only need to say --parallel-slave=on. --slave-parallel-mode defaults to "auto". I then need to consider how this affects backwards compatibility with 10.0...
I don't like having the word "transaction" in the names of some modes, ALL variants will maintain transaction semantics. Having "transaction" is name of only a few sort of implies that others are not transactional...
i think "optimistic" is better than "transactional" or "all_transactions". in the same spirit "conservative" could be "follow_master_commits"
Nice, I really like "conservative" and "optimistic". And maybe "only_commits" isn't really useful. There seems little reason not to at least use the "conservative" mode, it shouldn't cause any problems over "only_commits". So slave-parallel-mode= none | conservative | optimistic, perhaps...
i don't understand what "only_commits" is, how can it prepare transaction "queue up" for group commit if only 1 is prepared in parallel ?
The following transaction T2 is started as soon as T1 sees its COMMIT event. So if T2 can reach its own commit while T1 is still waiting for LOCK_log (or --binlog-commit-wait-usec), this is possible.
though, i think all suggestions might work if they have good defaults and a solid implementation
Right, I'd like to get the interface right from the start, but too much bikeshedding is also counterproductive. Thanks, Pavel and Jonas, for your comments! - Kristian.
On Mon, Dec 8, 2014 at 6:45 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only).
I don't understand what would be the meaning of combining this flag with --slave-parallel-mode. Does it mean that when this flag is on transactions from different domains are executed on "all_transactions" level of parallelism no matter what value --slave-parallel-mode has? What will happen if this flag off but --slave-parallel-mode=all_transactions?
These apply on two different levels. With --slave-parallel-domains=on, each replication domain is replicated as completely independent streams, similar to different multi-source replication slaves. The position in each stream is tracked with each one GTID in gtid_slave_pos, and one stream can be arbitrarily ahead of another.
This is not entirely true, right? Let's say master binlog has transactions T1.1, T1.2, T1.3, T1.4, T2.1, T1.5, T2.2 (where T1.* have domain_id = 1 and T2.* have domain_id = 2) and slave has 3 parallel threads. Then as I understand threads will be assigned to execute T1.1, T1.2 and T1.3. T2.1 won't be scheduled to execute until these 3 transactions (or at least 2 of them T1.1 and T1.2) have been committed. So streams from different domains are not completely independent, right?
The --slave-parallel-mode applies within each stream. Within one stream, commits are strictly ordered, and --slave-parallel-mode specifies how much parallelism is attempted.
The --slave-parallel-mode can be set to any value and the server is responsible to ensure that replication works correctly. In contrast, using --slave-parallel-domains, it is the users/DBAs responsibility to ensure that replication domains are set up correctly so that no conflict can occur between them.
I feel like you are up to something here, but implementing it using this flag is not quite right.
Can you elaborate? --slave-parallel-domains controls whether we have one stream or many. --slave-parallel-mode controls what happens inside each stream. Any suggestion how to clarify?
As I pointed above the streams from multiple domains are completely independent only when they are coming from multiple masters. When they come from a single master they are not completely independent and that creates a confusion (at least for me) of how these options work together in that case. I guess a big question I want to ask: why would someone want to use multiple domains together with slave-parallel-domains = off? If it's a kind of kill-switch to turn off multi-domain feature completely if it causes troubles for some reason, then I don't think it is baked deep enough to actually work like that. But I don't understand what else could it be used for.
Hm... The fact that a transaction did a lock wait on master doesn't mean that the conflicting transaction was committed on master, or that both of these transactions were committed close enough to even make it possible to be executed in parallel on slaves, right? Are you sure that this flag will be useful?
Right, and no, I'm not sure. Testing will be needed to have a better idea.
If two short transactions T1 and T2 conflict on a row, T2 is quite likely to commit just after T1, and thus likely to conflict on the slave. So there is some rationale behind this.
Right. For normal slaves T2 should be committed quickly after T1, for slaves catching up from far behind T2 should be committed in a close proximity to T1 (distance should be less than slave-parallel-threads). Both seem to be a very narrow use case to make it worth adding a flag that can significantly hurt the majority of other use cases. I think this feature will be useful only if master will somehow leave information about which transaction T2 was in conflict with, and then slave would make sure that T2 is not started until T1 has finished. Though this sounds over-complicated already.
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
I think this variable will be completely useless and is not worth implementing. How user will understand that the transaction he is about to execute is likely to conflict with another transactions committed at about the same time? I think it will be completely impossible to do that judgement, at the same time it will give too much impact on the slave's behavior into users' hands. Am I missing something? What kind of scenario you are envisioning this variable to be used in?
My main worry with optimistic parallel replication is if too many conflicts and retries on the slave will outweight the performance gained from parallelism. If this does not happen, I feel it will be awesome. So I was very focused on what to do if we _do_ get a lot of conflicts. So I wanted to give advanced users the possibility to work around hotspot rows basically, if necessary. Like single row that is updated very frequently.
I did not think that this was allowing users much impact on the slave's behaviour. This option is only a heuristics, it controls how aggressive the slave will try to parallelise, but it cannot affect correctness. And the user alredy has a lot of ways to affect parallelism in optimistic parallel replication.
For example, imagine lots of transactions like this executed serially on the master:
UPDATE t1 SET a=a+1 WHERE id=0; UPDATE t1 SET a=a+1 WHERE id=0; UPDATE t1 SET a=a+1 WHERE id=0; ...
All of these would conflict on a slave. It seems likely to cause O(N**2) transaction retries on the slave for --slave-parallel-mode=all_transactions --slave-parallel-threads=N.
So the idea was that user can already cause trouble for parallelism on the slave; @@replicate_expect_conflicts is intended for the poweruser to be able to hint the slave at how to get less trouble.
But I'm open to change it, if you think it's important. Your perspective is rather different from my usual point of view, which is useful input.
I understand everything that you say, but I think the difference between our views is that you consider DBAs and database users to be mostly the same people or two small groups sitting in the same room and easily communicating with each other. For me that's not true. For me DBAs are a distinct group of people which can sit in a different city from users, and which may not be able to communicate with users at all because there are hundreds of them and it's not clear whom some particular actions belong to. So when you say "the user already has a lot of ways to affect parallel replication" it translates to me as "there are certain workloads when parallel replication will behave slower than sequential". Yes, I agree with that. If I meet such workload I will have to turn off the parallel replication, or I (with your help) will have to find some generic improvement to make parallel replication work better with such workload too. And I want to underline that: improvement should be _generic_, it should work for all users and shouldn't involve any changes on the users' side. When you try to give users a variable that may give them control over treatment of hot rows, for me it means you create a tool that may be misused by some users because they read something on the internet, misunderstood, love doing stupid things, or whatever else. And so at some point we may wonder why the parallel replication that we worked so hard to setup doesn't actually work, and then find that it's just because of users' misbehavior. Besides when such hot rows are found in production it may be not that easy to modify users' code to add setting of this variable because there may be many different user groups involved and adding the variable by just half of them won't work... So overall I don't think this variable will be useful for large installations. Thank you, Pavel
Pavel Ivanov <pivanof@google.com> writes:
This is not entirely true, right? Let's say master binlog has transactions T1.1, T1.2, T1.3, T1.4, T2.1, T1.5, T2.2 (where T1.* have domain_id = 1 and T2.* have domain_id = 2) and slave has 3 parallel threads. Then as I understand threads will be assigned to execute T1.1, T1.2 and T1.3. T2.1 won't be scheduled to execute until these 3 transactions (or at least 2 of them T1.1 and T1.2) have been committed. So streams from different domains are not completely independent, right?
One can use --slave-domain-parallel-threads to limit the number of threads that one domain in one multi-source connection can reserve. By default, things work as in your example. With eg. --slave-parallel-threads=3 --slave-domain-parallel-threads=2, two threads will be assigned to run T1.1, T1.2, T1.3, and T1.4, and one free thread will remain to run T2.1 in parallel with them.
As I pointed above the streams from multiple domains are completely independent only when they are coming from multiple masters. When they come from a single master they are not completely independent and that creates a confusion (at least for me) of how these options work together in that case.
It's the same for multiple masters as for single masters. There is a global pool of --slave-parallel-threads=N threads. One domain in one master connection can allocate up to --slave-domain-parallel-threads=M of those to apply transactions in-order. If one master connection is using all available threads, other connections will stall :-/ So the thread management is pretty basic in the current version of parallel replication. And domain-based parallel replication is not as easy to use as I would like. Hopefully this can be improved in a later version.
I guess a big question I want to ask: why would someone want to use multiple domains together with slave-parallel-domains = off? If it's a kind of kill-switch to turn off multi-domain feature completely if it causes troubles for some reason, then I don't think it is baked deep enough to actually work like that. But I don't understand what else could it be used for.
The original motivation for replication domains is multi-source replication. Suppose we have M1->S1, M2->S1, S1->S2, S1->S3: M1 --\ /---S2 +-- S1 ---+ M2 --/ \---S3 Configuring different domains for M1 and M2 is necessary to be able to reconfigure the replication hierarchy, for example to M1->S2, M2->S2; or to S2->S3: M1 --\ /---S2 +--+ M2 --/ \---S1 ---S3 M1 --\ /---S2 ---S3 +-- S1 ---+ M2 --/ This requires a way to track the position in the binlog streams of M1 and M2 independently, hence the need for domain_id. The domains can also be used for parallel replication; this is needed to allow S2 and S3 to have the same parallelism as S1. However, this kind of parallel replication requires support from the application to avoid conflicts. Now concurrent changes on M1 and M2 have to be conflict-free not just on S1, but on _all_ slaves in the hierarchy. I think that such a feature, which can break replication unless the user carefully designs the application to avoid it, requires a switch to turn it on or off.
Both seem to be a very narrow use case to make it worth adding a flag that can significantly hurt the majority of other use cases. I think
I see your point. Another thing that makes the use case even narrower is that it will be kind of random if we actually get the lock wait in T2 on the master. So even if delaying T2 would significantly improve performance on the slave, it is not a reliable mechanism.
this feature will be useful only if master will somehow leave information about which transaction T2 was in conflict with, and then slave would make sure that T2 is not started until T1 has finished. Though this sounds over-complicated already.
Yeah, it does. What I really need is to get some results from testing optimistic parallel replication, to understand how many retries will be needed in various scenarios, and if those retries are a bottleneck for performance.
I understand everything that you say, but I think the difference between our views is that you consider DBAs and database users to be mostly the same people or two small groups sitting in the same room and easily communicating with each other. For me that's not true. For
(I meant getting your input has made me think more about use cases that are different from something like eg. Facebook, with a single carefully controlled application and a team of highly skilled database developers. Which is where I come from originally, though we were a _lot_ smaller than Facebook, obviously :-) If I understand correctly, your use case is one of a team of highly skilled DBAs (for lack of a better name) managing a database service used by a lot of users, each with their own applications, and each without high database skills. Another use case is users that run a single application on their own hosting of MariaDB, but use the database as a commodity without wanting to invest many resources in acquiring detailed MariaDB skills. These use cases are probably a lot more common than "facebook-like" applications.
So when you say "the user already has a lot of ways to affect parallel replication" it translates to me as "there are certain workloads when parallel replication will behave slower than sequential". Yes, I agree with that. If I meet such workload I will have to turn off the parallel replication, or I (with your help) will have to find some generic improvement to make parallel replication work better with such
Right, point taken.
So overall I don't think this variable will be useful for large installations.
I agree it will not be useful for you, nor for the majority of other users. The question is, if --slave-parallel-wait-if-conflict-on-master and @@replicate_expect_conflicts are sufficiently useful for a minority of users to be worth including in _some_ capacity? From the input I have gotten so far, I think I will remove them, unless someone chimes in with a different point of view (we can always add them back later in some form, if they actually turn out to be needed in real life testing). That would leave the following options (in one form or another): --master_connection.slave-parallel-mode = conservative | optimistic --slave-parallel-threads = N --master_connection.parallel-slave = on|off --master_connection.slave-parallel-domains = on|off I am not yet confident enough in the code to not provide a way to disable the new optimistic mode. And the fixed-size thread pool, while limited, is what we have for now. And being able to enable/disable in-order and out-of-order parallel replication independently on a per-master-connection basis seems useful. Or can we do better? Once again, thanks for taking the time to help me improve this important aspect of the parallel replication. Thanks, - Kristian.
hi there again, i didn't read the fully reply, sorry for that, but still wanted to say that even if @@replicate_expect_conflicts might only be useful to a very small minority of users, i think it might be worth to implement...if it's not too hard/time-consuming... I though (of course !?!) suggest a different name: @@block-parallel-slave-execution=1, which i think should be advisory, even if name suggest otherwise. and that the implementation of this is the events is marked with such a bit...and that the slave applier will obey unless slave-parallel-mode=aggressive (and 'aggressive' could be yet another mode to slave-parallel-mode) /Jonas, the automatic option-name-spam-bot On Tue, Dec 9, 2014 at 9:17 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
This is not entirely true, right? Let's say master binlog has transactions T1.1, T1.2, T1.3, T1.4, T2.1, T1.5, T2.2 (where T1.* have domain_id = 1 and T2.* have domain_id = 2) and slave has 3 parallel threads. Then as I understand threads will be assigned to execute T1.1, T1.2 and T1.3. T2.1 won't be scheduled to execute until these 3 transactions (or at least 2 of them T1.1 and T1.2) have been committed. So streams from different domains are not completely independent, right?
One can use --slave-domain-parallel-threads to limit the number of threads that one domain in one multi-source connection can reserve. By default, things work as in your example. With eg. --slave-parallel-threads=3 --slave-domain-parallel-threads=2, two threads will be assigned to run T1.1, T1.2, T1.3, and T1.4, and one free thread will remain to run T2.1 in parallel with them.
As I pointed above the streams from multiple domains are completely independent only when they are coming from multiple masters. When they come from a single master they are not completely independent and that creates a confusion (at least for me) of how these options work together in that case.
It's the same for multiple masters as for single masters. There is a global pool of --slave-parallel-threads=N threads. One domain in one master connection can allocate up to --slave-domain-parallel-threads=M of those to apply transactions in-order. If one master connection is using all available threads, other connections will stall :-/
So the thread management is pretty basic in the current version of parallel replication. And domain-based parallel replication is not as easy to use as I would like. Hopefully this can be improved in a later version.
I guess a big question I want to ask: why would someone want to use multiple domains together with slave-parallel-domains = off? If it's a kind of kill-switch to turn off multi-domain feature completely if it causes troubles for some reason, then I don't think it is baked deep enough to actually work like that. But I don't understand what else could it be used for.
The original motivation for replication domains is multi-source replication. Suppose we have M1->S1, M2->S1, S1->S2, S1->S3:
M1 --\ /---S2 +-- S1 ---+ M2 --/ \---S3
Configuring different domains for M1 and M2 is necessary to be able to reconfigure the replication hierarchy, for example to M1->S2, M2->S2; or to S2->S3:
M1 --\ /---S2 +--+ M2 --/ \---S1 ---S3
M1 --\ /---S2 ---S3 +-- S1 ---+ M2 --/
This requires a way to track the position in the binlog streams of M1 and M2 independently, hence the need for domain_id.
The domains can also be used for parallel replication; this is needed to allow S2 and S3 to have the same parallelism as S1. However, this kind of parallel replication requires support from the application to avoid conflicts. Now concurrent changes on M1 and M2 have to be conflict-free not just on S1, but on _all_ slaves in the hierarchy.
I think that such a feature, which can break replication unless the user carefully designs the application to avoid it, requires a switch to turn it on or off.
Both seem to be a very narrow use case to make it worth adding a flag that can significantly hurt the majority of other use cases. I think
I see your point. Another thing that makes the use case even narrower is that it will be kind of random if we actually get the lock wait in T2 on the master. So even if delaying T2 would significantly improve performance on the slave, it is not a reliable mechanism.
this feature will be useful only if master will somehow leave information about which transaction T2 was in conflict with, and then slave would make sure that T2 is not started until T1 has finished. Though this sounds over-complicated already.
Yeah, it does.
What I really need is to get some results from testing optimistic parallel replication, to understand how many retries will be needed in various scenarios, and if those retries are a bottleneck for performance.
I understand everything that you say, but I think the difference between our views is that you consider DBAs and database users to be mostly the same people or two small groups sitting in the same room and easily communicating with each other. For me that's not true. For
(I meant getting your input has made me think more about use cases that are different from something like eg. Facebook, with a single carefully controlled application and a team of highly skilled database developers. Which is where I come from originally, though we were a _lot_ smaller than Facebook, obviously :-)
If I understand correctly, your use case is one of a team of highly skilled DBAs (for lack of a better name) managing a database service used by a lot of users, each with their own applications, and each without high database skills. Another use case is users that run a single application on their own hosting of MariaDB, but use the database as a commodity without wanting to invest many resources in acquiring detailed MariaDB skills. These use cases are probably a lot more common than "facebook-like" applications.
So when you say "the user already has a lot of ways to affect parallel replication" it translates to me as "there are certain workloads when parallel replication will behave slower than sequential". Yes, I agree with that. If I meet such workload I will have to turn off the parallel replication, or I (with your help) will have to find some generic improvement to make parallel replication work better with such
Right, point taken.
So overall I don't think this variable will be useful for large installations.
I agree it will not be useful for you, nor for the majority of other users.
The question is, if --slave-parallel-wait-if-conflict-on-master and @@replicate_expect_conflicts are sufficiently useful for a minority of users to be worth including in _some_ capacity? From the input I have gotten so far, I think I will remove them, unless someone chimes in with a different point of view (we can always add them back later in some form, if they actually turn out to be needed in real life testing).
That would leave the following options (in one form or another):
--master_connection.slave-parallel-mode = conservative | optimistic
--slave-parallel-threads = N
--master_connection.parallel-slave = on|off
--master_connection.slave-parallel-domains = on|off
I am not yet confident enough in the code to not provide a way to disable the new optimistic mode. And the fixed-size thread pool, while limited, is what we have for now. And being able to enable/disable in-order and out-of-order parallel replication independently on a per-master-connection basis seems useful.
Or can we do better?
Once again, thanks for taking the time to help me improve this important aspect of the parallel replication.
Thanks,
- Kristian.
Jonas Oreland <jonaso@google.com> writes:
i didn't read the fully reply, sorry for that, but still wanted to say that even if @@replicate_expect_conflicts might only be useful to a very small minority of users, i think it might be worth to implement...if it's not too hard/time-consuming...
It is trivial to implement (and I already did).
I though (of course !?!) suggest a different name: @@block-parallel-slave-execution=1, which i think should be advisory, even if name suggest otherwise.
A similar name but different wording is @@skip_parallel_replication. This is consistent with @@skip_replication, which is also advisory (a slave option controls whether it takes effect or not).
and that the implementation of this is the events is marked with such a bit...and that the slave applier will obey unless slave-parallel-mode=aggressive (and 'aggressive' could be yet another mode to slave-parallel-mode)
I like 'aggressive' to control this slave-side. Thanks, - Kristian.
On Tue, Dec 9, 2014 at 12:17 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Pavel Ivanov <pivanof@google.com> writes:
This is not entirely true, right? Let's say master binlog has transactions T1.1, T1.2, T1.3, T1.4, T2.1, T1.5, T2.2 (where T1.* have domain_id = 1 and T2.* have domain_id = 2) and slave has 3 parallel threads. Then as I understand threads will be assigned to execute T1.1, T1.2 and T1.3. T2.1 won't be scheduled to execute until these 3 transactions (or at least 2 of them T1.1 and T1.2) have been committed. So streams from different domains are not completely independent, right?
One can use --slave-domain-parallel-threads to limit the number of threads that one domain in one multi-source connection can reserve. By default, things work as in your example. With eg. --slave-parallel-threads=3 --slave-domain-parallel-threads=2, two threads will be assigned to run T1.1, T1.2, T1.3, and T1.4, and one free thread will remain to run T2.1 in parallel with them.
So the slave coordinator (or I don't remember how you call it) reads relay log ahead of the last executing transaction? I.e. it will read and assign to threads T1.1, T1.2, then it will read T1.3, detect that there are no threads available for execution, but according to what you said it will still put this in the queue for thread 1, right? How long this queuing can be? Does it keep all queued events in memory? Does it depend on the size of the transactions (i.e. how much memory can it consume by this queuing)?
I guess a big question I want to ask: why would someone want to use multiple domains together with slave-parallel-domains = off? If it's a kind of kill-switch to turn off multi-domain feature completely if it causes troubles for some reason, then I don't think it is baked deep enough to actually work like that. But I don't understand what else could it be used for.
The original motivation for replication domains is multi-source replication. Suppose we have M1->S1, M2->S1, S1->S2, S1->S3:
M1 --\ /---S2 +-- S1 ---+ M2 --/ \---S3
Configuring different domains for M1 and M2 is necessary to be able to reconfigure the replication hierarchy, for example to M1->S2, M2->S2; or to S2->S3:
M1 --\ /---S2 +--+ M2 --/ \---S1 ---S3
M1 --\ /---S2 ---S3 +-- S1 ---+ M2 --/
This requires a way to track the position in the binlog streams of M1 and M2 independently, hence the need for domain_id.
The domains can also be used for parallel replication; this is needed to allow S2 and S3 to have the same parallelism as S1. However, this kind of parallel replication requires support from the application to avoid conflicts. Now concurrent changes on M1 and M2 have to be conflict-free not just on S1, but on _all_ slaves in the hierarchy.
I think that such a feature, which can break replication unless the user carefully designs the application to avoid it, requires a switch to turn it on or off.
Could there really be cases when multi-domain parallel application of transaction is safe on S1, but not safe on S2 or S3?
Both seem to be a very narrow use case to make it worth adding a flag that can significantly hurt the majority of other use cases. I think
I see your point. Another thing that makes the use case even narrower is that it will be kind of random if we actually get the lock wait in T2 on the master. So even if delaying T2 would significantly improve performance on the slave, it is not a reliable mechanism.
this feature will be useful only if master will somehow leave information about which transaction T2 was in conflict with, and then slave would make sure that T2 is not started until T1 has finished. Though this sounds over-complicated already.
Yeah, it does.
What I really need is to get some results from testing optimistic parallel replication, to understand how many retries will be needed in various scenarios, and if those retries are a bottleneck for performance.
Then I'd suggest to not add any special processing of such use case, but add something that will allow to easily monitor what happens. E.g. some status variables which could be plotted over time and show (or at least hint on) whether this is significant bottleneck for performance or not. This could be something like total time (in both wall time and accumulated CPU time) spent executing transactions in parallel, time spent rolling back transactions due to this lock conflict, time spent rolling back transactions because of other reasons (e.g. due to STOP SLAVE or reconnect after master crash), maybe also time spent waiting in one parallel thread while transaction is executing in another thread, etc. Pavel
Pavel Ivanov <pivanof@google.com> writes:
So the slave coordinator (or I don't remember how you call it) reads relay log ahead of the last executing transaction? I.e. it will read and assign to threads T1.1, T1.2, then it will read T1.3, detect that there are no threads available for execution, but according to what you said it will still put this in the queue for thread 1, right? How long this queuing can be? Does it keep all queued events in memory? Does it depend on the size of the transactions (i.e. how much memory can it consume by this queuing)?
Right. The queueing is limited by the configuration variable --slave-parallel-max-queued, which defaults to 128KB per worker thread. It does not depend on the size of the transactions (it is possible to replicate a large transaction without keeping all of it in memory at once). It does need to keep at least one event in-memory per worker thread, of course, even if an individual event exceeds --slave-parallel-max-queued. So memory consumption for queued events is generally limited by @@slave_parallel_threads * @@slave_parallel_max_queued.
M1 --\ /---S2 ---S3 +-- S1 ---+ M2 --/
I think that such a feature, which can break replication unless the user carefully designs the application to avoid it, requires a switch to turn it on or off.
Could there really be cases when multi-domain parallel application of transaction is safe on S1, but not safe on S2 or S3?
Such cases can definitely be constructed. For example, suppose S2 is stopped. User does a long ALTER TABLE t1 on M1, carefully waits for that ALTER to complete on M1 and on S1, then starts doing DML to table t1 on M2. Then, when S2 is restarted, it seems likely that it will start executing the DML in domain 2 before the ALTER in domain 2 has completed, which can break the replication. I do agree that in practice, something that breaks domain-based parallel replication on S2 and S3 is likely to also be able to cause problems on S1. On the other hand, there does not seem much harm to provide a switch to turn on or off domain-based parallel replication. (Such mechanism has to be implemented anyway, as it can only be used in GTID mode, in non-GTID mode domain-based parallel replication is always off).
What I really need is to get some results from testing optimistic parallel replication, to understand how many retries will be needed in various scenarios, and if those retries are a bottleneck for performance.
Then I'd suggest to not add any special processing of such use case, but add something that will allow to easily monitor what happens. E.g. some status variables which could be plotted over time and show (or at least hint on) whether this is significant bottleneck for performance or not. This could be something like total time (in both wall time and accumulated CPU time) spent executing transactions in parallel, time spent rolling back transactions due to this lock conflict, time spent rolling back transactions because of other reasons (e.g. due to STOP SLAVE or reconnect after master crash), maybe also time spent waiting in one parallel thread while transaction is executing in another thread, etc.
Yes, I agree, we need more of this. I think the monitoring part of the feature is currently rather weak, it probably suffers from it being now a long time since I was doing operations. Hopefully this can be significantly improved in the near future. I wonder if such accumulated-time measurements can be added liberally without significantly affecting performance? - Kristian.
----- Original Message -----
Pavel Ivanov <pivanof@google.com> writes:
So the slave coordinator (or I don't remember how you call it) reads relay log ahead of the last executing transaction? I.e. it will read and assign to threads T1.1, T1.2, then it will read T1.3, detect that there are no threads available for execution, but according to what you said it will still put this in the queue for thread 1, right? How long this queuing can be?
Does it keep all queued events in memory?
Does it depend on the size of the transactions (i.e. how much memory can it consume by this queuing)?
Right. The queueing is limited by the configuration variable --slave-parallel-max-queued, which defaults to 128KB per worker thread. It does not depend on the size of the transactions (it is possible to replicate a large transaction without keeping all of it in memory at once). It does need to keep at least one event in-memory per worker thread, of course, even if an individual event exceeds --slave-parallel-max-queued.
https://mariadb.atlassian.net/browse/MDEV-7202 for patch to add a status variable ....
Then I'd suggest to not add any special processing of such use case, but add something that will allow to easily monitor what happens. E.g. some status variables which could be plotted over time and show (or at least hint on) whether this is significant bottleneck for performance or not. This could be something like total time (in both wall time and accumulated CPU time) spent executing transactions in parallel, time spent rolling back transactions due to this lock conflict, time spent rolling back transactions because of other reasons (e.g. due to STOP SLAVE or reconnect after master crash), maybe also time spent waiting in one parallel thread while transaction is executing in another thread, etc.
added https://mariadb.atlassian.net/browse/MDEV-7340 quoting this
Yes, I agree, we need more of this. I think the monitoring part of the feature is currently rather weak, it probably suffers from it being now a long time since I was doing operations. Hopefully this can be significantly improved in the near future.
I wonder if such accumulated-time measurements can be added liberally without significantly affecting performance?
If each thread accumulate its own and if there needs to be a global it can be done like the global collation in the MDEV-7202 patch. -- -- Daniel Black, Engineer @ Open Query (http://openquery.com.au) Remote expertise & maintenance for MySQL/MariaDB server environments.
Hi again Kristian, A slightly off topic question that struck me last night: won't all parallel transactions conflict when updating the slave_gtid_pos table ? or is there something i missed... /Jonas On Thu, Dec 4, 2014 at 2:49 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
I discussed with Monty, and we came up with some more suggested changes for the options used to configure the optimistic parallel replication feature (MDEV-6676, https://mariadb.atlassian.net/browse/MDEV-6676).
The biggest change is to split up the --slave-parallel-mode into multiple options. I think that is reasonable, probably that option was doing too many things at once.
Instead, we could have the following three options:
--slave-parallel-mode=all_transactions | follow_master_commits | only_commits | none
"all_transactions" is what was called "transactional" before. The slave will try to apply all transactional DML in parallel; in case of conflicts it will roll back the later transaction and retry it.
"follow_master_commits" is the 10.0 functionality, apply in parallel transactions that group-committed together on the master (the default).
"only_commits" was suggested to me by a user testing parallel replication. It does not attempt to apply transactions in parallel, but still runs the commit steps in parallel, making slave group commit possible and thus saving on fsyncs if durability settings are on.
"none" means the parallel replication code is not used (same as --slave-parallel-threads=0, but now configurable per multimaster connection). (This corresponds to empty value in old --slave-parallel-mode).
--slave-parallel-domains=on|off (default on)
"This replaces the "domain" option of old --slave-parallel-mode. When enabled, parallel replication will apply in parallel transactions whose GTID has different domain ids (GTID mode only).
--slave-parallel-wait-if-conflict-on-master=on|off (default on)
When enabled, if a transaction had to do a row lock wait on the master, it will not be applied in parallel with any earlier transaction on the slave (idea is that such transaction is likely to get a conflict on the slave, causing a needless retry). (This was the "waiting" option to old --slave-parallel-mode).
These options will also be usable per multi-source master connection, like --master1.slave-parallel-mode=all_transactions. The options will be possible to change dynamically also (with SET GLOBAL), though the associated slave threads must be stopped while changing.
Also, Monty suggested to rename @@replicate_allow_parallel to
@@SESSION.replicate_expect_conflicts=0|1 (default 0)
When this option is enabled on the master when a transaction is committed, that transaction will not be applied in parallel with earlier transactions (when --slave-parallel-mode=all_transactions). This can be used to reduce retries on the slave, if an application is about to do a transaction that is likely to cause a conflict and retry on a slave if applied in parallel with earlier transactions.
Let me know if there are any comments to these or suggestions for changes. It is best to get these as right as possible before release (seems the intention is to include optimistic parallel replication in 10.1), since it is the user-visible part of the feature.
With these option names, the normal way to use optimistic parallel replication would be these two options in my.cnf:
slave_parallel_mode=all_transactions slave_parallel_threads=20 (or whatever)
This seems reasonably, I think. None of the other options would need be considered except in more special cases.
- Kristian.
Jonas Oreland <jonaso@google.com> writes:
A slightly off topic question that struck me last night: won't all parallel transactions conflict when updating the slave_gtid_pos table ?
They would, if the GTID was not carefully designed in anticipation of this issue. So the GTID position is updated in slave_gtid_pos with an INSERT, not an UPDATE. This way, multiple updates can be done concurrently. Each row in the gtid_slave_pos table contains an incrementing counter, and the highest value of the counter denotes the "current" row at any one time. I wrote a more detailed explanation of this here: http://kristiannielsen.livejournal.com/17008.html - Kristian.
ok, fair enough... and you auto purge it every now and then... /Jonas On Fri, Dec 5, 2014 at 12:01 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Jonas Oreland <jonaso@google.com> writes:
A slightly off topic question that struck me last night: won't all parallel transactions conflict when updating the slave_gtid_pos table ?
They would, if the GTID was not carefully designed in anticipation of this issue.
So the GTID position is updated in slave_gtid_pos with an INSERT, not an UPDATE. This way, multiple updates can be done concurrently. Each row in the gtid_slave_pos table contains an incrementing counter, and the highest value of the counter denotes the "current" row at any one time.
I wrote a more detailed explanation of this here:
http://kristiannielsen.livejournal.com/17008.html
- Kristian.
participants (5)
-
Daniel Black
-
Jonas Oreland
-
Kristian Nielsen
-
Pavel Ivanov
-
Sergei Golubchik