[Maria-developers] Group commit id in mysqlbinlog output
There was a question on the output of mysqlbinlog related to group commit. In 10.0, if two transactions group commit together on the master, their GTID event contains a "commit id" - a 64-bit number. This is used by parallel replication; if two transactions have the same commit id, they can be executed in parallel. In mysqlbinlog output, this looks like this: #140314 13:42:56 server id 1 end_log_pos 772 GTID 0-1-12 cid=180 ... #140314 13:42:56 server id 1 end_log_pos 1027 GTID 0-1-13 cid=180 But if a transaction commits alone on the master (no other transactions participate in the group commit), the mysqlbinlog outpus has no commit id: #140314 13:42:56 server id 1 end_log_pos 437 GTID 0-1-10 The question was why there is no cid=X in the second case, as it makes scripting/grepping the output harder. The reason is that there is no commit id in the event in the binlog in this case (to reduce the size of the binlog). So there is no valid number to put in there. One option could be to make the output something like this: #140314 13:42:56 server id 1 end_log_pos 437 GTID 0-1-10 cid=<none> This would allow to grep for "cid=" and catch everything. I do not have a strong opinion one way or the other, if there are people who find this useful, then let me know and I can change it. - Kristian.
Hi Kristian, First of all thanks for the great on-list explanations of your parallel replication features. It looks as if you are making good progress on a very hard problem. Second, this is slightly off-topic but can you expand somewhat on the semantics of group-committed transactions in the binlog? Here are a few questions: a.) It seems logical that transactions within a group commit should appear together in the binlog and should be serialized before and after other transactions in the binlog. Is there *any* way this ordering could be violated, for example to mix in a non-grouped transaction? b.) Is there any ordering of the transactions within the group commit in the binlog for example sorted based on the resources each uses? Or is it more or less random based on time locks are acquired, etc.? c.) How do you handle commit timestamps on group-committed transactions? Are they identical? In past MySQL releases I have found instances where timestamps can walk backwards across succeeding transactions. Such anomalies can be very troublesome for downstream consumers like data warehouses that want to create materialized, point-in-time views or partition data based on time of commit. (Ask me how I know.) Any clarifications you can offer would be most welcome. Cheers, Robert On Fri, Mar 14, 2014 at 5:51 AM, Kristian Nielsen <knielsen@knielsen-hq.org>wrote:
There was a question on the output of mysqlbinlog related to group commit.
In 10.0, if two transactions group commit together on the master, their GTID event contains a "commit id" - a 64-bit number. This is used by parallel replication; if two transactions have the same commit id, they can be executed in parallel.
In mysqlbinlog output, this looks like this:
#140314 13:42:56 server id 1 end_log_pos 772 GTID 0-1-12 cid=180 ... #140314 13:42:56 server id 1 end_log_pos 1027 GTID 0-1-13 cid=180
But if a transaction commits alone on the master (no other transactions participate in the group commit), the mysqlbinlog outpus has no commit id:
#140314 13:42:56 server id 1 end_log_pos 437 GTID 0-1-10
The question was why there is no cid=X in the second case, as it makes scripting/grepping the output harder. The reason is that there is no commit id in the event in the binlog in this case (to reduce the size of the binlog). So there is no valid number to put in there.
One option could be to make the output something like this:
#140314 13:42:56 server id 1 end_log_pos 437 GTID 0-1-10 cid=<none>
This would allow to grep for "cid=" and catch everything. I do not have a strong opinion one way or the other, if there are people who find this useful, then let me know and I can change it.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Robert Hodges <robert.hodges@continuent.com> writes:
a.) It seems logical that transactions within a group commit should appear together in the binlog and should be serialized before and after other transactions in the binlog. Is there *any* way this ordering could be violated, for example to mix in a non-grouped transaction?
No the ordering is completely fixed. All transactions that group-commit together will be written into the binlog as a unit, without any other transactions in-between being possible. The ordering will also be identical in the binlogs on any slaves (if using multiple replication domains, then the ordering is identical within a single domain).
b.) Is there any ordering of the transactions within the group commit in the binlog for example sorted based on the resources each uses? Or is it more or less random based on time locks are acquired, etc.?
There is no ordering. It is just the order in which each thread happens to reach the point in the code where it obtains the necessary lock and adds itself to the list of transactions waiting for group commit.
c.) How do you handle commit timestamps on group-committed transactions? Are they identical? In past MySQL releases I have found instances where timestamps can walk backwards across succeeding transactions. Such anomalies can be very troublesome for downstream consumers like data warehouses that want to create materialized, point-in-time views or partition data based on time of commit. (Ask me how I know.)
I am not very familiar with timestamps in binlog events. However, all the transactions in the group commit are written out one after the other, in a single thread holding the lock on the binlog. And the GTID events are generated during that loop. So I think that at least the timestamps of the GTID events in the group can never walk backwards, nor can they from one group to the next. But I am not sure if the time stamps of other events in the transaction can be earlier (maybe they were generated when the query was run, not when it was committed?
Any clarifications you can offer would be most welcome.
I hope the above helps, else please ask again. - Kristian.
Hi Kristian, Thanks for the prompt and detailed response! I’m glad you clarified that GTIDs cannot ever walk backwards. It’s really bad design if they are not monotonically increasing and comparable. This will really help restart logic in a number of places, not just for your own replication. The timestamp issue is mysterious one. I also don’t fully understand how the timestamp is generated for the event header. All I know is that it sometimes walks backwards, possibly as a result of large transactions or load data infile commands. If I can find a reproducible case I will post it, as it can cause inscrutable problems downstream when loading to other systems. (Ask me how I know…) Meanwhile, good luck on your replication work. It seems to be proceeding in a good direction. Cheers, Robert On March 14, 2014 at 8:01:04 AM, Kristian Nielsen (knielsen@knielsen-hq.org) wrote: Robert Hodges <robert.hodges@continuent.com> writes:
a.) It seems logical that transactions within a group commit should appear together in the binlog and should be serialized before and after other transactions in the binlog. Is there *any* way this ordering could be violated, for example to mix in a non-grouped transaction?
No the ordering is completely fixed. All transactions that group-commit together will be written into the binlog as a unit, without any other transactions in-between being possible. The ordering will also be identical in the binlogs on any slaves (if using multiple replication domains, then the ordering is identical within a single domain).
b.) Is there any ordering of the transactions within the group commit in the binlog for example sorted based on the resources each uses? Or is it more or less random based on time locks are acquired, etc.?
There is no ordering. It is just the order in which each thread happens to reach the point in the code where it obtains the necessary lock and adds itself to the list of transactions waiting for group commit.
c.) How do you handle commit timestamps on group-committed transactions? Are they identical? In past MySQL releases I have found instances where timestamps can walk backwards across succeeding transactions. Such anomalies can be very troublesome for downstream consumers like data warehouses that want to create materialized, point-in-time views or partition data based on time of commit. (Ask me how I know.)
I am not very familiar with timestamps in binlog events. However, all the transactions in the group commit are written out one after the other, in a single thread holding the lock on the binlog. And the GTID events are generated during that loop. So I think that at least the timestamps of the GTID events in the group can never walk backwards, nor can they from one group to the next. But I am not sure if the time stamps of other events in the transaction can be earlier (maybe they were generated when the query was run, not when it was committed?
Any clarifications you can offer would be most welcome.
I hope the above helps, else please ask again. - Kristian.
AFAIK, timestamp for a binlog event is the time when corresponding statement started to execute. The order of statements in the binlog is according to the time when statements were committed, which of course doesn't have anything to do with the time statement started to execute. That's why timestamps in the binlog can jump back and forth randomly. But I'd think that if one looks at timestamps only on GTID events then he should expect to see monotonically increasing time within each domain. Pavel On Fri, Mar 14, 2014 at 8:54 AM, Robert Hodges <robert.hodges@continuent.com
wrote:
Hi Kristian,
Thanks for the prompt and detailed response!
I’m glad you clarified that GTIDs cannot ever walk backwards. It’s really bad design if they are not monotonically increasing and comparable. This will really help restart logic in a number of places, not just for your own replication.
The timestamp issue is mysterious one. I also don’t fully understand how the timestamp is generated for the event header. All I know is that it sometimes walks backwards, possibly as a result of large transactions or load data infile commands. If I can find a reproducible case I will post it, as it can cause inscrutable problems downstream when loading to other systems. (Ask me how I know…)
Meanwhile, good luck on your replication work. It seems to be proceeding in a good direction.
Cheers, Robert
On March 14, 2014 at 8:01:04 AM, Kristian Nielsen ( knielsen@knielsen-hq.org) wrote:
Robert Hodges <robert.hodges@continuent.com> writes:
a.) It seems logical that transactions within a group commit should appear together in the binlog and should be serialized before and after other transactions in the binlog. Is there *any* way this ordering could be violated, for example to mix in a non-grouped transaction?
No the ordering is completely fixed. All transactions that group-commit together will be written into the binlog as a unit, without any other transactions in-between being possible. The ordering will also be identical in the binlogs on any slaves (if using multiple replication domains, then the ordering is identical within a single domain).
b.) Is there any ordering of the transactions within the group commit in the binlog for example sorted based on the resources each uses? Or is it more or less random based on time locks are acquired, etc.?
There is no ordering. It is just the order in which each thread happens to reach the point in the code where it obtains the necessary lock and adds itself to the list of transactions waiting for group commit.
c.) How do you handle commit timestamps on group-committed transactions? Are they identical? In past MySQL releases I have found instances where timestamps can walk backwards across succeeding transactions. Such anomalies can be very troublesome for downstream consumers like data warehouses that want to create materialized, point-in-time views or partition data based on time of commit. (Ask me how I know.)
I am not very familiar with timestamps in binlog events. However, all the transactions in the group commit are written out one after the other, in a single thread holding the lock on the binlog. And the GTID events are generated during that loop. So I think that at least the timestamps of the GTID events in the group can never walk backwards, nor can they from one group to the next. But I am not sure if the time stamps of other events in the transaction can be earlier (maybe they were generated when the query was run, not when it was committed?
Any clarifications you can offer would be most welcome.
I hope the above helps, else please ask again.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi Pavel, On March 14, 2014 at 9:09:47 AM, Pavel Ivanov (pivanof@google.com) wrote: AFAIK, timestamp for a binlog event is the time when corresponding statement started to execute. The order of statements in the binlog is according to the time when statements were committed, which of course doesn't have anything to do with the time statement started to execute. That's why timestamps in the binlog can jump back and forth randomly. But I'd think that if one looks at timestamps only on GTID events then he should expect to see monotonically increasing time within each domain. If that’s the case the GTID support is going to be helpful in areas beyond just parallelization and restart. Thank you both for the clarifications. We’ll be adding support for the MariaDB 10.0 binlog format in Tungsten Replicator later this year. Should be a pleasure to work on… :) Cheers, Robert Pavel On Fri, Mar 14, 2014 at 8:54 AM, Robert Hodges <robert.hodges@continuent.com> wrote: Hi Kristian, Thanks for the prompt and detailed response! I’m glad you clarified that GTIDs cannot ever walk backwards. It’s really bad design if they are not monotonically increasing and comparable. This will really help restart logic in a number of places, not just for your own replication. The timestamp issue is mysterious one. I also don’t fully understand how the timestamp is generated for the event header. All I know is that it sometimes walks backwards, possibly as a result of large transactions or load data infile commands. If I can find a reproducible case I will post it, as it can cause inscrutable problems downstream when loading to other systems. (Ask me how I know…) Meanwhile, good luck on your replication work. It seems to be proceeding in a good direction. Cheers, Robert
Hi!
"Kristian" == Kristian Nielsen <knielsen@knielsen-hq.org> writes:
<cut> Kristian> I am not very familiar with timestamps in binlog events. However, all the Kristian> transactions in the group commit are written out one after the other, in a Kristian> single thread holding the lock on the binlog. And the GTID events are Kristian> generated during that loop. So I think that at least the timestamps of the Kristian> GTID events in the group can never walk backwards, nor can they from one group Kristian> to the next. But I am not sure if the time stamps of other events in the Kristian> transaction can be earlier (maybe they were generated when the query was run, Kristian> not when it was committed? Time stamps in a statement is based on when the statement started, not when it committed. This is needed to ensure that all rows has the same value f you do: update big_table set timpestamp=now(); So it's normal that in a group commit, the transactions in the same commit can have totally different timestamps. Longer running transactions will typically have older timestamps. Regards, Monty
participants (4)
-
Kristian Nielsen
-
Michael Widenius
-
Pavel Ivanov
-
Robert Hodges