Hi Kristian, 

First of all thanks for the great on-list explanations of your parallel replication features.  It looks as if you are making good progress on a very hard problem. 

Second, this is slightly off-topic but can you expand somewhat on the semantics of group-committed transactions in the binlog?  Here are a few questions:  

a.) It seems logical that transactions within a group commit should appear together in the binlog and should be serialized before and after other transactions in the binlog.  Is there *any* way this ordering could be violated, for example to mix in a non-grouped transaction?  

b.) Is there any ordering of the transactions within the group commit in the binlog for example sorted based on the resources each uses?  Or is it more or less random based on time locks are acquired, etc.? 

c.) How do you handle commit timestamps on group-committed transactions?  Are they identical? In past MySQL releases I have found instances where timestamps can walk backwards across succeeding transactions.  Such anomalies can be very troublesome for downstream consumers like data warehouses that want to create materialized, point-in-time views or partition data based on time of commit.   (Ask me how I know.) 

Any clarifications you can offer would be most welcome. 

Cheers, Robert


On Fri, Mar 14, 2014 at 5:51 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
There was a question on the output of mysqlbinlog related to group commit.

In 10.0, if two transactions group commit together on the master, their GTID
event contains a "commit id" - a 64-bit number. This is used by parallel
replication; if two transactions have the same commit id, they can be executed
in parallel.

In mysqlbinlog output, this looks like this:

#140314 13:42:56 server id 1  end_log_pos 772   GTID 0-1-12 cid=180
...
#140314 13:42:56 server id 1  end_log_pos 1027  GTID 0-1-13 cid=180

But if a transaction commits alone on the master (no other transactions
participate in the group commit), the mysqlbinlog outpus has no commit id:

#140314 13:42:56 server id 1  end_log_pos 437   GTID 0-1-10

The question was why there is no cid=X in the second case, as it makes
scripting/grepping the output harder. The reason is that there is no commit id
in the event in the binlog in this case (to reduce the size of the binlog).
So there is no valid number to put in there.

One option could be to make the output something like this:

#140314 13:42:56 server id 1  end_log_pos 437   GTID 0-1-10 cid=<none>

This would allow to grep for "cid=" and catch everything. I do not have a
strong opinion one way or the other, if there are people who find this useful,
then let me know and I can change it.

 - Kristian.

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp