On Sun, 24 Jan 2010 14:27:05 -0800, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
On Fri, Jan 22, 2010 at 6:21 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Let the discussion begin!
The global transaction ID project done by Justin at Google is worth reviewing. In addition to supporting automated slave failover it also has options to make slave state crash-proof and add binlog event checksums. I doubt the patch should be reused, as the MySQL replication interface must be improved if we are to innovate -- but the wiki has a lot of details. * http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds * http://code.launchpad.net/~jtolmer/mysql-server/global-trx-ids
Hi, The global transaction ID is a cornerstone concept of a any replication system which aspires to be pluggable, extensible and go beyond basic master-slave. It is hardly possible to even start designing the rest of the API without first setting on global transaction ID. This is one of the reasons why http://forge.mysql.com/wiki/ReplicationFeatures/ReplicationInterface can be dissed without much consideration. What's good about http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds: 1) It introduces the concept of atomic database changesets. (Which, ironically, it calls "groups" due to dreaded binlog heritage) 2) It correctly identifies that (the part of) ID should be a monotonic ordinal number. 3) It correctly identifies that the global transaction ID is generated by redundancy service - in that case "MYSQL_LOG::write(Log_event *) (sql/log.cc, line 1708)" What's bad about it: 1) It fails to explicitly recognize that IDs should be a continuous sequence. In the implementation they are, but it is never stated explicitly, the only explicit requirement is monotonicity. Perhaps this is a minor omission. 2) It fails to address multi-master: (server_id, group_id) is not going to work - such pairs cannot be linearly ordered and, therefore, compared. And from the perspective of the node that needs to apply the changeset - does server_id really matter? It may be good for debugging, but it can't be a part of a global transaction ID. 3) No general theory is put behind it, it is just an attempt to fix concrete binlog implementation. In fact it is just one huge implementation detail. Inability to address mutl-master case is a direct consequence of this. In the end it is not very useful. Whatever good points are there are trivial. They and even more can be achieved by 15 minutes of abstract thinking. You don't need to know MySQL binlog format for that. In fact, you should forget about it unless you want to end up with something like (server_id, group_id). I'll take this opportunity to put forth some theory behind the global transaction IDs as we see it at Codership. 1. We have an abstract set of data subject to replication/logging. It can be a whole database, a schema, a table, a row. Lets call it a Replication Set (RS). 2. RS is undergoing changes in time which can be represented as a series of atomic changes. Let's call it RS History. That it is a _series_ is trivial but important - otherwise we can't reproduce historical RS state evolution. Each RS change is represented by a changeset. Since it is a series, RS changesets can be enumerated with a sequence of natural numbers without gaps within a given RS History. Here comes the first component of a global transaction ID: sequence number (seqno). 3. However there can be more than one RS. Moreover, the same RS can end up in different clusters and undergo different changes. So, to achieve truly global unambiguity each changeset, in addition to seqno, should be marked with a RS History ID. Obviously seqnos from different histories are logically incomparable. Therefore RS History ID can be any globally unique identifier, with no need for < or > operations. This is the second component of global transaction ID. One possible implementation for that can be (UUID, long long) pair. How redundancy service will generate those IDs is an implementation detail. For binlog/master-slave replication it is obviously trivial, even in its current state. Changing binlog format and mapping seqnos to file offsets is no big feat. What is not so obvious here is that since global transaction ID is generated by logging/replication service, it is that service that defines the order of commits, not vice versa. As a result transaction should first be passed to that service and only then committed. For one-way master-slave replication the order of operations is not so important. However for multi-master it is crucial. Note that the actual replication/logging can still happen asynchronously, but replication service must generate transaction ID before it is committed. Thanks, Alex -- Alexey Yurchenko, Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011