On Tue, Sep 16, 2014 at 12:38 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Robert Hodges <robert.hodges@continuent.com> writes:

> In that case it gets complex for outsiders to figure out how to restart if
> there's a failure. Let's say my transaction fails after T1 commits but
> before T1 commits. Then on restart I have to regenerate T2 and rerun it.
>  That could be hard if T2 contains statements that came before T1 in the
> original, serialized log.

Agree, there are different approaches possible.

The facility in MariaDB discussed here is precisely to apply in parallel a
serialized log. We apply the transaction in parallel, but the commits happen
in the original serial order. This makes the parallel apply transparent to
applications, assuming MVCC.

Each transaction inserts in a table its own monotonic transaction number. So
after a restart, it is easy to find the point at which to resume - just find
the highest number in the table.

One limitation of this approach is that with N worker threads, we can never
execute more than N transactions ahead of the slowest transaction in the log,
because every thread needs to wait for the prior commit.

Right.  I thought about that problem a lot in the Tungsten parallel apply design and ended up with an approach that allows workers to diverge by several minutes or longer.  This enables Tungsten to maintain good throughput even in the face of lumpy workloads that contain transactions ranging from single inserts to updates involving hundreds of thousands or millions of rows.  We did some early performance work in production environments that showed the need for wide divergence to avoid serialization around the "lumps" in the load.  

> Without a "group" transaction another approach is to keep separate restart
> points for each worker thread.  It works well as long as you can
> deterministically assign transactions to threads and keep those same
> threads from getting too far apart. The logic to do this is non-trivial, so
> it's not an approach for everybody.

MariaDB also supports this approach. Here, the user explicitly assigns each
transaction to a replication domain, and restart point is kept per-domain (not
per-transaction). Different domains commit independently and out-of-order with
respect to one another. Within one domain, commit order is strictly
serialised.

I like the approach with replication domains. Not only does it avoid the need
for non-trivial logic to assign transactions to different threads and restart
points. It also ensures that those restart points make sense to the user/DBA.

So are replication domains "shards"?  My definition of a shard in this context is a causally independent stream of transactions, which is effectively a partial order within the fully serialized log.  That's an excellent feature.  Assuming that's what you have done, how do you handle operations like CREATE USER that are global in effect?  (Just point me to docs or your blog if you wrote it up.  I would love to learn more.)

Cheers, Robert