[Maria-developers] MDEV-520: consider parallel replication patch from taobao patches

3 Oct 2012

      Hi xiaobin,

I read through your patch found here:

    http://mysql.taobao.org/index.php/RLPR_for_MariaDB

I am still a bit confused about the status of this patch. It would be useful 
if you could explain a bit about the status of the patch:

 - How have you tested the patch.

 - How are you currently using the patch? Or is it still just
   development/test?

 - What are your current plans for the patch? (You mentioned that you have
   some changes you are implementing at the moment).

Anyway, I think I have a fair understanding of the overall idea, which seems
very interesting! I will try to summarise here in a bit more detail, as well
as give a few general comments on the patch. Please correct me if I
misunderstood anything.

----

The basic idea is to build a set (implemented as a hash table) of every unique
key in every row-event in each transaction received from the master. If two
transactions have no overlapping keys, they can be applied in parallel
(foreign_key_checks is set to OFF to make this work).

There is a single thread that processes each event received in sequence. It
checks the key set of the new transaction against the key sets of all
transactions queued for each of N worker threads:

 - If there is no conflict with any already queued transaction, the new
   transaction can be queued to any worker thread (the one with fewest queued
   transaction is chosen).

 - If there is a conflict with just one thread, then the transaction can be
   queued for that thread - it will then execute in correct sequence with the
   conflicting transactions.

 - If there is a conflict with multiple threads, wait until the conflicting
   transactions have been processed. No new transactions are queued while
   waiting.

Then there are N (default 16) threads that apply the queued
transactions. Queued transactions are processsed in parallel within one
thread, but transaction in different queues are processed in parallel with
each other.

If a transaction is received which is not using row events (statement-based
replication or DDL), then we wait until all queued transactions are
processed. Only after that is the new transaction queued, so that such events
are processed correctly (but not in parallel, of course).

Likewise, if the worker threads get too far behind, we wait with queuing any
more transactions.

----

I think this is a very interesting approach, and I am happy that someone have
actually tried implementing it.

The biggest potential issue seems to be the performance of the thread that
checks the transactions for conflicts and queues them for the appropriate
thread:

 - Main issue will be huge transactions, if the size of hash tables of key
   values become too big for available memory. Maybe this could be solved by
   having some limit above which transactions are not tried in parallel.

 - There will be a lot of hash lookups of key values, all done in a single
   thread. This could become a limit I suppose - testing will be needed to
   understand this fully. But I think there will be many workloads where this
   will be useful.

Overall, it seems to me this technique is very promising. It seems to be
better at getting maximum parallelism on the slave than any of the other
approaches that have been considered.

As to the patch itself, it seems a bit immature in the present form, but I
understood you are still working on it. Let us discuss what the plans could be
for finishing it and get it ready for inclusion in MariaDB.

The main criteria is that it must be reasonably complete - it must not crash
or silently corrupt data in any usage. However, reasonable restrictions for
what is supported is fine. (For example, we could say that LOAD DATA is not
supported and give an error, but not crash or silently ignore such
statement). But we can discuss this in more details.

----

Finally, a couple general comments on the code:

1. get_pk_value() does not seem to handle collations, eg. case insensitivity?
   It needs to understand that eg. "abab" and "ABAB" can conflict. I think
   this should be solved by copying in values with non-binary collations using
   the appropriate strxfrm() variant.

2. remote_query() retries the query in case of deadlock - but that seems
   wrrong, doesn't it need to retry the whole _transaction_, not just the
   query?

3. What about other events like User_var_event ? Will they be included in a
   later patch?

4. There is a my_sleep(10000) in the code when a transaction conflicts with
   two or more workers. It would be better to replace this with a condition
   variable so we can wake up as soon as the conflict is resolved. Same for
   some other sleeps like in wait_for_all_dml_done().

5. Shouldn't there be some locking in dispatch_transaction()? To protect the
   hash lookups when checking conflicts against worker threads doing
   pk_hash_minus() ?

6. get_pk_value(). There seems to be a buffer overflow - st_hash_item::key is
   fixed size, but there seems no check against overrunning it. One solution
   is to just truncate if data does not fit - this can lead to false positives
   where potential parallelism is not exploited, but that is still safe and
   should be ok.

Thanks!

 - Kristian.

[Maria-developers] MDEV-520: consider parallel replication patch from taobao patches

Kristian Nielsen