Andrei Elkin <andrei.elkin@mariadb.com> writes:
I reviewed your branch to agree with the idea of xid conflicts handling and its implementation.
I however could not understand the need of the refactoring part. In order to track the xid dependency couple functions and and rpl_parallel_entry::maybe_active_xid a sort of a sliding window - that the 2nd commit of your branch introduced must be sufficient to my analysis. Of course it needs to hold worker indexes of active xid:s.
This observation led me to create a review branch
origin/review__knielsen_xa_sched_minimal_fix
[where origin git@github.com:MariaDB/server.git] which is a somewhat light elaboration over the 2nd commit of your branch.
The reason for the refactoring part is to allow to distribute the transactions as evenly as possible over the worker threads, given the scheduling constraints imposed by XA dependencies. Let's say we have 5 worker threads, and transactions T1..T15. If there are no scheduling constraints, we can schedule evenly like this: W1: T1 T6 T11 W2: T2 T7 T12 W3: T3 T8 T13 W4: T4 T9 T14 W5: T5 T10 T15 Now suppose we need to schedule T6 on the same worker as T4, and T8 on the same worker as T3. The most even way to distribute the transactions is now this: W1: T1 T7 T12 W2: T2 T9 T14 W3: T3 T8 T13 W4: T4 T6 T11 W5: T5 T10 T15 You see, we try to always have 5 consecutive transactions T_i, T_{i+1}, ..., T+{i+4} scheduled on 5 different worker threads, except when this is not possible due to dependency restrictions. But this most-even scheduling requires to change the scheduling order of threads from the original sequential 1,2,3,4,5. You see, the last transactions T11..T15 are scheduled on worker threads in order: W4, W1, W3, W2, W5. The refactoring patch introduces thread_sched_fifo to hold the current scheduling order of worker threads. Whenever we have to schedule out-of-order due to a scheduling dependency, we put the scheduled worker at the end of this fifo, to preserve even scheduling for the next N transactions. In the original scheduling code, we never scheduled workers out-of-order, so the scheduling order was always cyclic W1, W2, W3, W4, W5, W1, ..., and a simple incrementing counter was sufficient. But this is no longer sufficient when out-of-order scheduling occurs. In the example, if we use a simple counter and try to schedule on worker ((i+1) mod N) after worker (i), then we get the following uneven scheduling: W1: T1 T11 W2: T2 T12 W3: T3 T8 T13 W4: T4 T6 T9 T14 W5: T5 T7 T10 T15 Because T6 and T8 have particular scheduling requirements, they cause the scheduling to skip workers. The result is that W4 and W5 get too many transactions, while W1 and W2 get too few, and parallelism is reduced. I hope this explain the reasoning.
Could you please have a look at it?
If I understand your patch correctly, it schedules using a simple incrementing counter when there are no dependency requirements, and so would suffer from the uneven scheduling in the above example. It also looks like there's a mistake in the code:
+ if ((idx= check_xa_xid_dependency(>id_ev->xid)) < (uint32) -1) + { + /* + A previously scheduled event group with the same XID might still be + active in a worker, so schedule this event group in the same worker + to avoid a conflict. + */ + } + else + { + /* Record this XID now active. */ + xid_active_generation *a= + (xid_active_generation *)alloc_dynamic(&maybe_active_xid); + if (!a) + return NULL; + ++idx; + if (idx >= rpl_thread_max) + idx= 0;
If check_xa_xid_dependency() returns "no dependency", then idx is set to (uint32)-1 and the else {...} branch is executed. This does idx++, which increments the (uint32)-1 to 0. So it looks to me that _all_ transactions are scheduled on worker 0, or am I missing something? This is a simple mistake, it's an easy fix to not assign idx when check_xa_xid_dependency() returns "no dependency". But the code would still suffer from uneven scheduling as in the example, IIUC. So there's a good reason to introduce the thread_sched_fifo as done in my refactoring patch. This though still leaves the limitation in my minimal patch that XA PREPARE xid1 cannot group commit with XA COMMIT xid1. The impact of this will depend on how many transactions on average appear between an XA PREPARE and its associated XA COMMIT, as well as on how expensive fsync() is relative to the cost of each transaction. - Kristian.