Hi, Kristian! Now, WL#132 - Transaction coordinator plugin
============= High-Level Specification ... In current MariaDB, we have two different TC implementations (as well as a "dummy" empty implementation that I do not know if is used).
The code in mysqld.cc is tc_log= (total_ha_2pc > 1 ? (opt_bin_log ? (TC_LOG *) &mysql_bin_log : (TC_LOG *) &tc_log_mmap) : (TC_LOG *) &tc_log_dummy); so, tc_log_dummy is used when there's at most one xa-capable engine. But MySQL does not use 2pc for a transaction unless it has at least two xa-capable participants. In other words, tc_log_dummy is never used.
Binary log ----------
The binary log implements also a "fake" storage engine, mainly to hook into the commit (and prepare) phase of transaction processing. This is mainly used for statements in non-transactional engines, which are "committed" and written to the binary log outside of the TC and log_xid() framework.
No, this is used to make the number of xa-capable transaction participants more than one and to force MySQL to use 2PC.
TC interface subclasses -----------------------
The MWL#116 has two different algorithms for handling commit order and invoking prepare_ordered() and commit_ordered() handler methods:
- One used with TC_MMAP, which needs no correspondance between engines and TC. This uses the existing log_xid() interface.
- One used with the binary log TC, which ensures same commit order in engines and binary log, and which uses a new single-threaded group_log_xid() TC interface to efficiently do group commit.
In the prototype patch for MWL#116, these two methods are mixed with each other in the function ha_commit_trans(), and the logic is quite complex. Using the log_and_order() TC generalisation provides a nice cleanup of this.
We implement two subclasses of the TC interface:
- One class TC_LOG_unordered for the method used with TC_MMAP. This implements the old log_xid() interface.
- One class TC_LOG_group_commit for the method used for the binary log. This implements the new group_log_xid() interface.
Each subclass implements the corresponding algorithm for invoking prepare_ordered() and commit_ordered(), using the same mechanisms as in MWL#116, but implemented in a cleaner way. The ha_commit_trans() function then has no details about prepare_ordered() or commit_ordered(), it just calls into tc_log->log_and_order(), which handles the necessary details.
Thus a simple TC plugin similar to the binary log or TC_MMAP can implement one of the simple interfaces log_xid() or group_log_xid(), without having to worry about prepare_ordered() and commit_ordered(). But a plugin like Galera that needs to do more can implement the more general interface.
I still see no real value in keeping or supporting log_xid() interface. I think we can only implement one interface - group_log_xid() - and that's enough.
============= Low-Level Design ... log_and_order() Requests a decision to commit (non-zero return) or rollback (zero return) of the transaction. At this point, the transaction has been successfully prepared in all engines.
The method must call run_prepare_ordered(), in a way so that calls in different threads happen in the order that the transactions are committed. This call must be protected by the global LOCK_prepare_ordered mutex.
The method must then call run_commit_ordered(), protected by LOCK_commit_ordered, again so that different threads are called in the order that transactions are committed.
The idea with prepare_ordered() is to call it as early as possible after commit order has been decided, for example to release locks early. In particular, a transaction can still be rolled back after prepare_ordered() (for example in case of a crash). In contrast, commit_ordered() may only be called after the transaction is durably committed in the TC.
If need_prepare_ordered or need_commit_ordered is passed as FALSE, then the corresponding call need not be done. It is safe to do it anyway, however omitting it avoids the need to take a global mutex.
Why would this ever be needed ? (I mean need_prepare_ordered or need_commit_ordered being FALSE) ...
A TC based on this interface overrides group_log_xid() and xid_log_after() instead of log_and_order(), and again does not need to deal with any {prepare,commit}_ordered().
Why do you need xid_log_after here ? General comment: Wouldn't it be simpler to create only group_log_xid() interface, no log_and_order() or log_xid() ? The tc plugin gets the list in group_log_xid() - it can reorder the list any way it wants, call prepare_ordered() and commit_ordered() as needed and so on. In this interpretation, group_log_xid() can meet all the use cases. And there's no need to create a multitude of methods that one needs to get familiar with before implementing a TC plugin. Regards, Sergei P.S. Minor detail - there could be helper functions like iterate_the_list_and_call_prepare_ordered(), that the plugin can use.