Sachin Setiya <sachin.setiya@mariadb.com> writes:
<knielsen> sachin_setiya_7: so maybe the problem is - that a node broadcasts its write set before the commit order has been determined?
I do not think , this is the problem. Galera enforces the commit order. Yes, it broadcast write set in prepare phase. but it also
guarantees that t1->t2 order will be maintained in all participating N nodes.
Ok, good.
<knielsen> sachin_setiya_7: how is the galera internal transaction id allocated and broadcast?
I am here assuming that we are talking about gtid-sequence no.
Suppose our initial seqno is S. So basically at this time all N have same sequence no.
Some transaction T is executed at node Ni .It broadcast the writeset with
its current sequence no S.
At all Node Nj (including Ni).It receives this message. It checks some conditions
Like it “totally ordered action”. If yes then Nj updates its sequence no to + 1.
Right. I think "totally ordered action" is something like DDL, which needs special handling to ensure same commit order over the entire cluster. I am wondering if there is a slightly different method used for normal DML transactions, or if it is the same. In any case, I am guessing it must work much the same way, because after the prepare phase the transaction will be written into the binlog, and at that point it cannot be rolled back, so it should have passed certification and gotten its Galera transaction ID.
Blueprint of task:- We can do something like galera GTID, we will take initial
sequence no from server. We will add one more variable in gcs_group_t
Named s_sequence_no and will increment it at each node. We also have to
Create a gtid event and append it to message received at Nj , so that on late stages wsrep_apply_cb() can take care of gtid.
It sounds reasonable. I still don't fully understand how the Galera transaction id is generated, but my guess is that doing something similar for the MariaDB GTID sequence number should be the right way. But one more thing is needed, which is to ensure that the transactions will be written into the binary log in the same order as the GTID sequence numbers were assigned. This is necessary for GTID to work. The slave only stores one GTID position for each replication domain. So the sequence numbers must be in the right order in the master binlog, otherwise the slave cannot determine the starting position correctly. The original way I imagined this would be done is that Galera would take over the transaction coordinator role and implement the TC_LOG::log_and_order() virtual method. This was though not done, and I imagine it would be a somewhat large task. One possible simpler alternative is to use the wait_for_prior_commit, similar to how parallel replication does it. The idea would be that if Galera commits transactions T1, T2, T3, ... in order, Galera would for each T_i call wait_for_commit::register_wait_for_prior_commit(T_(i-1)). This will make TC_LOG_BINLOG::log_and_order() write the transactions to the binlog in the correct order. See comments on struct wait_for_commit in sql/sql_class.h. Another thing that needs to be handled is what happens to transactions that enter Galera through normal replication from another cluster, using MariaDB parallel replication. In this case, the commit order and GTID sequence numbers are already decided, from the replication master. Some way will be needed to force Galera to use the same commit order that replication has, it cannot invent its own order or GTIDs. I am not sure if that is even possible. So maybe it will be necessary to forbid the use of parallel replication with GTID against a Galera cluster? - Kristian.