[MariaDB developers] Re: Re2: Starting on review of XA patches in bb-10.6-MDEV-31949

12 Feb 2024

      Howdy Kristian.

I am responding to two notes of

- innobase_commit_by_xid() block
- run_commit_ordered() 's parallel slave branch for XAC.
...
andrei.elkin@pp.inet.fi writes:
...
I see now that alternatively thd_binlog_format(thd) could be used
which will always
return BINLOG_FORMAT_UNSPEC when that's the case (or @@skip_log_bin=1).
You think it's better to covert the block to use that?
I think it's better to define a proper API for this.
It feels an overkill in the current case.
It'd be simpler to replace this block I added with, see -/+ below:

innobase_commit_by_xid(
/*===================*/
	handlerton*	hton,
	XID*		xid)	/*!< in: X/Open XA transaction identification */
{
	DBUG_ASSERT(hton == innodb_hton_ptr);

	DBUG_EXECUTE_IF("innobase_xa_fail",
			return XAER_RMFAIL;);

	if (high_level_read_only) {
		return(XAER_RMFAIL);
	}

	if (trx_t* trx = trx_get_trx_by_xid(xid)) {
-       /* my stuff and  innobase_commit_low(trx); */        
+        	innobase_commit_ordered_2(current_thd, trx)

Let me try that out.
...
We have a storage engine API. We have an internal TC interface with class
TC_LOG.
Here, the binlog TC wants to tell the engine that it need not fsync, the
commit is already made durable. Make a proper API for this. Document how
this should be implemented in another TC implementation. Document how
another engine should handle it. Look for example at the comments in
sql/handler.h on commit_checkpoint_request() which documents the similar
behaviour for normal commit. Not saying your case should be identical; my
point is that whatever method is used, it should be a well-defined API that
is designed to be well usable by different TC implementations and different
engines.
I do not see how thd_binlog_format(thd) is any better, especially since IIUC
you need to call thd_binlog_pos() anyway to record the binlog position with
the commit.
...
...
...
And why should this be needed only for parallel replication, not
for
non-parellel replication or non-replication SQL? I don't understand.
It's a kind of optimization for the slave side, in that `xid` of XAP
(Prepare) gets released for a XAC (Commit) the soonest.
I still don't understand why this cannot be the same for non-parallel
replication as for parallel replication? Unless there is a reason, it seems
better to use as similar logic as possible for the two cases to reduce risk
of different behaviour?
I see your point.

A better way to specify this run_commit_ordered() condition must be to
exchange the if() and assert() arguments:

    if (thd->rgi_slave && thd->rgi_slave->is_parallel_exec)
    {
      DBUG_ASSERT(thd->transaction->xid_state.is_dummy_XA());

That should settle this note, agreed?

With saying that I hope the dummy XA will be eventually removed by
implementing out-of-order (XAC) commit by the XA-PREPARE (XAP) worker.
...
...
...
...
If the XA COMMIT is before this binlog position, then we know the
first transaction became durable in InnoDB, so we know we have to
roll back
the current prepared transaction in the engine. Otherwise we should
commit
it. I see the recovery code already uses binlog positions to decide
the fate
of pending XA transactions in the engines, but I did not yet fully
understand what is going on there.
Another way could be to binlog an extra event before the second XA 
PREPARE
with the duplicate XID.
Well, this does not cover the case of prior to crash
XA PREPARE only made to Engine. The above consulting with Innodb
resolves the ambiguity generally.
Yes. To solve it by binlogging an extra event between the first commit and
the second prepare, we would need the commit to fsync inside InnoDB before
the binlogging of the extra event.
- Kristian.
Cheers,

Andrei

[MariaDB developers] Re: Re2: Starting on review of XA patches in bb-10.6-MDEV-31949

andrei.elkin＠pp.inet.fi