Howdy Kristian. I am responding to two notes of - innobase_commit_by_xid() block - run_commit_ordered() 's parallel slave branch for XAC.
andrei.elkin@pp.inet.fi writes:
I see now that alternatively thd_binlog_format(thd) could be used which will always
return BINLOG_FORMAT_UNSPEC when that's the case (or @@skip_log_bin=1).
You think it's better to covert the block to use that?
I think it's better to define a proper API for this.
It feels an overkill in the current case. It'd be simpler to replace this block I added with, see -/+ below: innobase_commit_by_xid( /*===================*/ handlerton* hton, XID* xid) /*!< in: X/Open XA transaction identification */ { DBUG_ASSERT(hton == innodb_hton_ptr); DBUG_EXECUTE_IF("innobase_xa_fail", return XAER_RMFAIL;); if (high_level_read_only) { return(XAER_RMFAIL); } if (trx_t* trx = trx_get_trx_by_xid(xid)) { - /* my stuff and innobase_commit_low(trx); */ + innobase_commit_ordered_2(current_thd, trx) Let me try that out.
We have a storage engine API. We have an internal TC interface with class TC_LOG. Here, the binlog TC wants to tell the engine that it need not fsync, the commit is already made durable. Make a proper API for this. Document how this should be implemented in another TC implementation. Document how another engine should handle it. Look for example at the comments in sql/handler.h on commit_checkpoint_request() which documents the similar behaviour for normal commit. Not saying your case should be identical; my point is that whatever method is used, it should be a well-defined API that is designed to be well usable by different TC implementations and different engines.
I do not see how thd_binlog_format(thd) is any better, especially since IIUC you need to call thd_binlog_pos() anyway to record the binlog position with the commit.
And why should this be needed only for parallel replication, not for non-parellel replication or non-replication SQL? I don't understand.
It's a kind of optimization for the slave side, in that `xid` of XAP (Prepare) gets released for a XAC (Commit) the soonest.
I still don't understand why this cannot be the same for non-parallel replication as for parallel replication? Unless there is a reason, it seems better to use as similar logic as possible for the two cases to reduce risk of different behaviour?
I see your point. A better way to specify this run_commit_ordered() condition must be to exchange the if() and assert() arguments: if (thd->rgi_slave && thd->rgi_slave->is_parallel_exec) { DBUG_ASSERT(thd->transaction->xid_state.is_dummy_XA()); That should settle this note, agreed? With saying that I hope the dummy XA will be eventually removed by implementing out-of-order (XAC) commit by the XA-PREPARE (XAP) worker.
If the XA COMMIT is before this binlog position, then we know the first transaction became durable in InnoDB, so we know we have to roll back the current prepared transaction in the engine. Otherwise we should commit it. I see the recovery code already uses binlog positions to decide the fate of pending XA transactions in the engines, but I did not yet fully understand what is going on there. Another way could be to binlog an extra event before the second XA PREPARE with the duplicate XID.
Well, this does not cover the case of prior to crash XA PREPARE only made to Engine. The above consulting with Innodb resolves the ambiguity generally.
Yes. To solve it by binlogging an extra event between the first commit and the second prepare, we would need the commit to fsync inside InnoDB before the binlogging of the extra event.
- Kristian.
Cheers, Andrei