Hi Sergei,
Just one question, before I could answer. What does it mean "data node is committed manually after recovery"? What exactly should the user do?
Thank you for caring it! The xa commit sequence with crash recovery is like the followings.(In this case. I talk about 1 Spider node and 3 data nodes). Sorry for long explanation, answer for "What does it mean "data node is committed manually after recovery"?" is 3. 1. An application send xa prepare to Spider node. appilication -> xa prepare -> Spider node -|-> xa prepare -> data node1 |-> xa prepare -> data node2 |-> xa prepare -> data node3 return success to an application. 2. An application send xa commit to Spider node after crushing data node2. appilication -> xa commit -> Spider node -|-> xa prepare -> data node1 |-> xa prepare xx data node2 |-> xa prepare -> data node3 return error to an application. 3. Send xa recover and xa commit manually to data node2 after recovering. Status of xa transaction is recorded in mysql.spider_xa table. So you can know about you should commit or rollback the xa transaction from this table. It's human or monitoring tool operation. -> xa commit -> data node2 It is better to be able to commit through Spider node. Currently it is impossible, but I think it is possible if xid_cache_delete is skipped when xa commit get an error from a storage engine. Could you please tell me your opinion? Thanks, Kentoku 2013/10/5 Sergei Golubchik <serg@mariadb.org>
Hi, Kentoku!
On Oct 05, kentoku wrote:
Anyway, Spider should be fixed to not error out in 2pc commits, because such a commit means inconsistent data, it's a bad error, it breaks
ACID.
An engine is expected to check all preconditions during prepare, and if prepare succeeds, it is basically a guarantee that the commit will succeed, it is not allowed to fail anymore.
Does it means "an engine shouldn't return error at 2pc commit phase"? I can't understand it clearly. Currently, Spider return an error at 2pc commit phase if crash a data node between xa prepare phase and xa commit phase. In this case, Spider commits all living data nodes then returns error at 2pc commit phase. Crushed data node is committed manually after recovery. Does it break ACID? Why?
Just one question, before I could answer. What does it mean "data node is committed manually after recovery"? What exactly should the user do?
I think an engine should return error at xa commit phase if some data node fails xa commit. Because an application can't know this problem if it doesn't return an error.
Regards, Sergei