Hi Sergei, yes, there are many reasons why cohort may fail during commit phase. Spider has a lot reasons too. In this particular case (test case provided by Elena) it fails with the following error: ERROR 42S02: Table 'mysql.spider_xa' doesn't exist Anyway it is not clear how to handle cohort commit failure properly. Let's say we have 4 cohorts participating in XA transaction. Cohort 2 and 3 fail. Cohort 1 can't rollback (because it committed). What should we do with cohort 4 (commit/rollback/nothing)? Should we remove this transaction from xid_cache? Should we indicate clearly which cohorts failed? Should it be error or a warning? Should we hold the whole system (all cohorts + manager) until failure is resolved? Thanks, Sergey On Fri, Oct 04, 2013 at 06:02:51PM +0200, Sergei Golubchik wrote:
Hi, Sergey!
On Oct 04, Sergey Vojtovich wrote:
Hi Kentoku,
I just reviewed one of your revisions, specifically bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/
I believe things are a bit more complex: 2PC protocol doesn't seem to permit cohorts to fail during commit phase: http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase
<quot> If the coordinator received an agreement message from all cohorts during the commit-request phase: 1. The coordinator sends a commit message to all the cohorts. 2. Each cohort completes the operation, and releases all the locks and resources held during the transaction. 3. Each cohort sends an acknowledgment to the coordinator. 4. The coordinator completes the transaction when all acknowledgments have been received. </quot>
I read the above as: the only problem coordinator may experience is missing acknowledgement. What shall coordinator do if some cohorts acknowledged commit, but some did not? Probably spider should detect it earlier?
Sergei, what's your opinion?
Let me see, if I understood the problem correctly. The crash happens because spider uses my_error() in the 2pc commit step, and the error status is lost up the stack, so Diagnostic_area::ok() fires an asserts on redefining the statement status. Is that right?
The server should know that the error has happened on commit and should not trigger an assert, it should report the error to the user. The error at the commit step should normally never happen, it means inconsistent data, because some participants might've already committed the transaction and they cannot roll it back anymore. Still, the commit method *might* return an error status and we shouldn't ignore it. Hardware failures are a good example of what can cause a commit error.
Anyway, Spider should be fixed to not error out in 2pc commits, because such a commit means inconsistent data, it's a bad error, it breaks ACID. An engine is expected to check all preconditions during prepare, and if prepare succeeds, it is basically a guarantee that the commit will succeed, it is not allowed to fail anymore.
Regards, Sergei