Hi! Now continuing with reviews of the parallel replication code. Will take one patch at a time with a target to have everything reviewed this week. Review of revno 3678: Parallel replication: error handling. ------------------------------------------------------------ revno: 3678 committer: knielsen@knielsen-hq.org branch nick: work-10.0-mdev4506 timestamp: Mon 2013-10-14 15:28:16 +0200 message: MDEV-4506: Parallel replication: error handling. Add an error code to the wait_for_commit facility. Now, when a transaction fails, it can signal the error to any subsequent transaction that is waiting for it to commit. The waiting transactions then receive the error code back from wait_for_prior_commit() and can handle the error appropriately. Also fix one race that could cause crash if @@slave_parallel_threads were changed several times quickly in succession. <cut> === modified file 'sql/handler.cc' --- sql/handler.cc 2013-10-13 21:24:05 +0000 +++ sql/handler.cc 2013-10-14 13:28:16 +0000 @@ -1458,10 +1458,11 @@ int ha_commit_one_phase(THD *thd, bool a transaction.all.ha_list, see why in trans_register_ha()). */ bool is_real_trans=all || thd->transaction.all.ha_list == 0; + int res; DBUG_ENTER("ha_commit_one_phase"); - if (is_real_trans) - thd->wait_for_prior_commit(); - int res= commit_one_phase_2(thd, all, trans, is_real_trans); + if (is_real_trans && (res= thd->wait_for_prior_commit())) + DBUG_RETURN(res); + res= commit_one_phase_2(thd, all, trans, is_real_trans); DBUG_RETURN(res); Another way to write the above is: if (is_real_trans && !(res= thd->wait_for_prior_commit())) res= commit_one_phase_2(thd, all, trans, is_real_trans); DBUG_RETURN(res); <cut> I was checking how ha_commit_one_phase() was used. In ha_commit_trans() or trans_xa_commit(), if ha_commit_one_phase() fails, it will not run rollback. Is this right? I would have assumed that on failure, ha_commit_one_phase() should run ha_rollback_trans() before returning. === modified file 'sql/log.cc' @@ -7844,7 +7844,8 @@ int TC_LOG_MMAP::log_and_order(THD *thd, mysql_mutex_unlock(&LOCK_prepare_ordered); } - thd->wait_for_prior_commit(); + if (thd->wait_for_prior_commit()) + return 0; What does the return 0 mean here? Is it an error? Is thd->is_fatal_error assumed to be set if this happens? A comment here would not be totally wrong... @@ -5633,7 +5634,7 @@ wait_for_commit::~wait_for_commit() void -wait_for_commit::wakeup() +wait_for_commit::wakeup(int wakeup_error) { /* We signal each waiter on their own condition and mutex (rather than using @@ -5649,6 +5650,7 @@ wait_for_commit::wakeup() */ mysql_mutex_lock(&LOCK_wait_commit); waiting_for_commit= false; + this->wakeup_error= wakeup_error; Consider adding an assert that 'this->wakeup_error' is 0. Good way to ensure we don't accidently call this function twice... (as things would be bad if we first would call it with an error and then with 0) mysql_mutex_unlock(&LOCK_wait_commit); mysql_cond_signal(&COND_wait_commit); } Regards, Monty