Re: [PATCH] Fix redundant ER_PRIOR_COMMIT_FAILED in parallel replication

Hi Monty, thanks for review! Michael Widenius <michael.widenius@gmail.com> writes:
Review of patch:
wait_for_prior_commit() can be called multiple times per event group, only do my_error() the first time the call fails.
--- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -8365,6 +8365,18 @@ wait_for_commit::wait_for_prior_commit2(THD *thd, bool allow_kill) }
+void +wait_for_commit::prior_commit_error(THD *thd) +{ + /* + Only raise a "prior commit failed" error if we didn't already raise + an error. + */ + if (!thd->get_stmt_da()->is_set()) + my_error(ER_PRIOR_COMMIT_FAILED, MYF(0)); +} + + /* Wakeup anyone waiting for us to have committed.
diff --git a/sql/sql_class.h b/sql/sql_class.h index 4cee31b296e..69b021cd41d 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -2398,8 +2398,8 @@ struct wait_for_commit return wait_for_prior_commit2(thd, allow_kill); else { - if (wakeup_error) - my_error(ER_PRIOR_COMMIT_FAILED, MYF(0)); + if (unlikely(wakeup_error)) + prior_commit_error(thd); return wakeup_error; } }
The only small objection I have to this patch is that if we get first one error and then a wait_for_prior_commit error, then we will not get the second error. I would suggest that instead of checking for if (!thd->get_stmt_da()->is_set()) instead add a new flag 'got_prior_commit_error' to check if we have already got a prior commit error for this thread. Adding bool get_prior_commit_error into THD after bool wakeup_blocked; has no storage cost.
It is actually deliberate to skip the ER_PRIOR_COMMIT_FAILED if we already gave a different error. The reason is that the ER_PRIOR_COMMIT_FAILED is not a real error - it is an internal error thrown only to make sure that a following transaction does not succeed after a prior transaction failed, which would result in inconsistent commit order. Therefore, if a worker thread already got a real error, for example duplicate key or row not found, then it seems preferable to keep that error rather than override it with a default, internal-only error. (Arguably it might make sense to not write ER_PRIOR_COMMIT_FAILED errors to the error log at all, as they do not seem very useful; they are not an error that the user should do anything about, on the contrary they indicate that the worker correctly handled an earlier error. But that would be a separate patch, probably).
It up to you to decide which solution to choose.
Ok, then I will keep this, but I will add a comment explaining why the original error is preferred over ER_PRIOR_COMMIT_FAILED.
This should be pushed to 10.6 to avoid the 63*4 warnings we can get with parallel=4.
Ok, will do then.
I will try to spend some time today to try to simulate the case where we get new calls to wait_for_prior_commit after signaling a previous failure from it.
I hope you find a way to reproduce, I did not manage to do so when I tried. It would be good to find the root cause of those repeated ER_PRIOR_COMMIT_FAILED. - Kristian.
participants (1)
-
Kristian Nielsen