Hi Sergei,
if (victim_trx) { const trx_id_t victim_trx_id= victim_trx->id; const longlong victim_thread= thd_get_thread_id(victim_thd); /* This is necessary as correct mutexing order is lock_sys -> trx -> THD::LOCK_thd_data and below function assumes we have lock_sys and trx locked and takes THD::LOCK_thd_data for THD state check. */ wsrep_thd_UNLOCK(victim_thd); // GAP where thd or trx is not protected lock_mutex_enter(); if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
trx_rw_is_active needs to be modified to do that, right?
No this is current behaviour, I did not change anything on trx_rw_is_active
// As trx is now referenced it can't go away
Hmm. What happens if the thd that owns this transaction is killed or the user disconnects? THD gets freed. What happens to the referenced trx?
In my understanding you can't just free THD before it is aborted or committed, right ? As we have lock_sys, no trx can commit or abort inside InnoDB, and after this function this trx can't be deleted.
trx_mutex_enter(victim); // In below we take THD::LOCK_thd_data
"we take victim->mysql_thd->LOCK_thd_data", correct?
Yes
What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN failure and will just proceed killing? Perhaps if WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts anyway? Could you try to find it out?
User KILL can happen only after the node has moded to READY state so at startup you can't use it before the cluster is ready to serve. We could just ignore the TOI error here, but what is the point? There are bigger problems in the cluster if TOI fails. TOI can fail only in this node as all other nodes in the cluster will ignore the KILL command (after parsing it). R: Jan