Hi Sergei,



>   if (victim_trx) {
>     const trx_id_t victim_trx_id= victim_trx->id;
>     const longlong victim_thread= thd_get_thread_id(victim_thd);
>     /* This is necessary as correct mutexing order is
>     lock_sys -> trx -> THD::LOCK_thd_data and below
>     function assumes we have lock_sys and trx locked
>     and takes THD::LOCK_thd_data for THD state check. */
>     wsrep_thd_UNLOCK(victim_thd);
>     // GAP where thd or trx is not protected
>     lock_mutex_enter();
>     if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {

trx_rw_is_active needs to be modified to do that, right?

No this is current behaviour, I did not change anything on trx_rw_is_active 

>       // As trx is now referenced it can't go away

Hmm. What happens if the thd that owns this transaction is killed or the
user disconnects? THD gets freed. What happens to the referenced trx?

In my understanding you can't just free THD before it is aborted or committed, right ?
As we have lock_sys, no trx can commit or abort inside InnoDB, and after
this function this trx can't be deleted.
 

>       trx_mutex_enter(victim);
>       // In below we take THD::LOCK_thd_data

"we take victim->mysql_thd->LOCK_thd_data", correct?

Yes
 

What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
failure and will just proceed killing? Perhaps if
WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
anyway? Could you try to find it out?

User KILL can happen only after the node has moded to READY state so at startup you can't use
it before the cluster is ready to serve.  We could just ignore the TOI error here, but what is the point?
There are bigger problems in the cluster if TOI fails. TOI can fail only in this node as all other nodes 
in the cluster will ignore the KILL command (after parsing it).

R: Jan