Hi Sergei,

Answers to your questions below:

On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Jan!

On Oct 06, Jan Lindström wrote:
> >
> > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, my_bool signal)
> > > > > +{
> > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > > > +  from concurrent usage or disconnect or delete. */
> > > >
> > > > How do you know victim_thd wasn't deleted before you locked
> > > > LOCK_thd_data below?
> > >
> > > I must say the thr_lock code is not familiar to me but there are
> > > mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> > > clear to me what that mutex is.
> >
> > where are mysql_mutex_lock() calls to lock->mutex?
>
> mysys/thr_lock.c there is function thr_lock() there is call to
> mysql_mutex_lock(&lock->mutex); this is before wsrep_break_lock where we
> call wsrep_thd_abort

this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
structure somewhere in the table share. It locks tables and handlers,
not threads. And InnoDB isn't using it at all anyway.

I do not know how to change this one so that thd is protected, can I even do so as this code does not know
THD internals...
 

>
> > > > >          if (victim_trx) {
> > > > > +                wsrep_thd_UNLOCK(victim_thd);
> > > >
> > > > what keeps victim_trx from disappearing here?
> > >
> > > Nothing. Do you have suggestions ?
> >
> > A couple of thoughts:
> >
> > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > point of using TOI was to make sure that even if you lock mutexes in
> > a different order, it will not cause a deadlock.
>
> This is DDL-case when we have a MDL-conflict not user KILL, we need to
> release it in my opinion because we need to take mutexes in
> lock_sys -> trx -> THD::LOCK_thd_data order,
> if I do not release I can see easily safe_mutex warnings

I don't understand. First, lock_sys and trx mutexes are not covered by
safe_mutex, so it cannot possibly warn about them.

Second, the reason for taking mutexes always in the same order is to
avoid deadlocks. And yor TOI trick should archieve that.

 static
void
wsrep_abort_transaction(
/*====================*/
handlerton* hton,
THD *bf_thd,
THD *victim_thd,
my_bool signal)
{
  DBUG_ENTER("wsrep_abort_transaction");

  trx_t* victim_trx = thd_to_trx(victim_thd);
  trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL;

  WSREP_DEBUG("abort transaction: BF: %s victim: %s victim conf: %d",
    wsrep_thd_query(bf_thd),
    wsrep_thd_query(victim_thd),
    wsrep_thd_conflict_state(victim_thd, FALSE));

  if (victim_trx) {
    const trx_id_t victim_trx_id= victim_trx->id;
    const longlong victim_thread= thd_get_thread_id(victim_thd);
    WSREP_DEBUG("wsrep_abort_transaction: Victim thread %ld "
       "transaction " TRX_ID_FMT " trx_state %d",
       thd_get_thread_id(victim_thd),
        victim_trx->id,
        victim_trx->state);
    /* This is necessary as correct mutexing order is
    lock_sys -> trx -> THD::LOCK_thd_data and below
    function assumes we have lock_sys and trx locked
    and takes THD::LOCK_thd_data for THD state check. */
    wsrep_thd_UNLOCK(victim_thd);
    // GAP where thd or trx is not protected
    DEBUG_SYNC(bf_thd, "wsrep_abort_victim_unlocked");
    DBUG_EXECUTE_IF("wsrep_abort_replicated_sleep",
        WSREP_DEBUG("wsrep_abort_transaction: sleeping "
           "for thread %ld ",
           thd_get_thread_id(victim_thd));
    lock_mutex_enter();
    if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
      // As trx is now referenced it can't go away
      trx_mutex_enter(victim);
      ut_ad(victim->mysql_thd ?
             thd_get_thread_id(victim->mysql_thd) == victim_thread :1);
      // In below we take THD::LOCK_thd_data
      wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, signal);
      trx_mutex_exit(victim);
      victim->release_reference();
      wsrep_srv_conc_cancel_wait(victim);
    }
    lock_mutex_exit();
    DBUG_VOID_RETURN;
  } else {
    WSREP_DEBUG("victim does not have transaction");
    wsrep_thd_LOCK(victim_thd);
    wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT);
    wsrep_thd_awake(victim_thd, signal);
    wsrep_thd_UNLOCK(victim_thd);
  }
  DBUG_VOID_RETURN;
}

> > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> > Does it need to wait for everything else being replicated?
>
> As long as it takes to start it on all nodes in the cluster.

So, KILL basically won't work when a cluster is starting.
Can bf aborts happen during a cluster startup?

Not sure.

R: Jan