Hello Sergei!

As a maintenance sprint is over, I will return to this refactoring later, but I'll comment on everything while my memories are fresh.

On Wed, 1 May 2024 at 22:28, Sergei Golubchik <serg@mariadb.org> wrote:
> diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
> index 38e8b062f18..b5810eeaf7b 100644
> --- a/sql/log_event_server.cc
> +++ b/sql/log_event_server.cc
> @@ -5705,10 +5708,10 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
>                               &m_cols_ai : &m_cols);
>      bitmap_intersect(table->write_set, after_image);

> -    this->slave_exec_mode= slave_exec_mode_options; // fix the mode
> -
> +    COPY_INFO copy_info;
> +    Write_record write_record;
>      // Do event specific preparations
> -    error= do_before_row_operations(rli);
> +    error= do_before_row_operations(rgi, &copy_info, &write_record);

You create write_record on the stack and save it inside this->m_write_record
1. why not to have it permanently a part of Rows_log_event ?

Write_record takes around 64 bytes. I don't like the fact that all the data (table, thd, ...) will
be duplicated in Write_rows_event twice. Also I wanted to save the memory heap a little bit:
there can be many events simultaneously stored in memory scheduled for execution, and their number should have grown with parallel replication enabled (i believe).

However the stack is big enough for row events, so I thought about the idea to provide a stack buffer to row events to store arbitrary data. As there's no other data for the rest of the events at this moment, there's no overengineering, I just pass what I need, i.e. Write_record allocated on stack.

If this doesn't look nice to you, one compromise can be to specialize a Write_rows_event class template for IDEMPOTENT case, then at least for default (STRICT) slave_exec_mode.
 
2. if you'll keep it on the stack, please, reset m_write_record=0
before returning from ::do_apply_event(). And may be even do
m_write_record= &write_record here, not in ::do_before_row_operations()
it'll be a bit easier to see how a local variable is stored and then freed
and that it doesn't actually leave the scope.

Good point.
 
> +    copy_info->table_list= m_table->pos_in_table_list;
> +
> +    int (*callback)(void *, void*)= NULL;
> +    if (!get_flags(COMPLETE_ROWS_F))
> +    {
> +      /*
> +        If row is incomplete we will use the record found to fill
> +        missing columns.
> +      */
> +      callback= [](void *e, void* r)->int {
> +        auto rgi= static_cast<rpl_group_info*>(r);
> +        auto event= static_cast<Write_rows_log_event*>(e);
> +        return event->incomplete_record_callback(rgi);
> +      };
> +    }
> +    new (write_record) Write_record(thd, m_table, copy_info,
> +                                    m_vers_from_plain &&
> +                                    m_table->versioned(VERS_TIMESTAMP),
> +                                    m_table->triggers && do_invoke_trigger(),
> +                                    NULL, callback, this, rgi);

please, use write_record->init(...) here. And in all similar places below,

Sure, I can hide it inside init(). In  general I like constructors more that init functions: I think they are easier for data flow analysis tools (starting from simple -Wuninitialized). A good data flow analyzer should handle that well, though. 

 

> +int prepare_for_replace(TABLE *table, enum_duplicates handle_duplicates,
> +                        bool ignore)
> +{
> +  bool create_lookup_handler= handle_duplicates != DUP_ERROR;

why do you initialize create_lookup_handler here
and then immediately repeat it two lines below? :)

less confusing would be

  bool create_lookup_handler= handle_duplicates != DUP_ERROR || ignore;
  if (create_lookup_handler)
  {

yeah, thanks
 
> +  if (ignore || handle_duplicates != DUP_ERROR)
> +  {
> +    create_lookup_handler= true;
> +    table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);
> +    if (table->file->ha_table_flags() & HA_DUPLICATE_POS ||
> +        table->s->long_unique_table)
> +    {
> +      if (table->file->ha_rnd_init_with_error(false))
> +        return 1;
> +    }
> +  }
> +  return table->file->prepare_for_insert(create_lookup_handler);
> +}

~~~ 
> +int Write_record::locate_dup_record()
> +{
> +  handler *h= table->file;
> +  int error= 0;
> +  if (h->ha_table_flags() & HA_DUPLICATE_POS || h->lookup_errkey != (uint)-1)
> +  {
> +    DBUG_PRINT("info", ("Locating offending record using rnd_pos()"));
> +
> +    error= h->ha_rnd_pos(table->record[1], h->dup_ref);
> +    if (unlikely(error))
> +    {
> +      DBUG_PRINT("info", ("rnd_pos() returns error %d",error));
> +      h->print_error(error, MYF(0));
> +    }
> +  }
> +  else
> +  {
> +    DBUG_PRINT("info",
> +               ("Locating offending record using ha_index_read_idx_map"));

> +    if (h->lookup_handler)
> +      h= h->lookup_handler;

Why?

Yes, as we could see in the last bugfix version, this is not applicable (anymore). I tried to use lookup_handler here in a few ways, so that we could initialize its RND in prepare_for_replace.
None worked though. To call ha_update_row, handler should be positioned to the record.

One thing I didn't try:
In Write_record::replace_row() set:
if (h->lookup_handler)
  h= h->lookup_handler;
...
then call
locate_dup_record(h);
...
and then, in Write_record::replace_row again:
h->ha_update_row

Also, TABLE::delete_row will have to accept handler then.

why -- because handler::check_duplicate_long_entry_key calls
  lookup_handler->position(table->record[0]);

I should try it in the next sprint.


> -  DBUG_EXECUTE_IF("rpl_write_record_small_sleep_gtid_100_200",
> +    key_copy(key, table->record[0], table->key_info + key_nr, 0);
> +
> +    error= h->ha_index_read_idx_map(table->record[1], key_nr, key,
> +                                    HA_WHOLE_KEY, HA_READ_KEY_EXACT);

why did you revert commit 5e345281e3599c79 here?
 
 Definitely by a mistake, but why did the rpl test I added pass?
Will take a look at that.

> +
> +    if (incomplete_records_cb && (error= incomplete_records_cb(arg1, arg2)))

This callback is strange. I'd rather inherit Write_record_rpl
from Write_record and put rpl-specific logic there.

I thought about it, I didn't want to make the virtual functions for a simple enough structure that could stay pain.

I also tried to templatize replace_record, my approach didn't work well and looked more complicated.
 

Also you could avoid a switch on INSERT/REPLACE/ODKU if you'd use
inheritance instead. But the switch is less strange than the callback :)

In comparison to rpl case, here you'd have to make a factory:

static Write_record::make_write_record(...)
{
  if (info->duplicate == DUP_REPLACE)
    return new Replace_record(....)
  ....
}

this would also complicate my intention to store the object on stack:)

Also I try to leave the code comfortable for read for our old schoolers, at least in places where
they are used to be familiar with. It doesn't work well with refactorings, but at least a callback 
is more familiar, then a factory:)


By the way, I have a fresh thought about how to abstract the code from rpl with templates:



1. Create a trait declared outside Write_record, with a callback:
template <bool rpl>
class Write_record_trait
{
bool incomplete_records_cb(){} // todo: name it pretty, maybe just operator()?
};
1. Make Write_record a template as well and embody the trait instance:
template <bool rpl=false>
struct Write_record
{
  Write_record_trait<rpl> rpl_trait; // does nothing for default case
3. Specialize a trait for rpl:
template<> class Write_record_trait<true>
{
// rpl arguments and callback goes here.
}

4. in Write_record::replace_row():
rpl_trait.incomplete_records_cb(); // or whatever we call it.

Looks more complex in general, though. And add a trait. But removes a callback and void* arguments.


> +    if (unlikely(error) && error != HA_ERR_RECORD_IS_THE_SAME)
> +    {
> +      if (info->ignore && !table->file->is_fatal_error(error, HA_CHECK_ALL))

why not is_fatal_error(error) ?

I've been messing with right making is_fatal_error arguments correct in order to pass the tests.
I'll better return to it once again later to bring a better answer.

> +bool Write_record::is_fatal_error(int error)
> +{
> +  return error == HA_ERR_LOCK_DEADLOCK ||
> +         error == HA_ERR_LOCK_WAIT_TIMEOUT ||

These errors were only in rpl code, not anywhere in sql_insert.cc.
I don't think it's wrong to check for them everywhere consistently,
but does it change the behavior? Can you create a test case for INSERT
before your changes that comes to thd->is_fatal_error() with
HA_ERR_LOCK_DEADLOCK or HA_ERR_LOCK_WAIT_TIMEOUT? What will be the
behavior before and after your change?

I added these checks here to generalize the code, only.

The trait approach may help to remove it from the default case, though.

I guess there might be no way to get an error in non-rpl case. This common sense may be  wrong easily, though.
 
> +
> +  if (unlikely(key_nr == std::numeric_limits<decltype(key_nr)>::max()))

Well, the handler returns (uint)-1, please compare with it, instead of
inventing numerous other ways of archieving the same.

Oh, how do I not like this (uint)-1! It brings many conversion problems, especially when 
someone (like me 😇) decides to store the key in differently-sized variable.

And why did you decide to use ushort for the key number?
We pretty much never use ushort, key number is uint everywhere.
I agree that ushort is enough size-wise, but let's rather be consistent.

I wanted to fit all the operational variables in a single x86 cache line.
 

> +inline
> +void Write_record::notify_non_trans_table_modified()
> +{
>    if (!table->file->has_transactions_and_rollback())
>      thd->transaction->stmt.modified_non_trans_table= TRUE;

ok, as you like.
I'm personally not a great fan of one-liners that are used only once.

Dunno, looks quite organic for me. 

> --- a/sql/sql_parse.cc
> +++ b/sql/sql_parse.cc
> @@ -4814,7 +4815,8 @@ mysql_execute_command(THD *thd)
>                                      &lex->value_list,
>                                      lex->duplicates,
>                                      lex->ignore,
> -                                    result)))
> +                                    result,
> +                                    &write)))

Why do you need Write_record here on the stack?
Wouldn't it be more logical to have it in select_insert ?
 
The problem is that select_insert is declared in sql_class, which cannot include sql_insert (because it's vice-versa). So I could not embody it in select_insert.

As everywhere before, I wanted to avoid manual memory control, though I think saving an extra memory allocation here may be not so important (but I think it is so for mysql_insert).

--
Yours truly,
Nikita Malyavin