Thanks for the review, Kristian!

I always appreciate learning the history of why some things exist as they are :) 

And thanks for the reference to the comment explaining why the master thread_id
is logged. Reading the whole comment really solidified it. I'll incorporate your
feedback.

Brandon

On Mon, Jan 29, 2024 at 2:56 PM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Oh, and another thing (which Monty pointed out in the MDEV-7850):

When GTID is binlogged on the slave (--log-slave-updates), the thread id
should be preserved from the original event (the original master's
thread_id). See this comment in log_event_server.cc:

    "To avoid this, we log the thread's thread id EXCEPT for the SQL
    slave thread for which we log the original (master's) thread id."

So this needs to be changed, so that the thread id gets set from the user query
on the original master and then propagates through the levels of the
replication topology.

This way, the value will be consistent with the Query_event (historically,
there used to be a BEGIN query event in place of the GTID event, which
provided the thread id for row events; MDEV-7850 is the desire to have that
back).

 - Kristian.

Kristian Nielsen via developers <developers@lists.mariadb.org> writes:

>> commit c37b2087b4abe576f1b0391c8d379dba6299dcb5 (origin/bb-11.4-MDEV-7850)
>> Author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
>> Date:   Mon Jul 10 18:53:19 2023 +0300
>>
>>     MDEV-7850: Extend GTID Binlog Events with Thread Id
>>     
>>     This patch augments Gtid_log_event with the user thread-id.
>>     In particular that compensates for the loss of this info in
>>     Rows_log_events.
>
> Here's my review of this patch (which was already pushed, but there are some
> misunderstandings that should be fixed; Monty asked me about it as he had
> been told that I had reviewed the patch, but that is not the case (until
> now)):
>
>> diff --git a/sql/log_event.cc b/sql/log_event.cc
>> index 41ddc038594..f167284ad1d 100644
>> --- a/sql/log_event.cc
>> +++ b/sql/log_event.cc
>
>> @@ -1901,11 +1904,12 @@ Query_log_event::begin_event(String *packet, ulong ev_offset,
>>    /*
>>      Currently we only need to replace GTID event.
>>      The length of GTID differs depending on whether it contains commit id.
>> +    And/or thread id.
>>    */
>> -  DBUG_ASSERT(data_len == LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN ||
>> -              data_len == LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2);
>> -  if (data_len != LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN &&
>> -      data_len != LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2)
>> +  DBUG_ASSERT(data_len >= LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN &&
>> +              data_len <= LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2 + 9);
>> +  if (data_len < LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN ||
>> +      data_len > LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2 + 9)
>
> This change of the asserts/checks doesn't make any sense.
> They were there originally, because we are hand-crafting a BEGIN query
> event, and there were only two cases of the length, so two fixed-sized
> events were crafted by two different code paths. And the asserts are there
> to safe-guard if the length would change.
>
> But now this check was already wrong, as more optional info has been added
> to the GTID event without updating the code for
> Query_log_event::begin_event() (XID, flags_extra, extra_engines).
>
> But the new code in this patch is generic and can handle any length GTID
> event, so these checks no longer make any sense, and should just be removed
> (rather than attempt to correct them).
>
>> @@ -1926,9 +1930,19 @@ Query_log_event::begin_event(String *packet, ulong ev_offset,
>>    }
>>    else
>>    {
>> -    DBUG_ASSERT(data_len == LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2);
>> -    /* Put in an empty time_zone_str to take up the extra 2 bytes. */
>> -    int2store(q + Q_STATUS_VARS_LEN_OFFSET, 2);
>
>> +    DBUG_ASSERT(data_len <= LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 11);
>
> This assert should not be there. The length can be larger than 11, and the
> code below works fine for arbitrary length.
>
>> +    DBUG_ASSERT(data_len >= LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2);
>> +
>> +    /* Put in an empty time_zone_str to take up the extra 2 plus the number of
>> +       dummy_bytes. */
>> +    size_t dummy_bytes=
>> +        data_len - (LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2);
>> +
>> +    int2store(q + Q_STATUS_VARS_LEN_OFFSET, dummy_bytes + 2);
>> +    for (size_t i= 0; i < dummy_bytes; i++)
>> +      q[Q_DATA_OFFSET + i]= Q_DUMMY;
>> +    q+= dummy_bytes;
>> +
>>      q[Q_DATA_OFFSET]= Q_TIME_ZONE_CODE;
>>      q[Q_DATA_OFFSET+1]= 0;           /* Zero length for empty time_zone_str */
>>      q[Q_DATA_OFFSET+2]= 0;                  /* Zero terminator for empty db */
>
> This new code with Q_DUMMY is generic and independent of the length of the
> GTID event.
>
> That's good!
>
> But it needs a comment explaining (that the Q_DUMMY is unknown to the old
> slave, which will make the old slave just skip the rest of the user vars,
> which are just there to fill up the extra bytes).
>
> And now the code that adds the dummy empty time_zone_str is redundant, and
> should just be removed, as should the code for the case of data_len ==
> LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN. Since now any extra bytes in the
> GTID can just be replaced by Q_DUMMY.
>
> This way, the code will be fixed to work for all the possible GTID lengths,
> and be shorter and simpler (and a lot less confusing).
>
>
>> @@ -1584,6 +1584,9 @@ Query_log_event::Query_log_event(const uchar *buf, uint event_len,
>>          sa_seq_no = uint8korr(pos);
>>          pos+= 8;
>>        }
>> +    }
>> +    case Q_DUMMY:
>> +    {
>>        break;
>>      }
>>      default:
>>        /* That's why you must write status vars in growing order of code */
>>        DBUG_PRINT("info",("Query_log_event has unknown status vars (first has code: %u), skipping the rest of them", (uint) *(pos-1)));
>>        pos= (const uchar*) end;                         // Break loop
>
> There's no point in having this code. Q_DUMMY is supposed to be higher than
> any known status var type, so it will be ignored anyway in the `default` case.
> And besides, server that knows about Q_DUMMY can never receive it in the
> first place.
>
> (But if this is really wanted, then don't delete the `break` statement in
> the prior switch case, that's likely to introduce a bug the next time a case
> is added. And assign Q_DUMMY the next free value, not the value 255).
>
>
>> diff --git a/sql/log_event.h b/sql/log_event.h
>> index 473506fafed..d324642ad5e 100644
>> --- a/sql/log_event.h
>> +++ b/sql/log_event.h
>> @@ -324,6 +324,7 @@ class String;
>> 
>>  #define Q_HRNOW 128
>>  #define Q_XID   129
>> +#define Q_DUMMY 255
>
> This needs a good comment, explaining how Q_DUMMY is a code that will be
> unknown to any server version, and can be used to make the slave skip this
> (and any following) user vars.
>
>
> diff --git a/mysql-test/suite/rpl/t/rpl_gtid_thread_id.test
> b/mysql-test/suite/rpl/t/rpl_gtid_thread_id.test
> new file mode 100644
> index 00000000000..b8ed1865da9
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_gtid_thread_id.test
> @@ -0,0 +1,125 @@
> +#
> +#   Verify that GTID log events are written into the binary log along with the
> +# id of the thread which executed the transaction. On the primary, this is the
> <snip>
>
> Disclaimer: I did not review this testcase.
>
>  - Kristian.
> _______________________________________________
> developers mailing list -- developers@lists.mariadb.org
> To unsubscribe send an email to developers-leave@lists.mariadb.org