On Tue, 15 Aug 2023 at 00:00, Sergei Golubchik <serg@mariadb.org> wrote:

Hi, Nikita,

On Aug 14, Nikita Malyavin wrote:
> revision-id: 9c52eede2c8 (mariadb-11.0.1-175-g9c52eede2c8)
> parent(s): 209c894b431
> author: Nikita Malyavin
> committer: Nikita Malyavin
> timestamp: 2023-08-11 16:29:37 +0400
> message:
>
> MDEV-31646 Online alter applies binlog cache limit to cache writes
>
> 1. Make online disk writes unlimited, same as filesort does.
> 2. Make proper error handling -- in 32-bit build IO_CACHE capacity limit is
> 4GB, so it is quite possible to overfill there.
> 3. Event_log::write_cache complicated with event reparsing, and as it was
> proven by QA, contains some mistakes. Rewrite introbuce a simpler and much
> faster version, not featuring reparsing and therefore copying a whole
> buffer at once. This also disables checksums and crypto.
>
> As a result, online alter is untied of several binlog variables, which was
> a second aim of this patch.
>
> diff --git a/mysql-test/main/alter_table_online_debug.result b/mysql-test/main/alter_table_online_debug.result
> index 021267a15f5..3a615372ed5 100644
> --- a/mysql-test/main/alter_table_online_debug.result
> +++ b/mysql-test/main/alter_table_online_debug.result
> @@ -409,23 +409,23 @@ set debug_sync= 'now SIGNAL start_replication';
> set debug_sync= 'now WAIT_FOR applied';
> select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con;
> stage progress
> -3 51.220
> +3 53.390

why is that?

I think it's because it doesn't write the checksum anymore. binlog->write_cache had many weirdnesses.

> set debug_sync= 'now SIGNAL proceed WAIT_FOR applied';
> select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con;
> stage progress
> diff --git a/sql/log.cc b/sql/log.cc
> index d9e81700463..8c57487d57e 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2286,13 +2286,21 @@ int binlog_log_row_online_alter(TABLE* table, const uchar *before_record,
> MY_BITMAP *old_rpl_write_set= table->rpl_write_set;
> table->rpl_write_set= &table->s->all_set;
>
> + table->online_alter_cache->store_prev_position();
> int error= (*log_func)(thd, table, table->s->online_alter_binlog,
> - table->online_alter_cache, true,
> + table->online_alter_cache,
> + table->file->has_transactions_and_rollback(),
> before_record, after_record);
>
> table->rpl_write_set= old_rpl_write_set;
>
> - return unlikely(error) ? HA_ERR_RBR_LOGGING_FAILED : 0;
> + if (unlikely(error))
> + {
> + table->online_alter_cache->restore_prev_position();

what is this for?

Sorry, the additional comment was lost during squashing 1fa0e86f3cc:

* Handle read_log_event errors correctly: error returned is -1 (eof signal

for alter table), and my_error is not called. Call my_error and always
return 1. There's no test for this, since it shouldn't happen, see the
next bullet.
* An event could be written partially in case of error, if it's bigger than
the IO_CACHE buffer. Restore the position where it was before the error
was emitted.

So, if the event is too big, it could be written partially.

As you may see by the changes in sql_table.cc, i had this caught from the both ends while testing:)

> + return HA_ERR_RBR_LOGGING_FAILED;
> + }
> +
> + return 0;
> }
>
> static void
> @@ -7748,7 +7763,7 @@ static int binlog_online_alter_end_trans(THD *thd, bool all, bool commit)
> {
> DBUG_ASSERT(cache.cache_log.type != READ_CACHE);
> mysql_mutex_lock(binlog->get_log_lock());
> - error= binlog->write_cache(thd, &cache.cache_log);
> + error= binlog->write_cache_raw(thd, &cache.cache_log);
> mysql_mutex_unlock(binlog->get_log_lock());
> }
> }

Regards,
Sergei
VP of MariaDB Server Engineering
and security@mariadb.org