Re: 9c52eede2c8: MDEV-31646 Online alter applies binlog cache limit to cache writes
Hi, Nikita, On Aug 14, Nikita Malyavin wrote:
revision-id: 9c52eede2c8 (mariadb-11.0.1-175-g9c52eede2c8) parent(s): 209c894b431 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-11 16:29:37 +0400 message:
MDEV-31646 Online alter applies binlog cache limit to cache writes
1. Make online disk writes unlimited, same as filesort does. 2. Make proper error handling -- in 32-bit build IO_CACHE capacity limit is 4GB, so it is quite possible to overfill there. 3. Event_log::write_cache complicated with event reparsing, and as it was proven by QA, contains some mistakes. Rewrite introbuce a simpler and much faster version, not featuring reparsing and therefore copying a whole buffer at once. This also disables checksums and crypto.
As a result, online alter is untied of several binlog variables, which was a second aim of this patch.
diff --git a/mysql-test/main/alter_table_online_debug.result b/mysql-test/main/alter_table_online_debug.result index 021267a15f5..3a615372ed5 100644 --- a/mysql-test/main/alter_table_online_debug.result +++ b/mysql-test/main/alter_table_online_debug.result @@ -409,23 +409,23 @@ set debug_sync= 'now SIGNAL start_replication'; set debug_sync= 'now WAIT_FOR applied'; select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con; stage progress -3 51.220 +3 53.390
why is that?
set debug_sync= 'now SIGNAL proceed WAIT_FOR applied'; select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con; stage progress diff --git a/sql/log.cc b/sql/log.cc index d9e81700463..8c57487d57e 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -2286,13 +2286,21 @@ int binlog_log_row_online_alter(TABLE* table, const uchar *before_record, MY_BITMAP *old_rpl_write_set= table->rpl_write_set; table->rpl_write_set= &table->s->all_set;
+ table->online_alter_cache->store_prev_position(); int error= (*log_func)(thd, table, table->s->online_alter_binlog, - table->online_alter_cache, true, + table->online_alter_cache, + table->file->has_transactions_and_rollback(), before_record, after_record);
table->rpl_write_set= old_rpl_write_set;
- return unlikely(error) ? HA_ERR_RBR_LOGGING_FAILED : 0; + if (unlikely(error)) + { + table->online_alter_cache->restore_prev_position();
what is this for?
+ return HA_ERR_RBR_LOGGING_FAILED; + } + + return 0; }
static void @@ -7748,7 +7763,7 @@ static int binlog_online_alter_end_trans(THD *thd, bool all, bool commit) { DBUG_ASSERT(cache.cache_log.type != READ_CACHE); mysql_mutex_lock(binlog->get_log_lock()); - error= binlog->write_cache(thd, &cache.cache_log); + error= binlog->write_cache_raw(thd, &cache.cache_log); mysql_mutex_unlock(binlog->get_log_lock()); } }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi! On Tue, 15 Aug 2023 at 00:00, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
On Aug 14, Nikita Malyavin wrote:
revision-id: 9c52eede2c8 (mariadb-11.0.1-175-g9c52eede2c8) parent(s): 209c894b431 author: Nikita Malyavin committer: Nikita Malyavin timestamp: 2023-08-11 16:29:37 +0400 message:
MDEV-31646 Online alter applies binlog cache limit to cache writes
1. Make online disk writes unlimited, same as filesort does. 2. Make proper error handling -- in 32-bit build IO_CACHE capacity limit is 4GB, so it is quite possible to overfill there. 3. Event_log::write_cache complicated with event reparsing, and as it was proven by QA, contains some mistakes. Rewrite introbuce a simpler and much faster version, not featuring reparsing and therefore copying a whole buffer at once. This also disables checksums and crypto.
As a result, online alter is untied of several binlog variables, which was a second aim of this patch.
diff --git a/mysql-test/main/alter_table_online_debug.result b/mysql-test/main/alter_table_online_debug.result index 021267a15f5..3a615372ed5 100644 --- a/mysql-test/main/alter_table_online_debug.result +++ b/mysql-test/main/alter_table_online_debug.result @@ -409,23 +409,23 @@ set debug_sync= 'now SIGNAL start_replication'; set debug_sync= 'now WAIT_FOR applied'; select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con; stage progress -3 51.220 +3 53.390
why is that?
I think it's because it doesn't write the checksum anymore. binlog->write_cache had many weirdnesses.
set debug_sync= 'now SIGNAL proceed WAIT_FOR applied'; select stage, progress from INFORMATION_SCHEMA.PROCESSLIST where id = @con; stage progress diff --git a/sql/log.cc b/sql/log.cc index d9e81700463..8c57487d57e 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -2286,13 +2286,21 @@ int binlog_log_row_online_alter(TABLE* table, const uchar *before_record, MY_BITMAP *old_rpl_write_set= table->rpl_write_set; table->rpl_write_set= &table->s->all_set;
+ table->online_alter_cache->store_prev_position(); int error= (*log_func)(thd, table, table->s->online_alter_binlog, - table->online_alter_cache, true, + table->online_alter_cache, + table->file->has_transactions_and_rollback(), before_record, after_record);
table->rpl_write_set= old_rpl_write_set;
- return unlikely(error) ? HA_ERR_RBR_LOGGING_FAILED : 0; + if (unlikely(error)) + { + table->online_alter_cache->restore_prev_position();
what is this for?
Sorry, the additional comment was lost during squashing 1fa0e86f3cc: * Handle read_log_event errors correctly: error returned is -1 (eof signal for alter table), and my_error is not called. Call my_error and always return 1. There's no test for this, since it shouldn't happen, see the next bullet. * An event could be written partially in case of error, if it's bigger than the IO_CACHE buffer. Restore the position where it was before the error was emitted. So, if the event is too big, it could be written partially. As you may see by the changes in sql_table.cc, i had this caught from the both ends while testing:)
+ return HA_ERR_RBR_LOGGING_FAILED; + } + + return 0; }
static void @@ -7748,7 +7763,7 @@ static int binlog_online_alter_end_trans(THD *thd, bool all, bool commit) { DBUG_ASSERT(cache.cache_log.type != READ_CACHE); mysql_mutex_lock(binlog->get_log_lock()); - error= binlog->write_cache(thd, &cache.cache_log); + error= binlog->write_cache_raw(thd, &cache.cache_log); mysql_mutex_unlock(binlog->get_log_lock()); } }
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
-- Yours truly, Nikita Malyavin
participants (2)
-
Nikita Malyavin
-
Sergei Golubchik