Hi Serg, I came upon this commit by you:
commit 6b685ea7b0776430d45b095cb4be3ef0739a3c04 Author: Sergei Golubchik <serg@mariadb.org> Date: Wed Sep 28 18:55:15 2022 +0200
correctness assert
thd_get_ha_data() can be used without a lock, but only from the current thd thread, when calling from anoher thread it *must* be protected by thd->LOCK_thd_data
* fix group commit code to take thd->LOCK_thd_data * remove innobase_close_connection() from the innodb background thread, it's not needed after 87775402cd0c and was failing the assert with current_thd==0
@@ -8512,7 +8512,11 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader) ++num_commits; if (current->cache_mngr->using_xa && likely(!current->error) && DBUG_EVALUATE_IF("skip_commit_ordered", 0, 1)) + { + mysql_mutex_lock(¤t->thd->LOCK_thd_data); run_commit_ordered(current->thd, current->all); + mysql_mutex_unlock(¤t->thd->LOCK_thd_data); + }
This seems _really_ expensive :-( This code runs for every transaction commit in replication, and it runs under a global LOCK_commit_ordered which needs to be held for as short as possible. The commit message doesn't mention anything about what goes wrong during the group commit code. And the patch doesn't have any test case that shows the problem, it just adds an assertion that will trigger during the group commit. So what is the actual reason for this change? Note that at this point, the group commit leader thread _is_ effectively acting as if it was each of the participating threads, the other threads are blocked from running during commit_ordered. If it is just so that we can have this assertion, then that needs to be rolled back. We _really_ don't want to take/release N mutexes in release builds during LOCK_commit_ordered, just to have an assertion in debug builds. If there is some reason, then the correct fix could be to set current_thd for the duration of run_commit_ordered, which will satisfy the assertion and presumably the actual bug? Well, it depends _what_ the actual reason/bug is. - Kristian.