Hi, Kristian, On Oct 20, Kristian Nielsen wrote:
Hi Serg,
I came upon this commit by you:
commit 6b685ea7b0776430d45b095cb4be3ef0739a3c04 Author: Sergei Golubchik <serg@mariadb.org> Date: Wed Sep 28 18:55:15 2022 +0200
correctness assert
thd_get_ha_data() can be used without a lock, but only from the current thd thread, when calling from anoher thread it *must* be protected by thd->LOCK_thd_data
* fix group commit code to take thd->LOCK_thd_data * remove innobase_close_connection() from the innodb background thread, it's not needed after 87775402cd0c and was failing the assert with current_thd==0
@@ -8512,7 +8512,11 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader) ++num_commits; if (current->cache_mngr->using_xa && likely(!current->error) && DBUG_EVALUATE_IF("skip_commit_ordered", 0, 1)) + { + mysql_mutex_lock(¤t->thd->LOCK_thd_data); run_commit_ordered(current->thd, current->all); + mysql_mutex_unlock(¤t->thd->LOCK_thd_data); + }
This seems _really_ expensive :-( This code runs for every transaction commit in replication, and it runs under a global LOCK_commit_ordered which needs to be held for as short as possible.
Monty asserted many times that taking an uncontended mutex is cheap. Did you think it's expensive or did you benchmark that?
So what is the actual reason for this change?
This is just a protocol for accessing mostly-read-only thread-local variable. Either we can protect all accesses with a mutex. Which will heavily penalize the thread owner of the variable. This commit enforced a different protocol: the owner can read the value without a mutex, and only take the mutex when modifying it (which is infequent). A different thread can only read the variable and only with a mutex (this is also very infequent). I don't remember what code was violating it, likely something wsrep-specific.
Note that at this point, the group commit leader thread _is_ effectively acting as if it was each of the participating threads, the other threads are blocked from running during commit_ordered.
I'm not sure it's good enough. Practically it might work, but if one thread has modified some variable under a mutex, you cannot simply read it from another thread without a mutex, there's no guarantee that another thread will see the new value. You'll need an appropriate memory barrier for that.
If there is some reason, then the correct fix could be to set current_thd for the duration of run_commit_ordered, which will satisfy the assertion and presumably the actual bug?
Setting current_thd is cheating, it will trick the assert, but won't fix the underlying problem as outlined above. Regards, Sergei Chief Architect, MariaDB Server and security@mariadb.org