Hi Kristian, On Tue, Jul 11, 2023 at 1:35 AM Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Hi Marko,
I'd like to ask you to review the below patch for MDEV-31655.
https://jira.mariadb.org/browse/MDEV-31655 https://github.com/MariaDB/server/commit/cb06612a9da09a7981ada84768793f2ff3e...
I am sorry for the delay; last month I was mostly on vacation. I sent some minor comments that apply to the InnoDB code changes. I think that this needs to be reviewed by the replication team as well. Whoever reviews that should pay attention to the stability of the tests. We have a few replication tests that fail rather often.
This patch restores code in InnoDB that was originally part of the parallel replication implementation. It makes InnoDB choose the right victim when two transactions in parallel replication deadlock with each other. The code was erroneously removed as part of the VATS work in InnoDB. I saw that you later deprecated and eventually removed the VATS code.
As far as I can tell, innodb_lock_schedule_algorithm=VATS consistently reproduced debug assertion failures during stress testing. It was finally removed from 10.6. We might introduce something similar later, provided that it passes tests. In MySQL 8.0.20, https://dev.mysql.com/worklog/task/?id=13468 reimplemented this as CATS (Contention-Aware Transaction Scheduling).
I saw that the InnoDB deadlock code is substantially changed in 10.6, so I will need to rework the patch in that version. But I wanted to ask your opinion about this 10.4 version first.
I must trust your judgement here; I have no fundamental objection. For better testing of this, I think that we really need a 10.6 based branch that contains both this and MDEV-31482.
In the patch, note in particular the modification to has_higher_priority(), which I think has the effect of choosing which of two waiting transactions to grant a lock to. I don't remember adding this myself, so it might have been added as part of VATS work. But it sounds appropriate; if T1 and T2 are both waiting for a lock, there is no point in granting it to T2 over T1, as then T2 will just be immediately killed and rolled back by replication.
Because innodb_lock_schedule_algorithm=VATS is known to be broken even without any involvement of replication, I do not think that we have to care too much about it. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc