On Wed, Dec 4, 2024 at 6:23 PM <kazuhiko@nexedi.com> wrote:
Issue 1: InnoDB history length keeps growing while running lots of DML and DQL queries in parallel.
InnoDB history length is stable around 5k on 10.4, but it keeps growing on 11.4 up to 100k until the end. In our experience, increases in history list length are roughly correlated with poor query performance (causing queries to "step over tombstones" ?), and we do our best to avoid having long-running transactions in order to allow older snapshots to be freed ASAP. We are satisfied with the status on 10.4, where history list is constant. What is the reason behind such increases on 11.4 and is there any configuration to avoid such ?
Have you been able to measure the performance impact of this? 100K isn't usually high enough to start causing a measurable performance degradation, that usually starts to happen north of 1M history length.
For the performance impact, I compared pt-query-digest of slow queries between : * 10.4 with READ COMMITTED and REPEATABLE READ * 11.4 with READ COMMITTED and REPEATABLE READ, with 2GB log file size where we had no stalls
On 10.4, all slow queries are DML. On 11.4, we have fewer DML slow queries but we also have some DQL slow queries. 'DQL slow queries on 11.4 only' is a sign of performance impact of much more history length, I guess. (and 'fewer DML slow queries on 11.4' should come from various improvements on recent MariaDB).
And this reference workload simulates a small part of what we do in production. So if the history length keeps increasing like this, it will reach 1M or more.
I just tried READ COMMITTED everywhere on 11.4 and I had no such consistent increase of the history length (see the attachment). But it is not an acceptable solution and we anyway had no such issue on 10.4 with READ COMMITTED and REPEATABLE READ combination.
I suspect you may be out of luck on that point, then. You could try bumping innodb_purge_threads, but that will only help flush the history out faster when it is freed, if the rows are still held by open transactions, there's not much it can do. FWIW, I am actively considering taking on post-EOL maintenance of 10.4.x and backporting applicable 10.5.x patches to it. Reason being performance anomalies similar to the ones you have bumped into.
Issue 2: MariaDB stalls when free redo log is too small. ... This looks like the furious flushing regression that was introduced in 10.5 and doesn't appear to have been fixed since. It is possible to tune it out to a large extent, though: https://shatteredsilicon.net/mariadb-furious-flushing-fix-checkpoint-age/
Your article looks quite interesting and the symptom we had seems identical. And I tried the same configuration as you wrote there, innodb_adaptive_flushing = 0 innodb_adaptive_flushing_lwm = 50 innodb_max_dirty_pages_pct_lwm = 50 but the behaviour was still same, i.e. MariaDB stalled when free redo log is low.
I can admit 128MB log file size is too small for 16GB buffer pool even though we had no issue on MariaDB 10.4, but MariaDB 11.4 still stalls with 1GB log file as well.
Contrary to widely popular belief that redo log size is supposed to be in any meaningful way relative in size to the buffer pool size, the two have no common basis. Your redo log needs to be large enough to: 1) absorb short bursts of writes 2) be large enough to make sure it doesn't roll over during your mariabackup process The thing to look for is sawtooth pattern on your checkpoint age (you'll need monitoring tools that show that, like PMM or a fork of it I'm working on called SSM). The settings I mentioned cure that for me, and with that cured, you won't get stalls because the log won't fill up. If your log is filling up anyway because the burst of writes is too large (checkpoint age graph goes flat at the top, rather than sawtooth pattern), you will probably need to increase some combination of: innodb_io_capacity_max: I normally set this to between 2/3 and 3/4 of what your disks are capable of delivering with innodb_page_size random writes. innodb_io_capacity: Half of innodb_io_capacity_max is usually a good first pass estimate. innodb_write_io_threds: Bump this if your iops are struggling to touch innodb_io_capacity_max under load and this is making checkpoint age high.. But if that doesn't get your checkpoint age back under control, you may just have to increase the redo log size and see if that makes the problem go away.