Hi, On 2024-12-03 12:21, Gordan Bobic wrote:
To achieve a good performance as before, we now use READ COMMITTED transaction isolation for DML and use REPEATABLE READ isolation for DQL on MariaDB 11.4. This way guarantees the consistency for our application.
Have you actually verified that your application needs repeatable-read? It's possible but in my experience it is extremely rare for applications to re-read the rows they already read. Not to say it never happens, but I have seen application implementations that actually needs this maybe twice out of thousands of applications I've worked on over the last 25 years.
Our queries are dynamic and we also have queries which depends on the result of earlier queries in the same transaction. On MariaDB 10.4 with innodb-locks-unsafe-for-binlog, we use REPEATABLE READ everywhere. Having a few carefully-checked exceptions to use READ COMMITTED would be fine, but would require more work than we can afford.
Issue 1: InnoDB history length keeps growing while running lots of DML and DQL queries in parallel.
InnoDB history length is stable around 5k on 10.4, but it keeps growing on 11.4 up to 100k until the end. In our experience, increases in history list length are roughly correlated with poor query performance (causing queries to "step over tombstones" ?), and we do our best to avoid having long-running transactions in order to allow older snapshots to be freed ASAP. We are satisfied with the status on 10.4, where history list is constant. What is the reason behind such increases on 11.4 and is there any configuration to avoid such ?
Have you been able to measure the performance impact of this? 100K isn't usually high enough to start causing a measurable performance degradation, that usually starts to happen north of 1M history length.
For the performance impact, I compared pt-query-digest of slow queries between : * 10.4 with READ COMMITTED and REPEATABLE READ * 11.4 with READ COMMITTED and REPEATABLE READ, with 2GB log file size where we had no stalls On 10.4, all slow queries are DML. On 11.4, we have fewer DML slow queries but we also have some DQL slow queries. 'DQL slow queries on 11.4 only' is a sign of performance impact of much more history length, I guess. (and 'fewer DML slow queries on 11.4' should come from various improvements on recent MariaDB). And this reference workload simulates a small part of what we do in production. So if the history length keeps increasing like this, it will reach 1M or more. I just tried READ COMMITTED everywhere on 11.4 and I had no such consistent increase of the history length (see the attachment). But it is not an acceptable solution and we anyway had no such issue on 10.4 with READ COMMITTED and REPEATABLE READ combination.
Issue 2: MariaDB stalls when free redo log is too small. ... This looks like the furious flushing regression that was introduced in 10.5 and doesn't appear to have been fixed since. It is possible to tune it out to a large extent, though: https://shatteredsilicon.net/mariadb-furious-flushing-fix-checkpoint-age/
Your article looks quite interesting and the symptom we had seems identical. And I tried the same configuration as you wrote there, innodb_adaptive_flushing = 0 innodb_adaptive_flushing_lwm = 50 innodb_max_dirty_pages_pct_lwm = 50 but the behaviour was still same, i.e. MariaDB stalled when free redo log is low. I can admit 128MB log file size is too small for 16GB buffer pool even though we had no issue on MariaDB 10.4, but MariaDB 11.4 still stalls with 1GB log file as well. Best regards, Kazuhiko