On Tue, Dec 3, 2024 at 12:40 PM Kazuhiko via discuss <discuss@lists.mariadb.org> wrote:
Hello,
Our application has a sustained rate of DML queries in parallel with DQL. We can achieve good performance using innodb-locks-unsafe-for-binlog on MariaDB 10.4, with careful consideration of its 'unsafeness'. But innodb-locks-unsafe-for-binlog is removed in MariaDB 10.5 or later and our application runs much slower without it because of gap locking. https://mariadb.com/docs/server/ref/mdb/cli/mariadbd/innodb-locks-unsafe-for...
To achieve a good performance as before, we now use READ COMMITTED transaction isolation for DML and use REPEATABLE READ isolation for DQL on MariaDB 11.4. This way guarantees the consistency for our application.
Have you actually verified that your application needs repeatable-read? It's possible but in my experience it is extremely rare for applications to re-read the rows they already read. Not to say it never happens, but I have seen application implementations that actually needs this maybe twice out of thousands of applications I've worked on over the last 25 years.
Our database contains totally 13.7G rows and uses 4.5TB storage. Here is the configuration :
innodb_buffer_pool_size = 17179869184 (16GB) innodb_buffer_pool_instances = 16 (no meaning in 11.4) innodb_log_file_size = 134217728 (128MB) innodb_flush_method = O_DIRECT
Please see the attached graphs drawing 'InnoDB history length (right axis)' and 'free redo log in % (left axis)' (i.e. 100 - Redo Log Occupancy) while running our reference workload. https://mariadb.com/kb/en/innodb-redo-log/#determining-the-redo-log-occupanc... These tests are done with the same workload, but only MariaDB version and/or configuration is different. (Please disregard the 'warehouse history length' and 'activities history length' series in those plots, they are not relevant to the current issue.)
On MariaDB 10.4, the performance and the behaviour between 'using innodb-locks-unsafe-for-binlog' and 'using READ COMMITTED and REPEATABLE READ' are similar. But on MariaDB 11.4, I found two serious issues.
Issue 1: InnoDB history length keeps growing while running lots of DML and DQL queries in parallel.
InnoDB history length is stable around 5k on 10.4, but it keeps growing on 11.4 up to 100k until the end. In our experience, increases in history list length are roughly correlated with poor query performance (causing queries to "step over tombstones" ?), and we do our best to avoid having long-running transactions in order to allow older snapshots to be freed ASAP. We are satisfied with the status on 10.4, where history list is constant. What is the reason behind such increases on 11.4 and is there any configuration to avoid such ?
Have you been able to measure the performance impact of this? 100K isn't usually high enough to start causing a measurable performance degradation, that usually starts to happen north of 1M history length.
Issue 2: MariaDB stalls when free redo log is too small.
When MariaDB stalls, even a quite simple query takes too long.
(table definition) CREATE TABLE `catalog` ( `uid` bigint(20) unsigned NOT NULL, `path` varchar(255) NOT NULL DEFAULT '', ... PRIMARY KEY (`uid`), KEY `Path` (`path`), ... )
(from pt-query-digest output) # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 2 91 # Exec time 2 716s 1s 17s 8s 15s 5s 7s # Lock time 0 22ms 156us 491us 238us 366us 70us 204us # Rows sent 0 663 3 10 7.29 8.91 1.09 6.98 # Rows examine 0 663 3 10 7.29 8.91 1.09 6.98 # Rows affecte 0 0 0 0 0 0 0 0 # Bytes sent 0 52.10k 330 749 586.24 685.39 68.46 563.87 # Query size 0 470.83k 3.56k 5.26k 5.17k 5.20k 225.56 5.20k SELECT uid, path FROM catalog WHERE path in ('...', /*... omitted 80 items ...*/)\G (here, these SELECT are issued in READ COMMITTED isolation level)
Our understanding is that the stalls are caused by the undo log cleanup job, and/or the innodb buffer pool writeback job. Increasing the undo log size a lot seems to reduce the frequency of those stalls. For our reference workload here, we still have stalls with 1GB log file but we don't have stalls with 2GB or more log file. If we need 'big enough' log file size to avoid stalls, will the ideal size be proportional to the amount of data or proportional to the buffer pool size ? We have never had such stall on MariaDB 10.4 even with small 128MB log file. How this difference happen and is there any configuration to improve the behaviour ?
This looks like the furious flushing regression that was introduced in 10.5 and doesn't appear to have been fixed since. It is possible to tune it out to a large extent, though: https://shatteredsilicon.net/mariadb-furious-flushing-fix-checkpoint-age/