Hi Julien, First of all, is the version really 10.2.23, which was released in March 2019? If you are actually using the latest 10.2.33 release, you might be affected by MDEV-20638, which surprisingly introduced a performance regression. Based on benchmarks, we reverted that change from the upcoming 10.2, 10.3, and 10.4 releases (but not 10.5). A significant source of background activity is the purge of version history of old transactions. If SHOW ENGINE INNODB STATUS is reporting a nonzero "History list length", some purge activity will be needed. Before MariaDB 10.5, there was also a background merge of buffered changes to secondary indexes. I would suggest checking with http://poormansprofiler.org/ or "sudo perf top" what is causing the I/O, and posting the stack traces. You might even try attaching GDB to the server and setting a breakpoint on fsync(), with breakpoint commands "finish" and "continue" so that you can collect interesting stack traces. Also, I would recommend you to try a later major version. The InnoDB version in MariaDB 10.3 should scale better thanks to a lock-free hash table for maintaining the set of active transactions. MariaDB 10.5 included further improvements. But, be aware that we are working on a 10.5 performance regression in page flushing, in MDEV-23399. With best regards, Marko Mäkelä On Tue, Sep 15, 2020 at 1:25 PM Julien Muchembled <jm@jmuchemb.eu> wrote:
Hello,
We are evaluating new hardware by reproducing real-life workload on real-life data which works fine on an existing server. We copied over the software used to ensure an apple-to-apple comparison, which is based on MariaDB 10.2.23 and uses primarily InnoDB tables, with a few Mroonga tables which do not seem involved in the problem.
On the new hardware, we are seeing catastrophically bad performance, especially chain-deadlocks happening and being resolved but producing virtually no useful work at our usual parallelism level of 64 active connections.
If we reduce parallelism to 2 active connections, queries start to succeed in any meaningful proportion.
The database was freshly restored from a mysqldump (as we cannot interrupt the original database), could this have any effect on deadlocks ? The main tables involved in the deadlocking queries (job queues) are initially (in the test, and periodically in reality) empty, so it would seem surprising.
What should I check to debug further ?
Note that these deadlocks happen in production but they're negligible.
See attachment for mariadb configuration file.
Some more details: When we restore from mysqldump, we use the following extra options to speed things up: innodb_flush_log_at_trx_commit = 0 innodb_flush_method = nosync innodb_doublewrite = 0 sync_frm = 0 as during that period nothing of value can be lost (worst case we restart the restore from scratch). BTW, despite these settings, we are still noticing a lot of fsyncs. Is this expected ? Are we missing some other option ?
Also, after the import and without any connection, MariaDB was still producing a non-trivial amount of activity on the machine: 5% CPU, read <1MB/s, write 10MB/s, 60 fsync/s. I could not identify what is causing these, where should I look ? How can I tell when it will stabilise back to idle ? When I interrupt the benchmark workload, I see a similar resource usage.
I saw https://jira.mariadb.org/browse/MDEV-18698 And InnoDB seems to throttle its background activity: is there a way to tell InnoDB to perform its background tasks at maximum speed ? (the goal is that after the import, we can make a clean tarball that we extract when we want to launch the test case again)
Regards, Julien _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation