Re: [Maria-discuss] InnoDB: background jobs & fsyncs

15 Sep 2020

      Hi Julien,

First of all, is the version really 10.2.23, which was released in
March 2019? If you are actually using the latest 10.2.33 release, you
might be affected by MDEV-20638, which surprisingly introduced a
performance regression. Based on benchmarks, we reverted that change
from the upcoming 10.2, 10.3, and 10.4 releases (but not 10.5).

A significant source of background activity is the purge of version
history of old transactions. If SHOW ENGINE INNODB STATUS is reporting
a nonzero "History list length", some purge activity will be needed.
Before MariaDB 10.5, there was also a background merge of buffered
changes to secondary indexes.

I would suggest checking with http://poormansprofiler.org/ or "sudo
perf top" what is causing the I/O, and posting the stack traces. You
might even try attaching GDB to the server and setting a breakpoint on
fsync(), with breakpoint commands "finish" and "continue" so that you
can collect interesting stack traces.

Also, I would recommend you to try a later major version. The InnoDB
version in MariaDB 10.3 should scale better thanks to a lock-free hash
table for maintaining the set of active transactions. MariaDB 10.5
included further improvements. But, be aware that we are working on a
10.5 performance regression in page flushing, in MDEV-23399.

With best regards,

Marko Mäkelä

On Tue, Sep 15, 2020 at 1:25 PM Julien Muchembled <jm@jmuchemb.eu> wrote:
...
Hello,
We are evaluating new hardware by reproducing real-life workload on
real-life data which works fine on an existing server.
We copied over the software used to ensure an apple-to-apple
comparison, which is based on MariaDB 10.2.23 and uses primarily
InnoDB tables, with a few Mroonga tables which do not seem involved in
the problem.
On the new hardware, we are seeing catastrophically bad performance,
especially chain-deadlocks happening and being resolved but producing
virtually no useful work at our usual parallelism level of 64 active
connections.
If we reduce parallelism to 2 active connections, queries start to
succeed in any meaningful proportion.
The database was freshly restored from a mysqldump (as we cannot
interrupt the original database), could this have any effect on
deadlocks ?
The main tables involved in the deadlocking queries (job queues) are
initially (in the test, and periodically in reality) empty, so it would
seem surprising.
What should I check to debug further ?
Note that these deadlocks happen in production but they're negligible.
See attachment for mariadb configuration file.
Some more details:
When we restore from mysqldump, we use the following extra options to
speed things up:
  innodb_flush_log_at_trx_commit = 0
  innodb_flush_method = nosync
  innodb_doublewrite = 0
  sync_frm = 0
as during that period nothing of value can be lost (worst case we
restart the restore from scratch).
BTW, despite these settings, we are still noticing a lot of fsyncs. Is
this expected ? Are we missing some other option ?
Also, after the import and without any connection, MariaDB was still
producing a non-trivial amount of activity on the machine: 5% CPU, read
<1MB/s, write 10MB/s, 60 fsync/s.
I could not identify what is causing these, where should I look ? How
can I tell when it will stabilise back to idle ?
When I interrupt the benchmark workload, I see a similar resource usage.
I saw https://jira.mariadb.org/browse/MDEV-18698
And InnoDB seems to throttle its background activity: is there a way to
tell InnoDB to perform its background tasks at maximum speed ?
(the goal is that after the import, we can make a clean tarball that we
extract when we want to launch the test case again)
Regards,
Julien
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation