Hi Jure, maybe you can try using toku engine. I noticed that with large keycaches syncing it to disk can sometimes take up to a minute for myisam tables. Also innodb is generally heavy on I/O. Toku on the other hand is heavier on CPU/memory but seems very light on I/O (from my experience), so maybe that would help... Rocksdb should be light on random I/O access as it's using LSM, haven't used it in production yet and not sure how mature it is but may work...

You can try to disable binlogs/fsync almost altogether if you don't care about data consistency, it's very unrecommended. And if you're using replication skip the 3'rd line because it'll break it, if you're not using replication remove _xxx.
SET GLOBAL innodb_flush_log_at_trx_commit = 0;
SET GLOBAL sync_binlog = 0;
SET GLOBAL log_bin_xxx = 0;

But i believe SAN is really the issue. Benchmark this SAN (throughput / iops)... if not conclusive try running from local SSD on raid10 or something similar... raid5/6 works horribly with SSDs and databases in general due to trim support and write amplification (not sure what raid you're using)...


For kernel's pagecache/fsync comment, i think it's working as expected and you're missing write-behind not OS file cache... I guess without fsync it'd be totally impossible to have the data in consistent state after a crash, so you could not only loose couple of second's worth of data but whole table, as writes could be postponed and re-ordered, especially on SSDs...

So for example if you'd do copy-on-write to make changes atomic, the pointer to data block could be written before the data block is (fully) there on disk and you'd get a structure pointing to not-existing or random data. To be sure you're having anything which is making sense in your files you'd need to 
1. write your data
2. ensure it's on the device in full (fsync)
3. atomically change the data pointer from OLD to NEW data, which is probably using assumption that block-aligned writes below 512b should be atomic, or some log-like structure
Maria is probably doing it little differently, but concept is for sure similar if you don't want your software to be bound to some single, specific, probably very expensive and propertialy mix of software and hardware which can support some basic atomic I/O.

For IO over network... actually had a huge issues even with running simple readonly sqlite over network-attached I/O via NFS. I believe such kind of setup when your throughput is high will at least need dedicated 10Gbps network link. And still SAN/NAS can have "normal" throughput on 10Gb, but latency will probably be much higher (so slower fsync's). I'd try to move this SQL server outside of VM and to local storage.

Best,
Slawomir.

Dnia 7 września 2019 21:38 Jure Sah <e@juresah.si> napisał(a):

Hi,

On 7. 09. 19 21:03, Jan Steinman wrote:
I’m using an inexpensive Mac Mini, maxed out with RAM, and a 2GB SSD,
running NOTHING but MariaDB. I even run it headless, which means all
the UI processes stay in sleep(3). When I was having web server
performance issues, that was the one thing that improved things the
most. And that was after wasting a lot of time trying to tweak MariaDB
variables.

I work for an ISP. The system they use is connected to a dedicated SSD
RAID array on SAN. On the particular virtual machine the MariaDB is on
the same server as the Apache webserver.  The main issue is that the
website gets a lot of traffic. It has some CMS-based website on it and
the session table is grinding away constantly.

Normally it's not a problem, but once when a malfunction on the SAN
network caused degraded performance (not downtime mind you!), with the
usual traffic the website was simply down with Gateway timeout from
memcached. The workaround was to temporarily move the database to a
ramdisk (tmpfs).

My employer considers stepping outside the recommendations of the open
source community to be not worth the risk, and the issue was resolved
since.  However I know that I shouldn't have had to make the workaround
with the ramdisk, because the kernel page cache was supposed to take
care of that by itself. The reason this doesn't work is the fsyncs used
by MariaDB, which effectively disables any advantage offered by the page
cache. I think that is a great shame so I am trying to see if there is
anything I can do about that.

Perhaps there is a better alternative out there but nobody has as of yet
tried to make it work. I'm also not entirely convinced there is a
reasonable argument for being so particular about fsyncing everything on
typical server hardware (servers are usually UPS backed).


The argument about battery-backed RAID controllers, while this is
something we have, is something I find particularly unreasonable, since
as far as I know fsync isn't even supported by most RAID controller
drivers in Linux. There's an email on the Kernel mailing list that it
does exist, but that Linux doesn't use it.


Disabling fsync boosts performance of a typical MySQL server
by something like a factor of 3.

Are you really sure about that? I don’t know the MySQL implementation
details, but typically, sync(2) is run from a dedicated thread per
file, and so should not block anything, unless you’ve run out of
threads or something, in which case, tuning may help.

I found the measurement by some blogger on the internet. I'm not sure
what hardware they tried this on, it seems to me it would depend.

In the aforementioned malfunction it was pretty clear that MariaDB's
performance is capped by storage. It makes sense too, if you are
fsyncing on write operations, it's going to block if the storage is slow.

I'm not excluding the possibility that there is a setting here that is
making my system behave the way that it is.

LP,
Jure

_______________________________________________