Re: [Maria-discuss] fsync alternative

12 Sep 2019


      Hi Jure, maybe you can try using toku engine. I noticed that with large keycaches syncing it to disk can sometimes take up to a minute for myisam tables. Also innodb is generally heavy on I/O. Toku on the other hand is heavier on CPU/memory but seems very light on I/O (from my experience), so maybe that would help... Rocksdb should be light on random I/O access as it's using LSM, haven't used it in production yet and not sure how mature it is but may work...   You can try to disable binlogs/fsync almost altogether if you don't care about data consistency, it's very unrecommended. And if you're using replication skip the 3'rd line because it'll break it, if you're not using replication remove _xxx.  SET GLOBAL innodb_flush_log_at_trx_commit = 0;  SET GLOBAL  sync_binlog = 0;  SET GLOBAL log_bin_xxx = 0;   But i believe SAN is really the issue. Benchmark this SAN (throughput / iops)... if not conclusive try running from local SSD on raid10 or something similar... raid5/6 works horribly with SSDs and databases in general due to trim support and write amplification (not sure what raid you're using)...   www.percona.com www.percona.com  www.percona.com www.percona.com   For kernel's pagecache/fsync comment, i think it's working as expected and you're missing write-behind not OS file cache... I guess without fsync it'd be totally impossible to have the data in consistent state after a crash, so you could not only loose couple of second's worth of data but whole table, as writes could be postponed and re-ordered, especially on SSDs...   So for example if you'd do copy-on-write to make changes atomic, the pointer to data block could be written before the data block is (fully) there on disk and you'd get a structure pointing to not-existing or random data. To be sure you're having anything which is making sense in your files you'd need to   1. write your data  2. ensure it's on the device in full (fsync)  3. atomically change the data pointer from OLD to NEW data, which is probably using assumption that block-aligned writes below 512b should be atomic, or some log-like structure  Maria is probably doing it little differently, but concept is for sure similar if you don't want your software to be bound to some single, specific, probably very expensive and propertialy mix of software and hardware which can support some basic atomic I/O.    For IO over network... actually had a huge issues even with running simple readonly sqlite over network-attached I/O via NFS. I believe such kind of setup when your throughput is high will at least need dedicated 10Gbps network link. And still SAN/NAS can have "normal" throughput on 10Gb, but latency will probably be much higher (so slower fsync's). I'd try to move this SQL server outside of VM and to local storage.   Best,  Slawomir.  Dnia 7 września 2019 21:38 Jure Sah <e@juresah.si> napisał(a):  Hi,   On 7. 09. 19 21:03, Jan Steinman wrote:  I’m using an inexpensive Mac Mini, maxed out with RAM, and a 2GB SSD,  running NOTHING but MariaDB. I even run it headless, which means all  the UI processes stay in sleep(3). When I was having web server  performance issues, that was the one thing that improved things the  most. And that was after wasting a lot of time trying to tweak MariaDB  variables.   I work for an ISP. The system they use is connected to a dedicated SSD  RAID array on SAN. On the particular virtual machine the MariaDB is on  the same server as the Apache webserver.  The main issue is that the  website gets a lot of traffic. It has some CMS-based website on it and  the session table is grinding away constantly.   Normally it's not a problem, but once when a malfunction on the SAN  network caused degraded performance (not downtime mind you!), with the  usual traffic the website was simply down with Gateway timeout from  memcached. The workaround was to temporarily move the database to a  ramdisk (tmpfs).   My employer considers stepping outside the recommendations of the open  source community to be not worth the risk, and the issue was resolved  since.  However I know that I shouldn't have had to make the workaround  with the ramdisk, because the kernel page cache was supposed to take  care of that by itself. The reason this doesn't work is the fsyncs used  by MariaDB, which effectively disables any advantage offered by the page  cache. I think that is a great shame so I am trying to see if there is  anything I can do about that.   Perhaps there is a better alternative out there but nobody has as of yet  tried to make it work. I'm also not entirely convinced there is a  reasonable argument for being so particular about fsyncing everything on  typical server hardware (servers are usually UPS backed).    The argument about battery-backed RAID controllers, while this is  something we have, is something I find particularly unreasonable, since  as far as I know fsync isn't even supported by most RAID controller  drivers in Linux. There's an email on the Kernel mailing list that it  does exist, but that Linux doesn't use it.    Disabling fsync boosts performance of a typical MySQL server  by something like a factor of 3.   Are you really sure about that? I don’t know the MySQL implementation  details, but typically, sync(2) is run from a dedicated thread per  file, and so should not block anything, unless you’ve run out of  threads or something, in which case, tuning may help.   I found the measurement by some blogger on the internet. I'm not sure  what hardware they tried this on, it seems to me it would depend.   In the aforementioned malfunction it was pretty clear that MariaDB's  performance is capped by storage. It makes sense too, if you are  fsyncing on write operations, it's going to block if the storage is slow.   I'm not excluding the possibility that there is a setting here that is  making my system behave the way that it is.   LP,  Jure   ______________________________  Mailing list:  launchpad.net launchpad.net  Post to     :   maria-discuss@lists.launchpad.  Unsubscribe :  launchpad.net launchpad.net  More help   :  help.launchpad.net help.launchpad.net

pslawek83

tags

participants (1)