Hi, Kristian Nielsen wrote:
I have been analysing CPU bottlenecks in single-threaded sysbench read-only load. I found that icache misses is the main bottleneck, and that profile-guided compiler optimisation (PGO) with GCC gives a large speedup, 25% or more.
Here are some more results. Benchmark 1 is good old sysbench OLTP. I tested 10.0.7 vs. 10.0.7-pgo. With low concurrency there is about 10% win by PGO; however this is completely reversed at higher concurrency by mutex contention (the test was with performance schema disabled, so cannot say which mutex, probably LOCK_open). Normally I run with preloaded tcmalloc. However since 10.0.5(?) MariaDB uses jemalloc internally. Since this is built with MariaDB, it could benefit from PGO. However number look quite similar for tcmalloc vs. jemalloc. The other benchmark is purely single threaded and runs Q1 from DBT3 for memory based data. Here I include data for many MariaDB and MySQL versions for comparison. The plot is a classical box-and-whiskers plot where the box contains 50% of the data points (25-75 percentile) and the whiskers mark minimum and maximum. This time the win is about 5% for MariaDB-10.0.8 and ~ 0 for MariaDB-5.5.35. However those results should be taken with a grain of salt as those builds have been done with older gcc-4.6.3. I'll have to re-run with gcc-4.7.2 builds (but on different hardware). BR, XL