Hi Kristian, Kristian Nielsen wrote:
I have been analysing CPU bottlenecks in single-threaded sysbench read-only load. I found that icache misses is the main bottleneck, and that profile-guided compiler optimisation (PGO) with GCC gives a large speedup, 25% or more.
(More details in my blog posts:
http://kristiannielsen.livejournal.com/17676.html http://kristiannielsen.livejournal.com/18168.html )
Wow. 25% is a lot. Have you also tried compiling MySQL 5.6 with PGO? Because if that gets the same improvement, we haven't won anything in the comparison. I played a bit with PGO back at SAP - when we worked with the Intel guys and used the Intel compiler. One of the bottlenecks we found there was the SQL parser. It was just too big to fit into L1 cache. And also too big to be optimized at once.
Any comments? Here are some more points:
- I tested that gen_profile_load gives a good speedup of sysbench read-only (around 30%, so still very significant even though it generates a different and more varied load).
This is interesting. By definition, PGO should work best if the workload used for profiling matches the production workload. I hadn't expected that a partial match gives such good results too. This is something that needs more tests.
- More tests would be nice, of course. Axel, would you be able to build some binaries following above procedure, and test some different random benchmarks? Anything that is easy to run could be interesting, both to test for improvement, and to check against regressions.
Yes, I'll certainly do that. Speaking of regressions - if we plan to deliver binaries built with PGO, we must also test the influence of different architectures. I.e. how behaves a binary built on Intel when being run on AMD.
- We probably need a recent GCC version to get good results. I used GCC version 4.7.2. Maybe we should install this GCC version in all the VMs we use to build binaries?
That is the gcc version installed @ lizard2. The facebook machines still have a 4.5.x (SuSE specific snapshot from 2010 ... WTF?). Jani: could you look into upgrading gcc on the facebook machines? XL