Axel Schwenke <axel@askmonty.org> writes:
Benchmark 1 is good old sysbench OLTP. I tested 10.0.7 vs. 10.0.7-pgo. With low concurrency there is about 10% win by PGO; however this is completely reversed at higher concurrency by mutex contention (the test was with performance schema disabled, so cannot say which mutex, probably LOCK_open).
Ouch, pgo drops the throughput to 1/2! That's a pretty serious blow to the whole idea, unless there is not just a fix but also a good explanation. I will investigate this, thanks a lot for testing! I must say it is totally unexpected. I would have expected the effect of pgo (whether positive or negative) to be most pronounced at low concurrency, since at high concurrency lock contention dominates, which mainly happens in kernel and library code. And it is strange that a performance improvement at low concurrency manifests itself as a loss at high concurrency. Maybe pgo re-arranges data to optimise cache sharing? And this introduces more false sharing? But this needs to be checked properly.
The other benchmark is purely single threaded and runs Q1 from DBT3 for memory based data. Here I include data for many MariaDB and MySQL versions for comparison. The plot is a classical box-and-whiskers plot where the box contains 50% of the data points (25-75 percentile) and the whiskers mark minimum and maximum.
If I understand correctly, the noise in those tests is really too big to tell much one way or the other, right? Well, low-concurrency tests also shows the effect of single-threaded performance just fine. But clearly, unless there is some explanation for the hit at high concurrency, the PGO idea is not looking attractive... Again, thanks a lot for looking into this, I will try to find time soon to investigate more. - Kristian.