Hi, Sergey! On Sep 13, Sergey Vojtovich wrote:
Hi Sergei,
comments inline and a question: 10.0 throughput is twice lower than 5.6 in a specific case. It is known to be caused by tc_acquire_table() and tc_release_table(). Do we want to fix it? If yes - how?
How is it caused by tc_acquire_table/tc_release_table? In what specific case?
Why per-share lists are updated under the global mutex? Alas, it doesn't solve CPU cache coherence problem. It doesn't solve CPU cache coherence problem, yes. And it doesn't help if you have only one hot table. But it certainly helps if many threads access many tables. Ok, let's agree to agree: it will help in certain cases. Most probably it won't improve situation much if all threads access single table.
Of course.
We could try to ensure that per-share mutex is on the same cache line as free_tables and used_tables list heads. In this case I guess mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into CPU cache along with mutex structure. OTOH we still have to read per-TABLE prev/next pointers. And in 5.6 per-partition mutex should less frequently jump out of CPU cache than our per-share mutex. Worth trying?
Did you benchmark that these cache misses are a problem? What is the main problem that impacts the performance? Regards, Sergei