Hi Sergey, thanks for looking into this! On Wed, Nov 26, 2014 at 08:00:44PM +0100, Sergei Golubchik wrote:
Hi, Sergey
I didn't review MDEV-7200 (InnoDB) and patches that are already in 10.1. Patch for MDEV-7200 is not good to go into 10.1 as is anyway. It is here just as a reminder that we can gain another 2% with simple fix.
Almost everything else was fine, see few comments below.
------------------------------------------------------------ revno: 4513 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 10.0-power timestamp: Fri 2014-11-07 22:03:18 +0400 message: Multi-rwlock lock for table definition cache hash. added: modified: diff: === modified file 'sql/table_cache.cc' --- sql/table_cache.cc 2014-10-21 12:20:39 +0000 +++ sql/table_cache.cc 2014-11-07 18:03:18 +0000 @@ -53,6 +53,77 @@ #include "table.h" #include "sql_base.h"
+ +/** + Table definition hash lock. + + Lock consists of multiple rw-locks (8 by default). Lock is write locked + when all instances are write locked. Lock is read locked when any instance + is read locked. + + This class was implemented to workaround scalability bottleneck of table + definition hash rw-lock. + + To be replaced with lock-free hash when we understand how to solve the + following problems: + - lock-free hash iterator (can be workarounded by introducing global + all_tables list);
I don't particularly like that solution. Could you try to switch to lock-free hash now?
I'll try to do the iterator asap This is the change that allowed us to break through 27kTPS to 33kTPS. I don't
Some of my patches depend on Monty's patches that haven't actually been pushed to 10.1 yet. I'll hold those. Further comments inline. like it either. My raw lf-hash prototype shows similar results. I think we hardly need to have this bottleneck solved in 10.1 one way or another. So my question is: do you feel like we will be able to implement this all on time for 10.1?
+ - lf_hash_search() may fail because of OOM. It is rather complex to handle + it properly. +*/ + +class TDC_lock +{ + uint m_instances; + bool m_write_locked; + char pad[128];
Interesting, why did I add the above padding... probably leftover from intermediate patch. Will remove it.
+ struct st_locks + { + mysql_rwlock_t lock; + char pad[128];
In these cases I usually do: pad[128 - sizeof(mysql_rwlock_t)]; Hmm... strange that I didn't define macro for this, I thought I did.
Your suggestion guarantees that st_locks will be at least 128 bytes. But these 128 bytes may be split across 2 cache lines. Or malloc returns pointer aligned on cache line size?
+ } *locks; +public: + int init(PSI_rwlock_key key, uint instances) + { + m_instances= instances; + m_write_locked= false; + locks= (st_locks *) my_malloc(sizeof(struct st_locks) * m_instances, MYF(MY_WME)); + if (!locks) + return 1; + for (uint i= 0; i < m_instances; i++) + mysql_rwlock_init(key, &locks[i].lock); + return 0; + } + void destroy(void) + { + for (uint i= 0; i < m_instances; i++) + mysql_rwlock_destroy(&locks[i].lock); + my_free(locks); + } + void wrlock(void) + { + for (uint i= 0; i < m_instances; i++) + mysql_rwlock_wrlock(&locks[i].lock); + m_write_locked= true; + } + void rdlock(THD *thd= 0) + { + mysql_rwlock_rdlock(&locks[thd ? thd->thread_id % m_instances : 0].lock); + } + void unlock(THD *thd= 0) + { + if (m_write_locked) + { + m_write_locked= false; + for (uint i= 0; i < m_instances; i++) + mysql_rwlock_unlock(&locks[i].lock); + } + else + mysql_rwlock_unlock(&locks[thd ? thd->thread_id % m_instances : 0].lock); + } +}; + + /** Configuration. */ ulong tdc_size; /**< Table definition cache threshold for LRU eviction. */ ulong tc_size; /**< Table cache threshold for LRU eviction. */ ------------------------------------------------------------ revno: 4512 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 10.0-power timestamp: Fri 2014-10-31 15:44:36 +0400 message: Preallocate locks on THD mem_root to avoid expensive malloc. modified: diff: === modified file 'sql/lock.cc' --- sql/lock.cc 2014-09-30 17:31:14 +0000 +++ sql/lock.cc 2014-10-31 11:44:36 +0000 @@ -700,7 +699,7 @@ MYSQL_LOCK *get_lock_data(THD *thd, TABL TABLE **to, **table_buf; DBUG_ENTER("get_lock_data");
- DBUG_ASSERT((flags == GET_LOCK_UNLOCK) || (flags == GET_LOCK_STORE_LOCKS)); + DBUG_ASSERT((flags & GET_LOCK_UNLOCK) || (flags & GET_LOCK_STORE_LOCKS));
Not the same, old assert also verified that these flags aren't set both. New assert doesn't. But I don't think it matters much.
You're right, I'll change this: the stricter - the better.
DBUG_PRINT("info", ("count %d", count));
for (i=lock_count=table_count=0 ; i < count ; i++) ------------------------------------------------------------ revno: 4508 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 10.0 timestamp: Tue 2014-10-21 16:20:39 +0400 message: modified: diff: === modified file 'sql/handler.cc' --- sql/handler.cc 2014-11-13 10:01:31 +0000 +++ sql/handler.cc 2014-10-21 12:20:39 +0000 @@ -410,13 +410,16 @@ static int hton_ext_based_table_discover static void update_discovery_counters(handlerton *hton, int val) { if (hton->discover_table_existence == full_discover_for_existence) - my_atomic_add32(&need_full_discover_for_existence, val); + my_atomic_add32_explicit(&need_full_discover_for_existence, val, + MY_MEMORY_ORDER_RELAXED);
if (hton->discover_table_names) - my_atomic_add32(&engines_with_discover_table_names, val); + my_atomic_add32_explicit(&engines_with_discover_table_names, val, + MY_MEMORY_ORDER_RELAXED);
if (hton->discover_table) - my_atomic_add32(&engines_with_discover, val); + my_atomic_add32_explicit(&engines_with_discover, val, + MY_MEMORY_ORDER_RELAXED);
I wouldn't do that. Technically there's a dependency between these counters and loaded engines, so the order, kind of, matters. I don't think there is any practical risk of bugs here, but, on the other hand, these counters are only updated when an engine is loaded or unloaded, so there's no need to bother with the relaxed memory ordering here.
Ok, I'll revert this.
}
int ha_finalize_handlerton(st_plugin_int *plugin)
=== modified file 'sql/mysqld.cc' --- sql/mysqld.cc 2014-11-18 21:25:47 +0000 +++ sql/mysqld.cc 2014-10-21 12:20:39 +0000 @@ -3884,7 +3884,7 @@ static void my_malloc_size_cb_func(long } // workaround for gcc 4.2.4-1ubuntu4 -fPIE (from DEB_BUILD_HARDENING=1) int64 volatile * volatile ptr=&global_status_var.memory_used; - my_atomic_add64(ptr, size); + my_atomic_add64_explicit(ptr, size, MY_MEMORY_ORDER_RELAXED);
good!
} }
=== modified file 'sql/mysqld.h' --- sql/mysqld.h 2014-11-13 08:56:28 +0000 +++ sql/mysqld.h 2014-10-21 12:20:39 +0000 @@ -629,7 +629,7 @@ inline __attribute__((warn_unused_result { query_id_t id; my_atomic_rwlock_wrlock(&global_query_id_lock); - id= my_atomic_add64(&global_query_id, 1); + id= my_atomic_add64_explicit(&global_query_id, 1, MY_MEMORY_ORDER_RELAXED);
I believe this is fine
my_atomic_rwlock_wrunlock(&global_query_id_lock); return (id); } === modified file 'sql/table_cache.cc' --- sql/table_cache.cc 2014-06-10 18:20:33 +0000 +++ sql/table_cache.cc 2014-10-21 12:20:39 +0000 @@ -139,7 +139,7 @@ uint tc_records(void) { uint count; my_atomic_rwlock_rdlock(&LOCK_tdc_atomics); - count= my_atomic_load32(&tc_count); + count= my_atomic_load32_explicit(&tc_count, MY_MEMORY_ORDER_RELAXED);
I'm not sure about this file. You should know better whether it's safe to use relaxed memory ordering here.
It should be Ok. This counter should not have strict memory ordering constraints. Thanks, Sergey