[Maria-developers] RFC - query cache insert
Hi guys! I was reading (again) the query cache source code of 10.0.3 ... please tell me if i understood right when we insert at query cache... thread put a lock in query cache (no one can result from query cache) thread read all data and write to network (caching the result to query cache) after end of thread real the query cache is unlocked.. my doubts are... what happen if the query is too slow? the query cache is disabled while executing the query? what happen if two equal queries (same SELECT) execute? the first one is locked until the second one end the query and get the result from query cache? or the first lock query cache and the second runs without query cache? the last one there's any query cache changes in mind? i was reading the sql_cache.cc of old mysql servers and there's a todo list (very old i think) that didn't changed from old mysql to today mariadb 10.0.3, this todo list is inside mysql bug tracker? or mariadb jira? thanks guys! -- Roberto Spadim
05.08.2013 18:05, Roberto Spadim пишет:
Hi guys! I was reading (again) the query cache source code of 10.0.3 ... please tell me if i understood right
when we insert at query cache... thread put a lock in query cache (no one can result from query cache) thread read all data and write to network (caching the result to query cache) after end of thread real the query cache is unlocked..
my doubts are... what happen if the query is too slow? the query cache is disabled while executing the query? what happen if two equal queries (same SELECT) execute? the first one is locked until the second one end the query and get the result from query cache? or the first lock query cache and the second runs without query cache?
You are wrong. Lock put only to register the query, then it will be unlocked and it is not connected to time of query executed. QC could be disabled during query execution and even situation when two same queries processed in the same time was taken into consideration and works properly. Also we put lock when write result to the cache (when intercept network output), again the lock time do not depend on query execution time.
the last one there's any query cache changes in mind? i was reading the sql_cache.cc of old mysql servers and there's a todo list (very old i think) that didn't changed from old mysql to today mariadb 10.0.3, this todo list is inside mysql bug tracker? or mariadb jira?
There are some QC changes planed (to make it more lock free for big (in terms of load) MariaDB database and you can find the task in Jira https://mariadb.atlassian.net/browse/MDEV-4454 [skip]
humm i will check the source and understand how it is done, thanks oleksandr! but... another doubt... today query cache have: query blocks (with query and flags), table blocks (with tables) result blocks (with results) there's any way of two query blocks (or more) link to same result block? for example... we execute the query "SELECT * FROM A", we will have 1 block of each (query, table, result) now we execute the query but instead of "SELECT * FROM A" we run "select * from A" (for example) we will have 2 blocks of query and result, and one of table now... (we waste space of 1 result block) the point is... could we cache the first query "SELECT * FROM A", and the normalized version of the query "SELECT ´a´.´a´,´a´.´b´ .... FROM A"? i didn't found how the EXPLAIN EXTENDED write the 'normalized' query, but it's something like the output of "explain extended QUERY;show warnings;" in this case we will have 2 blocks of query (one for the normal query and other for the normalized query), 1 block for table and one for result now we could check the "raw" query cache, and the "normalized" query cache, and when we got a new query that hit the cache via the 'normalized' cache (after parsing the query) we only include the 'raw' query to query block and return the results from the result block (instead of executing the query and expending disk i/o) for example... 1)"SELECT * FROM A" -> 1 raw query, 1 normalized query, 1 result, 1 table (this one will be executed and consume disk i/o) here we will have "SELECT * FROM A" (raw query) + "SELECT ´a´.´a´,´a´.´b´ FROM ´database´.´a´" (normalized query) 2)"SELECT * FROm A" -> 2 raw query, 1 normalized query, 1 result, 1 table (this one was hit when we do the parse and check the query cache via normalized cache) here we will parse the raw query ("SELECT * FROm A") to => "SELECT ´a´.´a´,´a´.´b´ FROM ´database´.´a´" (normalized query), and we can find the normalized query in query cache, now the job is put the the raw query to query cache and link the results and tables block to it, and return the result to client without consume disk i/o 3)"SELECT * FRoM A" -> 3 raw query, 1 normalized query, 1 result, 1 table the same of (2) 4)"SELECT a,b FRoM A" -> 4 raw query, 1 normalized query, 1 result, 1 table the same of (2) if we could do this we could reduce the memory consume of query cache (instead of 4 results block used we have only one used, in other words 25% less consume) other pros is "optimize" the parser if the query was found in normalized query cache, in this case we reduce the disk i/o and return the result via query cache it's a raw idea and i don't know if internally it could be done i put a MDEV about it some time ago but i doesn't started it, i'm studying the code yet and have some jobs parallel to my "hobby" of mariadb/mysql "coder" the MDEV is here: https://mariadb.atlassian.net/browse/MDEV-4681 thanks for your time
06.08.2013 20:49, Roberto Spadim пишет:
humm i will check the source and understand how it is done, thanks oleksandr!
but... another doubt...
today query cache have: query blocks (with query and flags), table blocks (with tables) result blocks (with results) there's any way of two query blocks (or more) link to same result block?
If it would be inexpensive way to detect query with same semantic why do not put them as one query? Why all that complications with linking the same result (and same tables BTW)? But there is no inexpensive way to detect it (re-printing query is quite expensive if we will do it for each query). I hope it explain everything. [skip]
Hi Oleksandr! humm i will check the source and understand how it is done,
thanks oleksandr!
but... another doubt...
today query cache have: query blocks (with query and flags), table blocks (with tables) result blocks (with results) there's any way of two query blocks (or more) link to same result block?
If it would be inexpensive way to detect query with same semantic why do not put them as one query? Why all that complications with linking the same result (and same tables BTW)?
hum, are you talking about rewrite the source of queries (applications)? i think it's more complicate for some closed applications, i'm not thinking like a application developer now, i'm thinking like a dba. about links... that's how query cache works today, or not? at least three kind of blocks, one for queries and flags other for table and other for results, that's how qc_info plugin works at least about a new kind of blocks... 'raw' query will not be the same of a 'normalized' query (at most cases at least) I think separated blocks for each kind of query is nice, and allow a faster query hit for parser (expend less time at parsers qc hit)...
But there is no inexpensive way to detect it (re-printing query is quite expensive if we will do it for each query). I hope it explain everything.
yes it is expensive, but the idea isn't execute for each query, the "first version" will do it for every query (no problem), after some tunes could allow to enable / disable this cost, example the query size (number of bytes in query) number of truples (i don't remember the name, but that's the "field operator field" in WHERE and HAVING clauses number of tables, time expend in parser, or a SQL_CACHE_EXTENDED option to force a cache (in this case will only work for new applications) i didn't checked the time/cpu expend with rewriting queries (i didn't found this in source yet, but with time is will find), the numbers that i have are: 1)query cache hit is < 0,1ms for a query cache with more than 10000 queries and a relative small database with 300qps (queries per second) 2)query execution is more than >1ms in many disks (with some buffer / cache it is faster) if rewrite is near 1ms, i think that's ok in some kind of workloads (a database with 10000qps maybe isn't good), but again, 1ms is better than reading disk...
[skip]
Thanks Oleksandr, ideas? -- Roberto Spadim
07.08.2013 18:44, Roberto Spadim пишет:
Hi Oleksandr!
humm i will check the source and understand how it is done,
thanks oleksandr!
but... another doubt...
today query cache have: query blocks (with query and flags), table blocks (with tables) result blocks (with results) there's any way of two query blocks (or more) link to same result block?
If it would be inexpensive way to detect query with same semantic why do not put them as one query? Why all that complications with linking the same result (and same tables BTW)?
hum, are you talking about rewrite the source of queries (applications)? i think it's more complicate for some closed applications, i'm not thinking like a application developer now, i'm thinking like a dba. about links... that's how query cache works today, or not? at least three kind of blocks, one for queries and flags other for table and other for results, that's how qc_info plugin works at least about a new kind of blocks... 'raw' query will not be the same of a 'normalized' query (at most cases at least) I think separated blocks for each kind of query is nice, and allow a faster query hit for parser (expend less time at parsers qc hit)... Sorry I did not understand above, but it is not important (see below).
But there is no inexpensive way to detect it (re-printing query is quite expensive if we will do it for each query). I hope it explain everything.
yes it is expensive, but the idea isn't execute for each query, the "first version" will do it for every query (no problem), after some tunes could allow to enable / disable this cost, example the query size (number of bytes in query) number of truples (i don't remember the name, but that's the "field operator field" in WHERE and HAVING clauses number of tables, time expend in parser, or a SQL_CACHE_EXTENDED option to force a cache (in this case will only work for new applications)
i didn't checked the time/cpu expend with rewriting queries (i didn't found this in source yet, but with time is will find), the numbers that i have are: 1)query cache hit is < 0,1ms for a query cache with more than 10000 queries and a relative small database with 300qps (queries per second) 2)query execution is more than >1ms in many disks (with some buffer / cache it is faster)
if rewrite is near 1ms, i think that's ok in some kind of workloads (a database with 10000qps maybe isn't good), but again, 1ms is better than reading disk... There is other big flaw in your idea.
When we return result from QC we do it _BEFORE_ parsing (and it is the QC huge advantage), so it is impossible to print the query in unified form. To do it we should parse it. [skip]
hum... check if i'm thinking wrong about the internal flow of mariadb/mysql ... client send data over network server receive data server process data and check that it's a query command server check query cache and return if found (that's how query cache execute today, right?) (here no query was found in query cache) server parse/optimize the query (here where i want to include the 'normalized' query cache) server execute the query server send result to client (caching to query cache when possible) server end query command
Hi! 08.08.2013 08:05, Roberto Spadim пишет:
hum... check if i'm thinking wrong about the internal flow of mariadb/mysql ...
client send data over network server receive data server process data and check that it's a query command server check query cache and return if found (that's how query cache execute today, right?) (here no query was found in query cache) server parse/optimize the query (here where i want to include the 'normalized' query cache) server execute the query server send result to client (caching to query cache when possible) server end query command
It could be put just after parsing (you do not need optimize). But again, it is huge overhead (in this schema query should be checked twice) and one should have very very high hit-rate of differently written but the same by semantic queries to get gain from it. So it is idea for very specific load which we can't meet in "wild nature".
hi oleksandr! i understood some points... but let me explain better... about optimizer part, i was talking about constant propagation and others optimizations, like this: 1) "select .. where 0=0 and a=1 and a=2" optimized to: "0=0 and a=1 and a=2" => "true and false" => "always false" (ok here executing is fast but just an example of optimization) 2) "select ... where a in (1,2,3,4) and a>2 AND a <4" optimized to "select ... where a=3" with only the rewrite of parsed query (without optimization), we have only a gain of rewrite "FrOm" to "FROM", remove white spaces and others simple "normalizations", that's not a good gain (and consume cpu/memory) since we need to optimize the query (to execute it) that's not a new load to mysql/mariadb ----- going back to my idea... *(A)* the 'new' load in this case is 1) time to write 'normalized' query (after optimization) cpu consume + memory writing the normalized query + increased lock time for myisam table locks and others locks 2) check 'normalized' query cache (after normalized query write) cpu to find query in query cache + a query cache lock overload + increased lock time + memory to save normalized queries another important overload that i don't see? for (1) we can get time used by explain extended to know how many time it take? ----- *(B)* talking about implementation... i was thinking something like: 1)check if we should do normalized query cache hit, considering raw query length, number of tables, number of comparations, maybe others prunes this avoid aditional load of a normalized query write, the main problem of this kind of implementation, right? 2)add a write function after optimization to write the normalized query 3)check normalized query cache 3.1) link current raw query to normalized query cache 3.2) return result to client 3.3) end the current query parse (remove locks, etc) 4) when not found in normalized query cache, continue normal query parse/execution 5) link the raw query to normalized query cache is my "high level" implementation simple like i think, or the execution is a bit more complicated? since i'm not a guru of mysql/maria source code, there's some locks and others flows that a "begineer coder" (me) don't see? ----- *(C)* i read some (3 or 4) books about mysql internals, but they don't tell how to change/understand the current optimizer, just how to implement a new optimizer/parser :/ all i know about mysql optimizations is reading mail list and source code and running debugger do you know some book that talk about internal optimizer? well, thanks again for your time! :)
ops... i forgot one more question that i don't know how today it is executed... the "explain extended" create the "normalized query" after parse/optimize but don't execute the query, i'm thinking right or i miss something?
08.08.2013 09:11, Roberto Spadim пишет:
ops... i forgot one more question that i don't know how today it is executed... the "explain extended" create the "normalized query" after parse/optimize but don't execute the query, i'm thinking right or i miss something?
Yes it could execute only constant expression if they are relatively cheap. But again there is no a lot of cases which makes the schema "profitable" (actually I hardly believe that there is at least one real world installation which require it (if you get optimization involved especially)).
i executed some tests with set profiling=1, execute query, set profiling=0; with the explain extended, and explain without extended it take ~0.0014s (in my query test, a union query with two queries with two tables and a single join) with extended it take ~0.0019s (increase 0.0005s, 35% of overhead for explain) maybe 0.5ms isn't a bad thing on some queries i will check more about the query rewrite of explain extended, but i think it's not a big problem on some workloads
participants (2)
-
Oleksandr Byelkin
-
Roberto Spadim