Re: [Maria-developers] Review request: SHOW EXPLAIN, attempt 2

8 Aug 2012

Hi Sergei,

On Tue, Aug 07, 2012 at 01:37:33PM +0200, Sergei Golubchik wrote:
...
On Jul 17, Sergei Petrunia wrote:
...
...
...
=== modified file 'sql/sql_join_cache.cc'

--- sql/sql_join_cache.cc	2012-03-24 17:21:22 +0000
+++ sql/sql_join_cache.cc	2012-05-16 20:59:03 +0000
@@ -2236,7 +2236,7 @@ enum_nested_loop_state JOIN_CACHE::join_
while (!(error= join_tab_scan->next()))   
   {
-    if (join->thd->killed)
+    if (join->thd->check_killed())
I would recommend to rename thd->killed (for example, to thd->killed_flag)
just to make sure no code uses thd->killed the old way,
and - more importantly - that we won't get new thd->killed uses in
merges from MySQL.
psergey:
I've tried renaming. This required to make changes in 160 places, and was a
1600-line patch:
http://s.petrunia.net/scratch/psergey-rename-killed-to-killed_flag.diff
Did you really intend to make such a big change?
Yes. Looking at this big patch I see that we have lots of places where
thd->killed is checked directly, instead of using thd->check_killed(),
that is, there can be long delays where APC requests won't be served
(despite the fact that KILL works there).
But if you feel uneasy doing this, let's create a separate jira task for
"refactor KILL to work via APC" and I will do it.
Filed https://mariadb.atlassian.net/browse/MDEV-442
...
...
...
...
=== modified file 'sql/sql_lex.h'
--- sql/sql_lex.h	2012-05-21 13:30:25 +0000
+++ sql/sql_lex.h	2012-07-10 17:23:00 +0000
@@ -2344,7 +2357,8 @@ struct LEX: public Query_tables_list
   char *backup_dir;				/* For RESTORE/BACKUP */
   char* to_log;                                 /* For PURGE MASTER LOGS TO */
   char* x509_subject,*x509_issuer,*ssl_cipher;
-  String *wild;
+  String *wild; /* Wildcard in SHOW {something} LIKE 'wild'*/ 
+  Item *show_explain_for_thread; /* id in SHOW EXPLAIN FOR id */
Why do extend LEX even more?
KILL, for example, uses LEX::value_list
psergey:
KILL does NOT use INFORMATION_SCHEMA. This is why it is able to use
LEX::value_list.
Could you please explain, why using INFORMATION_SCHEMA prevents you
from using LEX::value_list ?
Ok, we've figured there's no valid reason. Started to use LEX::value_list.
...
...
...
...
=== modified file 'sql/sql_parse.cc'
--- sql/sql_parse.cc	2012-05-21 18:54:41 +0000
+++ sql/sql_parse.cc	2012-07-10 17:23:00 +0000
@@ -2143,6 +2144,32 @@ mysql_execute_command(THD *thd)
     execute_show_status(thd, all_tables);
     break;
   }
+  case SQLCOM_SHOW_EXPLAIN:
+  {
+    if (!thd->security_ctx->priv_user[0] &&
+        check_global_access(thd,PROCESS_ACL))
+      break;
+
+    /*
+      The select should use only one table, it's the SHOW EXPLAIN pseudo-table
+    */
+    if (lex->sroutines.records || lex->query_tables->next_global)
+    {
+      my_error(ER_NOT_SUPPORTED_YET, MYF(0), "Usage of subqueries or stored "
+               "function calls as part of this statement");
Ok. Here I'd suggest to create a new error message
ER_MSG_SUBQUERIES_OR_ROUTINES
  "Usage of subqueries or stored function calls as part of this statement"
and use it as
my_error(ER_NOT_SUPPORTED_YET, MYF(0), ER(ER_MSG_SUBQUERIES_OR_ROUTINES));
also in SQLCOM_KILL and SQLCOM_CREATE_EVENT cases.
psergey:
I've changed SHOW EXPLAIN code not to use this error (it uses
ER_SET_CONSTANTS_ONLY now). Is that ok?
Yes, thanks.
...
...
...
=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc	2012-06-04 15:26:11 +0000
+++ sql/sql_select.cc	2012-07-05 20:28:30 +0000
@@ -928,6 +979,13 @@ bool JOIN::prepare_stage2()
 }
+int JOIN::optimize()
+{
+  int res= optimize_inner();
+  if (!res)
+    have_query_plan= QEP_AVAILABLE;
Why wouldn't you simply set have_query_plan from the old JOIN::optimize,
I mean, why did you have to move it to optimize_inner() ?
I thought it's because in some cases you need to invoke optimize_inner()
directly, without setting have_query_plan. But you don't seem to do it
anywhere.
psergey:
There are multiple exits from JOIN::optimize, and I wanted to cover them all
with one statement.
I wouldn't do it that way, but it's your task.
Okay.
...
...
...
=== modified file 'sql/sql_show.cc'
--- sql/sql_show.cc	2012-05-21 18:54:41 +0000
+++ sql/sql_show.cc	2012-07-10 17:23:00 +0000
@@ -1998,6 +1998,124 @@ void mysqld_list_processes(THD *thd,cons
   DBUG_VOID_RETURN;
 }
+
+static 
+const char *target_not_explainable_cmd="Target is not running EXPLAINable command";
+
+/*
+  Store the SHOW EXPLAIN output in the temporary table.
+*/
+
+int fill_show_explain(THD *thd, TABLE_LIST *table, COND *cond)
+{
+  const char *calling_user;
+  THD *tmp;
+  my_thread_id  thread_id;
+  DBUG_ENTER("fill_show_explain");
+
+  DBUG_ASSERT(cond==NULL);
+  thread_id= thd->lex->show_explain_for_thread->val_int();
+  calling_user= (thd->security_ctx->master_access & PROCESS_ACL) ?  NullS :
+                 thd->security_ctx->priv_user;
+  /* 
+    Find the thread we need EXPLAIN for. Thread search code was copied from
+    kill_one_thread()
Perhaps, it'd make sense to extract this code to a separate function and
reuse it?
psergey:
It's a 13-line loop, I don't think it makes sense to extract it into a
separate function. The separate function will have a complex calling
convention (it returns with acquired mutex), so I will end up adding 
more code than I have removed.
I've just tried it in your tree (5.5-show-explain).
26 lines added, 34 lines removed. It adds less code than it removes :)
But more importantly, if something will need to be changed in this loop,
only one place will need to be changed.
Here's the function that I've used (copied from kill_one_thread):
THD *find_thread_by_id(ulong id)
  {
    THD *tmp;
    mysql_mutex_lock(&LOCK_thread_count); // For unlink from list
    I_List_iterator<THD> it(threads);
    while ((tmp=it++))
    {
      if (tmp->command == COM_DAEMON)
        continue;
      if (tmp->thread_id == id)
      {
        mysql_mutex_lock(&tmp->LOCK_thd_data);    // Lock from delete
        break;
      }
    }
    mysql_mutex_unlock(&LOCK_thread_count);
    return tmp;
  }
Ok, thanks. I have put the function in.
...
...
...
...
+  */
+  mysql_mutex_lock(&LOCK_thread_count); // For unlink from list
+  I_List_iterator<THD> it(threads);
+  while ((tmp=it++))
+  {
=== added file 'sql/my_apc.cc'
--- sql/my_apc.cc	1970-01-01 00:00:00 +0000
+++ sql/my_apc.cc	2012-07-10 17:23:00 +0000
...
+/* 
+  Enter ther state where the target is available for serving APC requests
+*/
+void Apc_target::enable()
+{
+  /* Ok to do without getting/releasing the mutex: */
+  enabled++;
+}
My comment in the first review was
2. I don't see why you need enable/disable calls at all.
  You use that to disable SHOW EXPLAIN for certain parts of the code,
  but this is wrong. Apc is not SHOW EXPLAIN, you should not disable the
  whole apc mechanism only because SHOW EXPLAIN cannot be served.
  Instead the show-explain-generating callback function should
  detect when it can and cannot be used and react appropriately.
And your reply was
(as discussed on the phone) will try to
  - get the APC mechanism to work always, including ability to wake up
  threads that are sleeping/waiting.
  - EXPLAIN will check whether the APC is available in the context of the
  target thread.
psergey:
But haven't we later come to conclusion that it is not currently
possible to implement? 
...
Hence, we have decided not to implement always-wake-up system.
My recollection is that we agreed to ignore this limitation for now.
That is, in case of a THD waiting for network and a thread-pool
enabled, SHOW EXPLAIN will time out, but as SHOW EXPLAIN is useless
anyway for connections in the "idle" state, it's not a big deal.
When we'll have more APC users we'll get back to it.
But again, if you'd like, let's keep it your way, with enabled/disabled.
And move this change to the "refactor KILL to work via APC" task, that I
mentioned above.
Done, added to task.
...
...
Your review is trying to pull stuff in different directions. On one
hand, INFORMATION_SCHEMA support is added for querying EXPLAINs from
many threads.  On the other hand, you want APC system to do the basic
check of whether EXPLAIN can be produced in the context of the target
thread. This way,
SELECT * FROM information_schema.explain
will need to switch context to every thread on the server. Does this
make any sense?
But you didn't create the I_S table, right? I mean, it's marked as
hidden and can not be selected from, only SHOW command works?
Yes. The table is currently marked as hidden.
...
I was contemplating that when we need to send APC requests to many
threads at once, we could do it in parallel - post all requests, then
wait for all of them to complete. But there's no need to do it now -
it's a future extension, we'll consider it when we'll have a use case.
...
...
...
=== added file 'sql/my_apc.h'
--- sql/my_apc.h	1970-01-01 00:00:00 +0000
+++ sql/my_apc.h	2012-07-11 09:39:56 +0000
@@ -0,0 +1,134 @@
... 
I'm sorry, I do not understand the point of "Why did you not do X"
questions.  There are nearly infinitely many things I could have done,
but I needed to produce exactly one patch, and that required me to
stop at some variant.  I stopped at a C-type function call. Per your
request, I changed it to a C++ functor. What else do you need?
Does having four parameters to make_apc_call() function cause any
defect? If yes, could you please explicitly mention it?
I'm just trying to understand your logic, when it is not obvious from
the code. If I'd wanted you to change something, I'd wrote "please
change this". But here I'm merely asking, I'm not trying to imply that
there are defects of that some other way of doing this piece of code
would be better. No hidden meaning :)
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog