[Maria-developers] Possible race condition or NULL pointer triggered by OQGraph
Hi all (Cross-posted to oqgraph-developers, maria-developers) I am trying to track down a segfault apparently triggered by concurrent execution of queries through OQGraph - for context, see: https://mariadb.atlassian.net/browse/MDEV-6282 I am however a bit confused as to what is going on, because at least some of the information leads me to suspect that there could be more going on here than just a simple race condition. --Immediate Cause-- On the surface, a segfault appears to be caused in a method in sql_class.h, Statement::check_limit_rows_examined() dereferencing a NULL pointer, `lex`. The method check_limit_rows_examined() was called from sql_class.h method handler::increment_statistics() . --Analysis-- A quick inspection reveals that a NULL value for `lex` is actually perfectly valid some of the time, although I am nowhere near familiar enough with the code to really understand when or why. Specifically, `lex` is a member of the base class Statement, which is initialised to either point to THD::main_lex via a constructor , or intentionally to NULL from construction of an object Prepared_statement() (sql_prepare.cc) To my mind, if there is going to be a possibility of a pointer being NULL, then all uses of that pointer should be checking for that. I am hoping that the bug reporter can prove or otherwise my hypothesis with his particular scenario, although I am suspicious that there may be additional issues to be dealt with; it is quite possible that lex being NULL is a red herring, being caused by a race condition memory overrun elsewhere. However, I think it is still a good idea to NULL check pointers that could be set to NULL intentionally. I can submit a pull request if the following patch is agreeable: diff --git a/sql/sql_class.h b/sql/sql_class.h index 5898d9e..957acf3 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -2475,8 +2475,10 @@ public: */ void check_limit_rows_examined() { - if (++accessed_rows_and_keys > lex->limit_rows_examined_cnt) + ++accessed_rows_and_keys; + if (lex && accessed_rows_and_keys > lex->limit_rows_examined_cnt) { killed= ABORT_QUERY; + } } --Aside-- Immediately prior to the check_limit_rows_examined() call is a call to macro status_var_increment(). Comments around the macro indicate it is often likely to be called in a mutli-threaded context, but the data is such that it is possible to disable protection with the only side effect being an occasionally 'inaccurate' result. Is it possible however, that check_limit_rows_examined() is not thread-safe and should have some protection? cheers, Andrew --A
Hi, Andrew! On Jun 02, Andrew McDonnell wrote:
Hi all
(Cross-posted to oqgraph-developers, maria-developers)
I am trying to track down a segfault apparently triggered by concurrent execution of queries through OQGraph - for context, see:
https://mariadb.atlassian.net/browse/MDEV-6282
I am however a bit confused as to what is going on, because at least some of the information leads me to suspect that there could be more going on here than just a simple race condition.
--Immediate Cause--
On the surface, a segfault appears to be caused in a method in sql_class.h, Statement::check_limit_rows_examined() dereferencing a NULL pointer, `lex`. The method check_limit_rows_examined() was called from sql_class.h method handler::increment_statistics() .
I don't see how thd->lex could be NULL there. It shouldn't be changed concurrently on anything. Can you actually repeat the crash? Regards, Sergei
Hi Sergei I cant, but Heinz has managed to (see https://mariadb.atlassian.net/browse/MDEV-6282) The value of lex at that point is NULL in the core I guess if that situation is impossible by design, this is more likely to be a memory overrun instead... In any case, I got Heinz to try my fix and the crash still happened in the same spot. So I think my next step at this point is valgrind... --A On 02/06/14 20:53, Sergei Golubchik wrote:
Hi, Andrew!
On Jun 02, Andrew McDonnell wrote:
Hi all
(Cross-posted to oqgraph-developers, maria-developers)
I am trying to track down a segfault apparently triggered by concurrent execution of queries through OQGraph - for context, see:
https://mariadb.atlassian.net/browse/MDEV-6282
I am however a bit confused as to what is going on, because at least some of the information leads me to suspect that there could be more going on here than just a simple race condition.
--Immediate Cause--
On the surface, a segfault appears to be caused in a method in sql_class.h, Statement::check_limit_rows_examined() dereferencing a NULL pointer, `lex`. The method check_limit_rows_examined() was called from sql_class.h method handler::increment_statistics() . I don't see how thd->lex could be NULL there. It shouldn't be changed concurrently on anything.
Can you actually repeat the crash?
Regards, Sergei
On Monday 02 June 2014 13:23:54 Sergei Golubchik wrote:
Hi, Andrew!
On Jun 02, Andrew McDonnell wrote:
Hi all
(Cross-posted to oqgraph-developers, maria-developers)
I am trying to track down a segfault apparently triggered by concurrent execution of queries through OQGraph - for context, see:
https://mariadb.atlassian.net/browse/MDEV-6282
I am however a bit confused as to what is going on, because at least some of the information leads me to suspect that there could be more going on here than just a simple race condition.
--Immediate Cause--
On the surface, a segfault appears to be caused in a method in sql_class.h, Statement::check_limit_rows_examined() dereferencing a NULL pointer, `lex`. The method check_limit_rows_examined() was called from sql_class.h method handler::increment_statistics() .
I don't see how thd->lex could be NULL there. It shouldn't be changed concurrently on anything.
Can you actually repeat the crash?
I haven't been able so far to come up with a nice, contained, general test case. However, I can very easily reproduce the crash, on both a CentOS 6 system using MariaDB's rpms as well as on my Slackware systems using a self- compiled version. Grs, Heinz
participants (3)
-
Andrew McDonnell
-
Heinz Wiesinger
-
Sergei Golubchik