On 11/16/2013 09:11 PM, Pavel Ivanov wrote:
On Sat, Nov 16, 2013 at 2:39 AM, Alexander Barkov <bar@mariadb.org> wrote:
I see you've included the full source of PCRE library into MariaDB tree and even made some local modifications which are not included into PCRE upstream. Could you please explain why did you do that? Why not allow to build with system standard PCRE library?
I understand that you could have needed some modifications and thus could want to include sources from PCRE trunk to allow users to build with modifications when their system-wide library is too old. But as I don't see your modifications on PCRE trunk that means you probably had something different in mind, and decided to incur the pain of constantly updating the included sources and merging security fixes. So why did you decide to do that?
There is a bug in the PCRE library. It crashes in the function compile_regex() because of the stack overrun in case when the pattern consists of a deep enough parenthesizes level like this:
SELECT 'x' RLIKE CONCAT(REPEAT('(',300), 'x', REPEAT(')',300));
I reported the problem to Philip Hazel (the author and the main maintainer of PCRE). Philip already made a fix in the PCRE sources to address this issue.
The Philip's fix is different comparing to our version, this is probably why you did not recognize it in the PCRE's trunk.
Our fix makes compile_regex() watch the available stack size through a callback function and stop the recursion when the execution is near to run out of stack. It works very well with all possible values of the thread_stack MariaDB system variable, which is quite small by default (256 Kb).
The Philip's version of the fix limits recursion depth to 250, which should be enough for any reasonable regular expression, and should *hopefully* not run out of stack. This fix will be included into the next release PCRE-8.34, which is planned around Christmas time.
After the fixed version of PCRE is released we'll check if it solves the problem on all supported platforms with the minimum possible value of the MariaDB system variable "thread_stack". In case it does, we'll add an option to compile MariaDB against the system installed PCRE library instead of the bundled version. Otherwise, we'll report to Philip again asking again to consider including our fix.
That makes sense. And I saw the commit in PCRE trunk http://vcs.pcre.org/viewvc?view=revision&revision=1389 and thought it is related. But I wonder why you don't take that change now, apply it to the local PCRE sources and test if it works for MariaDB without waiting for 8.34 release? If it works you can include sources with that change to avoid anybody's confusion, if it doesn't work Philip will be able to fix that before 8.34 release. Did you consider doing this?
Sorry, I did not. Being busy with something else now. The bundled PCRE version works fine, and we don't have any hurry to upgrade to a newer version. I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
But besides the fix for deep parenthesis there's another difference. pcreposix.h in MariaDB has macros REG_ATOI and REG_ITOA. They are used in MariaDB and they are not in PCRE 8.33 and not in PCRE trunk. Why these macros were included here and not e.g. in my_global.h or some more narrow-scoped header?
This particular change can be removed. It's not really needed anymore. I'll cleanup this at the same time with upgrading to PCRE-8.34 and adding support for an external PCRE library. Just created a task for this: https://mariadb.atlassian.net/browse/MDEV-5304
Pavel