[Maria-developers] Why local modifications to PCRE library?
Alexander, I see you've included the full source of PCRE library into MariaDB tree and even made some local modifications which are not included into PCRE upstream. Could you please explain why did you do that? Why not allow to build with system standard PCRE library? I understand that you could have needed some modifications and thus could want to include sources from PCRE trunk to allow users to build with modifications when their system-wide library is too old. But as I don't see your modifications on PCRE trunk that means you probably had something different in mind, and decided to incur the pain of constantly updating the included sources and merging security fixes. So why did you decide to do that? Thank you, Pavel
Hello Pavel, On 11/16/2013 11:11 AM, Pavel Ivanov wrote:
Alexander,
I see you've included the full source of PCRE library into MariaDB tree and even made some local modifications which are not included into PCRE upstream. Could you please explain why did you do that? Why not allow to build with system standard PCRE library?
I understand that you could have needed some modifications and thus could want to include sources from PCRE trunk to allow users to build with modifications when their system-wide library is too old. But as I don't see your modifications on PCRE trunk that means you probably had something different in mind, and decided to incur the pain of constantly updating the included sources and merging security fixes. So why did you decide to do that?
There is a bug in the PCRE library. It crashes in the function compile_regex() because of the stack overrun in case when the pattern consists of a deep enough parenthesizes level like this: SELECT 'x' RLIKE CONCAT(REPEAT('(',300), 'x', REPEAT(')',300)); I reported the problem to Philip Hazel (the author and the main maintainer of PCRE). Philip already made a fix in the PCRE sources to address this issue. The Philip's fix is different comparing to our version, this is probably why you did not recognize it in the PCRE's trunk. Our fix makes compile_regex() watch the available stack size through a callback function and stop the recursion when the execution is near to run out of stack. It works very well with all possible values of the thread_stack MariaDB system variable, which is quite small by default (256 Kb). The Philip's version of the fix limits recursion depth to 250, which should be enough for any reasonable regular expression, and should *hopefully* not run out of stack. This fix will be included into the next release PCRE-8.34, which is planned around Christmas time. After the fixed version of PCRE is released we'll check if it solves the problem on all supported platforms with the minimum possible value of the MariaDB system variable "thread_stack". In case it does, we'll add an option to compile MariaDB against the system installed PCRE library instead of the bundled version. Otherwise, we'll report to Philip again asking again to consider including our fix.
Thank you, Pavel
On Sat, Nov 16, 2013 at 2:39 AM, Alexander Barkov <bar@mariadb.org> wrote:
I see you've included the full source of PCRE library into MariaDB tree and even made some local modifications which are not included into PCRE upstream. Could you please explain why did you do that? Why not allow to build with system standard PCRE library?
I understand that you could have needed some modifications and thus could want to include sources from PCRE trunk to allow users to build with modifications when their system-wide library is too old. But as I don't see your modifications on PCRE trunk that means you probably had something different in mind, and decided to incur the pain of constantly updating the included sources and merging security fixes. So why did you decide to do that?
There is a bug in the PCRE library. It crashes in the function compile_regex() because of the stack overrun in case when the pattern consists of a deep enough parenthesizes level like this:
SELECT 'x' RLIKE CONCAT(REPEAT('(',300), 'x', REPEAT(')',300));
I reported the problem to Philip Hazel (the author and the main maintainer of PCRE). Philip already made a fix in the PCRE sources to address this issue.
The Philip's fix is different comparing to our version, this is probably why you did not recognize it in the PCRE's trunk.
Our fix makes compile_regex() watch the available stack size through a callback function and stop the recursion when the execution is near to run out of stack. It works very well with all possible values of the thread_stack MariaDB system variable, which is quite small by default (256 Kb).
The Philip's version of the fix limits recursion depth to 250, which should be enough for any reasonable regular expression, and should *hopefully* not run out of stack. This fix will be included into the next release PCRE-8.34, which is planned around Christmas time.
After the fixed version of PCRE is released we'll check if it solves the problem on all supported platforms with the minimum possible value of the MariaDB system variable "thread_stack". In case it does, we'll add an option to compile MariaDB against the system installed PCRE library instead of the bundled version. Otherwise, we'll report to Philip again asking again to consider including our fix.
That makes sense. And I saw the commit in PCRE trunk http://vcs.pcre.org/viewvc?view=revision&revision=1389 and thought it is related. But I wonder why you don't take that change now, apply it to the local PCRE sources and test if it works for MariaDB without waiting for 8.34 release? If it works you can include sources with that change to avoid anybody's confusion, if it doesn't work Philip will be able to fix that before 8.34 release. Did you consider doing this? But besides the fix for deep parenthesis there's another difference. pcreposix.h in MariaDB has macros REG_ATOI and REG_ITOA. They are used in MariaDB and they are not in PCRE 8.33 and not in PCRE trunk. Why these macros were included here and not e.g. in my_global.h or some more narrow-scoped header? Pavel
On 11/16/2013 09:11 PM, Pavel Ivanov wrote:
On Sat, Nov 16, 2013 at 2:39 AM, Alexander Barkov <bar@mariadb.org> wrote:
I see you've included the full source of PCRE library into MariaDB tree and even made some local modifications which are not included into PCRE upstream. Could you please explain why did you do that? Why not allow to build with system standard PCRE library?
I understand that you could have needed some modifications and thus could want to include sources from PCRE trunk to allow users to build with modifications when their system-wide library is too old. But as I don't see your modifications on PCRE trunk that means you probably had something different in mind, and decided to incur the pain of constantly updating the included sources and merging security fixes. So why did you decide to do that?
There is a bug in the PCRE library. It crashes in the function compile_regex() because of the stack overrun in case when the pattern consists of a deep enough parenthesizes level like this:
SELECT 'x' RLIKE CONCAT(REPEAT('(',300), 'x', REPEAT(')',300));
I reported the problem to Philip Hazel (the author and the main maintainer of PCRE). Philip already made a fix in the PCRE sources to address this issue.
The Philip's fix is different comparing to our version, this is probably why you did not recognize it in the PCRE's trunk.
Our fix makes compile_regex() watch the available stack size through a callback function and stop the recursion when the execution is near to run out of stack. It works very well with all possible values of the thread_stack MariaDB system variable, which is quite small by default (256 Kb).
The Philip's version of the fix limits recursion depth to 250, which should be enough for any reasonable regular expression, and should *hopefully* not run out of stack. This fix will be included into the next release PCRE-8.34, which is planned around Christmas time.
After the fixed version of PCRE is released we'll check if it solves the problem on all supported platforms with the minimum possible value of the MariaDB system variable "thread_stack". In case it does, we'll add an option to compile MariaDB against the system installed PCRE library instead of the bundled version. Otherwise, we'll report to Philip again asking again to consider including our fix.
That makes sense. And I saw the commit in PCRE trunk http://vcs.pcre.org/viewvc?view=revision&revision=1389 and thought it is related. But I wonder why you don't take that change now, apply it to the local PCRE sources and test if it works for MariaDB without waiting for 8.34 release? If it works you can include sources with that change to avoid anybody's confusion, if it doesn't work Philip will be able to fix that before 8.34 release. Did you consider doing this?
Sorry, I did not. Being busy with something else now. The bundled PCRE version works fine, and we don't have any hurry to upgrade to a newer version. I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
But besides the fix for deep parenthesis there's another difference. pcreposix.h in MariaDB has macros REG_ATOI and REG_ITOA. They are used in MariaDB and they are not in PCRE 8.33 and not in PCRE trunk. Why these macros were included here and not e.g. in my_global.h or some more narrow-scoped header?
This particular change can be removed. It's not really needed anymore. I'll cleanup this at the same time with upgrading to PCRE-8.34 and adding support for an external PCRE library. Just created a task for this: https://mariadb.atlassian.net/browse/MDEV-5304
Pavel
Am 16.11.2013 19:39, schrieb Alexander Barkov:
I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
it violates the concept of shared libraries and is *highly* disapproved by most Linux distributions like Fedora which are doing a hard work downstream to unbundle all this stuff which wastes a lot of energy all over the world https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
Reindl, On 11/16/2013 10:46 PM, Reindl Harald wrote:
Am 16.11.2013 19:39, schrieb Alexander Barkov:
I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
it violates the concept of shared libraries and is *highly* disapproved by most Linux distributions like Fedora which are doing a hard work downstream to unbundle all this stuff which wastes a lot of energy all over the world
https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
Thanks for the link. I have added it into the task description: https://mariadb.atlassian.net/browse/MDEV-5304
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Reindl, On 11/16/2013 10:46 PM, Reindl Harald wrote:
Am 16.11.2013 19:39, schrieb Alexander Barkov:
I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
it violates the concept of shared libraries and is *highly* disapproved by most Linux distributions like Fedora which are doing a hard work downstream to unbundle all this stuff which wastes a lot of energy all over the world
https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
Thanks for the link. I have added it into the task description: https://mariadb.atlassian.net/browse/MDEV-5304 For now, we have to wait for PCRE-8.34 release anyway. Adding an option to compile against the external PCRE-8.33 is not a good idea: any MariaDB user that have a valid user and password would be able to crash the server by sending a dangerous pattern to RLIKE. We can't do that.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Am 16.11.2013 19:58, schrieb Alexander Barkov:
Reindl,
On 11/16/2013 10:46 PM, Reindl Harald wrote:
Am 16.11.2013 19:39, schrieb Alexander Barkov:
I'm not sure why bundling PCRE with some our fixes should be confusing. We bundle many libraries with our own fixes. Before the 10.0.5 release we've bundled the Henry Spenser's regex library with our own modifications for many many years. No one was ever confused about that :) So why do you think it's confusing?
it violates the concept of shared libraries and is *highly* disapproved by most Linux distributions like Fedora which are doing a hard work downstream to unbundle all this stuff which wastes a lot of energy all over the world
https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
Thanks for the link. I have added it into the task description: https://mariadb.atlassian.net/browse/MDEV-5304
For now, we have to wait for PCRE-8.34 release anyway. Adding an option to compile against the external PCRE-8.33 is not a good idea: any MariaDB user that have a valid user and password would be able to crash the server by sending a dangerous pattern to RLIKE. We can't do that
no proble, i have my own build-environments with no rules except "it must work relieable" for servers i am responsible I thought it would be good to point out the side effects and the Fedora page is a great ressource in that context besides it is our favourite distribution for several reasons there are always pros and cons * have all bundled - you *may* be sure somehow all is fine * at the end of the day the "all" is relative there may also be removed some hacks and workarounds in other packages which are not bundled removed because the distribution updated the upstream library and a package behaves not as expected because a older version the opposite may also happen, so you not really know all possible combinations of libraries and side effects i have a strong feeling that use as less as possible downstream patches in whatever project and try to fix issues in involved upstream packages is doing a favor for all involved people and may save a lot fo time and pain over the years and if the goal having no downstream patches what Fedora has more or less is reached the result means like very high quality over the stack that said from a "only user" in context of MariaDB but with some years developer experience in other areas
participants (3)
-
Alexander Barkov
-
Pavel Ivanov
-
Reindl Harald