[Maria-developers] regexp review
Hi! Hi! Everything is OK. But there are some small issues: --- mysql-test/include/ctype_utf8mb4.inc 2010-03-05 08:17:19 +0000 +++ mysql-test/include/ctype_utf8mb4.inc 2013-09-26 14:02:17 +0000 @@ -234,15 +234,15 @@ set names utf8mb4; set names utf8mb4; # This should return TRUE -select 'вася' rlike '[[:<:]]вася[[:>:]]'; -select 'вася ' rlike '[[:<:]]вася[[:>:]]'; -select ' вася' rlike '[[:<:]]вася[[:>:]]'; -select ' вася ' rlike '[[:<:]]вася[[:>:]]'; +select 'вася' rlike '\\bвася\\b'; +select 'вася ' rlike '\\bвася\\b'; +select ' вася' rlike '\\bвася\\b'; +select ' вася ' rlike '\\bвася\\b'; Is above unsupported pattern? === modified file 'sql/mysqld.cc' --- sql/mysqld.cc 2013-09-18 11:07:31 +0000 +++ sql/mysqld.cc 2013-09-26 14:02:17 +0000 @@ -1898,7 +1898,7 @@ void clean_up(bool print_message) delete global_rpl_filter; end_ssl(); vio_end(); - my_regex_end(); + //my_regex_end(); #if defined(ENABLED_DEBUG_SYNC) /* End the debug sync facility. See debug_sync.cc. */ debug_sync_end(); @@ -3904,10 +3904,10 @@ static int init_common_variables() return 1; item_init(); #ifndef EMBEDDED_LIBRARY - my_regex_init(&my_charset_latin1, check_enough_stack_size); + //my_regex_init(&my_charset_latin1, check_enough_stack_size); my_string_stack_guard= check_enough_stack_size; #else - my_regex_init(&my_charset_latin1, NULL); + //my_regex_init(&my_charset_latin1, NULL); #endif /* Process a comma-separated character set list and choose Remove it please (I think it was just forgotten).
Hi Sanja, Thanks for review. On 09/30/2013 11:00 AM, Oleksandr Byelkin wrote:
Hi!
Hi!
Everything is OK.
But there are some small issues:
--- mysql-test/include/ctype_utf8mb4.inc 2010-03-05 08:17:19 +0000 +++ mysql-test/include/ctype_utf8mb4.inc 2013-09-26 14:02:17 +0000 @@ -234,15 +234,15 @@ set names utf8mb4; set names utf8mb4;
# This should return TRUE -select 'вася' rlike '[[:<:]]вася[[:>:]]'; -select 'вася ' rlike '[[:<:]]вася[[:>:]]'; -select ' вася' rlike '[[:<:]]вася[[:>:]]'; -select ' вася ' rlike '[[:<:]]вася[[:>:]]'; +select 'вася' rlike '\\bвася\\b'; +select 'вася ' rlike '\\bвася\\b'; +select ' вася' rlike '\\bвася\\b'; +select ' вася ' rlike '\\bвася\\b';
Is above unsupported pattern?
The above is a non-standard pattern which was supported by the Henry Spencer regex library which we bundle in the /regex directory. This is what regex/regex.7 says:
There are two special cases|.- of bracket expressions: the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at the begin- ning and end of a word respectively. A word is defined as a sequence of word characters which is neither preceded nor followed by word char- acters. A word character is an alnum character (as defined by ctype(3)) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in soft- ware intended to be portable to other systems.
I added this test a few years ago into MySQL after reading regex.7. (it was not a part of any bug report). This incompatibility should not be a big problem.
=== modified file 'sql/mysqld.cc' --- sql/mysqld.cc 2013-09-18 11:07:31 +0000 +++ sql/mysqld.cc 2013-09-26 14:02:17 +0000 @@ -1898,7 +1898,7 @@ void clean_up(bool print_message) delete global_rpl_filter; end_ssl(); vio_end(); - my_regex_end(); + //my_regex_end(); #if defined(ENABLED_DEBUG_SYNC) /* End the debug sync facility. See debug_sync.cc. */ debug_sync_end(); @@ -3904,10 +3904,10 @@ static int init_common_variables() return 1; item_init(); #ifndef EMBEDDED_LIBRARY - my_regex_init(&my_charset_latin1, check_enough_stack_size); + //my_regex_init(&my_charset_latin1, check_enough_stack_size); my_string_stack_guard= check_enough_stack_size; #else - my_regex_init(&my_charset_latin1, NULL); + //my_regex_init(&my_charset_latin1, NULL); #endif /* Process a comma-separated character set list and choose
Remove it please (I think it was just forgotten).
Done. Thanks for noticing this. As discussed on IRC, there is still one issue left related to the crash in pcre_compile() when using a recursive pattern with a lot of nested parenthesizes, like: SELECT 'x' RLIKE CONCAT( REPEAT('(',300), 'x', REPEAT(')',300)); I.e. when the pattern is like '((((((x))))))' but with more nested levels. Crash happens because pcre_compile() goes recursively and eats up all available stack. I'm currently discussing this problem on email with Philip Hazel (the author of PCRE). If he does not have quick ideas how to fix this, we'll just use the same trick that we used with the old regex library. See my_regex_enough_mem_in_stack in regex/regcomp.c. I'll tell you what we ended up with. Greetings.
participants (2)
-
Alexander Barkov
-
Oleksandr Byelkin