Fwd: REGEXP_REPLACE behaviour
Hi All, I am trying to understand the behaviour of REGEXP_REPLACE when there is a backslash in the subject string. It appears that the subject needs to have \'s escaped which is not desirable (see example 1 below). I can understand having to escape \'s in the regex, but not the subject string. Any suggestions? Thanks. Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 397751 Server version: 10.11.5-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SELECT REGEXP_REPLACE('Et12\4', '^Et', 'Ethernet'); +---------------------------------------------+ | REGEXP_REPLACE('Et12\4', '^Et', 'Ethernet') | +---------------------------------------------+ | Ethernet124 | +---------------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> SELECT REGEXP_REPLACE('Et12\\4', '^Et', 'Ethernet'); +----------------------------------------------+ | REGEXP_REPLACE('Et12\\4', '^Et', 'Ethernet') | +----------------------------------------------+ | Ethernet12\4 | +----------------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> SELECT REGEXP_REPLACE('Et12/4', '^Et', 'Ethernet'); +---------------------------------------------+ | REGEXP_REPLACE('Et12/4', '^Et', 'Ethernet') | +---------------------------------------------+ | Ethernet12/4 | +---------------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> SELECT REGEXP_REPLACE('Et12 4', '^Et', 'Ethernet'); +---------------------------------------------+ | REGEXP_REPLACE('Et12 4', '^Et', 'Ethernet') | +---------------------------------------------+ | Ethernet12 4 | +---------------------------------------------+ 1 row in set (0.000 sec) -- Richard Ellerbrock
Richard Ellerbrock via discuss <discuss@lists.mariadb.org> writes:
I am trying to understand the behaviour of REGEXP_REPLACE when there is a backslash in the subject string. It appears that the subject needs to have \'s escaped which is not desirable (see example 1 below). I can understand
This is the behaviour of string litterals in mariadb (quoted string constants), it is not related to REGEXP_REPLACE. From the documentation: https://mariadb.com/kb/en/string-literals/ "Backslash (\), if not used as an escape character, must always be escaped. When followed by a character that is not in the above table, backslashes will simply be ignored." But it's possible to set NO_BACKSLASH_ESCAPES in the @@sql_mode, then backslash needs not be escaped: MariaDB [knielsen]> select 'A\BC'; | ABC | MariaDB [knielsen]> select 'A\\BC'; | A\BC | MariaDB [knielsen]> set session sql_mode=no_backslash_escapes; MariaDB [knielsen]> select 'A\BC'; | A\BC | MariaDB [knielsen]> select 'A\\BC'; | A\\BC | Because of the need to escape backslash in the string litterals, you need double backslash (when no_backslash_escapes is not set) to escape specials in the regexp: MariaDB [knielsen]> select regexp_replace('a.b', '\.', '!'); | !!! | MariaDB [knielsen]> select regexp_replace('a.b', '\\.', '!'); | a!b | Remeber that this is the behaviour of the SQL parser when it parses a string litteral. If the subject or regexp in REGEXP_REPLACE comes from a table column or prepared-statement placeholder, or anywhere else the value is not parsed as a string litteral, the backslash should not be escaped. - Kristian.
Thank you for the detailed explanation. Much appreciated. I did the same against a database table row and I get the expected result I am after. Just to confirm then, there is no distinction between string quoting style in SQL like in other languages. So string literals enclosed in '' and "" behave the same? On Tue, 6 Feb 2024 at 23:21, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Richard Ellerbrock via discuss <discuss@lists.mariadb.org> writes:
I am trying to understand the behaviour of REGEXP_REPLACE when there is a backslash in the subject string. It appears that the subject needs to have \'s escaped which is not desirable (see example 1 below). I can understand
This is the behaviour of string litterals in mariadb (quoted string constants), it is not related to REGEXP_REPLACE. From the documentation:
https://mariadb.com/kb/en/string-literals/
"Backslash (\), if not used as an escape character, must always be escaped. When followed by a character that is not in the above table, backslashes will simply be ignored."
But it's possible to set NO_BACKSLASH_ESCAPES in the @@sql_mode, then backslash needs not be escaped:
MariaDB [knielsen]> select 'A\BC'; | ABC |
MariaDB [knielsen]> select 'A\\BC'; | A\BC |
MariaDB [knielsen]> set session sql_mode=no_backslash_escapes; MariaDB [knielsen]> select 'A\BC'; | A\BC |
MariaDB [knielsen]> select 'A\\BC'; | A\\BC |
Because of the need to escape backslash in the string litterals, you need double backslash (when no_backslash_escapes is not set) to escape specials in the regexp:
MariaDB [knielsen]> select regexp_replace('a.b', '\.', '!'); | !!! |
MariaDB [knielsen]> select regexp_replace('a.b', '\\.', '!'); | a!b |
Remeber that this is the behaviour of the SQL parser when it parses a string litteral. If the subject or regexp in REGEXP_REPLACE comes from a table column or prepared-statement placeholder, or anywhere else the value is not parsed as a string litteral, the backslash should not be escaped.
- Kristian.
-- Richard Ellerbrock
Richard Ellerbrock <ellerbrockr@gmail.com> writes:
Just to confirm then, there is no distinction between string quoting style in SQL like in other languages. So string literals enclosed in '' and "" behave the same?
Yes (except using "" is non-standard and can be disabled, but by default it's the same). - Kristian.
participants (2)
-
Kristian Nielsen
-
Richard Ellerbrock