[Maria-developers] bb-10.2-ext / rpad lpad with 2 args
Hi Alexander, This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574). Regard, Jérôme. PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote:
Hello Alexander, Some times ago, we have had the below discussion. I made a patch following my first idea (it's enough from our point of view) : replace empty string by null in literals and in make_empty_string. I think this patch can be complementary to your suggestion of adding an Item_func_eq_oracle (for data inserted in default sql_mode)
This patch also contains following some functions changes: - replace ('ab','a',null) returns 'a' instead null - trim / ltrim / rtrim returns null when input only contains spaces - lpad /rpad can accept 2 args (default padding char is space) and if length equals to 0, returns null -substring : if start index is equal to 0, act as if it's equals to 1 -add a function chr() as a synonym of char() (but accept only one arg) -add a function lengthb() as a synonym of lentgh() -change the semantic of length for oracle (where length() is the character length and not the byte length)
What do you think about this patch ?
(Perhaps I have to make 2 patchs, one for empty string, and one for functions changes with separate test file for each modified function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files directly. We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hi Jerome,
Hello Alexander, Sometime ago, when I ask you about plan for MDEV-10574, you replied :
The current plan is to do these transformations:
1. Transform Insert - insert values ("") -> insert values (null)
2. Transform Select
- where v=x => (v <> "" and V=X) - where v is null => (v="" or v is null)
We didn't plan to change functions yet. Thanks for bringing this up. We'll discuss this.
I've done some tests just by changing : - insert an Item_null instead of an Item_string when $1.length==0 in rule text_literal of sql_yacc_ora.yy - return null instead of an empty string in Item_str_func::make_empty_result
My first tests seem promising.
Of course this solution does not allow to "see" the records created with empty strings as null values. I don't see the importance of being able to do this in a transparent way. We can explicitly select these row by adding rtrim on these columns.
If you are interesting, I can begin to write a test to evaluate the coverage of
On 01/18/2017 01:22 PM, jerome brauge wrote: this solution.
I'm afraid this will fix the behavior only for literals.
This script:
DROP TABLE t1; CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
VALUES
('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1 WHERE a=b;
returns 0 for both SELECT queries in Oracle, and returns 1 for both SELECT queries in MariaDB, Your approach will fix the first SELECT only. The second will still return 1.
For now, I'm not sure how to do this correctly. Some ways possible:
1. We could add new classes, e.g. Item_func_eq_oracle as a pair for Item_func_eq, which will handle both empty strings and NULLs inside their val_int() implementations.
2. Or there could be some after-processing on the created Item tree which will replace all things like Item_func_eq to Item_cond_and with Item_func_eq and Item_func_null_predicate passed as arguments.
Currently I'm inclined to #1, because we don't have a good mechanism to clone items to implement #2 correctly. We'd need such cloning to pass arguments both to Item_func_eq and Item_func_null_predicate correctly. Some places in the code don't expect that the same arguments can be passed to two different Item_func instances.
On the other hand, implementing #1 will need more changes in the optimizer.
We were going to think on this later.
Best regard. Jérôme.
Hi Jerome, Thanks for your contribution! The patch generally looks good. This piece of code is not fully correct though: + Item *param_3= new (thd->mem_root) Item_string(thd, + param_1->collation.collation + " ", 1); There are two problems here: 1. param_1->collation.collation is not precise at this point. It will become valid later, after param_1 calls its fix_fields(). 2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes (in UTF32). We need to emulate the same behavior like when the user types space as the last argument, so conversion from @@character_set_client to @@character_set_connection is needed. To make this work correctly, we need to repeat this logic from sql_yacc.yy:
text_literal: TEXT_STRING { LEX_CSTRING tmp; CHARSET_INFO *cs_con= thd->variables.collation_connection; CHARSET_INFO *cs_cli= thd->variables.character_set_client; uint repertoire= $1.repertoire(cs_cli); if (thd->charset_is_collation_connection || (repertoire == MY_REPERTOIRE_ASCII && my_charset_is_ascii_based(cs_con))) tmp= $1; else { LEX_STRING to; if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli)) MYSQL_YYABORT; tmp.str= to.str; tmp.length= to.length; } $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length, cs_con, DERIVATION_COERCIBLE, repertoire); if ($$ == NULL) MYSQL_YYABORT; }
I'm going to move this code from sql_yacc.yy to a new method in THD: Item_string *THD::make_item_string(const char *str, size_t length) const; so you can reuse it in your patch. I'm starting working on it right now and will tell you when it's ready. Thanks. On 04/26/2017 11:45 AM, jerome brauge wrote:
Hi Alexander, This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574).
Regard, Jérôme.
PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote:
Hello Alexander, Some times ago, we have had the below discussion. I made a patch following my first idea (it's enough from our point of view) : replace empty string by null in literals and in make_empty_string. I think this patch can be complementary to your suggestion of adding an Item_func_eq_oracle (for data inserted in default sql_mode)
This patch also contains following some functions changes: - replace ('ab','a',null) returns 'a' instead null - trim / ltrim / rtrim returns null when input only contains spaces - lpad /rpad can accept 2 args (default padding char is space) and if length equals to 0, returns null -substring : if start index is equal to 0, act as if it's equals to 1 -add a function chr() as a synonym of char() (but accept only one arg) -add a function lengthb() as a synonym of lentgh() -change the semantic of length for oracle (where length() is the character length and not the byte length)
What do you think about this patch ?
(Perhaps I have to make 2 patchs, one for empty string, and one for functions changes with separate test file for each modified function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files directly. We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hi Jerome,
Hello Alexander, Sometime ago, when I ask you about plan for MDEV-10574, you replied :
The current plan is to do these transformations:
1. Transform Insert - insert values ("") -> insert values (null)
2. Transform Select
- where v=x => (v <> "" and V=X) - where v is null => (v="" or v is null)
We didn't plan to change functions yet. Thanks for bringing this up. We'll discuss this.
I've done some tests just by changing : - insert an Item_null instead of an Item_string when $1.length==0 in rule text_literal of sql_yacc_ora.yy - return null instead of an empty string in Item_str_func::make_empty_result
My first tests seem promising.
Of course this solution does not allow to "see" the records created with empty strings as null values. I don't see the importance of being able to do this in a transparent way. We can explicitly select these row by adding rtrim on these columns.
If you are interesting, I can begin to write a test to evaluate the coverage of
On 01/18/2017 01:22 PM, jerome brauge wrote: this solution.
I'm afraid this will fix the behavior only for literals.
This script:
DROP TABLE t1; CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
VALUES
('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1 WHERE a=b;
returns 0 for both SELECT queries in Oracle, and returns 1 for both SELECT queries in MariaDB, Your approach will fix the first SELECT only. The second will still return 1.
For now, I'm not sure how to do this correctly. Some ways possible:
1. We could add new classes, e.g. Item_func_eq_oracle as a pair for Item_func_eq, which will handle both empty strings and NULLs inside their val_int() implementations.
2. Or there could be some after-processing on the created Item tree which will replace all things like Item_func_eq to Item_cond_and with Item_func_eq and Item_func_null_predicate passed as arguments.
Currently I'm inclined to #1, because we don't have a good mechanism to clone items to implement #2 correctly. We'd need such cloning to pass arguments both to Item_func_eq and Item_func_null_predicate correctly. Some places in the code don't expect that the same arguments can be passed to two different Item_func instances.
On the other hand, implementing #1 will need more changes in the optimizer.
We were going to think on this later.
Best regard. Jérôme.
Hello Alexander, Thanks for the review. I have already moved code of text_literal into lex in my last patch (empty_strings.diff). I will change it to use your new method. Do you have news from Monty for the patch on stacktrace as notes ? Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mardi 2 mai 2017 09:39 À : jerome brauge Cc : MariaDB Developers Objet : Re: bb-10.2-ext / rpad lpad with 2 args
Hi Jerome,
Thanks for your contribution! The patch generally looks good.
This piece of code is not fully correct though:
+ Item *param_3= new (thd->mem_root) Item_string(thd, + param_1->collation.collation + " ", 1);
There are two problems here:
1. param_1->collation.collation is not precise at this point. It will become valid later, after param_1 calls its fix_fields().
2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes (in UTF32). We need to emulate the same behavior like when the user types space as the last argument, so conversion from @@character_set_client to @@character_set_connection is needed.
To make this work correctly, we need to repeat this logic from sql_yacc.yy:
text_literal: TEXT_STRING { LEX_CSTRING tmp; CHARSET_INFO *cs_con= thd->variables.collation_connection; CHARSET_INFO *cs_cli= thd->variables.character_set_client; uint repertoire= $1.repertoire(cs_cli); if (thd->charset_is_collation_connection || (repertoire == MY_REPERTOIRE_ASCII && my_charset_is_ascii_based(cs_con))) tmp= $1; else { LEX_STRING to; if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli)) MYSQL_YYABORT; tmp.str= to.str; tmp.length= to.length; } $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length, cs_con, DERIVATION_COERCIBLE, repertoire); if ($$ == NULL) MYSQL_YYABORT; }
I'm going to move this code from sql_yacc.yy to a new method in THD:
Item_string *THD::make_item_string(const char *str, size_t length) const;
so you can reuse it in your patch.
I'm starting working on it right now and will tell you when it's ready.
Thanks.
Hi Alexander, This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574).
Regard, Jérôme.
PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote:
Hello Alexander, Some times ago, we have had the below discussion. I made a patch following my first idea (it's enough from our point of view) : replace empty string by null in literals and in make_empty_string. I think this patch can be complementary to your suggestion of adding an Item_func_eq_oracle (for data inserted in default sql_mode)
This patch also contains following some functions changes: - replace ('ab','a',null) returns 'a' instead null - trim / ltrim / rtrim returns null when input only contains spaces - lpad /rpad can accept 2 args (default padding char is space) and if length equals to 0, returns null -substring : if start index is equal to 0, act as if it's equals to 1 -add a function chr() as a synonym of char() (but accept only one arg) -add a function lengthb() as a synonym of lentgh() -change the semantic of length for oracle (where length() is the character length and not the byte length)
What do you think about this patch ?
(Perhaps I have to make 2 patchs, one for empty string, and one for functions changes with separate test file for each modified function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files
On 04/26/2017 11:45 AM, jerome brauge wrote: directly.
We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hi Jerome,
On 01/18/2017 01:22 PM, jerome brauge wrote:
Hello Alexander, Sometime ago, when I ask you about plan for MDEV-10574, you
replied :
> The current plan is to do these transformations: > > 1. Transform Insert > - insert values ("") -> insert values (null) > > 2. Transform Select > > - where v=x => (v <> "" and V=X) > - where v is null => (v="" or v is null) > > We didn't plan to change functions yet. Thanks for bringing this up. > We'll discuss this.
I've done some tests just by changing : - insert an Item_null instead of an Item_string when $1.length==0 in rule text_literal of sql_yacc_ora.yy - return null instead of an empty string in Item_str_func::make_empty_result
My first tests seem promising.
Of course this solution does not allow to "see" the records created with
I don't see the importance of being able to do this in a transparent way. We can explicitly select these row by adding rtrim on these columns.
If you are interesting, I can begin to write a test to evaluate the coverage of
empty strings as null values. this solution.
I'm afraid this will fix the behavior only for literals.
This script:
DROP TABLE t1; CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1 VALUES ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1 WHERE a=b;
returns 0 for both SELECT queries in Oracle, and returns 1 for both SELECT queries in MariaDB, Your approach will fix the first SELECT only. The second will still return 1.
For now, I'm not sure how to do this correctly. Some ways possible:
1. We could add new classes, e.g. Item_func_eq_oracle as a pair for Item_func_eq, which will handle both empty strings and NULLs inside their val_int() implementations.
2. Or there could be some after-processing on the created Item tree which will replace all things like Item_func_eq to Item_cond_and with Item_func_eq and Item_func_null_predicate passed as arguments.
Currently I'm inclined to #1, because we don't have a good mechanism to clone items to implement #2 correctly. We'd need such cloning to pass arguments both to Item_func_eq and Item_func_null_predicate correctly. Some places in the code don't expect that the same arguments can be passed to two different Item_func instances.
On the other hand, implementing #1 will need more changes in the optimizer.
We were going to think on this later.
Best regard. Jérôme.
On 05/02/2017 11:53 AM, jerome brauge wrote:
Hello Alexander, Thanks for the review. I have already moved code of text_literal into lex in my last patch (empty_strings.diff). I will change it to use your new method.
Thanks.
Do you have news from Monty for the patch on stacktrace as notes ?
Not yet. Monty has been on a trip at the moment. I'll let you know when he's ready with review. Sorry for the delay with that.
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mardi 2 mai 2017 09:39 À : jerome brauge Cc : MariaDB Developers Objet : Re: bb-10.2-ext / rpad lpad with 2 args
Hi Jerome,
Thanks for your contribution! The patch generally looks good.
This piece of code is not fully correct though:
+ Item *param_3= new (thd->mem_root) Item_string(thd, + param_1->collation.collation + " ", 1);
There are two problems here:
1. param_1->collation.collation is not precise at this point. It will become valid later, after param_1 calls its fix_fields().
2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes (in UTF32). We need to emulate the same behavior like when the user types space as the last argument, so conversion from @@character_set_client to @@character_set_connection is needed.
To make this work correctly, we need to repeat this logic from sql_yacc.yy:
text_literal: TEXT_STRING { LEX_CSTRING tmp; CHARSET_INFO *cs_con= thd->variables.collation_connection; CHARSET_INFO *cs_cli= thd->variables.character_set_client; uint repertoire= $1.repertoire(cs_cli); if (thd->charset_is_collation_connection || (repertoire == MY_REPERTOIRE_ASCII && my_charset_is_ascii_based(cs_con))) tmp= $1; else { LEX_STRING to; if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli)) MYSQL_YYABORT; tmp.str= to.str; tmp.length= to.length; } $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length, cs_con, DERIVATION_COERCIBLE, repertoire); if ($$ == NULL) MYSQL_YYABORT; }
I'm going to move this code from sql_yacc.yy to a new method in THD:
Item_string *THD::make_item_string(const char *str, size_t length) const;
so you can reuse it in your patch.
I'm starting working on it right now and will tell you when it's ready.
Thanks.
Hi Alexander, This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574).
Regard, Jérôme.
PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote:
Hello Alexander, Some times ago, we have had the below discussion. I made a patch following my first idea (it's enough from our point of view) : replace empty string by null in literals and in make_empty_string. I think this patch can be complementary to your suggestion of adding an Item_func_eq_oracle (for data inserted in default sql_mode)
This patch also contains following some functions changes: - replace ('ab','a',null) returns 'a' instead null - trim / ltrim / rtrim returns null when input only contains spaces - lpad /rpad can accept 2 args (default padding char is space) and if length equals to 0, returns null -substring : if start index is equal to 0, act as if it's equals to 1 -add a function chr() as a synonym of char() (but accept only one arg) -add a function lengthb() as a synonym of lentgh() -change the semantic of length for oracle (where length() is the character length and not the byte length)
What do you think about this patch ?
(Perhaps I have to make 2 patchs, one for empty string, and one for functions changes with separate test file for each modified function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files
On 04/26/2017 11:45 AM, jerome brauge wrote: directly.
We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hi Jerome,
On 01/18/2017 01:22 PM, jerome brauge wrote: > Hello Alexander, > Sometime ago, when I ask you about plan for MDEV-10574, you
replied :
> >> The current plan is to do these transformations: >> >> 1. Transform Insert >> - insert values ("") -> insert values (null) >> >> 2. Transform Select >> >> - where v=x => (v <> "" and V=X) >> - where v is null => (v="" or v is null) >> >> We didn't plan to change functions yet. Thanks for bringing this up. >> We'll discuss this. > > I've done some tests just by changing : > - insert an Item_null instead of an Item_string when $1.length==0 > in rule text_literal of sql_yacc_ora.yy > - return null instead of an empty string in > Item_str_func::make_empty_result > > My first tests seem promising. > > Of course this solution does not allow to "see" the records > created with empty strings as null values. > I don't see the importance of being able to do this in a transparent way. > We can explicitly select these row by adding rtrim on these columns. > > If you are interesting, I can begin to write a test to evaluate > the coverage of this solution.
I'm afraid this will fix the behavior only for literals.
This script:
DROP TABLE t1; CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1 VALUES ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1 WHERE a=b;
returns 0 for both SELECT queries in Oracle, and returns 1 for both SELECT queries in MariaDB, Your approach will fix the first SELECT only. The second will still return 1.
For now, I'm not sure how to do this correctly. Some ways possible:
1. We could add new classes, e.g. Item_func_eq_oracle as a pair for Item_func_eq, which will handle both empty strings and NULLs inside their val_int() implementations.
2. Or there could be some after-processing on the created Item tree which will replace all things like Item_func_eq to Item_cond_and with Item_func_eq and Item_func_null_predicate passed as arguments.
Currently I'm inclined to #1, because we don't have a good mechanism to clone items to implement #2 correctly. We'd need such cloning to pass arguments both to Item_func_eq and Item_func_null_predicate correctly. Some places in the code don't expect that the same arguments can be passed to two different Item_func instances.
On the other hand, implementing #1 will need more changes in the optimizer.
We were going to think on this later.
> > Best regard. > Jérôme. >
Hello Jerome, On 05/02/2017 11:53 AM, jerome brauge wrote:
Hello Alexander, Thanks for the review. I have already moved code of text_literal into lex in my last patch (empty_strings.diff). I will change it to use your new method.
I have pushed a patch adding THD::make_string_literal() to bb-10.2-ext. Please pull to the latest commit. Instead of this code: Item *param_3= new (thd->mem_root) Item_string(thd, param_1->collation.collation, " ", 1); please use: Item *param_3= thd->make_string_literal(" ", 1, MY_REPERTOIRE_ASCII); After fixing this line, please create a pull request for this patch. Don't forget to mention that you share the code under either BSD or MCA. Please use this task in commit comments: MDEV-12658 Make the third parameter to LPAD and RPAD optional (I just created it) Thank you very much!
Do you have news from Monty for the patch on stacktrace as notes ?
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mardi 2 mai 2017 09:39 À : jerome brauge Cc : MariaDB Developers Objet : Re: bb-10.2-ext / rpad lpad with 2 args
Hi Jerome,
Thanks for your contribution! The patch generally looks good.
This piece of code is not fully correct though:
+ Item *param_3= new (thd->mem_root) Item_string(thd, + param_1->collation.collation + " ", 1);
There are two problems here:
1. param_1->collation.collation is not precise at this point. It will become valid later, after param_1 calls its fix_fields().
2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes (in UTF32). We need to emulate the same behavior like when the user types space as the last argument, so conversion from @@character_set_client to @@character_set_connection is needed.
To make this work correctly, we need to repeat this logic from sql_yacc.yy:
text_literal: TEXT_STRING { LEX_CSTRING tmp; CHARSET_INFO *cs_con= thd->variables.collation_connection; CHARSET_INFO *cs_cli= thd->variables.character_set_client; uint repertoire= $1.repertoire(cs_cli); if (thd->charset_is_collation_connection || (repertoire == MY_REPERTOIRE_ASCII && my_charset_is_ascii_based(cs_con))) tmp= $1; else { LEX_STRING to; if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli)) MYSQL_YYABORT; tmp.str= to.str; tmp.length= to.length; } $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length, cs_con, DERIVATION_COERCIBLE, repertoire); if ($$ == NULL) MYSQL_YYABORT; }
I'm going to move this code from sql_yacc.yy to a new method in THD:
Item_string *THD::make_item_string(const char *str, size_t length) const;
so you can reuse it in your patch.
I'm starting working on it right now and will tell you when it's ready.
Thanks.
Hi Alexander, This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574).
Regard, Jérôme.
PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote:
Hello Alexander, Some times ago, we have had the below discussion. I made a patch following my first idea (it's enough from our point of view) : replace empty string by null in literals and in make_empty_string. I think this patch can be complementary to your suggestion of adding an Item_func_eq_oracle (for data inserted in default sql_mode)
This patch also contains following some functions changes: - replace ('ab','a',null) returns 'a' instead null - trim / ltrim / rtrim returns null when input only contains spaces - lpad /rpad can accept 2 args (default padding char is space) and if length equals to 0, returns null -substring : if start index is equal to 0, act as if it's equals to 1 -add a function chr() as a synonym of char() (but accept only one arg) -add a function lengthb() as a synonym of lentgh() -change the semantic of length for oracle (where length() is the character length and not the byte length)
What do you think about this patch ?
(Perhaps I have to make 2 patchs, one for empty string, and one for functions changes with separate test file for each modified function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files
On 04/26/2017 11:45 AM, jerome brauge wrote: directly.
We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
Best regards, Jérôme.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hi Jerome,
On 01/18/2017 01:22 PM, jerome brauge wrote: > Hello Alexander, > Sometime ago, when I ask you about plan for MDEV-10574, you
replied :
> >> The current plan is to do these transformations: >> >> 1. Transform Insert >> - insert values ("") -> insert values (null) >> >> 2. Transform Select >> >> - where v=x => (v <> "" and V=X) >> - where v is null => (v="" or v is null) >> >> We didn't plan to change functions yet. Thanks for bringing this up. >> We'll discuss this. > > I've done some tests just by changing : > - insert an Item_null instead of an Item_string when $1.length==0 > in rule text_literal of sql_yacc_ora.yy > - return null instead of an empty string in > Item_str_func::make_empty_result > > My first tests seem promising. > > Of course this solution does not allow to "see" the records > created with empty strings as null values. > I don't see the importance of being able to do this in a transparent way. > We can explicitly select these row by adding rtrim on these columns. > > If you are interesting, I can begin to write a test to evaluate > the coverage of this solution.
I'm afraid this will fix the behavior only for literals.
This script:
DROP TABLE t1; CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1 VALUES ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1 WHERE a=b;
returns 0 for both SELECT queries in Oracle, and returns 1 for both SELECT queries in MariaDB, Your approach will fix the first SELECT only. The second will still return 1.
For now, I'm not sure how to do this correctly. Some ways possible:
1. We could add new classes, e.g. Item_func_eq_oracle as a pair for Item_func_eq, which will handle both empty strings and NULLs inside their val_int() implementations.
2. Or there could be some after-processing on the created Item tree which will replace all things like Item_func_eq to Item_cond_and with Item_func_eq and Item_func_null_predicate passed as arguments.
Currently I'm inclined to #1, because we don't have a good mechanism to clone items to implement #2 correctly. We'd need such cloning to pass arguments both to Item_func_eq and Item_func_null_predicate correctly. Some places in the code don't expect that the same arguments can be passed to two different Item_func instances.
On the other hand, implementing #1 will need more changes in the optimizer.
We were going to think on this later.
> > Best regard. > Jérôme. >
participants (2)
-
Alexander Barkov
-
jerome brauge