Hi Jerome, On 05/04/2017 11:58 AM, jerome brauge wrote:
Hi Alexander,
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : jeudi 4 mai 2017 09:00 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility
Hi Jerome,
On 05/04/2017 10:28 AM, jerome brauge wrote:
Hello Alexander, Just 2 points: - CHR accept only one value (not a list)
I think this is not a problem to support a list.
- NCHAR_CS is a keyword, you can't specify the character set (it's value is specified at the database creation time)
Ah, I see. We don't have a configurable NCHAR yet. NCHAR always translated to utf8, for example:
CREATE TABLE t1 (a NCHAR(10)) -> CREATE TABLE t1 (a CHAR(10) CHARACTER SET utf8)
There are two options then:
1. Understand NCHAR_CS and translate it to utf8 too. This, in addition too CHR(..USING NCHAR_CS), will also make MariaDB understand this valid Oracle syntax:
DECLARE a VARCHAR(10) CHARACTER SET NCHAR_CS; BEGIN NULL; END;
Btw, do you know where I can find the full list of Oracle's grammar rules that can contain NCHAR_CS?
Unfortunately no. I only know: CHR(n USING NCHAR_CS), but there is an equivalent : NCHR TRANSLATE (char USING { CHAR_CS | NCHAR_CS } ) equivalent to TO_CHAR and TO_NCHAR
2. Don't add the USING clause in CHR for now.
It was my original choice ... We don't use NVARCHAR. For multilingual database, we use AL32UTF8 as database character set and VARCHAR2(x CHAR) for all char datatype (by default VARCHAR(x) means x bytes, not x chars). AL32UTF8 is Oracle's first recommended database character set. From my point of view, oracle nvarchar compatibility is not a priority.
Agreed. Hmm. One more things. Translating CHR() to CHAR(...USING xxx) won't be actually correct, because it will preserve the information about the character set, but loose the information about the collation and will change the collation of the result: - from @@collation_database - to the default collation of @@character_set_database This would be wrong. So in this scenario: CREATE DATABASE db1 CHARACTER SET utf8 COLLATE utf8_unicode_ci; USE db1; CREATE TABLE t1 AS SELECT CHR(100) AS c; SHOW CREATE TABLE t1; DROP TABLE t1; it must create a column of type "VARCHAR(n) CHARACTER SET utf8 COLLATE utf8_unicode_ci" on both master and slave. With translation to CHAR(...USING xxx) it would erroneously make "VARCHAR(n) CHARACTER SET utf8 COLLATE utf8_general_ci" in slave. Therefore it must be done with a new class Item_func_chr, indeed, like you did. Please wait, I'll apply your patch and add a replication test, and send you the patch back.
Regards, Jérôme
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : jeudi 4 mai 2017 06:16 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility
Hello Jerome,
Hello Alexander, This is the patch for CHR(). Function CHR(n) returns the character having the binary equivalent to n in
On 04/27/2017 08:43 AM, jerome brauge wrote: the database character set. Returns null on null input.
Looking at Oracle's manual:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions019.h
tm
I think that:
1. We should support the optional USING clause. CHR(x USING y) could be just translated to CHAR(x USING y).
2. When no USING is specified, CHR(x) could be translated to CHAR(x USING database_character_set)
MariaDB's CHAR() uses &my_charset_bin as the default character set, which means VARBINARY result. Oracle's CHR() uses the database character set as the default character set.
It seems that both can be done in sql_yacc_ora.yy, and no new class is needed.
I suggest we do the following:
1. Add this into lex.h:
{ "CHR", SYM(CHR_SYM)},
2. Add this into sql_yacc.yy and sql_yacc_ora.yy:
%token CHR_SYM
This is to make this keyword non-reserved.
3. Add CHR_SYM into sql_yacc.yy, into the keyword_sp rule.
4. Add CHR_SYM into sql_yacc_ora.yy, into the keyword_sp_not_data_type rule.
This is to make this keyword non-reserved for sql_mode=ORACLE.
5. Add this into sql_yacc_ora.yy, into function_call_keyword
CHR_SYM '(' expr_list ')' { $$= new (thd->mem_root) Item_func_char(thd, *$3, thd->variables.collation_database); if ($$ == NULL) MYSQL_YYABORT; } | CHR_SYM '(' expr_list USING charset_name ')' { $$= new (thd->mem_root) Item_func_char(thd, *$3, $5); if ($$ == NULL) MYSQL_YYABORT; }
In the meanwhile I'll fix two bugs that I found reading the code in Item_func_char:
MDEV-12680 Crash when CREATE TABLE t1 AS SELECT char(100 using filename) MDEV-12681 Wrong VIEW results for CHAR(0xDF USING latin1)
Thanks!
Regard, Jérôme.
PS : I share this code under terms of MCA.
-----Message d'origine----- De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : mercredi 26 avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
Hello Jerome,
Thanks for working on this!
On 04/25/2017 11:42 PM, jerome brauge wrote: > Hello Alexander, > Some times ago, we have had the below discussion. > I made a patch following my first idea (it's enough from our point > of
replace empty string by null in literals and in make_empty_string. > I think this patch can be complementary to your suggestion of > adding an Item_func_eq_oracle (for data inserted in default > sql_mode) > > This patch also contains following some functions changes: > - replace ('ab','a',null) returns 'a' instead null > - trim / ltrim / rtrim returns null when input only contains > spaces > - lpad /rpad can accept 2 args (default padding char is space) and > if length equals to 0, returns null -substring : if start index is > equal to 0, act as if it's equals to 1 -add a function chr() as a > synonym of > char() (but accept only one arg) -add a function lengthb() as a > synonym of lentgh() -change the semantic of length for oracle > (where > length() is the character length and not the byte length) > > What do you think about this patch ? > > (Perhaps I have to make 2 patchs, one for empty string, and one > for functions changes with separate test file for each modified > function)
Right, can you please split the patch into pieces? I suggest even more pieces than two. One patch for one function. I'll create MDEVs, one MDEV per function and put examples how Oracle works.
Can we do it in the following order:
1. LPAD / RPAD - making the third parameter optional. This extension will be good for all sql_modes. I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have the third parameter to LPAD/RPAD optional.
2. CHR().
3. LENGTH() and LENGTHB().
Then we can continue with REPLACE, TRIM. SUBSTRING in no particular order, one patch per function.
Some general notes:
1. Unify *.yy files
Please put sql_mode dependent code into LEX rather than *.yy files
view) : directly.
We're going to unify the two files soon. See here for details: MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
So instead of doing:
| SUBSTRING '(' expr ',' expr ',' expr ')' { $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7); if ($$ == NULL) MYSQL_YYABORT; }
Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
| SUBSTRING '(' expr ',' expr ',' expr ')' { if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7))) MYSQL_YYABORT; }
where LEX::make_item_func_substr() is a new method, creating either Item_func_substr or Item_func_substr_oracle depending on sql_mode.
The same applies to all other functions, and to text_literal.
2. Don't use Item::is_null() when you call val_str() anyway.
There are two reasons that: - Performance is_null() involves the second val_str() call, which can be resource consuming in case of complex functions and subselect.
- Side effect Expressions can have some side effects. For example, in case if the argument is a stored function which counts how many times it was called and stores the counter in a user variable, or writes something to a table on every execution. If we do is_null() followed by val_str(), the side effect will done two times. We need one time only.
So instead of:
String *Item_func_replace_oracle::val_str(String *str) { String * res; if (res = replace_value(str, args[1]->is_null() ? &tmp_emtpystr : rgs[1]->val_str(&tmp_value2), args[2]->is_null() ? &tmp_emtpystr : rgs[2]->val_str(&tmp_value3))
return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Please do something like this:
String *Item_func_replace_oracle::val_str(String *str) { String * res; String *str1= args[1]->val_str(&tmp_value1); String *str2= args[2]->val_str(&tmp_value2); if (res= replace_value(str, str1 ? str1 : &tmp_emptystr, str2 ? str2 : &tmp_emptystr) return (res->length() == 0) ? make_empty_result () : res; else return 0; }
Thanks!
> > Best regards, > Jérôme. > >> -----Message d'origine----- >> De : Alexander Barkov [mailto:bar@mariadb.org] Envoyé : lundi 23 >> janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet : >> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574 >> >> Hi Jerome, >> >> On 01/18/2017 01:22 PM, jerome brauge wrote: >>> Hello Alexander, >>> Sometime ago, when I ask you about plan for MDEV-10574, you replied : >>> >>>> The current plan is to do these transformations: >>>> >>>> 1. Transform Insert >>>> - insert values ("") -> insert values (null) >>>> >>>> 2. Transform Select >>>> >>>> - where v=x => (v <> "" and V=X) >>>> - where v is null => (v="" or v is null) >>>> >>>> We didn't plan to change functions yet. Thanks for bringing this up. >>>> We'll discuss this. >>> >>> I've done some tests just by changing : >>> - insert an Item_null instead of an Item_string when >>> $1.length==0 in rule text_literal of sql_yacc_ora.yy >>> - return null instead of an empty string in >>> Item_str_func::make_empty_result >>> >>> My first tests seem promising. >>> >>> Of course this solution does not allow to "see" the records >>> created with >> empty strings as null values. >>> I don't see the importance of being able to do this in a >>> transparent way. >>> We can explicitly select these row by adding rtrim on these
columns. >>>>>>> >>>>>>> If you are interesting, I can begin to write a test to evaluate >>>>>>> the coverage of >>>>>> this solution. >>>>>> >>>>>> I'm afraid this will fix the behavior only for literals. >>>>>> >>>>>> This script: >>>>>> >>>>>> DROP TABLE t1; >>>>>> CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1 >>>> VALUES >>>>>> ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM >>>>>> t1 WHERE a=b; >>>>>> >>>>>> returns 0 for both SELECT queries in Oracle, and returns 1 for >>>>>> both SELECT queries in MariaDB, Your approach will fix the first SELECT only. >>>>>> The second will still return 1. >>>>>> >>>>>> >>>>>> For now, I'm not sure how to do this correctly. Some ways possible: >>>>>> >>>>>> 1. We could add new classes, e.g. Item_func_eq_oracle as a pair >>>>>> for Item_func_eq, which will handle both empty strings and NULLs >>>>>> inside their >>>>>> val_int() implementations. >>>>>> >>>>>> 2. Or there could be some after-processing on the created Item >>>>>> tree which will replace all things like Item_func_eq to >>>>>> Item_cond_and with Item_func_eq and Item_func_null_predicate >>>>>> passed as >> arguments. >>>>>> >>>>>> Currently I'm inclined to #1, because we don't have a good >>>>>> mechanism to clone items to implement #2 correctly. We'd need >>>>>> such cloning to pass arguments both to Item_func_eq and >>>>>> Item_func_null_predicate >>>> correctly. >>>>>> Some places in the code don't expect that the same arguments can >>>>>> be passed to two different Item_func instances. >>>>>> >>>>>> On the other hand, implementing #1 will need more changes in the >>>>>> optimizer. >>>>>> >>>>>> We were going to think on this later. >>>>>> >>>>>>> >>>>>>> Best regard. >>>>>>> Jérôme. >>>>>>>