[Maria-developers] Review: [MDEV-7062] parser refactoring: don't store field properties in LEX
Hi Sergei,
Sergei Golubchik edited comment on MDEV-7062 at 11/10/14 11:40 AM: -------------------------------------------------------------------
http://lists.askmonty.org/pipermail/commits/2014-November/006908.html
{noformat} From: serg@mariadb.org To: commits@mariadb.org Subject: [Commits] 2c904b9: parser cleanup: don't store field properties in LEX, use Create_field directly Date: Mon, 10 Nov 2014 10:26:48 +0100 {noformat}
Please find my comments below:
[Commits] 2c904b9: parser cleanup: don't store field properties in LEX, use Create_field directly serg at mariadb.org serg at mariadb.org Mon Nov 10 11:26:48 EET 2014
Previous message: [Commits] Rev 4344: MDEV-6179: dynamic columns functions/cast()/convert() doesn't play nice with CREATE/ALTER TABLE in lp:~maria-captains/maria/5.5 Next message: [Commits] Rev 4345: MDEV-6862 "#error <my_config.h>" and third-party libraries in lp:~maria-captains/maria/5.5 Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
revision-id: 2c904b937b5ef7fbec60ebb9cb514a501c5bd8e4 parent(s): f59588b9112965849986c45fcd150d9abed74063 committer: Sergei Golubchik branch nick: maria timestamp: 2014-11-09 22:33:17 +0100 message:
parser cleanup: don't store field properties in LEX, use Create_field directly
length/dec/charset are still in LEX, because they're also used for CAST and dynamic columns.
also 1. fix "MDEV-7041 COLLATION(CAST('a' AS CHAR BINARY)) returns a wrong result" 2. allow BINARY modifier in stored function RETURN clause 3. allow "COLLATION without CHARSET" in SP/SF (parameters, RETURN, DECLARE) 4. print correct variable name in error messages for stored routine parameters
A very nice clean up. I only found it a little bit not easy to track that all Lex and Create_field attributes get initialized and copied to each other properly. I suggest to go a little bit further and get rid from Lex->dec, Lex->length and Lex->charset at all, and write the attributes directly to the destination as soon as they get parsed. See a proposed addon patch attached.
diff --git a/mysql-test/r/cast.result b/mysql-test/r/cast.result index c81af13..8ad4568 100644 --- a/mysql-test/r/cast.result +++ b/mysql-test/r/cast.result @@ -796,3 +796,32 @@ DATE("foo") NULL Warnings: Warning 1292 Incorrect datetime value: 'foo' +create table t1 (a int, b char(5) as (cast("a" as char(10) binary) + a) ); +show create table t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` int(11) DEFAULT NULL, + `b` char(5) AS (cast("a" as char(10) binary) + a) VIRTUAL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +drop table t1; +select collation(cast("a" as char(10) binary)); +collation(cast("a" as char(10) binary)) +latin1_bin +select collation(cast("a" as char(10) charset utf8 binary)); +collation(cast("a" as char(10) charset utf8 binary)) +utf8_bin +select collation(cast("a" as char(10) ascii binary)); +collation(cast("a" as char(10) ascii binary)) +latin1_bin +select collation(cast("a" as char(10) unicode binary)); +collation(cast("a" as char(10) unicode binary)) +ucs2_bin
UNICODE is an alias for UCS2, which is not always compiled. Please move ucs2 related tests to ctype_ucs.test. <skip>
diff --git a/mysql-test/t/cast.test b/mysql-test/t/cast.test index b6c37ca..aa43ee6 100644 --- a/mysql-test/t/cast.test +++ b/mysql-test/t/cast.test @@ -456,3 +456,21 @@ SELECT CAST(TIME('10:20:30') AS DATE) + INTERVAL 1 DAY; SET SQL_MODE=ALLOW_INVALID_DATES; SELECT DATE("foo");
+# +# CAST and field definition using same fields in LEX +# +create table t1 (a int, b char(5) as (cast("a" as char(10) binary) + a) ); +show create table t1; +drop table t1; + +# +# CAST (... BINARY) +# +select collation(cast("a" as char(10) binary)); +select collation(cast("a" as char(10) charset utf8 binary)); +select collation(cast("a" as char(10) ascii binary)); +select collation(cast("a" as char(10) unicode binary)); +select collation(cast("a" as char(10) binary charset utf8)); +select collation(cast("a" as char(10) binary ascii)); +select collation(cast("a" as char(10) binary unicode));
More UCS2. You removed this code in field.cc:
- if (fld_length != NULL) - { - errno= 0; - length= strtoul(fld_length, NULL, 10); - if ((errno != 0) || (length > MAX_FIELD_BLOBLENGTH)) - { - my_error(ER_TOO_BIG_DISPLAYWIDTH, MYF(0), fld_name, MAX_FIELD_BLOBLENGTH); - DBUG_RETURN(TRUE); - } - - if (length == 0) - fld_length= NULL; /* purecov: inspected */ - }
and added this in sql_yacc.yy:
+ if (length) + { + int err; + last_field->length= (ulong)my_strtoll10(length, NULL, &err); + if (err) + last_field->length= ~0UL; // safety + } + else + last_field->length= 0;
I have a feeling that on a 32bit platforms it will stop sending errors on attempt to create a blob larger that 4Gb. last_field->length is ulong, which is 32bit is a 32 bit platform, and it gets assigned to a 64 bit value.
Hi, Alexander! On Nov 13, Alexander Barkov wrote:
[Commits] 2c904b9: parser cleanup: don't store field properties in LEX, use Create_field directly serg at mariadb.org serg at mariadb.org Mon Nov 10 11:26:48 EET 2014
Previous message: [Commits] Rev 4344: MDEV-6179: dynamic columns functions/cast()/convert() doesn't play nice with CREATE/ALTER TABLE in lp:~maria-captains/maria/5.5 Next message: [Commits] Rev 4345: MDEV-6862 "#error <my_config.h>" and third-party libraries in lp:~maria-captains/maria/5.5 Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
revision-id: 2c904b937b5ef7fbec60ebb9cb514a501c5bd8e4 parent(s): f59588b9112965849986c45fcd150d9abed74063 committer: Sergei Golubchik branch nick: maria timestamp: 2014-11-09 22:33:17 +0100 message:
parser cleanup: don't store field properties in LEX, use Create_field directly
length/dec/charset are still in LEX, because they're also used for CAST and dynamic columns.
also 1. fix "MDEV-7041 COLLATION(CAST('a' AS CHAR BINARY)) returns a wrong result" 2. allow BINARY modifier in stored function RETURN clause 3. allow "COLLATION without CHARSET" in SP/SF (parameters, RETURN, DECLARE) 4. print correct variable name in error messages for stored routine parameters
A very nice clean up.
I only found it a little bit not easy to track that all Lex and Create_field attributes get initialized and copied to each other properly.
I suggest to go a little bit further and get rid from Lex->dec, Lex->length and Lex->charset at all, and write the attributes directly to the destination as soon as they get parsed. See a proposed addon patch attached.
I don't like very much all that LEX::last_field thing, but it existed already and I reused it. But I'd rather not introduce more LEX::last_attr or LEX::last_whatever members unless absolutely necessary. I generally prefer to pass data via the parser stack. That's what I did in fixing this dyncol bug. As I've found in this cleanup, and as you've seen in review, dynamic columns, CAST, and field declarations all use LEX::length/etc - so using CAST or dyncols in a virtual column will overwrite field's declaration. Recently Kolbe found this as well and reported a bug for 5.5, MDEV-6179. I've fixed it by creating LEX_TYPE structure - I didn't remove length/etc from LEX in 5.5, but if this patch will be merged in 10.1, it could easily allow to remove them from LEX. So, I would rather postpone any further LEX changes until 10.0 merge. Then I can show you the next cleanup patch. In fact, if I'd have checked Jira bugs before doing this cleanup, I would've started from 5.5 bugfix and then would've done the rest on top of it. But now I'm facing an "interesting" merge ahead :) (LEX_TYPE in 5.5 is basically your Create_type_attributes_st without m_length_was_set_explicitly - as it's not needed in 5.5 - and with enum_field_types type).
diff --git a/mysql-test/r/cast.result b/mysql-test/r/cast.result index c81af13..8ad4568 100644 --- a/mysql-test/r/cast.result +++ b/mysql-test/r/cast.result @@ -796,3 +796,32 @@ DATE("foo") NULL Warnings: Warning 1292 Incorrect datetime value: 'foo' +create table t1 (a int, b char(5) as (cast("a" as char(10) binary) + a) ); +show create table t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` int(11) DEFAULT NULL, + `b` char(5) AS (cast("a" as char(10) binary) + a) VIRTUAL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +drop table t1; +select collation(cast("a" as char(10) binary)); +collation(cast("a" as char(10) binary)) +latin1_bin +select collation(cast("a" as char(10) charset utf8 binary)); +collation(cast("a" as char(10) charset utf8 binary)) +utf8_bin +select collation(cast("a" as char(10) ascii binary)); +collation(cast("a" as char(10) ascii binary)) +latin1_bin +select collation(cast("a" as char(10) unicode binary)); +collation(cast("a" as char(10) unicode binary)) +ucs2_bin
UNICODE is an alias for UCS2, which is not always compiled. Please move ucs2 related tests to ctype_ucs.test.
Sure.
<skip> You removed this code in field.cc:
- if (fld_length != NULL) - { - errno= 0; - length= strtoul(fld_length, NULL, 10); - if ((errno != 0) || (length > MAX_FIELD_BLOBLENGTH)) - { - my_error(ER_TOO_BIG_DISPLAYWIDTH, MYF(0), fld_name, MAX_FIELD_BLOBLENGTH); - DBUG_RETURN(TRUE); - } - - if (length == 0) - fld_length= NULL; /* purecov: inspected */ - }
and added this in sql_yacc.yy:
+ if (length) + { + int err; + last_field->length= (ulong)my_strtoll10(length, NULL, &err); + if (err) + last_field->length= ~0UL; // safety + } + else + last_field->length= 0;
I have a feeling that on a 32bit platforms it will stop sending errors on attempt to create a blob larger that 4Gb. last_field->length is ulong, which is 32bit is a 32 bit platform, and it gets assigned to a 64 bit value.
Indeed, good catch. I'll fix it. Regards, Sergei
Hi Sergei, On 11/16/2014 02:30 AM, Sergei Golubchik wrote:
Hi, Alexander!
On Nov 13, Alexander Barkov wrote:
[Commits] 2c904b9: parser cleanup: don't store field properties in LEX, use Create_field directly serg at mariadb.org serg at mariadb.org Mon Nov 10 11:26:48 EET 2014
Previous message: [Commits] Rev 4344: MDEV-6179: dynamic columns functions/cast()/convert() doesn't play nice with CREATE/ALTER TABLE in lp:~maria-captains/maria/5.5 Next message: [Commits] Rev 4345: MDEV-6862 "#error <my_config.h>" and third-party libraries in lp:~maria-captains/maria/5.5 Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
revision-id: 2c904b937b5ef7fbec60ebb9cb514a501c5bd8e4 parent(s): f59588b9112965849986c45fcd150d9abed74063 committer: Sergei Golubchik branch nick: maria timestamp: 2014-11-09 22:33:17 +0100 message:
parser cleanup: don't store field properties in LEX, use Create_field directly
length/dec/charset are still in LEX, because they're also used for CAST and dynamic columns.
also 1. fix "MDEV-7041 COLLATION(CAST('a' AS CHAR BINARY)) returns a wrong result" 2. allow BINARY modifier in stored function RETURN clause 3. allow "COLLATION without CHARSET" in SP/SF (parameters, RETURN, DECLARE) 4. print correct variable name in error messages for stored routine parameters
A very nice clean up.
I only found it a little bit not easy to track that all Lex and Create_field attributes get initialized and copied to each other properly.
I suggest to go a little bit further and get rid from Lex->dec, Lex->length and Lex->charset at all, and write the attributes directly to the destination as soon as they get parsed. See a proposed addon patch attached.
I don't like very much all that LEX::last_field thing, but it existed already and I reused it. But I'd rather not introduce more LEX::last_attr or LEX::last_whatever members unless absolutely necessary. I generally prefer to pass data via the parser stack.
If we can move this to the %union in sql_yacc.yy, then it will be even better.
That's what I did in fixing this dyncol bug. As I've found in this cleanup, and as you've seen in review, dynamic columns, CAST, and field declarations all use LEX::length/etc - so using CAST or dyncols in a virtual column will overwrite field's declaration. Recently Kolbe found this as well and reported a bug for 5.5, MDEV-6179. I've fixed it by creating LEX_TYPE structure - I didn't remove length/etc from LEX in 5.5, but if this patch will be merged in 10.1, it could easily allow to remove them from LEX.
So, I would rather postpone any further LEX changes until 10.0 merge. Then I can show you the next cleanup patch. In fact, if I'd have checked Jira bugs before doing this cleanup, I would've started from 5.5 bugfix and then would've done the rest on top of it. But now I'm facing an "interesting" merge ahead :)
Will wait for the next patch. Can we take in account the following thing while working on this: I think that to implement the data type plugin task properly, Create_field should end up in a class hierarchy, similar to what we have in Field. The hierarchy will have one class per group of similar types, grouped by having a particular data type attribute, approximately like this: Create_field |-Create_field_num (for tinyint,smallint,int,bigint) |-Create_field_num_with scale (for float, double, decimal) |-Create_field_str (for char and varchar) |-Create_field_blob |-Create_field_enum |-Create_field_geom |-Create_field_with_scale (for temporal data types) Create_field will have a set of virtual methods: bool Create_field::set_length(uint32 length); bool Create_field::set_precision_and_scale(uint32 precision, uint32 scale); and so on. Create_field will also have: Field *Create_field::make_field(); So all data specific attributes (like intervals for enum, packlen for blob, temporal precision, etc) will be hidden inside a particular Field_xxx and its corresponding Create_field_xxx. Field_xxx and it corresponding Create_field_xxx should share the same structures to store the same attributes. And it would be nice to encapsulate attributes into methods. uint32 field->field_length() const; instead of uint32 field->field_length;
(LEX_TYPE in 5.5 is basically your Create_type_attributes_st without m_length_was_set_explicitly - as it's not needed in 5.5 - and with enum_field_types type).
diff --git a/mysql-test/r/cast.result b/mysql-test/r/cast.result index c81af13..8ad4568 100644 --- a/mysql-test/r/cast.result +++ b/mysql-test/r/cast.result @@ -796,3 +796,32 @@ DATE("foo") NULL Warnings: Warning 1292 Incorrect datetime value: 'foo' +create table t1 (a int, b char(5) as (cast("a" as char(10) binary) + a) ); +show create table t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` int(11) DEFAULT NULL, + `b` char(5) AS (cast("a" as char(10) binary) + a) VIRTUAL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +drop table t1; +select collation(cast("a" as char(10) binary)); +collation(cast("a" as char(10) binary)) +latin1_bin +select collation(cast("a" as char(10) charset utf8 binary)); +collation(cast("a" as char(10) charset utf8 binary)) +utf8_bin +select collation(cast("a" as char(10) ascii binary)); +collation(cast("a" as char(10) ascii binary)) +latin1_bin +select collation(cast("a" as char(10) unicode binary)); +collation(cast("a" as char(10) unicode binary)) +ucs2_bin
UNICODE is an alias for UCS2, which is not always compiled. Please move ucs2 related tests to ctype_ucs.test.
Sure.
<skip> You removed this code in field.cc:
- if (fld_length != NULL) - { - errno= 0; - length= strtoul(fld_length, NULL, 10); - if ((errno != 0) || (length > MAX_FIELD_BLOBLENGTH)) - { - my_error(ER_TOO_BIG_DISPLAYWIDTH, MYF(0), fld_name, MAX_FIELD_BLOBLENGTH); - DBUG_RETURN(TRUE); - } - - if (length == 0) - fld_length= NULL; /* purecov: inspected */ - }
and added this in sql_yacc.yy:
+ if (length) + { + int err; + last_field->length= (ulong)my_strtoll10(length, NULL, &err); + if (err) + last_field->length= ~0UL; // safety + } + else + last_field->length= 0;
I have a feeling that on a 32bit platforms it will stop sending errors on attempt to create a blob larger that 4Gb. last_field->length is ulong, which is 32bit is a 32 bit platform, and it gets assigned to a 64 bit value.
Indeed, good catch. I'll fix it.
Okey to push, with the tests and with strtoul/my_strtoll10 problem fixed.
Regards, Sergei
participants (2)
-
Alexander Barkov
-
Sergei Golubchik