Re: [Maria-developers] 2467eb2d314: MDEV-27743 Remove Lex::charset
Hi, Alexander, I had only a couple of questions/comments, see below: On Mar 11, Alexander Barkov wrote:
revision-id: 2467eb2d314 (mariadb-10.6.1-330-g2467eb2d314) parent(s): 61b72ed3dde author: Alexander Barkov committer: Alexander Barkov timestamp: 2022-02-10 10:27:55 +0400 message:
MDEV-27743 Remove Lex::charset
This patch also fixes: MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition
how do you plan to fix it in 10.2-10.8 ?
diff --git a/mysql-test/main/ctype_collate_column.result b/mysql-test/main/ctype_collate_column.result new file mode 100644 index 00000000000..b20be5d3a48 --- /dev/null +++ b/mysql-test/main/ctype_collate_column.result @@ -0,0 +1,450 @@ +# +# Start of 10.9 tests +# +# +# MDEV-27743 Remove Lex::charset
1. Jira doesn't explain what user-visible changes one can expect. If there are none, it's not clear why you needed this big test case 2. the test is totally unreadable. Could you do a normal test instead? Like, put `select` instead of `execute immediate` and convert the output to a .test file?
+# +CREATE TABLE cs (cs VARCHAR(64) NOT NULL); +INSERT INTO cs (cs) VALUES +(''), +('CHARACTER SET latin1'), +('CHARACTER SET utf8mb4'); +CREATE TABLE cl0 (cl0 VARCHAR(64) NOT NULL); +INSERT INTO cl0 (cl0) VALUES +(''), +('BINARY'), +('COLLATE DEFAULT'), +('COLLATE latin1_bin'), +('COLLATE latin1_swedish_ci'), +('COLLATE utf8mb4_bin'), +('COLLATE utf8mb4_general_ci'); +CREATE TABLE cl1 (cl1 VARCHAR(64) NOT NULL); +INSERT INTO cl1 (cl1) VALUES +(''), +('COLLATE DEFAULT'), +('COLLATE latin1_bin'), +('COLLATE latin1_swedish_ci'), +('COLLATE utf8mb4_bin'), +('COLLATE utf8mb4_general_ci'); ... diff --git a/mysql-test/main/ctype_latin1.result b/mysql-test/main/ctype_latin1.result index 0b16d1cf6f2..70577f2c22c 100644 --- a/mysql-test/main/ctype_latin1.result +++ b/mysql-test/main/ctype_latin1.result @@ -8889,6 +8889,38 @@ a b 111 111 DROP TABLE t1; # +# MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition +# +CREATE TABLE t1 (a CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT); +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` char(10) DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1; +SELECT CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT); +CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT) +a +CREATE TABLE t1 AS +SELECT CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `c1` varchar(10) DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1; +SELECT COLUMN_GET(COLUMN_CREATE(0, 'string'),0 AS CHAR CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +c1 +string +CREATE TABLE t1 AS +SELECT COLUMN_GET(COLUMN_CREATE(0, 'string'),0 AS CHAR CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `c1` longtext DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1;
1. why is it in "10.2 tests" ? 2. shouldn't here be at least one test that COLLATE DEFAULT is not a no-op? Like CREATE TABLE (... COLLATE DEFAULT ...) COLLATE _non_default ? Or is it in your generated ctype_collate_column.test?
+# # End of 10.2 tests # # diff --git a/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test b/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test index 9cf0d7b4aff..dac6eab21e9 100644 --- a/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test +++ b/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test @@ -36,7 +36,7 @@ set default role test_role for root@localhost; drop role test_role; drop user test_user@localhost;
-alter table user add column default_role char(80) binary default '' not null +alter table user add column default_role char(80) default '' not null COLLATE utf8_general_ci
Heh, interesting that it worked before good catch
after is_role; alter table user add max_statement_time decimal(12,6) default 0 not null diff --git a/sql/structs.h b/sql/structs.h index 9a5006e8b47..db065c49931 100644 --- a/sql/structs.h +++ b/sql/structs.h @@ -599,11 +599,140 @@ class DDL_options: public DDL_options_st ... +class Lex_collation: public Lex_collation_st +{ +public: + Lex_collation(CHARSET_INFO *collation, Type type) + { + m_collation= collation; + m_type= type; + } + static Lex_collation national(bool bin_mod) + { + if (bin_mod) + return Lex_collation(&my_charset_utf8mb3_bin, TYPE_EXPLICIT); + return Lex_collation(&my_charset_utf8mb3_general_ci, TYPE_IMPLICIT);
utf8mb3? not utf8?
+ } +}; + + struct Lex_length_and_dec_st { -private: +protected: uint32 m_length; uint8 m_dec; + uint8 m_collation_type:2; bool m_has_explicit_length:1; bool m_has_explicit_dec:1; public: diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index ae6ae9a0cc0..6ca10267187 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -11816,3 +11784,257 @@ bool SELECT_LEX_UNIT::explainable() const ... +CHARSET_INFO * +Lex_collation_st::resolved_to_character_set(CHARSET_INFO *def) const +{ + DBUG_ASSERT(def); + if (m_type != TYPE_CONTEXTUALLY_TYPED) + { + if (!m_collation) + return def; // Empty - not typed at all + return m_collation; // Explicitly typed + } + + // Contextually typed + DBUG_ASSERT(m_collation); + + if (m_collation == &my_charset_bin) // CHAR(10) BINARY + return find_bin_collation(def); + + if (m_collation == &my_charset_latin1) // CHAR(10) COLLATE DEFAULT + return find_default_collation(def);
Uhm. Could you create two dummy CHARSET_INFO variables for that instead? Like CHARSET_INFO binary_collaton, default_collation; no need to initialize them, but to have distinct values for binary/default and not reuse my_charset_bin and my_charset_latin1.
+ + /* + Non-binary and non-default contextually typed collation. + We don't have such yet - the parser cannot produce this. + But will have soon, e.g. "uca1400_as_ci". + */ + DBUG_ASSERT(0); + return NULL; +} + + ... diff --git a/sql/field.h b/sql/field.h index 2ed02b37cfd..fe5ba435202 100644 --- a/sql/field.h +++ b/sql/field.h @@ -5493,6 +5492,22 @@ class Column_definition: public Sql_alloc, { return compression_method_ptr; }
bool check_vcol_for_key(THD *thd) const; + + void set_lex_collation(const Lex_collation_st &lc) + { + charset= lc.collation(); + if (lc.is_contextually_typed_collation()) + flags|= BINCMP_FLAG; + else + flags&= ~BINCMP_FLAG;
Isn't COLLATE DEFAULT also contextually typed?
+ } + Lex_collation lex_collation() const + { + return Lex_collation(charset, + flags & BINCMP_FLAG ? + Lex_collation_st::TYPE_CONTEXTUALLY_TYPED : + Lex_collation_st::TYPE_IMPLICIT); + } };
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi Sergei, Thanks for your review. Please see comments below: On 3/14/22 2:50 AM, Sergei Golubchik wrote:
Hi, Alexander,
I had only a couple of questions/comments, see below:
On Mar 11, Alexander Barkov wrote:
revision-id: 2467eb2d314 (mariadb-10.6.1-330-g2467eb2d314) parent(s): 61b72ed3dde author: Alexander Barkov committer: Alexander Barkov timestamp: 2022-02-10 10:27:55 +0400 message:
MDEV-27743 Remove Lex::charset
This patch also fixes: MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition
how do you plan to fix it in 10.2-10.8 ?
I will push a separate simplified patch for 10.2-10.8 fixing this crash.
diff --git a/mysql-test/main/ctype_collate_column.result b/mysql-test/main/ctype_collate_column.result new file mode 100644 index 00000000000..b20be5d3a48 --- /dev/null +++ b/mysql-test/main/ctype_collate_column.result @@ -0,0 +1,450 @@ +# +# Start of 10.9 tests +# +# +# MDEV-27743 Remove Lex::charset
1. Jira doesn't explain what user-visible changes one can expect. If there are none, it's not clear why you needed this big test case
I have reported a few other problems that get fixed by MDEV-27743: MDEV-27853 MDEV-27782 MDEV-28067 The big test covers all these cases. Also, I'm now going to add scripts from these three MDEVs as well.
2. the test is totally unreadable. Could you do a normal test instead? Like, put `select` instead of `execute immediate` and convert the output to a .test file?
Will try.
# +# MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition +# +CREATE TABLE t1 (a CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT); +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `a` char(10) DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1; +SELECT CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT); +CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT) +a +CREATE TABLE t1 AS +SELECT CAST('a' AS CHAR(10) CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `c1` varchar(10) DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1; +SELECT COLUMN_GET(COLUMN_CREATE(0, 'string'),0 AS CHAR CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +c1 +string +CREATE TABLE t1 AS +SELECT COLUMN_GET(COLUMN_CREATE(0, 'string'),0 AS CHAR CHARACTER SET latin1 COLLATE DEFAULT) AS c1; +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `c1` longtext DEFAULT NULL +) ENGINE=MyISAM DEFAULT CHARSET=latin1 +DROP TABLE t1;
1. why is it in "10.2 tests" ?
The crash happens in all version. I'm going to push a simple 10.2 specific fix later, so I put the test to the proper place right now. Note, unlike MDEV-27690, I'm not going to fix these problems before 10.9: MDEV-27853 MDEV-27782 MDEV-28067 So tests for these MDEVs will go to the "10.9 tests" section.
2. shouldn't here be at least one test that COLLATE DEFAULT is not a no-op? Like CREATE TABLE (... COLLATE DEFAULT ...) COLLATE _non_default ? Or is it in your generated ctype_collate_column.test?
Yes, this combination is covered by ctype_collate_column.test
+# # End of 10.2 tests # # diff --git a/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test b/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test index 9cf0d7b4aff..dac6eab21e9 100644 --- a/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test +++ b/mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test @@ -36,7 +36,7 @@ set default role test_role for root@localhost; drop role test_role; drop user test_user@localhost;
-alter table user add column default_role char(80) binary default '' not null +alter table user add column default_role char(80) default '' not null COLLATE utf8_general_ci
Heh, interesting that it worked before good catch
This combination is also covered by ctype_collate_column.test
after is_role; alter table user add max_statement_time decimal(12,6) default 0 not null diff --git a/sql/structs.h b/sql/structs.h index 9a5006e8b47..db065c49931 100644 --- a/sql/structs.h +++ b/sql/structs.h @@ -599,11 +599,140 @@ class DDL_options: public DDL_options_st ... +class Lex_collation: public Lex_collation_st +{ +public: + Lex_collation(CHARSET_INFO *collation, Type type) + { + m_collation= collation; + m_type= type; + } + static Lex_collation national(bool bin_mod) + { + if (bin_mod) + return Lex_collation(&my_charset_utf8mb3_bin, TYPE_EXPLICIT); + return Lex_collation(&my_charset_utf8mb3_general_ci, TYPE_IMPLICIT);
utf8mb3? not utf8?
Right, we need to change it eventually. - either to hard-coded utf8mb4 - or according to what UTF8_IS_UTF8MB3 says I've created a new MDEV for this: MDEV-28068 Change NATIONAL from utf8mb3 to utf8mb4
+ } +}; + + struct Lex_length_and_dec_st { -private: +protected: uint32 m_length; uint8 m_dec; + uint8 m_collation_type:2; bool m_has_explicit_length:1; bool m_has_explicit_dec:1; public: diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index ae6ae9a0cc0..6ca10267187 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -11816,3 +11784,257 @@ bool SELECT_LEX_UNIT::explainable() const ... +CHARSET_INFO * +Lex_collation_st::resolved_to_character_set(CHARSET_INFO *def) const +{ + DBUG_ASSERT(def); + if (m_type != TYPE_CONTEXTUALLY_TYPED) + { + if (!m_collation) + return def; // Empty - not typed at all + return m_collation; // Explicitly typed + } + + // Contextually typed + DBUG_ASSERT(m_collation); + + if (m_collation == &my_charset_bin) // CHAR(10) BINARY + return find_bin_collation(def); + + if (m_collation == &my_charset_latin1) // CHAR(10) COLLATE DEFAULT + return find_default_collation(def);
Uhm. Could you create two dummy CHARSET_INFO variables for that instead? Like
CHARSET_INFO binary_collaton, default_collation;
no need to initialize them, but to have distinct values for binary/default and not reuse my_charset_bin and my_charset_latin1.
Sounds good. It'll have for sure an advantage: easier to trace in gdb.
+ + /* + Non-binary and non-default contextually typed collation. + We don't have such yet - the parser cannot produce this. + But will have soon, e.g. "uca1400_as_ci". + */ + DBUG_ASSERT(0); + return NULL; +} + + ... diff --git a/sql/field.h b/sql/field.h index 2ed02b37cfd..fe5ba435202 100644 --- a/sql/field.h +++ b/sql/field.h @@ -5493,6 +5492,22 @@ class Column_definition: public Sql_alloc, { return compression_method_ptr; }
bool check_vcol_for_key(THD *thd) const; + + void set_lex_collation(const Lex_collation_st &lc) + { + charset= lc.collation(); + if (lc.is_contextually_typed_collation()) + flags|= BINCMP_FLAG; + else + flags&= ~BINCMP_FLAG;
Isn't COLLATE DEFAULT also contextually typed?
"COLLATE DEFAULT" and "COLLATE uca1400_as_ci" also reuse this flag. It should be renamed somehow. What about CONTEXTUAL_COLLATION_FLAG ? Note, it's defined in two places: In the server: include/mysql_com.h #define BINCMP_FLAG 131072U /* Intern: Used by sql_yacc */ In the client library: libmariadb/include/mariadb_com.h #define BINCMP_FLAG 131072 The client library defines, but does not actually use it. Should we rename it in the client library? Or remove it?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi, Alexander, I'd suggest to rename it the way you want in the server, but keep it in libmariadb as is. It's apparently part of the API and may be (just may be) some clients use it. On Mar 15, Alexander Barkov wrote:
+ + void set_lex_collation(const Lex_collation_st &lc) + { + charset= lc.collation(); + if (lc.is_contextually_typed_collation()) + flags|= BINCMP_FLAG; + else + flags&= ~BINCMP_FLAG;
Isn't COLLATE DEFAULT also contextually typed?
"COLLATE DEFAULT" and "COLLATE uca1400_as_ci" also reuse this flag. It should be renamed somehow.
What about CONTEXTUAL_COLLATION_FLAG ?
Note, it's defined in two places:
In the server:
include/mysql_com.h #define BINCMP_FLAG 131072U /* Intern: Used by sql_yacc */
In the client library:
libmariadb/include/mariadb_com.h #define BINCMP_FLAG 131072
The client library defines, but does not actually use it. Should we rename it in the client library? Or remove it?
Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
participants (2)
-
Alexander Barkov
-
Sergei Golubchik