Hello Sergei, Thanks for your review. Please find a new version here: https://github.com/MariaDB/server/commits/bb-10.4-bar-MDEV-27744-v2 I moved pure refacoting into separate patches, they are in the same branch: MDEV-31184 Remove parser tokens DECODE_MARIADB_SYM and DECODE_ORACLE_SYM MDEV-31187 Add class Sql_mode_save_for_frm_handling So now the main patch is much smaller. On 4/27/23 2:44 AM, Sergei Golubchik wrote:
Hi, Alexander,
A question: if the frm will *always* be created in the canonical, non-ORACLE mode and always parsed in the same mode, will we even need these schema-qualified native functions?
okay, the schema-qualified trick will help to avoid creating REGEX_REPLACE_ORACLE and so on. But is it needed to fix MDEV-27744?
Unfortunately I did not check how much of the patch can be removed to fix only MDEV-27744. My intent was to do everything at the same time: 1. Fix MDEV-27744 2. Prepare infrustructure to avoid adding mew XXX_ORACLE functions 3. Make the code symmetric for oracle_schema and mariadb_schema for simplicity of the patch. 4. Make it possible to call MariaDB functions from inside Oracle stored procedures 5. Make SHOW CREATE TABLE and SHOW CREATE VIEW display MariaDB versions of sql_mode-dependent functions with the mariadb_schema prefix if the current sql_mode is ORACLE. I think this is needed to avoid confusion for the user. But after that if we display MariaDB functions using mariadb_schema, we also need to parse mariadb_schema. Now that we have a customer bug "MDEV-29095 REGEXP_REPLACE ...", I think it's better not to minimize the patch to MDEV-27744. This symmetric solution is more reliable for my opinion.
More comments below
On Apr 27, Alexander Barkov wrote:
revision-id: d67c3f88883 (mariadb-10.3.33-61-gd67c3f88883) parent(s): 7355f7b1f5c author: Alexander Barkov committer: Alexander Barkov timestamp: 2022-04-22 15:35:16 +0400 message:
<cut>
diff --git a/sql/unireg.cc b/sql/unireg.cc index efbb5e773b4..3b37944afa5 100644 --- a/sql/unireg.cc +++ b/sql/unireg.cc @@ -652,6 +649,7 @@ static bool pack_expression(String *buf, Virtual_column_info *vcol, static bool pack_vcols(String *buf, List<Create_field> &create_fields, List<Virtual_column_info> *check_constraint_list) { + Sql_mode_save_for_frm_handling sql_mode_save(current_thd);
well, it was done above pack_vcols(), because the caller had access to the thd. Why not to keep it that way?
I added a THD* parameter to pack_vcols. I think when we have a code line this: change_sql_mode(); call_just_one_function(); restore_sql_mode(); then change/restore should go inside call_just_one_function(). This looks more readable for me. But I won't insist. Please suggest.
List_iterator<Create_field> it(create_fields); Create_field *field;
diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc index f41414f8ae9..0f5c7519d07 100644 --- a/sql/item_cmpfunc.cc +++ b/sql/item_cmpfunc.cc @@ -3292,9 +3292,24 @@ void Item_func_case_simple::print(String *str, enum_query_type query_type) }
+Item_func_decode_oracle * +Item_func_decode_oracle::create(THD *thd, const LEX_CSTRING &name, + List<Item> *item_list) +{ + if (unlikely(!item_list || item_list->elements < 3)) + { + wrong_param_count_error(oracle_schema_ref.name(), name); + return NULL; + } + return new (thd->mem_root) Item_func_decode_oracle(thd, *item_list); +}
why? items are normally created in the Create_xxx::create* methods, defined in item_create.cc. Why did you delegate that to static methods in Item_func_xxx ?
I thought it would be more future proof. It would allow to use very short templates for Create_func_xxx by passing Item_func_xxx as a template parameter. But you're right, this is a separate issue. I removed these Item_func_xxx::create() helper methods for now.
diff --git a/sql/mysqld.h b/sql/mysqld.h index d40e1d170d0..54069c3cd5a 100644 --- a/sql/mysqld.h +++ b/sql/mysqld.h @@ -741,12 +741,25 @@ enum enum_query_type QT_ITEM_SUBSELECT_ID_ONLY,
QT_SHOW_SELECT_NUMBER= (1<<10), + + //QT_ITEM_IDENT_DISABLE_DB_TABLE_NAMES= (1 <<11), -- this is taken in 10.5 + + /// Print sql_mode-dependent functions with the schema qualifier + /// even if the currently implied (by sql_mode) schema is equal to + /// to the function schema, e.g. mariadb_schema.concat('a'). + QT_ITEM_FUNC_FORCE_SCHEMA_NAME= (1 << 12), + + /// Print for FRM file. Focus on parse-back. + /// e.g. VIEW expressions and virtual column expressions + QT_FOR_FRM= (1 << 13), + /// This is used for EXPLAIN EXTENDED extra warnings / Be more detailed /// Be more detailed than QT_EXPLAIN. /// Perhaps we should eventually include QT_ITEM_IDENT_SKIP_CURRENT_DATABASE /// here, as it would give better readable results QT_EXPLAIN_EXTENDED= QT_TO_SYSTEM_CHARSET| - QT_SHOW_SELECT_NUMBER, + QT_SHOW_SELECT_NUMBER| + QT_ITEM_FUNC_FORCE_SCHEMA_NAME,
don't do this. let's try to keep the behavior *exactly as before* for people who don't switch between sql_modes all the time
on the second thought. why do you need QT_ITEM_FUNC_FORCE_SCHEMA_NAME at all? Besides EXPLAIN EXTENDED (where we'd better not use it), it's only used in generating view's sql for I_S. It's not supposed to be used for generating views, so why would it need QT_ITEM_FUNC_FORCE_SCHEMA_NAME? May be it can be deleted altogether?
Fixed. I removed QT_ITEM_FUNC_FORCE_SCHEMA_NAME. Now the patch is only adding QT_FOR_FRM.
// If an expression is constant, print the expression, not the value // it evaluates to. Should be used for error messages, so that they diff --git a/sql/sql_class.h b/sql/sql_class.h index 8268ebe0bcd..6b988938f0b 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -6589,6 +6589,50 @@ class Sql_mode_save sql_mode_t old_mode; // SQL mode saved at construction time. };
+ +/* + Save the current sql_mode. Switch off sql_mode flags which can prevent + normal parsing of VIEWs, expressions in generated columns. + Restore the old sql_mode on destructor. +*/ +class Sql_mode_save_for_frm_handling: public Sql_mode_save +{ +public: + Sql_mode_save_for_frm_handling(THD *thd) + :Sql_mode_save(thd) + { + /* + - MODE_REAL_AS_FLOAT affect only CREATE TABLE parsing + + MODE_PIPES_AS_CONCAT affect expression parsing + + MODE_ANSI_QUOTES affect expression parsing + + MODE_IGNORE_SPACE affect expression parsing + - MODE_IGNORE_BAD_TABLE_OPTIONS affect only CREATE/ALTER TABLE parsing + * MODE_ONLY_FULL_GROUP_BY affect execution + * MODE_NO_UNSIGNED_SUBTRACTION affect execution + - MODE_NO_DIR_IN_CREATE affect table creation only + - MODE_POSTGRESQL compounded from other modes + - MODE_ORACLE affects Item creation (e.g for CONCAT)
should be '+' for MODE_ORACLE, not '-', right?
Fixed.
diff --git a/sql/sql_schema.h b/sql/sql_schema.h index 7c8f284d526..36914e36990 100644 --- a/sql/sql_schema.h +++ b/sql/sql_schema.h @@ -33,6 +36,34 @@ class Schema { return src; } + + + /** + Find the native function builder associated with a given function name. + @param thd The current thread + @param name The native function name + @return The native function builder associated with the name, or NULL + */ + virtual Create_func *find_native_function_builder(THD *thd, + const LEX_CSTRING &name) + const; + /** + Find a native function builder, return an error if not found, + build an Item otherwise. + */ + Item *make_item_func_call_native(THD *thd, + const Lex_ident_sys &name, + List<Item> *args) const; + + // Builders for native SQL function with a special syntax in sql_yacc.yy + virtual Item *make_item_func_substr(THD *thd, + const Lex_substring_spec_st &spec) const; + + virtual Item *make_item_func_trim(THD *thd, const Lex_trim_st &spec) const; + virtual Item *make_item_func_replace(THD *thd, + Item *subj, + Item *find, + Item *replace) const;
Why no Lex_replace_spec_st ?
The intent for adding Lex_substring_spec_st and Lex_trim_spec_st was to move the grammar into separate rules: substring_operands: trim_operands: REPLACE operands on the other hand have a very simple notation: | REPLACE '(' expr ',' expr ',' expr ')' I didn't find useful to add Lex_replace_spec_st.
/* For now we have *hard-coded* compatibility schemas: schema_mariadb, schema_oracle, schema_maxdb. @@ -66,5 +97,6 @@ class Schema
extern Schema mariadb_schema; +extern const Schema &oracle_schema_ref;
??? why did you declare mariadb_schema extern, but oracle_schema_ref only via a const ref?
The internal structure of class Schema_oracle is visible only inside sql_schema.cc. I did not want to expose it to sql_schema.h. It looks like compatibility modes like ORACLE can in the future very easily go into a new type of plugins - compatibility plugins, and oracle_schema_ref plays a role of the entry point of such a plugin. So I made it this way intentionally, to avoid Schema_oracle specific methods or members to be visible outside.
#endif // SQL_SCHEMA_H_INCLUDED diff --git a/sql/item_create.h b/sql/item_create.h index 894e9777b8d..3fadaecb090 100644 --- a/sql/item_create.h +++ b/sql/item_create.h @@ -215,9 +205,48 @@ struct Native_func_registry Create_func *builder; };
+ +class Native_functions_hash: public HASH +{ +public: + Native_functions_hash() + { + bzero(this, sizeof(*this)); + } + ~Native_functions_hash() + { + /* + No automatic free. + The the upper level code should call cleanup() explicitly.
why is that?
As discussed on slack - instances of this class are global variables and they destruct at the server shutdown time only. At this point mysys can already be deinitialized.
Item *create_func_dyncol_create(THD *thd, List<DYNCALL_CREATE_DEF> &list); diff --git a/sql/sql_schema.cc b/sql/sql_schema.cc index 0bf4a63c2f8..e2993f51c6d 100644 --- a/sql/sql_schema.cc +++ b/sql/sql_schema.cc @@ -78,3 +89,86 @@ Schema *Schema::find_implied(THD *thd) return &maxdb_schema; return &mariadb_schema; } + + +Create_func * +Schema::find_native_function_builder(THD *thd, const LEX_CSTRING &name) const +{ + return native_functions_hash.find(thd, name); +} + + +Create_func * +Schema_oracle::find_native_function_builder(THD *thd, + const LEX_CSTRING &name) const +{ + Create_func *create= native_functions_hash_oracle_overrides.find(thd, name); + if (create) + return create; + return native_functions_hash.find(thd, name);
may be populate native_functions_hash_oracle will all names to do only one lookup here? this will be used often in oracle mode
Done.
+} + + +Item *Schema::make_item_func_call_native(THD *thd, + const Lex_ident_sys &name, + List<Item> *args) const +{ + Create_func *builder= find_native_function_builder(thd, name); + if (builder) + return builder->create_func(thd, &name, args); + my_error(ER_FUNCTION_NOT_DEFINED, MYF(0), name.str); + return NULL; +} + + +Item *Schema::make_item_func_trim(THD *thd, const Lex_trim_st &spec) const +{ + return spec.make_item_func_trim_std(thd);
why does trim is constructed via helper in spec, but substr and replace are not?
SUBSTR and TRIM are constructed using helpers: virtual Item *make_item_func_substr(THD *thd, const Lex_substring_spec_st &spec) const; virtual Item *make_item_func_trim(THD *thd, const Lex_trim_st &spec) const; REPLACE is not - because it does not have a helper (explained above). <cut>
diff --git a/sql/item_func.h b/sql/item_func.h index 754b1cd1eb2..52bf7f661d1 100644 --- a/sql/item_func.h +++ b/sql/item_func.h
+ void print_sql_mode_dependent_name(String *to, enum_query_type query_type, + const Schema &schema, + const char *function_name) const + { + static const LEX_CSTRING oracle= {STRING_WITH_LEN("oracle")};
can it be oracle_schema.name() instead?
Unfortunately, it can not. oracle_schema.name() is "oracle_schema", it's not "oracle". Btw, this code does not exists any more. I changed a few Item_func_xxx_oracle classes to do it like this: void print(String *str, enum_query_type query_type) { if (query_type & QT_FOR_FRM) str->append(STRING_WITH_LEN("lpad_oracle")); else print_sql_mode_qualified_name(str, query_type); print_args_parenthesized(str, query_type); } Oh, by the way, it's better to add a comment here if (query_type & QT_FOR_FRM) { // 10.3 FRM downgrade compatibility str->append(STRING_WITH_LEN("lpad_oracle")); } I find this way more clear to follow. This way we can see which functions support backward FRM compatibility, and which functions do not. Item_func_regexp_replace_oracle will not support. But there is now a similar code in here: void Item_func_trim::print(String *str, enum_query_type query_type) { LEX_CSTRING suffix= {STRING_WITH_LEN("_oracle")}; Now it handles both sql_mode=DEFAULT and sql_mode=ORACLE. Should I split out Oracle-specific code into a new overriding methods Item_func_trim_oracle::print()? So it will pass "trim_oracle" as a parameter to a shared function, while Item_func_trim::print() won't have any references to "oracle" in the body.
diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 3770488f3f4..e0bafc75a36 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -11235,6 +11236,34 @@ function_call_generic: if (unlikely(!($$= Lex->make_item_func_call_generic(thd, &$1, &$3, &$5, $7)))) MYSQL_YYABORT; } + | ident_cli '.' REPLACE '(' expr ',' expr ',' expr ')' + { + if (unlikely(!($$= Lex->make_item_func_replace(thd, $1, $3, + $5, $7, $9)))) + MYSQL_YYABORT; + } + | ident_cli '.' SUBSTRING '(' substring_operands ')' + { + if (unlikely(!($$= Lex->make_item_func_substr(thd, $1, $3, $5)))) + MYSQL_YYABORT; + } + | ident_cli '.' TRIM '(' trim_operands ')' + { + if (unlikely(!($$= $5.make_item_func_trim(thd, $1, $3)))) + MYSQL_YYABORT; + } + /* + We don't add a qualified syntax for TRIM_ORACLE here, + as this syntax is not absolutely required: + SELECT mariadb_schema.TRIM_ORACLE(..); + What absolutely required is only: + SELECT mariadb_schema.TRIM(..); + Adding a qualified syntax for TRIM_ORACLE would be tricky because + it is a non-reserved keyword. To avoid new shift/reduce conflicts + it would require grammar changes, like introducing a new rule + ident_step2_cli (which would include everything that ident_cli + includes but TRIM_ORACLE).
Did you add a qualified syntax for any other xxx_ORACLE functions?
Functions defined through "const Native_func_registry func_array[]" support qualifiers automatically. All this work: select mariadb_schema.decode_oracle(1,1,'11'); select mariadb_schema.substr_oracle('abcde',2,3); select mariadb_schema.replace_oracle('abc','a','b'); select oracle_schema.decode_oracle(1,1,'11'); select oracle_schema.substr_oracle('abcde',2,3); select oracle_schema.replace_oracle('abc','a','b'); TRIM_ORACLE is different - it's a keyword defined in lex.h. It does not support qualifiers automatically. And I don't think it needs to.
+ */ ;
fulltext_options: diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index 7f1b362d91d..6b0a351f76e 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -2008,7 +2023,59 @@ bool Lex_input_stream::get_7bit_or_8bit_ident(THD *thd, uchar *last_char) }
+/* + Resolve special SQL functions that have a qualified syntax in sql_yacc.yy. + These functions are not listed in the native function registry + because of a special syntax, or a reserved keyword: + + mariadb_schema.SUBSTRING('a' FROM 1 FOR 2) -- Special syntax + mariadb_schema.TRIM(BOTH ' ' FROM 'a') -- Special syntax + mariadb_schema.REPLACE('a','b','c') -- Verb keyword +*/ + +int Lex_input_stream::find_keyword_qualified_special_func(Lex_ident_cli_st *str, + uint length) const +{ + /* + There are many other special functions, see the following grammar rules: + function_call_keyword + function_call_nonkeyword + Here we resolve only those that have a qualified syntax to handle + different behavior in different @@sql_mode settings. + + Other special functions do not work in qualified context: + SELECT mariadb_schema.year(now()); -- Function year is not defined + SELECT mariadb_schema.now(); -- Function now is not defined + + We don't resolve TRIM_ORACLE here, because it does not have + a qualified syntax yet. Search for "trim_operands" in sql_yacc.yy + to find more comments. + */ + static LEX_CSTRING funcs[]= + { + {STRING_WITH_LEN("SUBSTRING")}, + {STRING_WITH_LEN("SUBSTR")}, + {STRING_WITH_LEN("TRIM")}, + {STRING_WITH_LEN("REPLACE")} + }; + + int tokval= find_keyword(str, length, true); + if (!tokval) + return 0; + for (size_t i= 0; i < array_elements(funcs); i++) + { + CHARSET_INFO *cs= system_charset_info; + if (!cs->coll->strnncollsp(cs,
may be strcasecmp? or with cs=latin1? you know the repertoire is ascii, so you can cut corners here.
+ (const uchar *) m_tok_start, length, + (const uchar *) funcs[i].str, funcs[i].length)) + return tokval; + }
Fixed to additionally check length equality to avoid non-ASCII letters matching. Note, strcasecmp() cannot be used here because one side it not a 0-terminated string.
+ return 0; +} + + -int Lex_input_stream::scan_ident_sysvar(THD *thd, Lex_ident_cli_st *str) +int Lex_input_stream::scan_ident_common(THD *thd, Lex_ident_cli_st *str, + Ident_mode mode) { uchar last_char; uint length; @@ -8035,30 +8133,49 @@ bool LEX::add_grant_command(THD *thd, enum_sql_command sql_command_arg, }
+const Schema * +LEX::find_func_schema_by_name_or_error(const Lex_ident_sys &schema, + const Lex_ident_sys &func) +{ + Schema *res= Schema::find_by_name(schema); + if (res) + return res; + Database_qualified_name qname(schema, func); + my_error(ER_FUNCTION_NOT_DEFINED, MYF(0), ErrConvDQName(&qname).ptr());
eh, not quite. I can define a function with the same name as a native function in any db. E.g.
create test.replace(a int) returns int
and it'll be callable as test.replace(1) So ER_FUNCTION_NOT_DEFINED is wrong here.
But note that even though test.substr() is possible, it should not allow substr() special syntax.
Good catch, I overlooked this. From a glance, a workaround could be to use test.`substr`(). But it's better to provide backward compatibility. I don't have a quick idea how to fix this properly. If you have, please let me know. <cut>