[Maria-developers] String functions have their item->maybe_null=true unnecessarily, blocking optimizations
Hi Alexander, I was looking at https://jira.mariadb.org/browse/MDEV-11893 and found this piece of code: bool Item_str_func::fix_fields(THD *thd, Item **ref) { bool res= Item_func::fix_fields(thd, ref); /* In Item_str_func::check_well_formed_result() we may set null_value flag on the same condition as in test() below. */ maybe_null= maybe_null || thd->is_strict_mode(); return res; } This has been introduced by this cset: https://github.com/MariaDB/server/commit/af22eb35e577ef17226faf662f2cffc4705... Which says: Add Item_str_func::fix_fields() implementation, and set maybe_null to TRUE if we are in the SQL mode that requires some functions to return null even if they normally do not. The patch has only one example of CHAR() function doing that. At the same time, not having NOT-NULL attribute disallows query opimizations (see MDEV-11893 for an example). I think we should have this maybe_null= maybe_null || thd->is_strict_mode(); logic on a case-by-case basis only. I wanted to check with you - are there any other known string functions that, as the patch puts it "return null even if they normally do not"? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Hello Sergey, On 05/15/2017 04:45 PM, Sergey Petrunia wrote:
Hi Alexander,
I was looking at
https://jira.mariadb.org/browse/MDEV-11893
and found this piece of code:
bool Item_str_func::fix_fields(THD *thd, Item **ref) { bool res= Item_func::fix_fields(thd, ref); /* In Item_str_func::check_well_formed_result() we may set null_value flag on the same condition as in test() below. */ maybe_null= maybe_null || thd->is_strict_mode(); return res; }
This has been introduced by this cset:
https://github.com/MariaDB/server/commit/af22eb35e577ef17226faf662f2cffc4705...
Which says:
Add Item_str_func::fix_fields() implementation, and set maybe_null to TRUE if we are in the SQL mode that requires some functions to return null even if they normally do not.
The patch has only one example of CHAR() function doing that.
At the same time, not having NOT-NULL attribute disallows query opimizations (see MDEV-11893 for an example). I think we should have this
maybe_null= maybe_null || thd->is_strict_mode();
logic on a case-by-case basis only. I wanted to check with you - are there any other known string functions that, as the patch puts it "return null even if they normally do not"?
The idea behind this code is to return: - A well-formed result, or NULL in strict mode. - A well-formed result, or a fixed result in non-strict mode. If val_str() sees that some bad byte sequences were fixed to question marks, then it returns NULL in strict mode, or the fixed string in non-strict mode. There are more functions except CHAR() that can return NULL or not-NULL depending on strict mode. The most important is Item_func_conv_charset. It's val_str() uses String_copier_for_item::copy_with_warn(), whose result depends on strict mode. The important thing about Item_func_conv_charset is that it can be automatically generated during fix_fields() whenever a character set conversion is needed. See Item::safe_charset_converter(). So it can appear even if it did not exists in the query typed by the user. For example: CREATE OR REPLACE TABLE t1 (a VARBINARY(1)); INSERT INTO t1 VALUES (0xFF); SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, a) FROM t1; The SELECT query is automatically replaced to: SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, CONVERT(a USING utf8)) FROM t1; For every Item_func_xxx it is possible to track down recursively during fix_fields() if the result is going to be well-formed for all possible argument values. In many cases it will also depend on the current @@character_set_connection value. I'm not sure how much efforts a precise solution will need. But in the example in MDEV-11893, it's clear that the result is going to be well-formed, because: - A datetime field converted to string always returns a result with ASCII repertoire, which is well-formed. - A string literal consisting of ASCII characters is also well-formed - A concatenation of two ASCII-repertoire results is always well-formed, even if character set conversion is needed. We could start with checking repertoire. If it's MY_REPERTOIRE_ASCII, then the result should be well-formed. It will at least cover the cases like MDEV-11893.
BR Sergei
participants (2)
-
Alexander Barkov
-
Sergey Petrunia