Hi Sergei, Thanks for your review. I'm sending a new version. I changed DATE_FORMAT() to determine its result character set and collation from args[1] (the "format" argument), instead of using collation_connection. I don't see anything special with DATE_FORMAT and think that we just forgot to fix it under terms of: WL#2649: Number-to-string conversions https://dev.mysql.com/worklog/task/?id=2649 The difference is very subtle and does not affect the most regular case: DATE_FORMAT(datetime_expr, 'format-string') This change allowed to reuse agg_arg_charsets_for_string_result(), which is what the other string functions with string input do in fix_length_and_dec(). It automatically makes DATE_TIME() work in a consistent way when the "format" argument is a numeric or a datetime expression. Please also see inline: On 12/09/2015 02:36 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Dec 08, Alexander Barkov wrote:
Hi Sergei,
Please review a patch for MDEV-7055.
Thanks.
diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc index 522004e..a2a6fff 100644 --- a/sql/item_timefunc.cc +++ b/sql/item_timefunc.cc @@ -447,6 +447,70 @@ static bool extract_date_time(DATE_TIME_FORMAT *format,
/** + A multi-byte safe helper class to read characters from a string. + + QQ: Serg: which file to put this new class in? + It can be helpful for some other purposes + (not only here in item_timefunc.cc) + I remember you don't like such things in sql_string.h :)
Thoughts: 1. I'd call it, like, string (or character) iterator 2. There's something like that in 5.6, you might want to see what API they use 3. Should rather be in C (inline, if possible, with a C++ wrapper, if you'd like) so that we can put it in a service along with CHARSET_INFO 4. Should rather be in CHARSET_INFO so that you could have fast bytewise iterator for simple charsets.
And yes, 3 and 4 contradict each other, to a certain extent :) That's because they're thoughts, not a complete how-to plan.
Discussed on IRC. <skip>
@@ -457,21 +521,29 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time, uint hours_i; uint weekday; ulong length; - const char *ptr, *end; + my_wc_t wc; + int chlen; + Wchar_reader reader(str->charset(), format->format.str, + format->format.length);
is str->charset() the character set of the format string?
It was not necessarily the case in the first version, with collation_connection. In the second version, with determining the result collation from args[1], it is.
- end= (ptr= format->format.str) + format->format.length; - for (; ptr != end ; ptr++) + for ( ; !reader.read(&wc, &chlen) ; )
You don't distinguish here between end of string and invalid character. Is that ok?
As agreed on IRC, I now changed it to convert bad bytes to question marks, to be inline with MDEV-6566 (which we implemented in 10.0). Note, I did not add warnings on bad bytes to avoid special code in DATE_FORMAT. Other functions, e.g. CONCAT('bad-byte-sequence') also do not warn. My patch for "MDEV-6643 Improve performance of string processing in the parser" will automatically add warnings into DATE_FORMAT(), together with all other functions.
diff --git a/sql/sql_string.cc b/sql/sql_string.cc index 885f53a..21fba79 100644 --- a/sql/sql_string.cc +++ b/sql/sql_string.cc @@ -547,8 +556,28 @@ bool String::append_with_prefill(const char *s,uint32 arg_length, t_length= full_length - arg_length; if (t_length > 0) { - bfill(Ptr+str_length, t_length, fill_char); - str_length=str_length + t_length; + if (charset()->mbminlen == 1) + { + /* + An ASCII string can be appended directly + to an ASCII-compatible string. This includes + multi-byte character sets, like utf8, sjis, etc. + */
but (mbminlen == 1) doesn't necessarily mean "ASCII-compatible", remember "filename" charset? I thought you've had an "ASCII-compatible" property somewhere in the CHARSET_INFO.
In this particular case it was not important, because the argument string can only consists of digits. But to be on the safe side I changed it to test the MY_CS_NONASCII flag instead. Thanks! <skip>