Re: [Maria-developers] Please review MDEV-7055 MySQL#74664 - InnoDB: Failing assertion ...

11 Dec 2015

Hi Sergei,

Thanks for your review. I'm sending a new version.

I changed DATE_FORMAT() to determine its result character set and 
collation from args[1] (the "format" argument), instead of using 
collation_connection.

I don't see anything special with DATE_FORMAT and think that we just
forgot to fix it under terms of:
WL#2649: Number-to-string conversions
https://dev.mysql.com/worklog/task/?id=2649

The difference is very subtle and does not affect the most regular case:
DATE_FORMAT(datetime_expr, 'format-string')

This change allowed to reuse agg_arg_charsets_for_string_result(),
which is what the other string functions with string input do in
fix_length_and_dec(). It automatically makes DATE_TIME() work in a
consistent way when the "format" argument is a numeric or a datetime
expression.


Please also see inline:

On 12/09/2015 02:36 PM, Sergei Golubchik wrote:
...
Hi, Alexander!
On Dec 08, Alexander Barkov wrote:
...
Hi Sergei,
Please review a patch for MDEV-7055.
Thanks.
...

diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc
index 522004e..a2a6fff 100644
--- a/sql/item_timefunc.cc
+++ b/sql/item_timefunc.cc
@@ -447,6 +447,70 @@ static bool extract_date_time(DATE_TIME_FORMAT *format,
/**
+  A multi-byte safe helper class to read characters from a string.
+
+  QQ: Serg: which file to put this new class in?
+      It can be helpful for some other purposes
+      (not only here in item_timefunc.cc)
+      I remember you don't like such things in sql_string.h :)
Thoughts:
1. I'd call it, like, string (or character) iterator
2. There's something like that in 5.6, you might want to see what API
    they use
3. Should rather be in C (inline, if possible, with a C++ wrapper, if
    you'd like) so that we can put it in a service along with CHARSET_INFO
4. Should rather be in CHARSET_INFO so that you could have fast bytewise
    iterator for simple charsets.
And yes, 3 and 4 contradict each other, to a certain extent :)
That's because they're thoughts, not a complete how-to plan.
Discussed on IRC.

<skip>
...
...
@@ -457,21 +521,29 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
    uint hours_i;
    uint weekday;
    ulong length;
-  const char *ptr, *end;
+  my_wc_t wc;
+  int chlen;
+  Wchar_reader reader(str->charset(), format->format.str,
+                                      format->format.length);
is str->charset() the character set of the format string?
It was not necessarily the case in the first version,
with collation_connection.
In the second version, with determining the result collation from 
args[1], it is.
...
...
-  end= (ptr= format->format.str) + format->format.length;
-  for (; ptr != end ; ptr++)
+  for ( ; !reader.read(&wc, &chlen) ; )
You don't distinguish here between end of string and invalid character.
Is that ok?
As agreed on IRC, I now changed it to convert bad bytes to question
marks, to be inline with MDEV-6566  (which we implemented in 10.0).

Note, I did not add warnings on bad bytes to avoid special code in 
DATE_FORMAT. Other functions, e.g. CONCAT('bad-byte-sequence') also
do not warn.

My patch for "MDEV-6643 Improve performance of string processing in the 
parser" will automatically add warnings into DATE_FORMAT(), together 
with all other functions.
...
...
diff --git a/sql/sql_string.cc b/sql/sql_string.cc
index 885f53a..21fba79 100644
--- a/sql/sql_string.cc
+++ b/sql/sql_string.cc
@@ -547,8 +556,28 @@ bool String::append_with_prefill(const char *s,uint32 arg_length,
    t_length= full_length - arg_length;
    if (t_length > 0)
    {
-    bfill(Ptr+str_length, t_length, fill_char);
-    str_length=str_length + t_length;
+    if (charset()->mbminlen == 1)
+    {
+      /*
+        An ASCII string can be appended directly
+        to an ASCII-compatible string. This includes
+        multi-byte character sets, like utf8, sjis, etc.
+      */
but (mbminlen == 1) doesn't necessarily mean "ASCII-compatible",
remember "filename" charset? I thought you've had an "ASCII-compatible"
property somewhere in the CHARSET_INFO.
In this particular case it was not important, because the argument
string can only consists of digits. But to be on the safe side I
changed it to test the MY_CS_NONASCII flag instead.

Thanks!

<skip>