[Maria-developers] Sprint 10.0: MDEV-8205 timediff returns null when comparing decimal time to time string value
Hi Sergei, I'm not sure what to do with this:
MariaDB [test]> select -> timediff('2014-01-01 00:00:00' , '2014-01-01 01:00:00' ), -> timediff(20140101000000.000 , 20140101010000.000 ), -> timediff(20140101000000.000 , '2014-01-01 01:00:00' ), -> datediff('2014-01-01 00:00:00' , '2014-01-02 01:00:00' ), -> datediff(20140101000000.000 , 20140102010000.000 ), -> datediff(20140101000000.000 , '2014-01-02 01:00:00' )\G *************************** 1. row *************************** timediff('2014-01-01 00:00:00' , '2014-01-01 01:00:00' ): -01:00:00 timediff(20140101000000.000 , 20140101010000.000 ): -01:00:00.000 timediff(20140101000000.000 , '2014-01-01 01:00:00' ): NULL datediff('2014-01-01 00:00:00' , '2014-01-02 01:00:00' ): -1 datediff(20140101000000.000 , 20140102010000.000 ): -1 datediff(20140101000000.000 , '2014-01-02 01:00:00' ): -1 1 row in set (0.03 sec)
Notice, it returns NULL in one case. The problem is in this piece of the code:
if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);
get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME. get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME. I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases. Any ideas what to do?
Hi, Alexander! On Jun 11, Alexander Barkov wrote:
The problem is in this piece of the code:
if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);
get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME.
get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME.
I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases.
First, this is clearly a bug that get_time behaves differently for numbers and strings. One could get different results for WHERE time_column = 'string' and WHERE time_column = number Something needs to be fixed. We cannot fix string-to-time conversion to return MYSQL_TIMESTAMP_TIME in all cases. When a string literal is parsed we don't always know in advance whether it's a datetime or a time, so we can only indicate *preference* based on the context (preference matters, as it tells how to parse ambiguous strings like "10:10:10"). So, I've fixed number_to_time to keep the date part: --- a/sql-common/my_time.c +++ b/sql-common/my_time.c @@ -1319,9 +1319,6 @@ int number_to_time(my_bool neg, ulonglong nr, ulong sec_pa TIME_INVALID_DATES, was_cut) < 0) return -1; - ltime->year= ltime->month= ltime->day= 0; - ltime->time_type= MYSQL_TIMESTAMP_TIME; - *was_cut= MYSQL_TIME_NOTE_TRUNCATED; return 0; } The only test that changed results was type_time_hires, and I think the change was ok. Regards, Sergei
Hi Sergei, On 06/12/2015 01:00 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Jun 11, Alexander Barkov wrote:
The problem is in this piece of the code:
if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);
get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME.
get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME.
I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases.
First, this is clearly a bug that get_time behaves differently for numbers and strings. One could get different results for
WHERE time_column = 'string'
and
WHERE time_column = number
Something needs to be fixed. We cannot fix string-to-time conversion to return MYSQL_TIMESTAMP_TIME in all cases. When a string literal is parsed we don't always know in advance whether it's a datetime or a time, so we can only indicate *preference* based on the context (preference matters, as it tells how to parse ambiguous strings like "10:10:10").
I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it? Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01' But this is a kind of separate issue. Would you like me to create a task for this? In other cases it could still use the *preference* flag: '01#01#01' - could be either date or time, depending on the flag.
So, I've fixed number_to_time to keep the date part:
For some reasons I even didn't consider fixing number_to_time() as an option. Thanks. This fixes the bug. I pushed this change with a test added.
--- a/sql-common/my_time.c +++ b/sql-common/my_time.c @@ -1319,9 +1319,6 @@ int number_to_time(my_bool neg, ulonglong nr, ulong sec_pa TIME_INVALID_DATES, was_cut) < 0) return -1;
- ltime->year= ltime->month= ltime->day= 0; - ltime->time_type= MYSQL_TIMESTAMP_TIME; - *was_cut= MYSQL_TIME_NOTE_TRUNCATED; return 0; }
The only test that changed results was type_time_hires, and I think the change was ok.
Regards, Sergei
Hi, Alexander! On Jun 15, Alexander Barkov wrote:
(preference matters, as it tells how to parse ambiguous strings like "10:10:10").
I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it?
yes, I believe delimiters are pretty much ignored in our code. so any delimiter can be used anywhere.
Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01'
But this is a kind of separate issue. Would you like me to create a task for this?
The way it works now - after your patch - there's no much need for a "preference" flag. The only issue I've uncovered in testing was related to parsing strings with time preference. Like in WHERE time_column > '2010-12-11' The code is my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time, ulonglong fuzzydate, MYSQL_TIME_STATUS *status) { ... /* Check first if this is a full TIMESTAMP */ if (length >= 12) { /* Probably full timestamp */ (void) str_to_datetime(str, length, l_time, (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY, status); Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time). I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like patterns[]= { { 'yyyy-mm-dd' , parse_date }, { 'hh:mm:ss.uuuuuu', parse_time }, ... } Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc). Regards, Sergei
Hi Sergei, I created this task based on our discussion: MDEV-8322 Distinguish between time and date strings more carefully Please also see comments below: On 06/15/2015 12:49 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Jun 15, Alexander Barkov wrote:
(preference matters, as it tells how to parse ambiguous strings like "10:10:10").
I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it?
yes, I believe delimiters are pretty much ignored in our code. so any delimiter can be used anywhere.
Right, in MariaDB they are ignored. But I mean in real life nobody uses colons to separate date parts.
Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01'
But this is a kind of separate issue. Would you like me to create a task for this?
The way it works now - after your patch - there's no much need for a "preference" flag. The only issue I've uncovered in testing was related to parsing strings with time preference. Like in
WHERE time_column > '2010-12-11'
The code is
my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time, ulonglong fuzzydate, MYSQL_TIME_STATUS *status) { ... /* Check first if this is a full TIMESTAMP */ if (length >= 12) { /* Probably full timestamp */ (void) str_to_datetime(str, length, l_time, (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY, status);
Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time).
I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like
patterns[]= { { 'yyyy-mm-dd' , parse_date }, { 'hh:mm:ss.uuuuuu', parse_time }, ... }
Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc).
Regards, Sergei
participants (2)
-
Alexander Barkov
-
Sergei Golubchik