[Maria-developers] Sprint 10.0: MDEV-8205 timediff returns null when comparing decimal time to time string value

newer
[Maria-developers] GSoC Progress...

older
[Maria-developers] MDEV-7502...

Alexander Barkov

11 Jun 2015 11 Jun '15

4:20 a.m.

Hi Sergei, I'm not sure what to do with this:

...

MariaDB [test]> select -> timediff('2014-01-01 00:00:00' , '2014-01-01 01:00:00' ), -> timediff(20140101000000.000 , 20140101010000.000 ), -> timediff(20140101000000.000 , '2014-01-01 01:00:00' ), -> datediff('2014-01-01 00:00:00' , '2014-01-02 01:00:00' ), -> datediff(20140101000000.000 , 20140102010000.000 ), -> datediff(20140101000000.000 , '2014-01-02 01:00:00' )\G *************************** 1. row *************************** timediff('2014-01-01 00:00:00' , '2014-01-01 01:00:00' ): -01:00:00 timediff(20140101000000.000 , 20140101010000.000 ): -01:00:00.000 timediff(20140101000000.000 , '2014-01-01 01:00:00' ): NULL datediff('2014-01-01 00:00:00' , '2014-01-02 01:00:00' ): -1 datediff(20140101000000.000 , 20140102010000.000 ): -1 datediff(20140101000000.000 , '2014-01-02 01:00:00' ): -1 1 row in set (0.03 sec)

Notice, it returns NULL in one case. The problem is in this piece of the code:

...

if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);

get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME. get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME. I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases. Any ideas what to do?

Show replies by date

Sergei Golubchik

12 Jun 12 Jun

9 a.m.

Hi, Alexander! On Jun 11, Alexander Barkov wrote:

...

The problem is in this piece of the code:

...
if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);

get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME.

get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME.

I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases.

First, this is clearly a bug that get_time behaves differently for numbers and strings. One could get different results for WHERE time_column = 'string' and WHERE time_column = number Something needs to be fixed. We cannot fix string-to-time conversion to return MYSQL_TIMESTAMP_TIME in all cases. When a string literal is parsed we don't always know in advance whether it's a datetime or a time, so we can only indicate *preference* based on the context (preference matters, as it tells how to parse ambiguous strings like "10:10:10"). So, I've fixed number_to_time to keep the date part: --- a/sql-common/my_time.c +++ b/sql-common/my_time.c @@ -1319,9 +1319,6 @@ int number_to_time(my_bool neg, ulonglong nr, ulong sec_pa TIME_INVALID_DATES, was_cut) < 0) return -1; - ltime->year= ltime->month= ltime->day= 0; - ltime->time_type= MYSQL_TIMESTAMP_TIME; - *was_cut= MYSQL_TIME_NOTE_TRUNCATED; return 0; } The only test that changed results was type_time_hires, and I think the change was ok. Regards, Sergei

Alexander Barkov

15 Jun 15 Jun

7:28 a.m.

Hi Sergei, On 06/12/2015 01:00 PM, Sergei Golubchik wrote:

...

Hi, Alexander!

On Jun 11, Alexander Barkov wrote:

...
The problem is in this piece of the code:

...
if (args[0]->get_time(&l_time1) || args[1]->get_time(&l_time2) || l_time1.time_type != l_time2.time_type) return (null_value= 1);

get_time() forces decimal-to-time conversion to truncate the date part and return MYSQL_TIMESTAMP_TIME.

get_time() does not force string-to-time conversion to truncate the date part. It still returns the full MYSQL_TIMESTAMP_DATETIME.

I'm afraid a lot of the code rely in this behavior. If we fix string-to-time conversion, mtr will start to fail in other cases.

First, this is clearly a bug that get_time behaves differently for numbers and strings. One could get different results for

WHERE time_column = 'string'

and

WHERE time_column = number

Something needs to be fixed. We cannot fix string-to-time conversion to return MYSQL_TIMESTAMP_TIME in all cases. When a string literal is parsed we don't always know in advance whether it's a datetime or a time, so we can only indicate *preference* based on the context (preference matters, as it tells how to parse ambiguous strings like "10:10:10").

I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it? Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01' But this is a kind of separate issue. Would you like me to create a task for this? In other cases it could still use the *preference* flag: '01#01#01' - could be either date or time, depending on the flag.

...

So, I've fixed number_to_time to keep the date part:

For some reasons I even didn't consider fixing number_to_time() as an option. Thanks. This fixes the bug. I pushed this change with a test added.

...

--- a/sql-common/my_time.c +++ b/sql-common/my_time.c @@ -1319,9 +1319,6 @@ int number_to_time(my_bool neg, ulonglong nr, ulong sec_pa TIME_INVALID_DATES, was_cut) < 0) return -1;

- ltime->year= ltime->month= ltime->day= 0; - ltime->time_type= MYSQL_TIMESTAMP_TIME; - *was_cut= MYSQL_TIME_NOTE_TRUNCATED; return 0; }

The only test that changed results was type_time_hires, and I think the change was ok.

Regards, Sergei

Sergei Golubchik

8:49 a.m.

Hi, Alexander! On Jun 15, Alexander Barkov wrote:

...

...
(preference matters, as it tells how to parse ambiguous strings like "10:10:10").

I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it?

yes, I believe delimiters are pretty much ignored in our code. so any delimiter can be used anywhere.

...

Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01'

But this is a kind of separate issue. Would you like me to create a task for this?

The way it works now - after your patch - there's no much need for a "preference" flag. The only issue I've uncovered in testing was related to parsing strings with time preference. Like in WHERE time_column > '2010-12-11' The code is my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time, ulonglong fuzzydate, MYSQL_TIME_STATUS *status) { ... /* Check first if this is a full TIMESTAMP */ if (length >= 12) { /* Probably full timestamp */ (void) str_to_datetime(str, length, l_time, (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY, status); Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time). I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like patterns[]= { { 'yyyy-mm-dd' , parse_date }, { 'hh:mm:ss.uuuuuu', parse_time }, ... } Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc). Regards, Sergei

Alexander Barkov

16 Jun 16 Jun

3:20 p.m.

Hi Sergei, I created this task based on our discussion: MDEV-8322 Distinguish between time and date strings more carefully Please also see comments below: On 06/15/2015 12:49 PM, Sergei Golubchik wrote:

...

Hi, Alexander!

On Jun 15, Alexander Barkov wrote:

...
...
(preference matters, as it tells how to parse ambiguous strings like "10:10:10").

I'd say '10:10:10' should be unambiguously treated as time. Colon is never used to delimit date parts. Is it?

yes, I believe delimiters are pretty much ignored in our code. so any delimiter can be used anywhere.

Right, in MariaDB they are ignored. But I mean in real life nobody uses colons to separate date parts.

...

...
Date parts are usually delimited by as follows: '01-01-01' '01.01.01' '01/01/01'

But this is a kind of separate issue. Would you like me to create a task for this?

The way it works now - after your patch - there's no much need for a "preference" flag. The only issue I've uncovered in testing was related to parsing strings with time preference. Like in

WHERE time_column > '2010-12-11'

The code is

my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time, ulonglong fuzzydate, MYSQL_TIME_STATUS *status) { ... /* Check first if this is a full TIMESTAMP */ if (length >= 12) { /* Probably full timestamp */ (void) str_to_datetime(str, length, l_time, (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY, status);

Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time).

I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like

patterns[]= { { 'yyyy-mm-dd' , parse_date }, { 'hh:mm:ss.uuuuuu', parse_time }, ... }

Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc).

Regards, Sergei

3670

Age (days ago)

3675

Last active (days ago)

List overview

4 comments

2 participants

participants (2)

Alexander Barkov
Sergei Golubchik