Hi!
"Sergei" == Sergei Golubchik <serg@askmonty.org> writes:
<cut>
=== modified file 'mysql-test/r/date_formats.result' --- mysql-test/r/date_formats.result 2010-11-12 10:12:15 +0000 +++ mysql-test/r/date_formats.result 2011-03-08 18:41:58 +0000 @@ -196,16 +196,16 @@ date format datetime 0003-01-02 8:11:2.123456 %Y-%m-%d %H:%i:%S.%# 0003-01-02 08:11:02 03-01-02 8:11:2.123456 %Y-%m-%d %H:%i:%S.%# 2003-01-02 08:11:02 2003-01-02 10:11:12 PM %Y-%m-%d %h:%i:%S %p 2003-01-02 22:11:12 -2003-01-02 01:11:12.12345AM %Y-%m-%d %h:%i:%S.%f%p 2003-01-02 01:11:12.123450 -2003-01-02 02:11:12.12345AM %Y-%m-%d %h:%i:%S.%f %p 2003-01-02 02:11:12.123450 -2003-01-02 12:11:12.12345 am %Y-%m-%d %h:%i:%S.%f%p 2003-01-02 00:11:12.123450 +2003-01-02 01:11:12.12345AM %Y-%m-%d %h:%i:%S.%f%p 2003-01-02 01:11:12 +2003-01-02 02:11:12.12345AM %Y-%m-%d %h:%i:%S.%f %p 2003-01-02 02:11:12 +2003-01-02 12:11:12.12345 am %Y-%m-%d %h:%i:%S.%f%p 2003-01-02 00:11:12
Why has microseconds been removed from the output ?
It looks like 'cast(str_to_date(date, format) as datetime)' removes all microseconds, which is an incompatible change. In the past cast(..., datetime) as same as cast(..., datetime(6))
<cut>
I think that it's more important that cast() produces the same value as MySQL did than that create ... select cast() produces the same type as before.
With the first issue, you will get lost data (without any warnings) for old applictions if they would do things like:
select * from t1 where cast(char_column as time) = "12:23:24.123456";
which works with MySQL but not with this patch.
Sergei> Looking at how Oracle implements this feature I can see that cast() Sergei> there works exactly as in my patch. If I do what you want, MariaDB won't Sergei> be compatible with MySQL in the future. It's clear that MySQL removes microseconds becasue they don't have real support for microseconds. As we now can have microseconds in datetime, we need to keep it. I don't think that anyone would accept that cast(cast(datetime as string) as datetime) Should give a loss of precision. Compare to: cast(cast(5.5 as char(5)) as double) -> 5.5 (not 5) In this case I prefer to be compatible with older versions of MySQL than with newer versions that are not yet out.
=== modified file 'mysql-test/r/func_time.result' --- mysql-test/r/func_time.result 2010-10-31 16:04:38 +0000 +++ mysql-test/r/func_time.result 2011-03-19 14:49:36 +0000
As part of this, I noticed that from_unixtime() and unix_timestamp() is not enhanced to support microseconds.
This needs to work:
create table t2 (a int, b timestamp(6)) insert into t2 (a) values(1); select * from t2; I got '1 | 2011-03-29 15:38:53.395516'
select unix_timestamp(b) from t2; -> 1301402333 (no decimals)
This needs to returns sub seconds!
Sergei> If I do that, unix_timestamp() will no longer return INT_RESULT. Sergei> Which means it cannot be used as a partitioning function anymore. Sergei> Few tests use it that way. I've fixed the tests by wrapping Sergei> unix_timestamp() in cast(... as unsigned). Which failed, because Sergei> cast cannot be used as a partitioning function either. I fixed that too. Sergei> Now tests pass. But this is an incompatible change, it may break Sergei> existing applications. And fixing SQL will make it backward Sergei> incompatible - partitioning by unix_timestamp() will not work in newer Sergei> MariaDB, partitioning by cast(unix_timestamp(...) as unsigned) will Sergei> work, but it will not work in MySQL and older MariaDB. Sergei> At the moment I've shelved this change and I'm not committing it, until Sergei> we decide what to do. How about then adding: Add unix_high_precision_timestamp(b) that returns a decimal ? I would be fine with that. By the way, good that you fixed that cast is a partition function!
=== modified file 'mysql-test/r/ps.result' --- mysql-test/r/ps.result 2010-10-04 08:51:26 +0000 +++ mysql-test/r/ps.result 2011-03-01 12:24:36 +0000 @@ -3040,3 +3040,18 @@ id select_type table type possible_keys DEALLOCATE PREPARE stmt; DROP TABLE t1; End of 5.1 tests. +prepare stmt from "select date('2010-10-10') between '2010-09-09' and ?"; +set @a='2010-11-11'; +execute stmt using @a; +date('2010-10-10') between '2010-09-09' and ? +1 +execute stmt using @a; +date('2010-10-10') between '2010-09-09' and ? +1 +set @a='2010-08-08'; +execute stmt using @a; +date('2010-10-10') between '2010-09-09' and ? +0 +execute stmt using @a; +date('2010-10-10') between '2010-09-09' and ? +0
Please add a test that uses sub seconds (for completeness)
set @a='2010-08-08 02:03:04.567890'; execute stmt using @a;
Sergei> it's not a completeness, but a bloat. First this is a test for dates. Sergei> second - the comment says "restoring of the Item tree in BETWEEN with Sergei> dates". this code path does not check for second_part > 0, it is not Sergei> affected by microsecond presence at all. No, this test file is for prepared statements, not for dates. I would like to be sure that we also do some test with microseconds and prepared statements and this is the test file where to do it. I don't remember any tests where you tested prepare stmt and microseconds. Did you add such a test somewhere else? <cut>
+++ mysql-test/suite/funcs_1/r/is_columns_innodb.result 2011-03-24 14:55:52 +0000 @@ -382,333 +382,333 @@ LOAD DATA INFILE '<MYSQLTEST_VARDIR>/std SELECT * FROM information_schema.columns WHERE table_schema LIKE 'test%' ORDER BY table_schema, table_name, column_name; -TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME ORDINAL_POSITION COLUMN_DEFAULT IS_NULLABLE DATA_TYPE CHARACTER_MAXIMUM_LENGTH CHARACTER_OCTET_LENGTH NUMERIC_PRECISION NUMERIC_SCALE CHARACTER_SET_NAME COLLATION_NAME COLUMN_TYPE COLUMN_KEY EXTRA PRIVILEGES COLUMN_COMMENT -NULL test t1 f1 1 NULL YES char 20 20 NULL NULL latin1 latin1_swedish_ci char(20) select,insert,update,references ... lots of changes
How about changing the SELECT * FROM information_schema.columns to just list the old columns in suite/funcs_1/datadict/columns.inc
This gives a smaller change list and will not cause a diff if we add another column... Especially this change caused really huge diff and will likely cause merge conflicts sooner or later...
Sergei> Changing will also cause merge conflicts. This will keep the code similar to MySQL and thus give us less conflicts for anything they change! Sergei> And I think it's good to see all columns here. Then make a separate test of that that is not run by 10x engines and return LOTS of rows. <cut>
=== modified file 'sql/event_queue.cc' --- sql/event_queue.cc 2008-05-09 07:43:02 +0000 +++ sql/event_queue.cc 2011-03-01 12:24:36 +0000 @@ -513,9 +513,10 @@ Event_queue::empty_queue() */
void +Event_queue::dbug_dump_queue(my_time_t when) { #ifndef DBUG_OFF + my_time_t now= when; Event_queue_element *et; uint i; DBUG_ENTER("Event_queue::dbug_dump_queue"); @@ -592,14 +593,13 @@ Event_queue::get_top_for_execution_if_ti thd->set_current_time(); /* Get current time */
next_activation_at= top->execute_at; + if (next_activation_at >thd->query_start()) { /* Not yet time for top event, wait on condition with time or until signaled. Release LOCK_queue while waiting. */ - struct timespec top_time; - set_timespec(top_time, next_activation_at - thd->query_start()); + struct timespec top_time= { next_activation_at, 0 };
Not sure if the above change is portable. That is why we have the set_timespec() defines.
Any particular reason why you did the change ?
<cut> Sergei> I'll fix it when merging in 5.3, as there we have different set of Sergei> set_timespec* macros, and there I can set the wakeup time directly. In other words, I have to do it (as I am the one to merge to 5.3) ;)
=== modified file 'sql/field.cc'
@@ -3151,13 +3141,9 @@ int Field_short::store(const char *from,
error= get_int(cs, from, len, &rnd, UINT_MAX16, INT_MIN16, INT_MAX16); store_tmp= unsigned_flag ? (int) (ulonglong) rnd : (int) rnd; -#ifdef WORDS_BIGENDIAN - if (table->s->db_low_byte_first) - { + if (ARCH_BIGENDIAN && table->s->db_low_byte_first) int2store(ptr, store_tmp); - }
<cut>
Sergei> As we agreed on IRC, I won't revert this change, but instead remove Sergei> ARCH_BIGENDIAN and db_low_byte_first in a separate patch. Will push both Sergei> together. Good!
@@ -4822,136 +4682,136 @@ timestamp_auto_set_type Field_timestamp: } }
+long Field_timestamp::get_timestamp(ulong *sec_part) const +{ + ASSERT_COLUMN_MARKED_FOR_READ; + *sec_part= 0; + if (ARCH_BIGENDIAN && table && table->s->db_low_byte_first) + return sint4korr(ptr); + long tmp; + longget(tmp,ptr); + return tmp; +}
Shouldn't this be of type 'my_time_t' for completeness?
Sergei> I think 'long' is kind of safer, as these types use longget() anyway, Sergei> and beeing explicit about the type would help to avoid unreasonable Sergei> assumptions and expectations. Using long will tempt users that calls this function to use long as the variable to store the value, instead of my_time_t that they should use. I don't think that any caller of get_timestamp() should know how the timestamp is stored. If we ever change the representation, it will be less changes if things are right from the start. <cut>
-void Field_timestamp::set_time() +int Field_timestamp::set_time() { THD *thd= table ? table->in_use : current_thd; - long tmp= (long) thd->query_start(); set_notnull(); - store_timestamp(tmp); + store_TIME(thd->query_start(), 0); + return 0; }
The set_time() function could be a void as it can never fail. (Yes, I agree that it means that we have to test type of field in event code, but that something we should do at event startup anyway...)
Sergei> I may do it after a merge - I've seen that Alik added this test to Sergei> MySQL. In other words, I have to do this when I do the merge...
+static void store_native(ulonglong num, uchar *to, uint bytes) { + switch(bytes) { + case 1: *to= (uchar)num; break; + case 2: shortstore(to, (ushort)num); break; + case 3: int3store(to, num); /* Sic!*/ break; + case 4: longstore(to, (ulong)num); break; + case 8: longlongstore(to, num); break; + default: DBUG_ASSERT(0); + } +}
What would be even better if each field would have a pointers to c functions with the right read and write functions; Would avoid a switch for every call...
Sergei> Mats Kindal benchmarked and blogged about it. Sergei> switch is about 20% slower than function pointers if the function is Sergei> big. But it is several times faster than function pointers if the Sergei> function (with the switch) is small and can be inlined. If 'bytes' would be a constant, I would agree with you Here it can't be efficently inlined as the caller don't use a constant for bytes. Inline:in any function (especially one that has jumps) that generates
60 bytes many times will make the code slower, now faster. (Takes more memory in the cpu, more memory for jump predictions etc).
Having one function that is called by many threads uses less CPU memory and is becasue of this usually faster then inline (when inline can't remove code because of usage of constants)
+ if (!tmp) + return fuzzydate & TIME_NO_ZERO_DATE;
should be:
+ return (fuzzydate & TIME_NO_ZERO_DATE) != 0;
as the function is a bool function
Sergei> no need to because the function is a bool function. If it were my_bool, Sergei> then it would be needed. Please add as otherwise we may get compiler warnings for loss of precession when converting to bool. <cut>
@@ -7048,6 +7101,8 @@ Item_cache* Item_cache::get_cache(const return new Item_cache_str(item); case ROW_RESULT: return new Item_cache_row(); + case TIME_RESULT: + return new Item_cache_int(MYSQL_TYPE_DATETIME);
Shouldn't this be a double or decimal as the time can contain microseconds?
Looks like a potential bug!
Sergei> No :) Sergei> It's an integer that stores packed datetime value. Sergei> This is the value we pass around and compare with. Then I assume you added a comment about this...
=== modified file 'sql/item.h'
<cut>
I also noticed that you have a Field->cmp_type() but there is no Item_field->cmp_type() mapped to Field->cmp_type(). As I assume that for all fields the following should hold:
Item_field-> cmp_type() == Item_field->field->cmp_type()
Sergei> Yes. Then I assume you have now added Item_field->cmp_type() as above.
@@ -3031,7 +3022,7 @@ class Item_cache_int: public Item_cache protected: longlong value; public: - Item_cache_int(): Item_cache(), + Item_cache_int(): Item_cache(MYSQL_TYPE_LONGLONG), value(0) {} Item_cache_int(enum_field_types field_type_arg): Item_cache(field_type_arg), value(0) {} @@ -3043,8 +3034,13 @@ class Item_cache_int: public Item_cache String* val_str(String *str); my_decimal *val_decimal(my_decimal *); enum Item_result result_type() const { return INT_RESULT; } - bool result_as_longlong() { return TRUE; } bool cache_value(); + Item *clone_item() + { + Item_cache_int *item= new Item_cache_int(cached_field_type); + item->store(this, value); + return item; + } };
Don't understand why Item_cache_int() needs a clone_item when Item_cache_real() or Item_cache_decimal() dosn't have one.
If this was needed, please check if the other Item_cache_xxx() functions needs it.
Sergei> It only needed it because of this special usage, as a holder of packed Sergei> datetime value. get_datetime_value() creates it to cache converted Sergei> constant datetime. The existence of clone() method tells the server that Sergei> this item is a proper constant and there's no need to optimize it Sergei> further. Other Item_cache_* objects are never used like that. Thanks for the explanation. Please add it as a comment to clone_item().
=== modified file 'sql/item_cmpfunc.cc' <cut>
@@ -1026,9 +761,14 @@ Item** Arg_comparator::cache_converted_c Item **cache_item, Item_result type) { - /* Don't need cache if doing context analysis only. */ + /* + Don't need cache if doing context analysis only. + Also, get_datetime_value creates Item_cache internally. + Unless fixed, we should not do it here. + */ if (!thd_arg->is_context_analysis_only() && - (*value)->const_item() && type != (*value)->result_type()) + (*value)->const_item() && type != (*value)->result_type() && + type != TIME_RESULT) { Item_cache *cache= Item_cache::get_cache(*value, type); cache->setup(*value);
Why don't we cache it there instead of doing it in get_datetime_value()? This would save us one if and 3 test for every call to get_datetime_value().
You could here call get_datetime_value() and use this value in the cache. If this doesn't easily work (as indicated on IRC), at least expand the comment to explain the issue so that we can later try to fix it in a better way.
+ if (args[i]->cmp_type() == TIME_RESULT) { - datetime_found= TRUE; + if (args[i]->field_type() != MYSQL_TYPE_TIME || + (args[i]->field_type() == MYSQL_TYPE_TIME && compare_as_dates==0)) + compare_as_dates= args[i]; continue; } } }
I find it strange that you set compare_as_dates to the last not MYSQL_TYPE_TIME column as this is used to give warnings as the warning may in theory come from some other field. I tried to make a test case to show the problem but didn't succeed :(
Sergei> Yes, the warning may come from some other field. But 'compare_as_dates' Sergei> field affects how we will interpret that other field. For example, if Sergei> you compare a string and a time - the string will be parsed as a time Sergei> value, and the warning will say "incorrect TIME value". Sergei> If you compare a string and a datetime - the warning will say "incorrect Sergei> DATETIME value". If you compare a string, a time, and a datetime (this Sergei> is BETWEEN, remember? Three operands), the string will be parsed as Sergei> DATETIME. Do we have test cases for all this? If not, it would be good to have all combinations covered!
=== modified file 'sql/item_func.cc' --- sql/item_func.cc 2010-11-23 13:08:10 +0000 +++ sql/item_func.cc 2011-03-19 13:54:46 +0000 <cut> +bool Item_func_min_max::get_date(MYSQL_TIME *ltime, uint fuzzy_date) { longlong UNINIT_VAR(min_max); + DBUG_ASSERT(fixed == 1); + if (!compare_as_dates) + return Item_func::get_date(ltime, fuzzy_date);
Add a comment in which case the above may not be true.
Sergei> But it's a normal situation. Sergei> For all val_* we consider all cases. Sergei> You don't want me to add a similar comment for case INT_RESULT in Sergei> val_str(), for example, do you? And for all other combinations. Yes, but normally we do things on result_type, not by a local variable that is set in some unknown manner. I needed it for the above as I did not understand the code, even after looking around for a short time. For any such code, I want a comment.
=== modified file 'sql/item_timefunc.cc'
@@ -2050,42 +1845,14 @@ bool Item_func_from_unixtime::get_date(M void Item_func_convert_tz::fix_length_and_dec() { collation.set(&my_charset_bin); - decimals= 0; - max_length= MAX_DATETIME_WIDTH*MY_CHARSET_BIN_MB_MAXLEN; + max_length= MAX_DATETIME_WIDTH;
Should probably be: max_length= MAX_DATETIME_WIDTH*MY_CHARSET_BIN_MB_MAXLEN;
This is how all the other functions are defined.
Sergei> MY_CHARSET_BIN_MB_MAXLEN is always 1. It is assumed and hardcoded in so Sergei> many places in the code (like buffer sizes, memory allocations, using strlen, Sergei> byte-per-byte string processing and so on) that it's practically Sergei> impossible to change. And doing it makes no sense either. Sergei> I've asked Bar, he said that this define (MY_CHARSET_BIN_MB_MAXLEN) Sergei> was removed from MySQL sources some time ago. And that it's meaningless Sergei> and should've never been added in the first place. It's still in the current MariaDB source and I would prefer to have everything done the same way. To argue how things are done in the future is not an argument for current code (If we are not doing the change in all relevant places). Anyway, not critical to change. <cut>
For some examples where this now goes strange:
dayofyear(timediff('0000-00-01 00:00:05', '0000-00-01 00:00:05')) -> NULL dayofyear(timediff('0000-00-01 00:00:05', '0000-00-01 00:00:04')) -> 0 dayofyear(cast('0000-00-01 00:00:05' as datetime)) -> NULL
Sergei> Which means that dayofyear() needs to require both TIME_NO_ZERO_DATE and Sergei> TIME_NO_ZERO_IN_DATE. I'll fix that. Sergei> Alternatively, we can say that TIME_NO_ZERO_DATE implicitly includes Sergei> TIME_NO_ZERO_IN_DATE. The later is probably ok.
=== modified file 'sql/log_event.cc'
@@ -1000,16 +1004,27 @@ class Log_event { return 0; } virtual bool write_data_body(IO_CACHE* file __attribute__((unused))) { return 0; } - inline time_t get_time() + inline my_time_t get_time() { THD *tmp_thd; if (when) return when; if (thd) + { + when= thd->start_time; + when_sec_part= thd->start_time_sec_part; + return when; + }
thd should alway be true. Can you at some point in time check when this is not the case and if needed set thd for the class !
if ((tmp_thd= current_thd))
current_thd() should never fail. If it can, there should be a comment in the code about this!
Sergei> This is your code :) Sergei> http://bazaar.launchpad.net/~maria-captains/maria/5.1/revision/2476.293.2 Sergei> And there was a comment that dissapeared in that change. Sergei> It used to be Sergei> THD *thd= current_thd; Sergei> /* thd will only be 0 here at time of log creation */ Sergei> now= thd ? thd->start_time : my_time(0); And in my code, there was a comment, which was what I requested ;) I assume you added it back
=== modified file 'sql/mysql_priv.h'
<cut>
@@ -3045,10 +3045,10 @@ void sys_var_microseconds::set_default(T uchar *sys_var_microseconds::value_ptr(THD *thd, enum_var_type type, LEX_STRING *base) { - thd->tmp_double_value= (double) ((type == OPT_GLOBAL) ? + thd->sys_var_tmp.double_value= (double) ((type == OPT_GLOBAL) ? global_system_variables.*offset : thd->variables.*offset) / 1000000.0;
Change 100000.0 to same define constant as 1e6
Sergei> No. I have a constant for "precision of my_hrtime_t", it's 1000000, Sergei> which means "microseconds", but it does not have to be. We can have Sergei> my_hrtime_t storing, for example, hundreds of nanoseconds. Sergei> I have a constant for "precision of second_part", it's also 1000000, Sergei> which means second_part is microseconds, but it doesn't have to be Sergei> either. Sergei> But I won't introduce a constant for "number of microseconds in a Sergei> second". And any reasons the above functions is returning microseconds ? Shouldn't this be something that is depending on the precision that we can provide? In other words, if we change my_hrtime_t to use nanoseconds, isn't it likely that sys_var_microseconds needs to be renamed and the above constant too ?
=== modified file 'sql/sql_select.cc'
@@ -9455,7 +9445,7 @@ test_if_equality_guarantees_uniqueness(I { return r->const_item() && /* elements must be compared as dates */ - (Arg_comparator::can_compare_as_dates(l, r, 0) || + (l->cmp_type() == TIME_RESULT || /* or of the same result type */ (r->result_type() == l->result_type() && /* and must have the same collation if compared as strings */
Add a comment:
/* It's enough to test 'l' for TIME_RESULT as only 'l' can be a field in this context. */
Sergei> No, this was not why I changed it. Sergei> It was, in fact, a bug. Cases where 'r' had TIME_RESULT, but 'l' did not Sergei> have it, may cause test_if_equality_guarantees_uniqueness() to return Sergei> false positives. Like: Sergei> create table t1 (a varchar(100)); Sergei> insert t1 values ('2010-01-02'), ('2010-1-2'), ('20100102'); Sergei> select distinct t1 from t1 where a=date('2010-01-02'); Sergei> This fails in 5.1 and works correctly in 5.1-micro. Still, the code needs a comment why we only test for 'l' and not for 'r'. Regards, Monty