Hi Alexander, I've reviewed your second patch. Comments inline.
+}
/* diff --git a/sql/field.h b/sql/field.h index 541da5a..fa84e7d 100644 --- a/sql/field.h +++ b/sql/field.h @@ -835,7 +835,6 @@ class Field: public Value_source virtual Item_result cmp_type () const { return result_type(); } static bool type_can_have_key_part(enum_field_types);
The field_type_merge function breaks all the other naming patterns. We have result_type, cmp_type real_type and now field_type_merge. Wouldn't it be better to name it field_merge_type, to be consistent? Or since this is a lookup kind of operation, perhaps name it (get|lookup)_merge_field_type? I know it's not _exactly_ like result_type and compare_type, but it generally is used in a simillar context.
Don't we make our life harder in respect of merging when renaming?
Usually I try not to rename existing functions: - There is a chance that we'll merge something from earlier versions, and renaming can cause conflicts. - Also, developers are used to this name and can do "grep field_type_merge" or similar when searching the code.
I see your point. I am not 100% in favor of the idea of protecting ourselves (making life easier) in future merges, as it tends to keep us stuck with the same version of (often) difficult to read code. For this case, I guess we can live with it, but I'll keep on pointing such things in future reviews. :)
static enum_field_types field_type_merge(enum_field_types,
enum_field_types);
- static Item_result result_merge_type(enum_field_types); virtual bool eq(Field *field) { return (ptr == field->ptr && null_ptr == field->null_ptr && diff --git a/sql/item.cc b/sql/item.cc index c97f41f..c9b6155 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -9657,20 +9657,8 @@ Item_type_holder::Item_type_holder(THD *thd, Item *item) maybe_null= item->maybe_null; collation.set(item->collation); get_full_info(item); - /** - Field::result_merge_type(real_field_type()) should be equal to - result_type(), with one exception when "this" is a Item_field for - a BIT field: - - Field_bit::result_type() returns INT_RESULT, so does its Item_field. - - Field::result_merge_type(MYSQL_TYPE_BIT) returns STRING_RESULT. - Perhaps we need a new method in Type_handler to cover these type - merging rules for UNION. - */ - DBUG_ASSERT(real_field_type() == MYSQL_TYPE_BIT || - Item_type_holder::result_type() == - Field::result_merge_type(Item_type_holder::real_field_type())); /* fix variable decimals which always is NOT_FIXED_DEC */ - if (Field::result_merge_type(real_field_type()) == INT_RESULT)
Alright so this seems to be fixed here, looking at Type_handler_bit, inheriting from Type_handler_int_result. Do we test this somewhere though? I couldn't find it in the test case, perhaps you can point it out to me.
In theory, decimals is always 0 if result_type() is INT_RESULT. But I'm not fully sure that in reality non of the Items return non-zero decimals in combination with INT_RESULT. There's so many hacks in the code, so we combination can be used somewhere.
I just tried to comment out these two lines:
- if (Item_type_holder::result_type() == INT_RESULT) - decimals= 0; +// if (Item_type_holder::result_type() == INT_RESULT) +// decimals= 0; both in the constructor and in the method join_types() and run test. Nothing failed.
So perhaps these two lines can be just replaced to:
DBUG_ASSERT(decimals == 0 || Item_type_holder::result_type() != INT_RESULT);
push, and see.
Any suggestions?
This is indeed ugly. I've tried to look into the calling places, then to backtrack from there but I gave up after about 30 minutes of seeing a never ending branching possibility of items. Please add the assert.
Let's discuss about cleaning this up later. To me it feels like this Item does not really belong in the Item class and should be factored out. Probably a whole project on its own :)
I made attempts to move Item_type_holder out of the Item hierarchy in the past, but failed. It caused too much refactoring, because Item_type_holder is used with List<Item> all around the UNION and table creation code.
Perhaps we should make another attempt eventually. I suggest to postpone this at least after the main Type_handler related changes are done.
I agree.
item_decimals= 0; decimals= MY_MAX(decimals, item_decimals); } diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc index 98b179b..e5e366e 100644 --- a/sql/item_cmpfunc.cc +++ b/sql/item_cmpfunc.cc @@ -180,32 +179,40 @@ static int agg_cmp_type(Item_result *type, Item
**items, uint nitems)
@return aggregated field type. */
-enum_field_types agg_field_type(Item **items, uint nitems, - bool treat_bit_as_number)
Why is this function in item_cmpfunc.cc and not in sql_type.cc?
Moving this to sql_type.cc can cause additional merge conflicts.
But perhaps it's Ok to move it, as it gets changed significantly anyway (not logically, but textually).
I vote for moving it. I hate it when the implementation is all over the place.
+bool +Type_handler_hybrid_field_type::aggregate_for_result(const char *funcname, + Item **items, uint nitems, + bool treat_bit_as_number) { I would move uint i to be a local for variable. This is a C-style loop. Is there a compiler that doesn't support this in one of our builders?
Done.
Also, how about size_t instead of uint? (Probably not necessary but wlad made a point of prefering to use that for iterators and such).
size_t is fine. But this should be done together with changing Item_args::arg_count, which is passed to this method. I suggest not to do this change under terms of this patch.
I agree.
<cut>
diff --git a/sql/sql_type.cc b/sql/sql_type.cc index 8746595..397b5cf 100644 --- a/sql/sql_type.cc +++ b/sql/sql_type.cc @@ -54,6 +51,41 @@ static Type_handler_set type_handler_set; Type_handler_null type_handler_null; Type_handler_row type_handler_row; Type_handler_varchar type_handler_varchar; +Type_handler_newdecimal type_handler_newdecimal; +Type_handler_longlong type_handler_longlong; +Type_handler_bit type_handler_bit; + + I'm sure there's a better way to write this so that it gets initialized at compile time instead of at runtime (before main). Perhaps we can define the Type_aggregator differently. I need to look this Up. For now it will work. Standard offers no guarantees regarding the order, but it shouldn't matter for us as the address shouldn't change for global objects during initialization.
The part from Static_data_initializer should eventually be gone. We need to extend the Type_handler API first, so a data type handler (plugin) can provide an array of its aggregation rules. So in the future the server will do the calls like type_aggregator_for_result.add() when loading a new data type plugin, either on startup, or during INSTALL PLUGIN.
For now, type handlers reside statically in the server anyway, so this should work fine, and I think it's 100% safe. As you noted, the addresses should not change even if the get initialized in some non-reliable order.
I chose this approach because I didn't want to expose this code to mysqld.cc now. Exposing it would be too early at this point.
I agree.
LEX_CSTRING::str or LEX_CSTRING::length without changing the other one. I suggest we make the inheritance private so that the only way to access
I don't like this class too much. One can easily break it by either changing the
members is through the methods available.
Done: I changed it to derive privately.
A suggestion I have is to create a generic "String" class that has this same behaviour, without calling it Name. Afterwards typedefing it to Name.
Earlier I proposed to add similar classes to struct.h, something like this:
struct Lex_cstring_st: public LEX_CSTRING; // without initialization
class Lex_cstring: protected Lex_cstring_st; // with initialization
and to move all global functions operating on LEX_CSTRING as methods into these new struct and class.
But Monty disliked it. Monty thinks that having more globally visible classes makes the code harder to read. I think it makes the code easier to read, to use, and to reuse. We never could agree :)
So if we're adding Lex_cstring_st at this point we should be ready: - either to convince Monty that this is good - or to revert our changes in struct.h and move the class locally to sql_type.h again.
Let's go with Lex_cstring_st in struct.h ?
:)
I say we keep it as is for the moment. The patch changes enough as is. It can happen in a follow up, once we're done.
+ static const
+ Type_handler *aggregate_for_result_traditional(const Type_handler *h1, + const Type_handler *h2); + This function can return a const reference from a const static object within the class's namespace. Why create an object every time this is called? Compiler might optimize it, but why risk it? This goes for all the implementations.
For my opinion it's 100% safe. There is no any risk here. It should be the same safe with doing "return 10" from a "int" function.
It just reserves 16 bytes for a LEX_CSTRING on the stack and populates it, and then the caller uses this populated LEX_CSTRING to access its members though the methods ptr() or length(). Some compilers should probably be able even to use registers instead of stack for this. But my intent was not to rely on using registers. I just found this style as the shortest possible and the most readable.
As for performance, it requires the same amount of resources with for example passing LEX_CSTRING by value to some function, or just to create a local LEX_CSTRING/LEX_STRING variable.
Here are some examples in the existing code:
sp_sql->append(C_STRING_WITH_LEN("CREATE ")); sp_sql->append(C_STRING_WITH_LEN("PROCEDURE ")); LEX_STRING pw= { C_STRING_WITH_LEN("password") };
Notice, we don't create static LEX_STRING or LEX_CSTRING for all possible strings we need in the server. The proposed code should be exactly the same cheap with these examples.
Another approach would be to: - have a static variable for every Type_handler name - return this variable from the method name().
This could give slight benefits when we need ptr() without length(), or the other way around. And the caller in item.cc actually uses name().ptr() without name().length().
But as name() will be used for errors and for DBUG_PRINT mostly, so I thought it would be more useful to save the number of lines. And it's easier to read this way. You can see the name right inside the class definition, you don't have to go to sql_class.cc to see it.
The solution I propose is this: // in sql_type.h Type_handler { const Name& name() const = 0; } Type_handler_xxx { static const Name xxx_name; // xxx const Name& name() const; } // In sql_type.cc Type_handler_xxx::xxx_name = Name(C_STRING_WITH_LEN("xxx")); const Name& Type_handler_xxx::name() const { return xxx_name; } Performance wise this is superior. Have a look at assembly for the following function calls: // these are defined in say sql_type.cc; Type_handler *give_random_type_handler() { return new Type_handler_date(); } bool use_random_name(const Name& name) { printf("%s\n", name.ptr()); return true; } Using them in sql_<something_else>.cc ////// Assembly generated by proposed patch: (GCC 6.2.0) Type_handler *hnd = give_random_type_handler(); 1cc5c: e8 00 00 00 00 call 1cc61 const Name& n = hnd->name(); 1cc61: 48 8b 10 mov rdx,QWORD PTR [rax] 1cc64: 48 89 c7 mov rdi,rax 1cc67: ff 12 call QWORD PTR [rdx] 1cc69: 48 8d bd 00 cf ff ff lea rdi,[rbp-0x3100] 1cc70: 48 89 85 00 cf ff ff mov QWORD PTR [rbp-0x3100],rax 1cc77: 48 89 95 08 cf ff ff mov QWORD PTR [rbp-0x30f8],rdx use_random_name(n); 1cc7e: e8 00 00 00 00 call 1cc83 ///// Assembly generated by my suggestion: (GCC 6.2.0) Type_handler *hnd = give_random_type_handler(); 1cc5c: e8 00 00 00 00 call 1cc61 const Name& n = hnd->name(); 1cc61: 48 8b 10 mov rdx,QWORD PTR [rax] 1cc64: 48 89 c7 mov rdi,rax 1cc67: ff 12 call QWORD PTR [rdx] use_random_name(n); 1cc69: 48 89 c7 mov rdi,rax 1cc6c: e8 00 00 00 00 call 1cc71 The difference is that the optimized version only uses registers, while the first version has to write to memory (or cache I guess). To me, readability seems the same for both implementations. I'm not going to insist on this too much, but I believe it's best practice to have this sort of code that passes a const reference, instead of always creating a new item on the stack. We can discuss more in detail on this if you'd like. So to sum up, please add the assert you proposed and remove setting decimals to 0 as a "safety measure". I would like to make use of a static variable instead of how we now construct "Name" objects for every name() call. Ok to push otherwise. Regards, Vicențiu