Hi Alexander,

I've reviewed your second patch. Comments inline.

>> +}
>>
>>
>>  /*
>> diff --git a/sql/field.h b/sql/field.h
>> index 541da5a..fa84e7d 100644
>> --- a/sql/field.h
>> +++ b/sql/field.h
>> @@ -835,7 +835,6 @@ class Field: public Value_source
>>    virtual Item_result cmp_type () const { return result_type(); }
>>    static bool type_can_have_key_part(enum_field_types);
>>
> The field_type_merge function breaks all the other naming patterns. We
> have result_type, cmp_type real_type and now field_type_merge. Wouldn't it be
> better to name it field_merge_type, to be consistent? Or since this
> is a lookup kind of operation, perhaps name it (get|lookup)_merge_field_type?
> I know it's not _exactly_ like result_type and compare_type, but it generally
> is used in a simillar context.

Don't we make our life harder in respect of merging when renaming?

Usually I try not to rename existing functions:
- There is a chance that we'll merge something from earlier versions,
  and renaming can cause conflicts.
- Also, developers are used to this name and can do
  "grep field_type_merge" or similar when searching the code.

I see your point. I am not 100% in favor of the idea of protecting ourselves (making life easier) in future merges, as it tends to keep us stuck with the same version of (often) difficult to read code. For this case, I guess we can live with it, but I'll keep on pointing such things in future reviews. :)
 

>>
>>    static enum_field_types field_type_merge(enum_field_types, enum_field_types);
>> -  static Item_result result_merge_type(enum_field_types);
>>    virtual bool eq(Field *field)
>>    {
>>      return (ptr == field->ptr && null_ptr == field->null_ptr &&
>> diff --git a/sql/item.cc b/sql/item.cc
>> index c97f41f..c9b6155 100644
>> --- a/sql/item.cc
>> +++ b/sql/item.cc
>> @@ -9657,20 +9657,8 @@ Item_type_holder::Item_type_holder(THD *thd, Item *item)
>>    maybe_null= item->maybe_null;
>>    collation.set(item->collation);
>>    get_full_info(item);
>> -  /**
>> -    Field::result_merge_type(real_field_type()) should be equal to
>> -    result_type(), with one exception when "this" is a Item_field for
>> -    a BIT field:
>> -    - Field_bit::result_type() returns INT_RESULT, so does its Item_field.
>> -    - Field::result_merge_type(MYSQL_TYPE_BIT) returns STRING_RESULT.
>> -    Perhaps we need a new method in Type_handler to cover these type
>> -    merging rules for UNION.
>> -  */
>> -  DBUG_ASSERT(real_field_type() == MYSQL_TYPE_BIT ||
>> -              Item_type_holder::result_type()  ==
>> -              Field::result_merge_type(Item_type_holder::real_field_type()));
>>    /* fix variable decimals which always is NOT_FIXED_DEC */
>> -  if (Field::result_merge_type(real_field_type()) == INT_RESULT)
>>
> Alright so this seems to be fixed here, looking at Type_handler_bit,
> inheriting from Type_handler_int_result. Do we test this somewhere though?
> I couldn't find it in the test case, perhaps you can point it out to me.


In theory, decimals is always 0 if result_type() is INT_RESULT.
But I'm not fully sure that in reality non of the Items return
non-zero decimals in combination with INT_RESULT.
There's so many hacks in the code, so we combination can
be used somewhere.

I just tried to comment out these two lines:
> -  if (Item_type_holder::result_type() == INT_RESULT)
> -    decimals= 0;
> +//  if (Item_type_holder::result_type() == INT_RESULT)
> +//    decimals= 0;
both in the constructor and in the method join_types()
and run test. Nothing failed.

So perhaps these two lines can be just replaced to:

DBUG_ASSERT(decimals == 0 ||
            Item_type_holder::result_type() != INT_RESULT);

push, and see.

Any suggestions?

This is indeed ugly. I've tried to look into the calling places, then to backtrack from there but I gave up after about 30 minutes of seeing a never ending branching possibility of items. Please add the assert.


>
> Let's discuss about cleaning this up later. To me it feels like this Item does
> not really belong in the Item class and should be factored out. Probably
> a whole project on its own :)

I made attempts to move Item_type_holder out of the Item hierarchy in
the past, but failed. It caused too much refactoring, because
Item_type_holder is used with List<Item> all around the UNION
and table creation code.

Perhaps we should make another attempt eventually.
I suggest to postpone this at least after the main Type_handler related
changes are done.


I agree.
 
>>
>>        item_decimals= 0;
>>      decimals= MY_MAX(decimals, item_decimals);
>>    }
>> diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc
>> index 98b179b..e5e366e 100644
>> --- a/sql/item_cmpfunc.cc
>> +++ b/sql/item_cmpfunc.cc
>> @@ -180,32 +179,40 @@ static int agg_cmp_type(Item_result *type, Item **items, uint nitems)
>>    @return aggregated field type.
>>  */
>>
>> -enum_field_types agg_field_type(Item **items, uint nitems,
>> -                                bool treat_bit_as_number)
>>
> Why is this function in item_cmpfunc.cc and not in sql_type.cc?

Moving this to sql_type.cc can cause additional merge conflicts.

But perhaps it's Ok to move it, as it gets changed significantly anyway
(not logically, but textually).


I vote for moving it. I hate it when the implementation is all over the place.
 

>> +bool
>> +Type_handler_hybrid_field_type::aggregate_for_result(const char *funcname,
>> +                                                     Item **items, uint nitems,
>> +                                                     bool treat_bit_as_number)
>>  {
> I would move uint i to be a local for variable. This is a C-style loop.
> Is there a compiler that doesn't support this in one of our builders?

Done.

> Also, how about size_t instead of uint? (Probably not necessary but
> wlad made a point of prefering to use that for iterators and such).

size_t is fine. But this should be done together with changing
Item_args::arg_count, which is passed to this method.
I suggest not to do this change under terms of this patch.


I agree.
 
<cut>

>> diff --git a/sql/sql_type.cc b/sql/sql_type.cc
>> index 8746595..397b5cf 100644
>> --- a/sql/sql_type.cc
>> +++ b/sql/sql_type.cc
>> @@ -54,6 +51,41 @@ static Type_handler_set         type_handler_set;
>>  Type_handler_null        type_handler_null;
>>  Type_handler_row         type_handler_row;
>>  Type_handler_varchar     type_handler_varchar;
>> +Type_handler_newdecimal  type_handler_newdecimal;
>> +Type_handler_longlong    type_handler_longlong;
>> +Type_handler_bit         type_handler_bit;
>> +
>> +
> I'm sure there's a better way to write this so that it gets initialized at
> compile time instead of at runtime (before main).
> Perhaps we can define the Type_aggregator differently. I need to look this
> Up. For now it will work.
> Standard offers no guarantees regarding the order, but it shouldn't matter
> for us as the address shouldn't change for global objects during
> initialization.

The part from Static_data_initializer should eventually be gone.
We need to extend the Type_handler API first, so a data type handler
(plugin) can provide an array of its aggregation rules.
So in the future the server will do the calls like
type_aggregator_for_result.add() when loading a new data
type plugin, either on startup, or during INSTALL PLUGIN.

For now, type handlers reside statically in the server anyway,
so this should work fine, and I think it's 100% safe.
As you noted, the addresses should not change even if
the get initialized in some non-reliable order.


I chose this approach because I didn't want to expose this code to
mysqld.cc now. Exposing it would be too early at this point.


I agree. 
 

> I don't like this class too much. One can easily break it by either changing
> LEX_CSTRING::str or LEX_CSTRING::length without changing the other one.
> I suggest we make the inheritance private so that the only way to access the
> members is through the methods available.

Done: I changed it to derive privately.

> A suggestion I have is to create a generic "String" class that has this same
> behaviour, without calling it Name. Afterwards typedefing it to Name.

Earlier I proposed to add similar classes to struct.h,
something like this:

struct Lex_cstring_st: public LEX_CSTRING; // without initialization

class Lex_cstring: protected Lex_cstring_st; // with initialization

and to move all global functions operating
on LEX_CSTRING as methods into these new struct and class.


But Monty disliked it. Monty thinks that having more globally visible
classes makes the code harder to read.
I think it makes the code easier to read, to use, and to reuse.
We never could agree :)

So if we're adding Lex_cstring_st at this point we should be ready:
- either to convince Monty that this is good
- or to revert our changes in struct.h and move the class locally
  to sql_type.h again.

Let's go with Lex_cstring_st in struct.h ?

:)

I say we keep it as is for the moment. The patch changes enough as is. It can happen in a follow up, once we're done.
 

>> +  static const
>> +  Type_handler *aggregate_for_result_traditional(const Type_handler *h1,
>> +                                                 const Type_handler *h2);
>> +
> This function can return a const reference from a const static object
> within the class's namespace. Why create an object every time this is called?
> Compiler might optimize it, but why risk it?
> This goes for all the implementations.

For my opinion it's 100% safe. There is no any risk here.
It should be the same safe with doing "return 10" from a "int" function.

It just reserves 16 bytes for a LEX_CSTRING on the stack and populates
it, and then the caller uses this populated LEX_CSTRING to access
its members though the methods ptr() or length().
Some compilers should probably be able even to use registers instead
of stack for this. But my intent was not to rely on using registers.
I just found this style as the shortest possible and the most readable.

As for performance, it requires the same amount of resources with for
example passing LEX_CSTRING by value to some function, or just
to create a local LEX_CSTRING/LEX_STRING variable.

Here are some examples in the existing code:

sp_sql->append(C_STRING_WITH_LEN("CREATE "));
sp_sql->append(C_STRING_WITH_LEN("PROCEDURE "));
LEX_STRING pw= { C_STRING_WITH_LEN("password") };

Notice, we don't create static LEX_STRING or LEX_CSTRING for all
possible strings we need in the server.
The proposed code should be exactly the same cheap with these examples.

Another approach would be to:
- have a static variable for every Type_handler name
- return this variable from the method name().

This could give slight benefits when we need ptr() without length(),
or the other way around. And the caller in item.cc actually uses
name().ptr() without name().length().

But as name() will be used for errors and for DBUG_PRINT mostly,
so I thought it would be more useful to save the number of lines.
And it's easier to read this way.
You can see the name right inside the class definition,
you don't have to go to sql_class.cc to see it.


The solution I propose is this:
// in sql_type.h
Type_handler {
    const Name& name() const = 0;
}

Type_handler_xxx {
     static const Name xxx_name;   // xxx
     const Name& name() const;
}

// In sql_type.cc
Type_handler_xxx::xxx_name = Name(C_STRING_WITH_LEN("xxx"));
const Name& Type_handler_xxx::name() const
{
   return xxx_name;

Performance wise this is superior. Have a look at assembly for the following function calls:

// these are defined in say sql_type.cc;
Type_handler *give_random_type_handler()
{
  return new Type_handler_date();
}

bool use_random_name(const Name& name)
{
  printf("%s\n", name.ptr());
  return true;
}

Using them in sql_<something_else>.cc

////// Assembly generated by proposed patch: (GCC 6.2.0)
  Type_handler *hnd = give_random_type_handler();
   1cc5c:       e8 00 00 00 00          call   1cc61 
  const Name& n = hnd->name();
   1cc61:       48 8b 10                mov    rdx,QWORD PTR [rax]
   1cc64:       48 89 c7                mov    rdi,rax
   1cc67:       ff 12                   call   QWORD PTR [rdx]
   1cc69:       48 8d bd 00 cf ff ff    lea    rdi,[rbp-0x3100]
   1cc70:       48 89 85 00 cf ff ff    mov    QWORD PTR [rbp-0x3100],rax
   1cc77:       48 89 95 08 cf ff ff    mov    QWORD PTR [rbp-0x30f8],rdx

  use_random_name(n);
   1cc7e:       e8 00 00 00 00          call   1cc83

///// Assembly generated by my suggestion: (GCC 6.2.0)
  Type_handler *hnd = give_random_type_handler();
   1cc5c:       e8 00 00 00 00          call   1cc61
  const Name& n = hnd->name();
   1cc61:       48 8b 10                mov    rdx,QWORD PTR [rax]
   1cc64:       48 89 c7                mov    rdi,rax
   1cc67:       ff 12                   call   QWORD PTR [rdx]
  use_random_name(n);
   1cc69:       48 89 c7                mov    rdi,rax
   1cc6c:       e8 00 00 00 00          call   1cc71

The difference is that the optimized version only uses registers, while the first version has to write to memory (or cache I guess).

To me, readability seems the same for both implementations. I'm not going to insist on this too much, but I believe it's best practice to have this sort of code that passes a const reference, instead of always creating a new item on the stack. We can discuss more in detail on this if you'd like.


So to sum up, please add the assert you proposed and remove setting decimals to 0 as a "safety measure". I would like to make use of a static variable instead of how we now construct "Name" objects for every name() call.

Ok to push otherwise.

Regards,
Vicențiu