Re: [Maria-developers] PLEASE REVIEWA: pre-requisite patch for the data type plugins, also fixing a number of optimizer bugs

12 Nov 2014

      Hello Sergei,

On 11/12/2014 01:28 PM, Sergei Golubchik wrote:
...
Hi, Alexander!
On Nov 03, Alexander Barkov wrote:
...
Hello,
While working on pluggable data types, I noticed a few problems
related to optimizer:
...
MDEV-6950 Bad results with joins comparing DATE/DATETIME and INT/DECIMAL/DOUBLE/ENUM/VARCHAR columns
MDEV-6971 Bad results with joins comparing TIME and DOUBLE/DECIMAL columns
MDEV-6978 Bad results with joins comparing case insensitive VARCHAR/ENUM/SET expression to a _bin ENUM column
MDEV-6982 LEFT JOIN table elimination is not always used when it could
MDEV-6989 BINARY and COLLATE xxx_bin comparisions are not used for optimization in some cases
MDEV-6990 GROUP_MIN_MAX optimization is not applied in some cases when it could
MDEV-6991 GROUP_MIN_MAX optimization is erroneously applied in some cases
In some cases wrong result sets are returned.
So it would be nice to have this fixed in 10.0.
I made these changes as a standalone patch.
Please review.
It's a big patch that introduces quite a log of new code and new
concepts. What is a "Tool"?
It's an internal protected subclass in Field_optimizer.
Allows to implement all optimization operations
(e.g. ref access, range access, hash join, etc)
in a single virtual method can_optimize() using
a switch on operation type (Tooltype).

Another option would be to have separate 9 *virtual* methods,
one per operation. But as they looks quite similar inside
a particular Field_xxx (Field, Field_temporal, Field_longstr, 
Field_geom, Field_enum), I thought having a switch is easier.
...
I would rather have it in 10.1, not in 10.0.
Okey.
...
It was difficult to see what was wrong with the old code, that is, what
were actual bug fixes? Could you explain, please?
Generally, the problem was that the tests that checked if a particular
optimization operation can be applied (in opt_range.cc, 
opt_table_elimination.cc, sql_select.cc) were not precise.

For example, the tests for STRING_RESULT did not take into account that
the field can actually be enum or temporal and the operation behaviour
should actually be different from what is correct for a regular
character string data type.

So I ended up in 5 different implementations of can_optimize().
Field           - for numeric and bit types
Field_temporal
Field_longstr
Field_geom
Field_enum
...
...
I also noticed some other optimizer bugs but not sure how to fix them
properly:
...
MDEV-6986 Bad results with join comparing INT and VARCHAR columns
I don't see how you can fix it. The correct fix would be to disable the
index in the second query and compare as doubles. But I could only
imagine how many applications it will break.
What do you suggest? Won't fix?

An idea (but I'm not sure):

comparing INT and VARCHAR as DECIMAL should give a more precise result,
and the index should still be usable. INT linearly and distinctly maps
to DECIMAL (unlike DOUBLE).
...
...
...
MDEV-6969 Bad results with joins comparing DOUBLE to BIGINT/DECIMAL columns
This one, I think, we can safely fix.
...
...
MDEV-6993 Bad results with join comparing DECIMAL and ENUM/SET columns
And this one.
...
Also, I found some other problems (not related to optimizer):
...
MDEV-6973 XOR aggregates argument collations
MDEV-7005 NULLIF does not work as documented
Not sure which version to fix them in.
Please suggest.
10.0 looks ok.
Okey.
...
Regards,
Sergei