Hello Sergei, On 11/12/2014 01:28 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Nov 03, Alexander Barkov wrote:
Hello,
While working on pluggable data types, I noticed a few problems related to optimizer:
MDEV-6950 Bad results with joins comparing DATE/DATETIME and INT/DECIMAL/DOUBLE/ENUM/VARCHAR columns MDEV-6971 Bad results with joins comparing TIME and DOUBLE/DECIMAL columns MDEV-6978 Bad results with joins comparing case insensitive VARCHAR/ENUM/SET expression to a _bin ENUM column MDEV-6982 LEFT JOIN table elimination is not always used when it could MDEV-6989 BINARY and COLLATE xxx_bin comparisions are not used for optimization in some cases MDEV-6990 GROUP_MIN_MAX optimization is not applied in some cases when it could MDEV-6991 GROUP_MIN_MAX optimization is erroneously applied in some cases
In some cases wrong result sets are returned. So it would be nice to have this fixed in 10.0.
I made these changes as a standalone patch. Please review.
It's a big patch that introduces quite a log of new code and new concepts. What is a "Tool"?
It's an internal protected subclass in Field_optimizer. Allows to implement all optimization operations (e.g. ref access, range access, hash join, etc) in a single virtual method can_optimize() using a switch on operation type (Tooltype). Another option would be to have separate 9 *virtual* methods, one per operation. But as they looks quite similar inside a particular Field_xxx (Field, Field_temporal, Field_longstr, Field_geom, Field_enum), I thought having a switch is easier.
I would rather have it in 10.1, not in 10.0.
Okey.
It was difficult to see what was wrong with the old code, that is, what were actual bug fixes? Could you explain, please?
Generally, the problem was that the tests that checked if a particular optimization operation can be applied (in opt_range.cc, opt_table_elimination.cc, sql_select.cc) were not precise. For example, the tests for STRING_RESULT did not take into account that the field can actually be enum or temporal and the operation behaviour should actually be different from what is correct for a regular character string data type. So I ended up in 5 different implementations of can_optimize(). Field - for numeric and bit types Field_temporal Field_longstr Field_geom Field_enum
I also noticed some other optimizer bugs but not sure how to fix them properly:
MDEV-6986 Bad results with join comparing INT and VARCHAR columns
I don't see how you can fix it. The correct fix would be to disable the index in the second query and compare as doubles. But I could only imagine how many applications it will break.
What do you suggest? Won't fix? An idea (but I'm not sure): comparing INT and VARCHAR as DECIMAL should give a more precise result, and the index should still be usable. INT linearly and distinctly maps to DECIMAL (unlike DOUBLE).
MDEV-6969 Bad results with joins comparing DOUBLE to BIGINT/DECIMAL columns
This one, I think, we can safely fix.
MDEV-6993 Bad results with join comparing DECIMAL and ENUM/SET columns
And this one.
Also, I found some other problems (not related to optimizer):
MDEV-6973 XOR aggregates argument collations MDEV-7005 NULLIF does not work as documented Not sure which version to fix them in. Please suggest.
10.0 looks ok.
Okey.
Regards, Sergei