On Fri, May 24, 2019 at 04:39:51PM -0700, Igor Babaev wrote:
On Fri, May 24, 2019 at 10:08:04AM -0700, Igor Babaev wrote:
On 05/24/2019 04:03 AM, Varun Gupta wrote:
Hi Igor, After discussing with Sergey, we came up with these conclusions as to why we used the approach of going through all the keyuses in the KEYUSE array
Cases to consider: we have an index on column a
1) a OP const where OP can be (</<=/=/between/ etc on which we do range analysis) . This case is already handled by the range optimizer. We do not create a NULL rejecting predicate for such conditions.
2) eq_join conditions (tbl1.a = tbl2.col2) This is the specific case we have tried to implement in MDEV-15777, we create NULL rejecting predicates for the keyuses for each table and then feed these predicates to range optimizer There is no logical explanation for checking null rejection of fields used in equalities with the help of array keyuse. The logic here is as follows:
We already collect the set of KEYUSE::null_rejecting attributes. Its meaning is very close to what the patch needs. The meaning of this argument is to be used a key value for ref access by an index i1. You need range access by a different index i2. There is no logical connection between these two indexes. The usage of i1 can be blocked by IGNORE clause and this case
On 05/24/2019 01:41 PM, Sergey Petrunia wrote: there won't be any keyuses for it. But the equality from which the null-rejection can be deduced is still there. You need range scan to reduce the number of records produced by the left operand of a join operation. And you should not care weather it should be the argument of any ref access. Here is the situation: The field of interest for range access is t1.a and you have two conditions t1.a=t2.a with index i1 on t2.a t1.b=t3.b with index i2 on t3.b The use prohibited to use i1. t1.a apparently remains null-rejecting. Yet it cannot be not detected as such by a look through array of keyuses.
I don't follow the logic. What MDEV-15777 does is to basically look at KEYUSE object, which represent tbl.key_col = other_column and feed this into the range optimizer: tbl.key_col IS NOT NULL If the query CAN use a certain index: - we will have a KEYUSE object for it - we will feed the IS NOT NULL into the range optimizer. if the query CANNOT use a certain index (because of IGNORE INDEX), then - we won't have KEYUSE object for that index (and so won't construct an IS NOT NULL condition for it) - range optimizer will not build ranges for it (so it doesn't really matter if IS NOT NULL was constructed or not) Can you provide a full example?
Because of this, the part of the patch for MDEV-15777 which analyzes the KEYUSE array is very small: 122 lines including the comment (I counted make_null_rejecting_conds() and add_cond()).
I think, the CPU overhead is small, too.
3) a < func(b,c) we do not handle this case, because:
a) It is harder to process. We would need to process the whole WHERE clause to infer non-nullability of the condition. Example:
(t1.f2+1 < t1.f1+t1.f3) OR ... OR t1.f1 is null
here the left part of the OR condition doesn't allow f1 to be NULL, but the right condition allows it. We would also need to take into account tables inside outer joins. We would need to go through Item classes and add code which say "Item_xxx() will compute to false when its argument Y is null" (there is item->not_null_tables() currently, but not not_null_columns()). Have you actually looked at the implementations of the virtual function not_null_tables()? Do you need me to provide the implementation of the virtual function not_null_columns()? It will require multiple implementations. At the moment we have:
nm mysqld --demangle | grep 'Item.*::not_null_tables' | wc -l 21
Another question: do you intend to collect not_null_columns() for all columns, or just columns that are a part of some index?
What would a good data structure to store a set of tablename.column_name ?
It will be just a bitmap created for each table that can be used for many other purposes. If you want to filter out those that are not part of any index it can be easily done (even once for any table share)
My point was that constructing such a bitmap is a lot heavier than the not_null_table_map(). table.h declares a field bitmap class: typedef Bitmap<MAX_FIELDS> Field_map; but it's huge: (gdb) p sizeof(Field_map) $52 = 544 and note that we will not be able to just put one Field_map into the TABLE object: If we think about how to compute not_null_columns() for Item_cond_and or Item_cond_or: - Item_cond_and will compute a union of all its arguments' not_null_columns(). - Item_cond_or will compute an intersection of all its arguments' not_null_columns(). - Other items have different logic. that is, we will need to keep multiple "set of fields in tables" objects around. Just one bitmap attached to the table will not suffice. This is doable, but it's going to be more code and have more CPU overhead. And I'm still not convinced why we should do this. It is not clear if this will fit the scope of MDEV-15777 and once we got it done, what's the use of it? Is this something other query optimizers do (so the users expect it?)?
b) the conditions in form of arbitrary functions are not as frequently used as ref access conditions.
to sum up a) and b) - doable but will require a lot of effort.
Do you have any specific practically relevant example in mind that we should handle?
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog