[Maria-developers] MDEV-83: pushdown of expensive predicates
Hi Timour, Some thoughts about MDEV-83. The jira entry says:
The main difference in these two plans is that with the first execution plan the dependent subquery: ... is evaluated before the table part is accessed while many rows where (p_name like 'forest%') is not true are filtered out and evaluation of the dependent subquery is not needed for them.
The way I read this, the only way one could figure out that the subquery should be evaluated at later phase (after table part) is to figure out that selectivity of table `part`, together with its condition of p_name like 'forest%' is less than one. That way, subquery is evaluated fewer times, and query execution will be cheaper. I don't see any index that covers p_name column. This means that current optimizer has no clue about selectivity of conditions over p_name column, and hence is unable to make the correct decision. If I add an index on p_name, then I get a different query plan (I didn't check if is slower or faster than the original). I think, one of the first things this MDEV needs is to figure out how exactly we will get selectivities for non-indexed columns. If we get them, and start to take them into account in join optimizer(*), the example provided in this MDEV/ bug may just go away :-). The general problem will remain, though. (*) { Currently, the optimizer has "quick_condition_rows", which may cause join fanout to be less than 1 (check out examples with EXPLAIN EXTENDED and filtered != 100% in the testsuite). The problem is that quick_condition_rows is not used consistently for all ways to access the table, for example it is not taken into account when considering ref access. I do not know whether Igor's work in best_access_path() includes making it consistently make use of quick_condition_rows estimates. } == Subquery predicate attachment considerations == Some thoughts about the problem. There are three factors we may want/need to take into account when attaching subquery predicate to some join_tab: 1. Fanout of join tabs 2. Cost of the subquery predicate evaluation 3. Selectivity of the subquery predicate. The example used in this MDEV item can be resolved by taking into account #1 and #2. For the first step, we only need to know - accurate join tab fanout (#1), - that subquery predicate evaluation has some cost ( #2 != 0) these two allow us to set/resolve a task of attaching the subquery predicate to the join_tab where total cost of its evaluations is minimized. The exact value of subquery evaluation cost is irrelevant. It will only become relevant when/if we start to make a choice between evaluating subquery predicate, or preforming join with the next join_tab. In order for that choice to be meaningful, we will need to have an idea about subquery predicate's selectivity. That way, we will be able to make a choice between 1. evaluate subquery, then make join with next join tab cost(subquery_pred) + selectivity(subquery_pred) * cost(join_tab) and 2. make join with next join_tab, then evaluate the subquery: cost(join_tab) + selectivity(join_tab) * cost(join_tab) BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
participants (1)
-
Sergei Petrunia