Hi! I cc:d this to the maria-developers list as there are probably others that are interested in this discussion. (I removed all references to the customer who is interested in getting this solved)
"Sergey" == Sergey Petrunya <psergey@askmonty.org> writes:
Sergey> Hi Monty, Sergey> Please find below a summary of index_merge problems and suggestions on how Sergey> we could address them, together with concerns: Sergey> 1. index_merge/intersect is not used for range access scans. Sergey> Described as WL#21. Seems to be a big task. I extended the high level definition a bit to make the text more clear. Any preliminary estimate you could do (in weeks)? Sergey> One big issue is that wether we'll be able to make the optimizer make Sergey> good choices. There are two concerns Sergey> - Bad estimates/poor cost model Sergey> - Correlation(s). When we consider index_merge/intersect for Sergey> t.key1<c1 AND t.key2<c2 Sergey> we have no way to know whether the conditions are Sergey> = always satisfied together (and thus there is no point to use index_merge) Sergey> = never satisified together Sergey> = have no correlation Sergey> this will cause our estimates be inherently poor. Will they be Sergey> satisfactory? In case they won't, should we provision for adding hints Sergey> or something else? I think that we should assume that there is always notable less rows to retrieve when you can do an intersection than if we can't, if the sizes of the sets are of same magnitude. We should always do index merge (except if disabled by a hint) if the number or rows matching the conditions are relatively small for all involved keys. After all, for normal size keys, we will get +100 keys for each key block we read. If we can eliminate a couple of rows for each block we read we will probably gain speed. I talked with Igor about having live statistics in memory for the result of index merge. If we could keep for each index combination the number of rows found for each index and the number of rows eliminated, we could quite soon know which index merge makes sense. (This is an idea for the future). I had some ideas to improve the temporary table solution; we can discuss these separately. Sergey> 2. Possible range access disables index_merge/[sort]union scans Sergey> Described as WL#24. I've posted a fix suggestion there. I'm not sure if Sergey> it will work to customers complete satisfaction: Sergey> - whether the fix will handle all kinds of WHERE clauses he needs Sergey> - what will happen when we enable cost-based choice between range access Sergey> and index_merge/intersect (will there be poor plan choices due to Sergey> wrong cost calculations?) If you can add to the worklog what kind of WHERE we will be able to optimize with your sugestions, we can then ask the customer to verify if that is enough for him as a first step. Sergey> 3. index_merge/intersect optimization is poor. Sergey> Problems described in WL#26. At the moment I have only vague idea which Sergey> direction we need to move to improve it. Sergey> We could try grabbing low-hanging fruits, like Sergey> = make sure index_merge/intersect is not picked when the range is Sergey> available. Wouldn't this cause more problems like described in #2 ? Please add some example WHERE clauses to the worklog to make it clear exactly what you mean. Sergey> = change the the process of choosing best index_merge/intersect plan so Sergey> that it doesn't construct apparently useless (e.g. redundant) plans. Do you mean disregarding some index from the index_merge plan that are covered by other index? I think this is an obvious fix to make early. Sergey> Or we could try re-working cost calculations so that the above is Sergey> automatically taken care of and doesnt happen. The problem with this is Sergey> that it's hard to estimate when we'll get at acceptable result then. Lets start with the obvious and only when needed start thinking about recalculating costs as these can easily lead to the some old working queries are suddenly slower than before... Sergey> It seems the first logical thing to do is #2. Agree. Sergey> Then we could pick between #1 and #3. #1 and #3 are related in some way as Sergey> index_merge/intersect can make more assumptions when it optimizes for ROR Sergey> scans only. On the other hand, Customer mentioned that if we fix #1, #3 will Sergey> have easier job as he'll be able to remove all the multi-column indexes he Sergey> had to create to be able to get ROR scans for every WHERE clause he has. Agree that fixing #1 is the next logical step. Lets discuss all these worklogs on IRC on Thursday and then decide in which order and how we should do things. Regards, Monty