- developers - lists.mariadb.org

[Maria-developers] New (by Psergey): index_merge optimization tasks (30)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge optimization tasks CREATION DATE..: Wed, 03 Jun 2009, 12:08 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty CATEGORY.......: Server-RawIdeaBin TASK ID........: 30 (http://askmonty.org/worklog/?tid=30) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: This WL entry groups all index_merge optimization improvement tasks ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): index_merge optimization tasks (30)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge optimization tasks CREATION DATE..: Wed, 03 Jun 2009, 12:08 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty CATEGORY.......: Server-RawIdeaBin TASK ID........: 30 (http://askmonty.org/worklog/?tid=30) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: This WL entry groups all index_merge optimization improvement tasks ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Table elimination: all tasks (29)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination: all tasks CREATION DATE..: Wed, 03 Jun 2009, 12:07 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 29 (http://askmonty.org/worklog/?tid=29) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: This WL entry groups all table elimination tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Table elimination: all tasks (29)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination: all tasks CREATION DATE..: Wed, 03 Jun 2009, 12:07 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 29 (http://askmonty.org/worklog/?tid=29) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: This WL entry groups all table elimination tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

Re: [Maria-developers] index_merge tasks summary
by Michael Widenius 03 Jun '09

03 Jun '09

Hi! I cc:d this to the maria-developers list as there are probably others that are interested in this discussion. (I removed all references to the customer who is interested in getting this solved) >>>>> "Sergey" == Sergey Petrunya <psergey(a)askmonty.org> writes: Sergey> Hi Monty, Sergey> Please find below a summary of index_merge problems and suggestions on how Sergey> we could address them, together with concerns: Sergey> 1. index_merge/intersect is not used for range access scans. Sergey> Described as WL#21. Seems to be a big task. I extended the high level definition a bit to make the text more clear. Any preliminary estimate you could do (in weeks)? Sergey> One big issue is that wether we'll be able to make the optimizer make Sergey> good choices. There are two concerns Sergey> - Bad estimates/poor cost model Sergey> - Correlation(s). When we consider index_merge/intersect for Sergey> t.key1<c1 AND t.key2<c2 Sergey> we have no way to know whether the conditions are Sergey> = always satisfied together (and thus there is no point to use index_merge) Sergey> = never satisified together Sergey> = have no correlation Sergey> this will cause our estimates be inherently poor. Will they be Sergey> satisfactory? In case they won't, should we provision for adding hints Sergey> or something else? I think that we should assume that there is always notable less rows to retrieve when you can do an intersection than if we can't, if the sizes of the sets are of same magnitude. We should always do index merge (except if disabled by a hint) if the number or rows matching the conditions are relatively small for all involved keys. After all, for normal size keys, we will get +100 keys for each key block we read. If we can eliminate a couple of rows for each block we read we will probably gain speed. I talked with Igor about having live statistics in memory for the result of index merge. If we could keep for each index combination the number of rows found for each index and the number of rows eliminated, we could quite soon know which index merge makes sense. (This is an idea for the future). I had some ideas to improve the temporary table solution; we can discuss these separately. Sergey> 2. Possible range access disables index_merge/[sort]union scans Sergey> Described as WL#24. I've posted a fix suggestion there. I'm not sure if Sergey> it will work to customers complete satisfaction: Sergey> - whether the fix will handle all kinds of WHERE clauses he needs Sergey> - what will happen when we enable cost-based choice between range access Sergey> and index_merge/intersect (will there be poor plan choices due to Sergey> wrong cost calculations?) If you can add to the worklog what kind of WHERE we will be able to optimize with your sugestions, we can then ask the customer to verify if that is enough for him as a first step. Sergey> 3. index_merge/intersect optimization is poor. Sergey> Problems described in WL#26. At the moment I have only vague idea which Sergey> direction we need to move to improve it. Sergey> We could try grabbing low-hanging fruits, like Sergey> = make sure index_merge/intersect is not picked when the range is Sergey> available. Wouldn't this cause more problems like described in #2 ? Please add some example WHERE clauses to the worklog to make it clear exactly what you mean. Sergey> = change the the process of choosing best index_merge/intersect plan so Sergey> that it doesn't construct apparently useless (e.g. redundant) plans. Do you mean disregarding some index from the index_merge plan that are covered by other index? I think this is an obvious fix to make early. Sergey> Or we could try re-working cost calculations so that the above is Sergey> automatically taken care of and doesnt happen. The problem with this is Sergey> that it's hard to estimate when we'll get at acceptable result then. Lets start with the obvious and only when needed start thinking about recalculating costs as these can easily lead to the some old working queries are suddenly slower than before... Sergey> It seems the first logical thing to do is #2. Agree. Sergey> Then we could pick between #1 and #3. #1 and #3 are related in some way as Sergey> index_merge/intersect can make more assumptions when it optimizes for ROR Sergey> scans only. On the other hand, Customer mentioned that if we fix #1, #3 will Sergey> have easier job as he'll be able to remove all the multi-column indexes he Sergey> had to create to be able to get ROR scans for every WHERE clause he has. Agree that fixing #1 is the next logical step. Lets discuss all these worklogs on IRC on Thursday and then decide in which order and how we should do things. Regards, Monty

1 0

[Maria-developers] Updated (by Guest): index_merge: non-ROR intersection (21)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: non-ROR intersection CREATION DATE..: Thu, 21 May 2009, 21:32 SUPERVISOR.....: Knielsen IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 21 (http://askmonty.org/worklog/?tid=21) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Wed, 03 Jun 2009, 01:17)=-=- High Level Description modified. --- /tmp/wklog.21.old.30002 2009-06-03 01:17:32.000000000 +0300 +++ /tmp/wklog.21.new.30002 2009-06-03 01:17:32.000000000 +0300 @@ -7,13 +7,13 @@ The current optimization works with: -WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 +WHERE key1_part1=1 AND key1_part2=2 AND key2_part1=3 but not with: -WHERE key1_part1=1 OR key2_part1=3 +WHERE key1_part1=1 AND key2_part1=3 or -WHERE key_part1<10 or key2_part1<100 +WHERE key_part1<10 AND key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:06)=-=- High Level Description modified. --- /tmp/wklog.21.old.29694 2009-06-03 01:06:50.000000000 +0300 +++ /tmp/wklog.21.new.29694 2009-06-03 01:06:50.000000000 +0300 @@ -12,6 +12,8 @@ but not with: WHERE key1_part1=1 OR key2_part1=3 +or +WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:05)=-=- High Level Description modified. --- /tmp/wklog.21.old.29638 2009-06-03 01:05:01.000000000 +0300 +++ /tmp/wklog.21.new.29638 2009-06-03 01:05:01.000000000 +0300 @@ -3,5 +3,15 @@ constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. +For example, assuming that key1 has 2 parts and key2 has 1 part. + +The current optimization works with: + +WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 + +but not with: + +WHERE key1_part1=1 OR key2_part1=3 + This WL entry is to lift this limitation by developing algorithms that do -intersection on non-ROR scans. +intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Guest - Tue, 26 May 2009, 14:04)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.1802 2009-05-26 14:04:57.000000000 +0300 +++ /tmp/wklog.21.new.1802 2009-05-26 14:04:57.000000000 +0300 @@ -1,4 +1,3 @@ - <contents> 1. Execution 1.1 Temptable @@ -30,6 +29,8 @@ 1.1 Temptable ------------- +[ This is our strategy of choice at the moment] + Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( @@ -168,3 +169,8 @@ a subset of columns covered by all other indexes. = (TODO any other rules?) +- Correlation across selectivities. If there is a condition + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + can we consider satisfaction of AND-parts to be independent? -=-=(Psergey - Thu, 21 May 2009, 21:33)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.25705 2009-05-21 21:33:02.000000000 +0300 +++ /tmp/wklog.21.new.25705 2009-05-21 21:33:02.000000000 +0300 @@ -1 +1,170 @@ +<contents> +1. Execution +1.1 Temptable +1.1.1 Improvement +1.2 Produce/merge sorted streams +1.3 Extend Unique class to handle intersection +1.4 Strategies that do not seem to be useful +1.4.1 Remove matches after having produced an ordered stream +1.4.2 Sparse rowid bitmaps +2. Optimization + +</contents> + +1. Execution +============ + +The primary task is to find means to compute an intersection of N unordered +streams. Besides general memory/cpu cost of computation, we consider: + +- whether the produced rowid stream is ordered. If it is, it can be piped + into index_merge/intersect (as opposed to sort-intersect) + +- whether the strategy can take advantage of the fact that some input streams + are already rowid-ordered + +- startup cost (cost of producing the first output record) + +We see the following possible strategies: + +1.1 Temptable +------------- +Use a temporary heap-grow-out-to-myisam table with a primary key: + +create table temp_table ( + rowid binary($rowid_size), + count n, + primary key(rowid); +); + +Then use this algorithm: + + i1= {index with the least E(#records)}; + + for each record R in range_scan(i1) + temp_table.insert(R.rowid, count=1); + + for each index idx except i1 + { + for each R record in scan(idx) // (INNER-LOOP) + { + if (temp_table has R) + temptable[R].count++; + } + } + + // The following loop can do ordered or unordered scan + // if we want it to be ordered scan, we probably better arrange so that + // 'count' column is part of the index. + for each record R in temp_table + { + if (R.count == number_of_streams) + emit(R.rowid); + } + +The algorithm has an option to emit an ordered rowid stream. + +In the above form, the cost to produce the first record is high. It's easy to +adjust the algorithm to make it low - we'll need to just start scanning all +indexes at once, and finish as soon as we got a full match, i.e. the + + temptable[R].count++ + +operation resulted in the counter being equal to the number of merged scans. + +1.1.1 Improvement +~~~~~~~~~~~~~~~~~ +When running INNER-LOOP, we could count how many times we've done the +"count++" operation. If it has been done #records-in-temptable times, that +means that all further records will not have matches and we can finish the +scan, i.e. break out of the INNER-LOOP. + +1.2 Produce/merge sorted streams +-------------------------------- +For each of the merged scan, use filesort-like action to end up with an +ordered stream of rowids. Then merge the ordered streams. + +By filesort-like action we mean + - Run over index, collect rowids in a buffer. + - When the buffer is full, sort it and dump into a temporary file. +After the above we'll end up with a number of sorted buffers on disk. We can +use mergebuff() function (it is part of filesort's functions) to produce one +ordered sequence (i.e. array, which may be partially on disk) of rowids. + +Merging of ordered streams with help of priority queue is already implemented +in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the + + child_quick->get_next() + +call with a call to read rowid from an ordered sequence. + +1.3 Extend Unique class to handle intersection +---------------------------------------------- +There is no point to use Unique object as a device that accumulates rowids of +a single scan then produces them in sorted order. One could do the same faster +with accumulating an array of rowids and then sorting it. + +It's possible to use Unique object to collect/merge data from all scans though. +The idea is as follows: + +- Unique should store <rowid, n_scans> pairs +- Duplicates are pairs with the same rowid +- Unique should try to avoid creating duplicates: + - don't add a duplicate into the in-memory part, instead combine two elements + together by adding their n_scans elements. + - combine duplicates when it sees them in Unique.get() call +- The data we get from Unique.get() should be filtered, all records that have + n_scans != number_of_scans_being_merged should be discarded. + +If we're lucky to have started and finished a scan on some index (denote it +as S) without flushing the Unique in the process, then: +- there is no point in adding any new records into the Unique because their + absence in the Unique means that they don't have match in S and hence will + not get into the result of intersection. +- we need to only update the counters to be able to tell if the elements that + are already in the Unique will have matches in all scans. + +1.4 Strategies that do not seem to be useful +-------------------------------------------- + +keeping them here so we don't consider them over and over + +1.4.1 Remove matches after having produced an ordered stream +------------------------------------------------------------ +We can dump everything into a rowid stream and get it sorted. Then we read it, +and if we see a rowid repeated $n_merged_scans times, it belongs to the +intersection (pass to output), otherwise it doesn't (skip). +This doesn't have any advantages over the produce/merge sorted streams +approach. + +1.4.2 Sparse rowid bitmaps +-------------------------- +Use Falcon-style rowid bitmaps. The problem with that is that Falcon's +bitmaps assume there will always be enough memory to accommodate them. + +PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering +disk pages, not ids of individual records. It's hard for us to do something +similar because our rowids are opaque entities whose meaning depends on the +storage engines. + +This seems to require too much change to be worth it. + +2. Optimization +=============== + +SEL_TREE objects already represent intersections. The problems with +optimizations are: + +- Cost formula(s) +- When N keys/conditions are present: + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + somehow avoid considering (2^n - n) possible options. + +- Avoid producing (or even considering) apparently suboptimal plans: + = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are + a subset of columns covered by all other indexes. + = (TODO any other rules?) + DESCRIPTION: At the moment index_merge supports intersection only for rowid-ordered streams. This translates into a limitation that index_merge/intersect can only be constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. For example, assuming that key1 has 2 parts and key2 has 1 part. The current optimization works with: WHERE key1_part1=1 AND key1_part2=2 AND key2_part1=3 but not with: WHERE key1_part1=1 AND key2_part1=3 or WHERE key_part1<10 AND key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. HIGH-LEVEL SPECIFICATION: <contents> 1. Execution 1.1 Temptable 1.1.1 Improvement 1.2 Produce/merge sorted streams 1.3 Extend Unique class to handle intersection 1.4 Strategies that do not seem to be useful 1.4.1 Remove matches after having produced an ordered stream 1.4.2 Sparse rowid bitmaps 2. Optimization </contents> 1. Execution ============ The primary task is to find means to compute an intersection of N unordered streams. Besides general memory/cpu cost of computation, we consider: - whether the produced rowid stream is ordered. If it is, it can be piped into index_merge/intersect (as opposed to sort-intersect) - whether the strategy can take advantage of the fact that some input streams are already rowid-ordered - startup cost (cost of producing the first output record) We see the following possible strategies: 1.1 Temptable ------------- [ This is our strategy of choice at the moment] Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( rowid binary($rowid_size), count n, primary key(rowid); ); Then use this algorithm: i1= {index with the least E(#records)}; for each record R in range_scan(i1) temp_table.insert(R.rowid, count=1); for each index idx except i1 { for each R record in scan(idx) // (INNER-LOOP) { if (temp_table has R) temptable[R].count++; } } // The following loop can do ordered or unordered scan // if we want it to be ordered scan, we probably better arrange so that // 'count' column is part of the index. for each record R in temp_table { if (R.count == number_of_streams) emit(R.rowid); } The algorithm has an option to emit an ordered rowid stream. In the above form, the cost to produce the first record is high. It's easy to adjust the algorithm to make it low - we'll need to just start scanning all indexes at once, and finish as soon as we got a full match, i.e. the temptable[R].count++ operation resulted in the counter being equal to the number of merged scans. 1.1.1 Improvement ~~~~~~~~~~~~~~~~~ When running INNER-LOOP, we could count how many times we've done the "count++" operation. If it has been done #records-in-temptable times, that means that all further records will not have matches and we can finish the scan, i.e. break out of the INNER-LOOP. 1.2 Produce/merge sorted streams -------------------------------- For each of the merged scan, use filesort-like action to end up with an ordered stream of rowids. Then merge the ordered streams. By filesort-like action we mean - Run over index, collect rowids in a buffer. - When the buffer is full, sort it and dump into a temporary file. After the above we'll end up with a number of sorted buffers on disk. We can use mergebuff() function (it is part of filesort's functions) to produce one ordered sequence (i.e. array, which may be partially on disk) of rowids. Merging of ordered streams with help of priority queue is already implemented in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the child_quick->get_next() call with a call to read rowid from an ordered sequence. 1.3 Extend Unique class to handle intersection ---------------------------------------------- There is no point to use Unique object as a device that accumulates rowids of a single scan then produces them in sorted order. One could do the same faster with accumulating an array of rowids and then sorting it. It's possible to use Unique object to collect/merge data from all scans though. The idea is as follows: - Unique should store <rowid, n_scans> pairs - Duplicates are pairs with the same rowid - Unique should try to avoid creating duplicates: - don't add a duplicate into the in-memory part, instead combine two elements together by adding their n_scans elements. - combine duplicates when it sees them in Unique.get() call - The data we get from Unique.get() should be filtered, all records that have n_scans != number_of_scans_being_merged should be discarded. If we're lucky to have started and finished a scan on some index (denote it as S) without flushing the Unique in the process, then: - there is no point in adding any new records into the Unique because their absence in the Unique means that they don't have match in S and hence will not get into the result of intersection. - we need to only update the counters to be able to tell if the elements that are already in the Unique will have matches in all scans. 1.4 Strategies that do not seem to be useful -------------------------------------------- keeping them here so we don't consider them over and over 1.4.1 Remove matches after having produced an ordered stream ------------------------------------------------------------ We can dump everything into a rowid stream and get it sorted. Then we read it, and if we see a rowid repeated $n_merged_scans times, it belongs to the intersection (pass to output), otherwise it doesn't (skip). This doesn't have any advantages over the produce/merge sorted streams approach. 1.4.2 Sparse rowid bitmaps -------------------------- Use Falcon-style rowid bitmaps. The problem with that is that Falcon's bitmaps assume there will always be enough memory to accommodate them. PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering disk pages, not ids of individual records. It's hard for us to do something similar because our rowids are opaque entities whose meaning depends on the storage engines. This seems to require too much change to be worth it. 2. Optimization =============== SEL_TREE objects already represent intersections. The problems with optimizations are: - Cost formula(s) - When N keys/conditions are present: "cond(key1) AND cond(key2) AND ... AND cond(keyN)", somehow avoid considering (2^n - n) possible options. - Avoid producing (or even considering) apparently suboptimal plans: = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are a subset of columns covered by all other indexes. = (TODO any other rules?) - Correlation across selectivities. If there is a condition "cond(key1) AND cond(key2) AND ... AND cond(keyN)", can we consider satisfaction of AND-parts to be independent? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): index_merge: non-ROR intersection (21)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: non-ROR intersection CREATION DATE..: Thu, 21 May 2009, 21:32 SUPERVISOR.....: Knielsen IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 21 (http://askmonty.org/worklog/?tid=21) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Wed, 03 Jun 2009, 01:17)=-=- High Level Description modified. --- /tmp/wklog.21.old.30002 2009-06-03 01:17:32.000000000 +0300 +++ /tmp/wklog.21.new.30002 2009-06-03 01:17:32.000000000 +0300 @@ -7,13 +7,13 @@ The current optimization works with: -WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 +WHERE key1_part1=1 AND key1_part2=2 AND key2_part1=3 but not with: -WHERE key1_part1=1 OR key2_part1=3 +WHERE key1_part1=1 AND key2_part1=3 or -WHERE key_part1<10 or key2_part1<100 +WHERE key_part1<10 AND key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:06)=-=- High Level Description modified. --- /tmp/wklog.21.old.29694 2009-06-03 01:06:50.000000000 +0300 +++ /tmp/wklog.21.new.29694 2009-06-03 01:06:50.000000000 +0300 @@ -12,6 +12,8 @@ but not with: WHERE key1_part1=1 OR key2_part1=3 +or +WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:05)=-=- High Level Description modified. --- /tmp/wklog.21.old.29638 2009-06-03 01:05:01.000000000 +0300 +++ /tmp/wklog.21.new.29638 2009-06-03 01:05:01.000000000 +0300 @@ -3,5 +3,15 @@ constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. +For example, assuming that key1 has 2 parts and key2 has 1 part. + +The current optimization works with: + +WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 + +but not with: + +WHERE key1_part1=1 OR key2_part1=3 + This WL entry is to lift this limitation by developing algorithms that do -intersection on non-ROR scans. +intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Guest - Tue, 26 May 2009, 14:04)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.1802 2009-05-26 14:04:57.000000000 +0300 +++ /tmp/wklog.21.new.1802 2009-05-26 14:04:57.000000000 +0300 @@ -1,4 +1,3 @@ - <contents> 1. Execution 1.1 Temptable @@ -30,6 +29,8 @@ 1.1 Temptable ------------- +[ This is our strategy of choice at the moment] + Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( @@ -168,3 +169,8 @@ a subset of columns covered by all other indexes. = (TODO any other rules?) +- Correlation across selectivities. If there is a condition + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + can we consider satisfaction of AND-parts to be independent? -=-=(Psergey - Thu, 21 May 2009, 21:33)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.25705 2009-05-21 21:33:02.000000000 +0300 +++ /tmp/wklog.21.new.25705 2009-05-21 21:33:02.000000000 +0300 @@ -1 +1,170 @@ +<contents> +1. Execution +1.1 Temptable +1.1.1 Improvement +1.2 Produce/merge sorted streams +1.3 Extend Unique class to handle intersection +1.4 Strategies that do not seem to be useful +1.4.1 Remove matches after having produced an ordered stream +1.4.2 Sparse rowid bitmaps +2. Optimization + +</contents> + +1. Execution +============ + +The primary task is to find means to compute an intersection of N unordered +streams. Besides general memory/cpu cost of computation, we consider: + +- whether the produced rowid stream is ordered. If it is, it can be piped + into index_merge/intersect (as opposed to sort-intersect) + +- whether the strategy can take advantage of the fact that some input streams + are already rowid-ordered + +- startup cost (cost of producing the first output record) + +We see the following possible strategies: + +1.1 Temptable +------------- +Use a temporary heap-grow-out-to-myisam table with a primary key: + +create table temp_table ( + rowid binary($rowid_size), + count n, + primary key(rowid); +); + +Then use this algorithm: + + i1= {index with the least E(#records)}; + + for each record R in range_scan(i1) + temp_table.insert(R.rowid, count=1); + + for each index idx except i1 + { + for each R record in scan(idx) // (INNER-LOOP) + { + if (temp_table has R) + temptable[R].count++; + } + } + + // The following loop can do ordered or unordered scan + // if we want it to be ordered scan, we probably better arrange so that + // 'count' column is part of the index. + for each record R in temp_table + { + if (R.count == number_of_streams) + emit(R.rowid); + } + +The algorithm has an option to emit an ordered rowid stream. + +In the above form, the cost to produce the first record is high. It's easy to +adjust the algorithm to make it low - we'll need to just start scanning all +indexes at once, and finish as soon as we got a full match, i.e. the + + temptable[R].count++ + +operation resulted in the counter being equal to the number of merged scans. + +1.1.1 Improvement +~~~~~~~~~~~~~~~~~ +When running INNER-LOOP, we could count how many times we've done the +"count++" operation. If it has been done #records-in-temptable times, that +means that all further records will not have matches and we can finish the +scan, i.e. break out of the INNER-LOOP. + +1.2 Produce/merge sorted streams +-------------------------------- +For each of the merged scan, use filesort-like action to end up with an +ordered stream of rowids. Then merge the ordered streams. + +By filesort-like action we mean + - Run over index, collect rowids in a buffer. + - When the buffer is full, sort it and dump into a temporary file. +After the above we'll end up with a number of sorted buffers on disk. We can +use mergebuff() function (it is part of filesort's functions) to produce one +ordered sequence (i.e. array, which may be partially on disk) of rowids. + +Merging of ordered streams with help of priority queue is already implemented +in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the + + child_quick->get_next() + +call with a call to read rowid from an ordered sequence. + +1.3 Extend Unique class to handle intersection +---------------------------------------------- +There is no point to use Unique object as a device that accumulates rowids of +a single scan then produces them in sorted order. One could do the same faster +with accumulating an array of rowids and then sorting it. + +It's possible to use Unique object to collect/merge data from all scans though. +The idea is as follows: + +- Unique should store <rowid, n_scans> pairs +- Duplicates are pairs with the same rowid +- Unique should try to avoid creating duplicates: + - don't add a duplicate into the in-memory part, instead combine two elements + together by adding their n_scans elements. + - combine duplicates when it sees them in Unique.get() call +- The data we get from Unique.get() should be filtered, all records that have + n_scans != number_of_scans_being_merged should be discarded. + +If we're lucky to have started and finished a scan on some index (denote it +as S) without flushing the Unique in the process, then: +- there is no point in adding any new records into the Unique because their + absence in the Unique means that they don't have match in S and hence will + not get into the result of intersection. +- we need to only update the counters to be able to tell if the elements that + are already in the Unique will have matches in all scans. + +1.4 Strategies that do not seem to be useful +-------------------------------------------- + +keeping them here so we don't consider them over and over + +1.4.1 Remove matches after having produced an ordered stream +------------------------------------------------------------ +We can dump everything into a rowid stream and get it sorted. Then we read it, +and if we see a rowid repeated $n_merged_scans times, it belongs to the +intersection (pass to output), otherwise it doesn't (skip). +This doesn't have any advantages over the produce/merge sorted streams +approach. + +1.4.2 Sparse rowid bitmaps +-------------------------- +Use Falcon-style rowid bitmaps. The problem with that is that Falcon's +bitmaps assume there will always be enough memory to accommodate them. + +PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering +disk pages, not ids of individual records. It's hard for us to do something +similar because our rowids are opaque entities whose meaning depends on the +storage engines. + +This seems to require too much change to be worth it. + +2. Optimization +=============== + +SEL_TREE objects already represent intersections. The problems with +optimizations are: + +- Cost formula(s) +- When N keys/conditions are present: + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + somehow avoid considering (2^n - n) possible options. + +- Avoid producing (or even considering) apparently suboptimal plans: + = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are + a subset of columns covered by all other indexes. + = (TODO any other rules?) + DESCRIPTION: At the moment index_merge supports intersection only for rowid-ordered streams. This translates into a limitation that index_merge/intersect can only be constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. For example, assuming that key1 has 2 parts and key2 has 1 part. The current optimization works with: WHERE key1_part1=1 AND key1_part2=2 AND key2_part1=3 but not with: WHERE key1_part1=1 AND key2_part1=3 or WHERE key_part1<10 AND key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. HIGH-LEVEL SPECIFICATION: <contents> 1. Execution 1.1 Temptable 1.1.1 Improvement 1.2 Produce/merge sorted streams 1.3 Extend Unique class to handle intersection 1.4 Strategies that do not seem to be useful 1.4.1 Remove matches after having produced an ordered stream 1.4.2 Sparse rowid bitmaps 2. Optimization </contents> 1. Execution ============ The primary task is to find means to compute an intersection of N unordered streams. Besides general memory/cpu cost of computation, we consider: - whether the produced rowid stream is ordered. If it is, it can be piped into index_merge/intersect (as opposed to sort-intersect) - whether the strategy can take advantage of the fact that some input streams are already rowid-ordered - startup cost (cost of producing the first output record) We see the following possible strategies: 1.1 Temptable ------------- [ This is our strategy of choice at the moment] Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( rowid binary($rowid_size), count n, primary key(rowid); ); Then use this algorithm: i1= {index with the least E(#records)}; for each record R in range_scan(i1) temp_table.insert(R.rowid, count=1); for each index idx except i1 { for each R record in scan(idx) // (INNER-LOOP) { if (temp_table has R) temptable[R].count++; } } // The following loop can do ordered or unordered scan // if we want it to be ordered scan, we probably better arrange so that // 'count' column is part of the index. for each record R in temp_table { if (R.count == number_of_streams) emit(R.rowid); } The algorithm has an option to emit an ordered rowid stream. In the above form, the cost to produce the first record is high. It's easy to adjust the algorithm to make it low - we'll need to just start scanning all indexes at once, and finish as soon as we got a full match, i.e. the temptable[R].count++ operation resulted in the counter being equal to the number of merged scans. 1.1.1 Improvement ~~~~~~~~~~~~~~~~~ When running INNER-LOOP, we could count how many times we've done the "count++" operation. If it has been done #records-in-temptable times, that means that all further records will not have matches and we can finish the scan, i.e. break out of the INNER-LOOP. 1.2 Produce/merge sorted streams -------------------------------- For each of the merged scan, use filesort-like action to end up with an ordered stream of rowids. Then merge the ordered streams. By filesort-like action we mean - Run over index, collect rowids in a buffer. - When the buffer is full, sort it and dump into a temporary file. After the above we'll end up with a number of sorted buffers on disk. We can use mergebuff() function (it is part of filesort's functions) to produce one ordered sequence (i.e. array, which may be partially on disk) of rowids. Merging of ordered streams with help of priority queue is already implemented in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the child_quick->get_next() call with a call to read rowid from an ordered sequence. 1.3 Extend Unique class to handle intersection ---------------------------------------------- There is no point to use Unique object as a device that accumulates rowids of a single scan then produces them in sorted order. One could do the same faster with accumulating an array of rowids and then sorting it. It's possible to use Unique object to collect/merge data from all scans though. The idea is as follows: - Unique should store <rowid, n_scans> pairs - Duplicates are pairs with the same rowid - Unique should try to avoid creating duplicates: - don't add a duplicate into the in-memory part, instead combine two elements together by adding their n_scans elements. - combine duplicates when it sees them in Unique.get() call - The data we get from Unique.get() should be filtered, all records that have n_scans != number_of_scans_being_merged should be discarded. If we're lucky to have started and finished a scan on some index (denote it as S) without flushing the Unique in the process, then: - there is no point in adding any new records into the Unique because their absence in the Unique means that they don't have match in S and hence will not get into the result of intersection. - we need to only update the counters to be able to tell if the elements that are already in the Unique will have matches in all scans. 1.4 Strategies that do not seem to be useful -------------------------------------------- keeping them here so we don't consider them over and over 1.4.1 Remove matches after having produced an ordered stream ------------------------------------------------------------ We can dump everything into a rowid stream and get it sorted. Then we read it, and if we see a rowid repeated $n_merged_scans times, it belongs to the intersection (pass to output), otherwise it doesn't (skip). This doesn't have any advantages over the produce/merge sorted streams approach. 1.4.2 Sparse rowid bitmaps -------------------------- Use Falcon-style rowid bitmaps. The problem with that is that Falcon's bitmaps assume there will always be enough memory to accommodate them. PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering disk pages, not ids of individual records. It's hard for us to do something similar because our rowids are opaque entities whose meaning depends on the storage engines. This seems to require too much change to be worth it. 2. Optimization =============== SEL_TREE objects already represent intersections. The problems with optimizations are: - Cost formula(s) - When N keys/conditions are present: "cond(key1) AND cond(key2) AND ... AND cond(keyN)", somehow avoid considering (2^n - n) possible options. - Avoid producing (or even considering) apparently suboptimal plans: = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are a subset of columns covered by all other indexes. = (TODO any other rules?) - Correlation across selectivities. If there is a condition "cond(key1) AND cond(key2) AND ... AND cond(keyN)", can we consider satisfaction of AND-parts to be independent? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Monty): index_merge: non-ROR intersection (21)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: non-ROR intersection CREATION DATE..: Thu, 21 May 2009, 21:32 SUPERVISOR.....: Knielsen IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 21 (http://askmonty.org/worklog/?tid=21) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Monty - Wed, 03 Jun 2009, 01:06)=-=- High Level Description modified. --- /tmp/wklog.21.old.29694 2009-06-03 01:06:50.000000000 +0300 +++ /tmp/wklog.21.new.29694 2009-06-03 01:06:50.000000000 +0300 @@ -12,6 +12,8 @@ but not with: WHERE key1_part1=1 OR key2_part1=3 +or +WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:05)=-=- High Level Description modified. --- /tmp/wklog.21.old.29638 2009-06-03 01:05:01.000000000 +0300 +++ /tmp/wklog.21.new.29638 2009-06-03 01:05:01.000000000 +0300 @@ -3,5 +3,15 @@ constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. +For example, assuming that key1 has 2 parts and key2 has 1 part. + +The current optimization works with: + +WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 + +but not with: + +WHERE key1_part1=1 OR key2_part1=3 + This WL entry is to lift this limitation by developing algorithms that do -intersection on non-ROR scans. +intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Guest - Tue, 26 May 2009, 14:04)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.1802 2009-05-26 14:04:57.000000000 +0300 +++ /tmp/wklog.21.new.1802 2009-05-26 14:04:57.000000000 +0300 @@ -1,4 +1,3 @@ - <contents> 1. Execution 1.1 Temptable @@ -30,6 +29,8 @@ 1.1 Temptable ------------- +[ This is our strategy of choice at the moment] + Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( @@ -168,3 +169,8 @@ a subset of columns covered by all other indexes. = (TODO any other rules?) +- Correlation across selectivities. If there is a condition + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + can we consider satisfaction of AND-parts to be independent? -=-=(Psergey - Thu, 21 May 2009, 21:33)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.25705 2009-05-21 21:33:02.000000000 +0300 +++ /tmp/wklog.21.new.25705 2009-05-21 21:33:02.000000000 +0300 @@ -1 +1,170 @@ +<contents> +1. Execution +1.1 Temptable +1.1.1 Improvement +1.2 Produce/merge sorted streams +1.3 Extend Unique class to handle intersection +1.4 Strategies that do not seem to be useful +1.4.1 Remove matches after having produced an ordered stream +1.4.2 Sparse rowid bitmaps +2. Optimization + +</contents> + +1. Execution +============ + +The primary task is to find means to compute an intersection of N unordered +streams. Besides general memory/cpu cost of computation, we consider: + +- whether the produced rowid stream is ordered. If it is, it can be piped + into index_merge/intersect (as opposed to sort-intersect) + +- whether the strategy can take advantage of the fact that some input streams + are already rowid-ordered + +- startup cost (cost of producing the first output record) + +We see the following possible strategies: + +1.1 Temptable +------------- +Use a temporary heap-grow-out-to-myisam table with a primary key: + +create table temp_table ( + rowid binary($rowid_size), + count n, + primary key(rowid); +); + +Then use this algorithm: + + i1= {index with the least E(#records)}; + + for each record R in range_scan(i1) + temp_table.insert(R.rowid, count=1); + + for each index idx except i1 + { + for each R record in scan(idx) // (INNER-LOOP) + { + if (temp_table has R) + temptable[R].count++; + } + } + + // The following loop can do ordered or unordered scan + // if we want it to be ordered scan, we probably better arrange so that + // 'count' column is part of the index. + for each record R in temp_table + { + if (R.count == number_of_streams) + emit(R.rowid); + } + +The algorithm has an option to emit an ordered rowid stream. + +In the above form, the cost to produce the first record is high. It's easy to +adjust the algorithm to make it low - we'll need to just start scanning all +indexes at once, and finish as soon as we got a full match, i.e. the + + temptable[R].count++ + +operation resulted in the counter being equal to the number of merged scans. + +1.1.1 Improvement +~~~~~~~~~~~~~~~~~ +When running INNER-LOOP, we could count how many times we've done the +"count++" operation. If it has been done #records-in-temptable times, that +means that all further records will not have matches and we can finish the +scan, i.e. break out of the INNER-LOOP. + +1.2 Produce/merge sorted streams +-------------------------------- +For each of the merged scan, use filesort-like action to end up with an +ordered stream of rowids. Then merge the ordered streams. + +By filesort-like action we mean + - Run over index, collect rowids in a buffer. + - When the buffer is full, sort it and dump into a temporary file. +After the above we'll end up with a number of sorted buffers on disk. We can +use mergebuff() function (it is part of filesort's functions) to produce one +ordered sequence (i.e. array, which may be partially on disk) of rowids. + +Merging of ordered streams with help of priority queue is already implemented +in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the + + child_quick->get_next() + +call with a call to read rowid from an ordered sequence. + +1.3 Extend Unique class to handle intersection +---------------------------------------------- +There is no point to use Unique object as a device that accumulates rowids of +a single scan then produces them in sorted order. One could do the same faster +with accumulating an array of rowids and then sorting it. + +It's possible to use Unique object to collect/merge data from all scans though. +The idea is as follows: + +- Unique should store <rowid, n_scans> pairs +- Duplicates are pairs with the same rowid +- Unique should try to avoid creating duplicates: + - don't add a duplicate into the in-memory part, instead combine two elements + together by adding their n_scans elements. + - combine duplicates when it sees them in Unique.get() call +- The data we get from Unique.get() should be filtered, all records that have + n_scans != number_of_scans_being_merged should be discarded. + +If we're lucky to have started and finished a scan on some index (denote it +as S) without flushing the Unique in the process, then: +- there is no point in adding any new records into the Unique because their + absence in the Unique means that they don't have match in S and hence will + not get into the result of intersection. +- we need to only update the counters to be able to tell if the elements that + are already in the Unique will have matches in all scans. + +1.4 Strategies that do not seem to be useful +-------------------------------------------- + +keeping them here so we don't consider them over and over + +1.4.1 Remove matches after having produced an ordered stream +------------------------------------------------------------ +We can dump everything into a rowid stream and get it sorted. Then we read it, +and if we see a rowid repeated $n_merged_scans times, it belongs to the +intersection (pass to output), otherwise it doesn't (skip). +This doesn't have any advantages over the produce/merge sorted streams +approach. + +1.4.2 Sparse rowid bitmaps +-------------------------- +Use Falcon-style rowid bitmaps. The problem with that is that Falcon's +bitmaps assume there will always be enough memory to accommodate them. + +PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering +disk pages, not ids of individual records. It's hard for us to do something +similar because our rowids are opaque entities whose meaning depends on the +storage engines. + +This seems to require too much change to be worth it. + +2. Optimization +=============== + +SEL_TREE objects already represent intersections. The problems with +optimizations are: + +- Cost formula(s) +- When N keys/conditions are present: + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + somehow avoid considering (2^n - n) possible options. + +- Avoid producing (or even considering) apparently suboptimal plans: + = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are + a subset of columns covered by all other indexes. + = (TODO any other rules?) + DESCRIPTION: At the moment index_merge supports intersection only for rowid-ordered streams. This translates into a limitation that index_merge/intersect can only be constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. For example, assuming that key1 has 2 parts and key2 has 1 part. The current optimization works with: WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 but not with: WHERE key1_part1=1 OR key2_part1=3 or WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. HIGH-LEVEL SPECIFICATION: <contents> 1. Execution 1.1 Temptable 1.1.1 Improvement 1.2 Produce/merge sorted streams 1.3 Extend Unique class to handle intersection 1.4 Strategies that do not seem to be useful 1.4.1 Remove matches after having produced an ordered stream 1.4.2 Sparse rowid bitmaps 2. Optimization </contents> 1. Execution ============ The primary task is to find means to compute an intersection of N unordered streams. Besides general memory/cpu cost of computation, we consider: - whether the produced rowid stream is ordered. If it is, it can be piped into index_merge/intersect (as opposed to sort-intersect) - whether the strategy can take advantage of the fact that some input streams are already rowid-ordered - startup cost (cost of producing the first output record) We see the following possible strategies: 1.1 Temptable ------------- [ This is our strategy of choice at the moment] Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( rowid binary($rowid_size), count n, primary key(rowid); ); Then use this algorithm: i1= {index with the least E(#records)}; for each record R in range_scan(i1) temp_table.insert(R.rowid, count=1); for each index idx except i1 { for each R record in scan(idx) // (INNER-LOOP) { if (temp_table has R) temptable[R].count++; } } // The following loop can do ordered or unordered scan // if we want it to be ordered scan, we probably better arrange so that // 'count' column is part of the index. for each record R in temp_table { if (R.count == number_of_streams) emit(R.rowid); } The algorithm has an option to emit an ordered rowid stream. In the above form, the cost to produce the first record is high. It's easy to adjust the algorithm to make it low - we'll need to just start scanning all indexes at once, and finish as soon as we got a full match, i.e. the temptable[R].count++ operation resulted in the counter being equal to the number of merged scans. 1.1.1 Improvement ~~~~~~~~~~~~~~~~~ When running INNER-LOOP, we could count how many times we've done the "count++" operation. If it has been done #records-in-temptable times, that means that all further records will not have matches and we can finish the scan, i.e. break out of the INNER-LOOP. 1.2 Produce/merge sorted streams -------------------------------- For each of the merged scan, use filesort-like action to end up with an ordered stream of rowids. Then merge the ordered streams. By filesort-like action we mean - Run over index, collect rowids in a buffer. - When the buffer is full, sort it and dump into a temporary file. After the above we'll end up with a number of sorted buffers on disk. We can use mergebuff() function (it is part of filesort's functions) to produce one ordered sequence (i.e. array, which may be partially on disk) of rowids. Merging of ordered streams with help of priority queue is already implemented in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the child_quick->get_next() call with a call to read rowid from an ordered sequence. 1.3 Extend Unique class to handle intersection ---------------------------------------------- There is no point to use Unique object as a device that accumulates rowids of a single scan then produces them in sorted order. One could do the same faster with accumulating an array of rowids and then sorting it. It's possible to use Unique object to collect/merge data from all scans though. The idea is as follows: - Unique should store <rowid, n_scans> pairs - Duplicates are pairs with the same rowid - Unique should try to avoid creating duplicates: - don't add a duplicate into the in-memory part, instead combine two elements together by adding their n_scans elements. - combine duplicates when it sees them in Unique.get() call - The data we get from Unique.get() should be filtered, all records that have n_scans != number_of_scans_being_merged should be discarded. If we're lucky to have started and finished a scan on some index (denote it as S) without flushing the Unique in the process, then: - there is no point in adding any new records into the Unique because their absence in the Unique means that they don't have match in S and hence will not get into the result of intersection. - we need to only update the counters to be able to tell if the elements that are already in the Unique will have matches in all scans. 1.4 Strategies that do not seem to be useful -------------------------------------------- keeping them here so we don't consider them over and over 1.4.1 Remove matches after having produced an ordered stream ------------------------------------------------------------ We can dump everything into a rowid stream and get it sorted. Then we read it, and if we see a rowid repeated $n_merged_scans times, it belongs to the intersection (pass to output), otherwise it doesn't (skip). This doesn't have any advantages over the produce/merge sorted streams approach. 1.4.2 Sparse rowid bitmaps -------------------------- Use Falcon-style rowid bitmaps. The problem with that is that Falcon's bitmaps assume there will always be enough memory to accommodate them. PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering disk pages, not ids of individual records. It's hard for us to do something similar because our rowids are opaque entities whose meaning depends on the storage engines. This seems to require too much change to be worth it. 2. Optimization =============== SEL_TREE objects already represent intersections. The problems with optimizations are: - Cost formula(s) - When N keys/conditions are present: "cond(key1) AND cond(key2) AND ... AND cond(keyN)", somehow avoid considering (2^n - n) possible options. - Avoid producing (or even considering) apparently suboptimal plans: = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are a subset of columns covered by all other indexes. = (TODO any other rules?) - Correlation across selectivities. If there is a condition "cond(key1) AND cond(key2) AND ... AND cond(keyN)", can we consider satisfaction of AND-parts to be independent? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Monty): index_merge: non-ROR intersection (21)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: non-ROR intersection CREATION DATE..: Thu, 21 May 2009, 21:32 SUPERVISOR.....: Knielsen IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 21 (http://askmonty.org/worklog/?tid=21) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Monty - Wed, 03 Jun 2009, 01:06)=-=- High Level Description modified. --- /tmp/wklog.21.old.29694 2009-06-03 01:06:50.000000000 +0300 +++ /tmp/wklog.21.new.29694 2009-06-03 01:06:50.000000000 +0300 @@ -12,6 +12,8 @@ but not with: WHERE key1_part1=1 OR key2_part1=3 +or +WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Monty - Wed, 03 Jun 2009, 01:05)=-=- High Level Description modified. --- /tmp/wklog.21.old.29638 2009-06-03 01:05:01.000000000 +0300 +++ /tmp/wklog.21.new.29638 2009-06-03 01:05:01.000000000 +0300 @@ -3,5 +3,15 @@ constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. +For example, assuming that key1 has 2 parts and key2 has 1 part. + +The current optimization works with: + +WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 + +but not with: + +WHERE key1_part1=1 OR key2_part1=3 + This WL entry is to lift this limitation by developing algorithms that do -intersection on non-ROR scans. +intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Guest - Tue, 26 May 2009, 14:04)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.1802 2009-05-26 14:04:57.000000000 +0300 +++ /tmp/wklog.21.new.1802 2009-05-26 14:04:57.000000000 +0300 @@ -1,4 +1,3 @@ - <contents> 1. Execution 1.1 Temptable @@ -30,6 +29,8 @@ 1.1 Temptable ------------- +[ This is our strategy of choice at the moment] + Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( @@ -168,3 +169,8 @@ a subset of columns covered by all other indexes. = (TODO any other rules?) +- Correlation across selectivities. If there is a condition + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + can we consider satisfaction of AND-parts to be independent? -=-=(Psergey - Thu, 21 May 2009, 21:33)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.25705 2009-05-21 21:33:02.000000000 +0300 +++ /tmp/wklog.21.new.25705 2009-05-21 21:33:02.000000000 +0300 @@ -1 +1,170 @@ +<contents> +1. Execution +1.1 Temptable +1.1.1 Improvement +1.2 Produce/merge sorted streams +1.3 Extend Unique class to handle intersection +1.4 Strategies that do not seem to be useful +1.4.1 Remove matches after having produced an ordered stream +1.4.2 Sparse rowid bitmaps +2. Optimization + +</contents> + +1. Execution +============ + +The primary task is to find means to compute an intersection of N unordered +streams. Besides general memory/cpu cost of computation, we consider: + +- whether the produced rowid stream is ordered. If it is, it can be piped + into index_merge/intersect (as opposed to sort-intersect) + +- whether the strategy can take advantage of the fact that some input streams + are already rowid-ordered + +- startup cost (cost of producing the first output record) + +We see the following possible strategies: + +1.1 Temptable +------------- +Use a temporary heap-grow-out-to-myisam table with a primary key: + +create table temp_table ( + rowid binary($rowid_size), + count n, + primary key(rowid); +); + +Then use this algorithm: + + i1= {index with the least E(#records)}; + + for each record R in range_scan(i1) + temp_table.insert(R.rowid, count=1); + + for each index idx except i1 + { + for each R record in scan(idx) // (INNER-LOOP) + { + if (temp_table has R) + temptable[R].count++; + } + } + + // The following loop can do ordered or unordered scan + // if we want it to be ordered scan, we probably better arrange so that + // 'count' column is part of the index. + for each record R in temp_table + { + if (R.count == number_of_streams) + emit(R.rowid); + } + +The algorithm has an option to emit an ordered rowid stream. + +In the above form, the cost to produce the first record is high. It's easy to +adjust the algorithm to make it low - we'll need to just start scanning all +indexes at once, and finish as soon as we got a full match, i.e. the + + temptable[R].count++ + +operation resulted in the counter being equal to the number of merged scans. + +1.1.1 Improvement +~~~~~~~~~~~~~~~~~ +When running INNER-LOOP, we could count how many times we've done the +"count++" operation. If it has been done #records-in-temptable times, that +means that all further records will not have matches and we can finish the +scan, i.e. break out of the INNER-LOOP. + +1.2 Produce/merge sorted streams +-------------------------------- +For each of the merged scan, use filesort-like action to end up with an +ordered stream of rowids. Then merge the ordered streams. + +By filesort-like action we mean + - Run over index, collect rowids in a buffer. + - When the buffer is full, sort it and dump into a temporary file. +After the above we'll end up with a number of sorted buffers on disk. We can +use mergebuff() function (it is part of filesort's functions) to produce one +ordered sequence (i.e. array, which may be partially on disk) of rowids. + +Merging of ordered streams with help of priority queue is already implemented +in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the + + child_quick->get_next() + +call with a call to read rowid from an ordered sequence. + +1.3 Extend Unique class to handle intersection +---------------------------------------------- +There is no point to use Unique object as a device that accumulates rowids of +a single scan then produces them in sorted order. One could do the same faster +with accumulating an array of rowids and then sorting it. + +It's possible to use Unique object to collect/merge data from all scans though. +The idea is as follows: + +- Unique should store <rowid, n_scans> pairs +- Duplicates are pairs with the same rowid +- Unique should try to avoid creating duplicates: + - don't add a duplicate into the in-memory part, instead combine two elements + together by adding their n_scans elements. + - combine duplicates when it sees them in Unique.get() call +- The data we get from Unique.get() should be filtered, all records that have + n_scans != number_of_scans_being_merged should be discarded. + +If we're lucky to have started and finished a scan on some index (denote it +as S) without flushing the Unique in the process, then: +- there is no point in adding any new records into the Unique because their + absence in the Unique means that they don't have match in S and hence will + not get into the result of intersection. +- we need to only update the counters to be able to tell if the elements that + are already in the Unique will have matches in all scans. + +1.4 Strategies that do not seem to be useful +-------------------------------------------- + +keeping them here so we don't consider them over and over + +1.4.1 Remove matches after having produced an ordered stream +------------------------------------------------------------ +We can dump everything into a rowid stream and get it sorted. Then we read it, +and if we see a rowid repeated $n_merged_scans times, it belongs to the +intersection (pass to output), otherwise it doesn't (skip). +This doesn't have any advantages over the produce/merge sorted streams +approach. + +1.4.2 Sparse rowid bitmaps +-------------------------- +Use Falcon-style rowid bitmaps. The problem with that is that Falcon's +bitmaps assume there will always be enough memory to accommodate them. + +PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering +disk pages, not ids of individual records. It's hard for us to do something +similar because our rowids are opaque entities whose meaning depends on the +storage engines. + +This seems to require too much change to be worth it. + +2. Optimization +=============== + +SEL_TREE objects already represent intersections. The problems with +optimizations are: + +- Cost formula(s) +- When N keys/conditions are present: + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + somehow avoid considering (2^n - n) possible options. + +- Avoid producing (or even considering) apparently suboptimal plans: + = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are + a subset of columns covered by all other indexes. + = (TODO any other rules?) + DESCRIPTION: At the moment index_merge supports intersection only for rowid-ordered streams. This translates into a limitation that index_merge/intersect can only be constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. For example, assuming that key1 has 2 parts and key2 has 1 part. The current optimization works with: WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 but not with: WHERE key1_part1=1 OR key2_part1=3 or WHERE key_part1<10 or key2_part1<100 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. HIGH-LEVEL SPECIFICATION: <contents> 1. Execution 1.1 Temptable 1.1.1 Improvement 1.2 Produce/merge sorted streams 1.3 Extend Unique class to handle intersection 1.4 Strategies that do not seem to be useful 1.4.1 Remove matches after having produced an ordered stream 1.4.2 Sparse rowid bitmaps 2. Optimization </contents> 1. Execution ============ The primary task is to find means to compute an intersection of N unordered streams. Besides general memory/cpu cost of computation, we consider: - whether the produced rowid stream is ordered. If it is, it can be piped into index_merge/intersect (as opposed to sort-intersect) - whether the strategy can take advantage of the fact that some input streams are already rowid-ordered - startup cost (cost of producing the first output record) We see the following possible strategies: 1.1 Temptable ------------- [ This is our strategy of choice at the moment] Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( rowid binary($rowid_size), count n, primary key(rowid); ); Then use this algorithm: i1= {index with the least E(#records)}; for each record R in range_scan(i1) temp_table.insert(R.rowid, count=1); for each index idx except i1 { for each R record in scan(idx) // (INNER-LOOP) { if (temp_table has R) temptable[R].count++; } } // The following loop can do ordered or unordered scan // if we want it to be ordered scan, we probably better arrange so that // 'count' column is part of the index. for each record R in temp_table { if (R.count == number_of_streams) emit(R.rowid); } The algorithm has an option to emit an ordered rowid stream. In the above form, the cost to produce the first record is high. It's easy to adjust the algorithm to make it low - we'll need to just start scanning all indexes at once, and finish as soon as we got a full match, i.e. the temptable[R].count++ operation resulted in the counter being equal to the number of merged scans. 1.1.1 Improvement ~~~~~~~~~~~~~~~~~ When running INNER-LOOP, we could count how many times we've done the "count++" operation. If it has been done #records-in-temptable times, that means that all further records will not have matches and we can finish the scan, i.e. break out of the INNER-LOOP. 1.2 Produce/merge sorted streams -------------------------------- For each of the merged scan, use filesort-like action to end up with an ordered stream of rowids. Then merge the ordered streams. By filesort-like action we mean - Run over index, collect rowids in a buffer. - When the buffer is full, sort it and dump into a temporary file. After the above we'll end up with a number of sorted buffers on disk. We can use mergebuff() function (it is part of filesort's functions) to produce one ordered sequence (i.e. array, which may be partially on disk) of rowids. Merging of ordered streams with help of priority queue is already implemented in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the child_quick->get_next() call with a call to read rowid from an ordered sequence. 1.3 Extend Unique class to handle intersection ---------------------------------------------- There is no point to use Unique object as a device that accumulates rowids of a single scan then produces them in sorted order. One could do the same faster with accumulating an array of rowids and then sorting it. It's possible to use Unique object to collect/merge data from all scans though. The idea is as follows: - Unique should store <rowid, n_scans> pairs - Duplicates are pairs with the same rowid - Unique should try to avoid creating duplicates: - don't add a duplicate into the in-memory part, instead combine two elements together by adding their n_scans elements. - combine duplicates when it sees them in Unique.get() call - The data we get from Unique.get() should be filtered, all records that have n_scans != number_of_scans_being_merged should be discarded. If we're lucky to have started and finished a scan on some index (denote it as S) without flushing the Unique in the process, then: - there is no point in adding any new records into the Unique because their absence in the Unique means that they don't have match in S and hence will not get into the result of intersection. - we need to only update the counters to be able to tell if the elements that are already in the Unique will have matches in all scans. 1.4 Strategies that do not seem to be useful -------------------------------------------- keeping them here so we don't consider them over and over 1.4.1 Remove matches after having produced an ordered stream ------------------------------------------------------------ We can dump everything into a rowid stream and get it sorted. Then we read it, and if we see a rowid repeated $n_merged_scans times, it belongs to the intersection (pass to output), otherwise it doesn't (skip). This doesn't have any advantages over the produce/merge sorted streams approach. 1.4.2 Sparse rowid bitmaps -------------------------- Use Falcon-style rowid bitmaps. The problem with that is that Falcon's bitmaps assume there will always be enough memory to accommodate them. PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering disk pages, not ids of individual records. It's hard for us to do something similar because our rowids are opaque entities whose meaning depends on the storage engines. This seems to require too much change to be worth it. 2. Optimization =============== SEL_TREE objects already represent intersections. The problems with optimizations are: - Cost formula(s) - When N keys/conditions are present: "cond(key1) AND cond(key2) AND ... AND cond(keyN)", somehow avoid considering (2^n - n) possible options. - Avoid producing (or even considering) apparently suboptimal plans: = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are a subset of columns covered by all other indexes. = (TODO any other rules?) - Correlation across selectivities. If there is a condition "cond(key1) AND cond(key2) AND ... AND cond(keyN)", can we consider satisfaction of AND-parts to be independent? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Monty): index_merge: non-ROR intersection (21)
by worklog-noreply＠askmonty.org 03 Jun '09

03 Jun '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: non-ROR intersection CREATION DATE..: Thu, 21 May 2009, 21:32 SUPERVISOR.....: Knielsen IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 21 (http://askmonty.org/worklog/?tid=21) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Monty - Wed, 03 Jun 2009, 01:05)=-=- High Level Description modified. --- /tmp/wklog.21.old.29638 2009-06-03 01:05:01.000000000 +0300 +++ /tmp/wklog.21.new.29638 2009-06-03 01:05:01.000000000 +0300 @@ -3,5 +3,15 @@ constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. +For example, assuming that key1 has 2 parts and key2 has 1 part. + +The current optimization works with: + +WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 + +but not with: + +WHERE key1_part1=1 OR key2_part1=3 + This WL entry is to lift this limitation by developing algorithms that do -intersection on non-ROR scans. +intersection on non-ROR (rowid ordered retrieval) scans. -=-=(Guest - Tue, 26 May 2009, 14:04)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.1802 2009-05-26 14:04:57.000000000 +0300 +++ /tmp/wklog.21.new.1802 2009-05-26 14:04:57.000000000 +0300 @@ -1,4 +1,3 @@ - <contents> 1. Execution 1.1 Temptable @@ -30,6 +29,8 @@ 1.1 Temptable ------------- +[ This is our strategy of choice at the moment] + Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( @@ -168,3 +169,8 @@ a subset of columns covered by all other indexes. = (TODO any other rules?) +- Correlation across selectivities. If there is a condition + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + can we consider satisfaction of AND-parts to be independent? -=-=(Psergey - Thu, 21 May 2009, 21:33)=-=- High-Level Specification modified. --- /tmp/wklog.21.old.25705 2009-05-21 21:33:02.000000000 +0300 +++ /tmp/wklog.21.new.25705 2009-05-21 21:33:02.000000000 +0300 @@ -1 +1,170 @@ +<contents> +1. Execution +1.1 Temptable +1.1.1 Improvement +1.2 Produce/merge sorted streams +1.3 Extend Unique class to handle intersection +1.4 Strategies that do not seem to be useful +1.4.1 Remove matches after having produced an ordered stream +1.4.2 Sparse rowid bitmaps +2. Optimization + +</contents> + +1. Execution +============ + +The primary task is to find means to compute an intersection of N unordered +streams. Besides general memory/cpu cost of computation, we consider: + +- whether the produced rowid stream is ordered. If it is, it can be piped + into index_merge/intersect (as opposed to sort-intersect) + +- whether the strategy can take advantage of the fact that some input streams + are already rowid-ordered + +- startup cost (cost of producing the first output record) + +We see the following possible strategies: + +1.1 Temptable +------------- +Use a temporary heap-grow-out-to-myisam table with a primary key: + +create table temp_table ( + rowid binary($rowid_size), + count n, + primary key(rowid); +); + +Then use this algorithm: + + i1= {index with the least E(#records)}; + + for each record R in range_scan(i1) + temp_table.insert(R.rowid, count=1); + + for each index idx except i1 + { + for each R record in scan(idx) // (INNER-LOOP) + { + if (temp_table has R) + temptable[R].count++; + } + } + + // The following loop can do ordered or unordered scan + // if we want it to be ordered scan, we probably better arrange so that + // 'count' column is part of the index. + for each record R in temp_table + { + if (R.count == number_of_streams) + emit(R.rowid); + } + +The algorithm has an option to emit an ordered rowid stream. + +In the above form, the cost to produce the first record is high. It's easy to +adjust the algorithm to make it low - we'll need to just start scanning all +indexes at once, and finish as soon as we got a full match, i.e. the + + temptable[R].count++ + +operation resulted in the counter being equal to the number of merged scans. + +1.1.1 Improvement +~~~~~~~~~~~~~~~~~ +When running INNER-LOOP, we could count how many times we've done the +"count++" operation. If it has been done #records-in-temptable times, that +means that all further records will not have matches and we can finish the +scan, i.e. break out of the INNER-LOOP. + +1.2 Produce/merge sorted streams +-------------------------------- +For each of the merged scan, use filesort-like action to end up with an +ordered stream of rowids. Then merge the ordered streams. + +By filesort-like action we mean + - Run over index, collect rowids in a buffer. + - When the buffer is full, sort it and dump into a temporary file. +After the above we'll end up with a number of sorted buffers on disk. We can +use mergebuff() function (it is part of filesort's functions) to produce one +ordered sequence (i.e. array, which may be partially on disk) of rowids. + +Merging of ordered streams with help of priority queue is already implemented +in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the + + child_quick->get_next() + +call with a call to read rowid from an ordered sequence. + +1.3 Extend Unique class to handle intersection +---------------------------------------------- +There is no point to use Unique object as a device that accumulates rowids of +a single scan then produces them in sorted order. One could do the same faster +with accumulating an array of rowids and then sorting it. + +It's possible to use Unique object to collect/merge data from all scans though. +The idea is as follows: + +- Unique should store <rowid, n_scans> pairs +- Duplicates are pairs with the same rowid +- Unique should try to avoid creating duplicates: + - don't add a duplicate into the in-memory part, instead combine two elements + together by adding their n_scans elements. + - combine duplicates when it sees them in Unique.get() call +- The data we get from Unique.get() should be filtered, all records that have + n_scans != number_of_scans_being_merged should be discarded. + +If we're lucky to have started and finished a scan on some index (denote it +as S) without flushing the Unique in the process, then: +- there is no point in adding any new records into the Unique because their + absence in the Unique means that they don't have match in S and hence will + not get into the result of intersection. +- we need to only update the counters to be able to tell if the elements that + are already in the Unique will have matches in all scans. + +1.4 Strategies that do not seem to be useful +-------------------------------------------- + +keeping them here so we don't consider them over and over + +1.4.1 Remove matches after having produced an ordered stream +------------------------------------------------------------ +We can dump everything into a rowid stream and get it sorted. Then we read it, +and if we see a rowid repeated $n_merged_scans times, it belongs to the +intersection (pass to output), otherwise it doesn't (skip). +This doesn't have any advantages over the produce/merge sorted streams +approach. + +1.4.2 Sparse rowid bitmaps +-------------------------- +Use Falcon-style rowid bitmaps. The problem with that is that Falcon's +bitmaps assume there will always be enough memory to accommodate them. + +PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering +disk pages, not ids of individual records. It's hard for us to do something +similar because our rowids are opaque entities whose meaning depends on the +storage engines. + +This seems to require too much change to be worth it. + +2. Optimization +=============== + +SEL_TREE objects already represent intersections. The problems with +optimizations are: + +- Cost formula(s) +- When N keys/conditions are present: + + "cond(key1) AND cond(key2) AND ... AND cond(keyN)", + + somehow avoid considering (2^n - n) possible options. + +- Avoid producing (or even considering) apparently suboptimal plans: + = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are + a subset of columns covered by all other indexes. + = (TODO any other rules?) + DESCRIPTION: At the moment index_merge supports intersection only for rowid-ordered streams. This translates into a limitation that index_merge/intersect can only be constructed for equality conditions (t.keypart1=const1 AND t.keypart2=const2 AND ... ) and the equalities should cover all index components. For example, assuming that key1 has 2 parts and key2 has 1 part. The current optimization works with: WHERE key1_part1=1 AND key1_part2=2 OR key2_part1=3 but not with: WHERE key1_part1=1 OR key2_part1=3 This WL entry is to lift this limitation by developing algorithms that do intersection on non-ROR (rowid ordered retrieval) scans. HIGH-LEVEL SPECIFICATION: <contents> 1. Execution 1.1 Temptable 1.1.1 Improvement 1.2 Produce/merge sorted streams 1.3 Extend Unique class to handle intersection 1.4 Strategies that do not seem to be useful 1.4.1 Remove matches after having produced an ordered stream 1.4.2 Sparse rowid bitmaps 2. Optimization </contents> 1. Execution ============ The primary task is to find means to compute an intersection of N unordered streams. Besides general memory/cpu cost of computation, we consider: - whether the produced rowid stream is ordered. If it is, it can be piped into index_merge/intersect (as opposed to sort-intersect) - whether the strategy can take advantage of the fact that some input streams are already rowid-ordered - startup cost (cost of producing the first output record) We see the following possible strategies: 1.1 Temptable ------------- [ This is our strategy of choice at the moment] Use a temporary heap-grow-out-to-myisam table with a primary key: create table temp_table ( rowid binary($rowid_size), count n, primary key(rowid); ); Then use this algorithm: i1= {index with the least E(#records)}; for each record R in range_scan(i1) temp_table.insert(R.rowid, count=1); for each index idx except i1 { for each R record in scan(idx) // (INNER-LOOP) { if (temp_table has R) temptable[R].count++; } } // The following loop can do ordered or unordered scan // if we want it to be ordered scan, we probably better arrange so that // 'count' column is part of the index. for each record R in temp_table { if (R.count == number_of_streams) emit(R.rowid); } The algorithm has an option to emit an ordered rowid stream. In the above form, the cost to produce the first record is high. It's easy to adjust the algorithm to make it low - we'll need to just start scanning all indexes at once, and finish as soon as we got a full match, i.e. the temptable[R].count++ operation resulted in the counter being equal to the number of merged scans. 1.1.1 Improvement ~~~~~~~~~~~~~~~~~ When running INNER-LOOP, we could count how many times we've done the "count++" operation. If it has been done #records-in-temptable times, that means that all further records will not have matches and we can finish the scan, i.e. break out of the INNER-LOOP. 1.2 Produce/merge sorted streams -------------------------------- For each of the merged scan, use filesort-like action to end up with an ordered stream of rowids. Then merge the ordered streams. By filesort-like action we mean - Run over index, collect rowids in a buffer. - When the buffer is full, sort it and dump into a temporary file. After the above we'll end up with a number of sorted buffers on disk. We can use mergebuff() function (it is part of filesort's functions) to produce one ordered sequence (i.e. array, which may be partially on disk) of rowids. Merging of ordered streams with help of priority queue is already implemented in QUICK_ROR_INTERSECT_SELECT. We'll need to substitute the child_quick->get_next() call with a call to read rowid from an ordered sequence. 1.3 Extend Unique class to handle intersection ---------------------------------------------- There is no point to use Unique object as a device that accumulates rowids of a single scan then produces them in sorted order. One could do the same faster with accumulating an array of rowids and then sorting it. It's possible to use Unique object to collect/merge data from all scans though. The idea is as follows: - Unique should store <rowid, n_scans> pairs - Duplicates are pairs with the same rowid - Unique should try to avoid creating duplicates: - don't add a duplicate into the in-memory part, instead combine two elements together by adding their n_scans elements. - combine duplicates when it sees them in Unique.get() call - The data we get from Unique.get() should be filtered, all records that have n_scans != number_of_scans_being_merged should be discarded. If we're lucky to have started and finished a scan on some index (denote it as S) without flushing the Unique in the process, then: - there is no point in adding any new records into the Unique because their absence in the Unique means that they don't have match in S and hence will not get into the result of intersection. - we need to only update the counters to be able to tell if the elements that are already in the Unique will have matches in all scans. 1.4 Strategies that do not seem to be useful -------------------------------------------- keeping them here so we don't consider them over and over 1.4.1 Remove matches after having produced an ordered stream ------------------------------------------------------------ We can dump everything into a rowid stream and get it sorted. Then we read it, and if we see a rowid repeated $n_merged_scans times, it belongs to the intersection (pass to output), otherwise it doesn't (skip). This doesn't have any advantages over the produce/merge sorted streams approach. 1.4.2 Sparse rowid bitmaps -------------------------- Use Falcon-style rowid bitmaps. The problem with that is that Falcon's bitmaps assume there will always be enough memory to accommodate them. PostgreSQL makes bitmaps "loose" when they exceed certain size by remembering disk pages, not ids of individual records. It's hard for us to do something similar because our rowids are opaque entities whose meaning depends on the storage engines. This seems to require too much change to be worth it. 2. Optimization =============== SEL_TREE objects already represent intersections. The problems with optimizations are: - Cost formula(s) - When N keys/conditions are present: "cond(key1) AND cond(key2) AND ... AND cond(keyN)", somehow avoid considering (2^n - n) possible options. - Avoid producing (or even considering) apparently suboptimal plans: = Don't generate a merge of indexes (I_1, ... I_n) where columns of I_n are a subset of columns covered by all other indexes. = (TODO any other rules?) - Correlation across selectivities. If there is a condition "cond(key1) AND cond(key2) AND ... AND cond(keyN)", can we consider satisfaction of AND-parts to be independent? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0