July 2009 - developers - lists.mariadb.org

[Maria-developers] New (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply＠askmonty.org 21 Jul '09

21 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add support for google protocol buffers CREATION DATE..: Tue, 21 Jul 2009, 21:11 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 34 (http://askmonty.org/worklog/?tid=34) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Add support for Google Protocol Buffers (further GPB). It should be possible to have columns that store GPB-encoded data, as well as use SQL constructs to extract parts of GPB data for use in select list, for filtering, and so forth. Any support for indexing GPB data is outside of scope of this WL entry. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 20 Jul '09

20 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 1 ESTIMATE.......: 3 (hours remain) ORIG. ESTIMATE.: 3 PROGRESS NOTES: -=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=- deukje weg Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours). -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates ------------------------------------------------------------ -=-=(View All Progress Notes, 23 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 20 Jul '09

20 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 1 ESTIMATE.......: 3 (hours remain) ORIG. ESTIMATE.: 3 PROGRESS NOTES: -=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=- deukje weg Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours). -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates ------------------------------------------------------------ -=-=(View All Progress Notes, 23 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] TYPE= disappears again in MySQL 5.4.4
by Arjen Lentz 17 Jul '09

17 Jul '09

Hi all, Just a heads up, please don't merge in this change from MySQL 5.4.4 See http://openquery.com/blog/type-disappears-mysql-544 for background, also http://bugs.mysql.com/bug.php?id=17501 Sergei Golubchik is still assigned to that bug, btw ;-) Cheers, Arjen. -- Arjen Lentz, Director @ Open Query (http://openquery.com) Exceptional Services for MySQL at a fixed budget. Follow our blog at http://openquery.com/blog/ OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Jul '09

16 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates -=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=- Low Level Design modified. --- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300 +++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300 @@ -135,3 +135,8 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. ------------------------------------------------------------ -=-=(View All Progress Notes, 22 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Jul '09

16 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: Server-9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates -=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=- Low Level Design modified. --- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300 +++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300 @@ -135,3 +135,8 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. ------------------------------------------------------------ -=-=(View All Progress Notes, 22 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Jul '09

16 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Knielsen COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: 9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates -=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=- Low Level Design modified. --- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300 +++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300 @@ -135,3 +135,8 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=- Dependency created: 29 now depends on 17 ------------------------------------------------------------ -=-=(View All Progress Notes, 21 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Jul '09

16 Jul '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Knielsen COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: 9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + -=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300 +++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300 @@ -89,14 +89,14 @@ - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether dbms supports table +elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ -- Re-check how this works with equality propagation. - - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have @@ -141,8 +141,13 @@ 7. Additional issues -------------------- -* We remove ON clauses within semi-join nests. If these clauses contain +* We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? + Yes. Current approach: when removing an outer join nest, walk the ON clause + and mark subselects as eliminated. Then let EXPLAIN code check if the + SELECT was eliminated before the printing (EXPLAIN is generated by doing + a recursive descent, so the check will also cause children of eliminated + selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=- Low Level Design modified. --- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300 +++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300 @@ -83,9 +83,12 @@ 5. Tests and benchmarks ----------------------- -Should create a benchmark in sql-bench which checks if the dbms has table +Create a benchmark in sql-bench which checks if the DBMS has table elimination. -TODO elaborate +[According to Monty] Run + - queries that would use elimination + - queries that are very similar to one above (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. 6. Todo, issues to resolve -------------------------- @@ -109,33 +112,37 @@ 6.2 Resolved ~~~~~~~~~~~~ -- outer->inner join conversion is not a problem for table elimination. +* outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. -7. Additional issues --------------------- -* We remove ON clauses within semi-join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -* Aggregate functions report they depend on all tables, that is, +* Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 - always. If we want table elimination to work in presence of grouping, need - to devise some other way of analyzing aggregate functions. + always. Fixed it, now aggregate function reports it depends on + tables that its arguments depend on. In particular, COUNT(*) reports + that it depends on no tables (item_count_star->used_tables()==0). + One consequence of that is that "item->used_tables()==0" is not + equivalent to "item->const_item()==true" anymore (not sure if it's + "anymore" or this has been already happening). + +* EXPLAIN EXTENDED warning text was generated after the JOIN object has + been discarded. This didn't allow to use information about join plan + when printing the warning. Fixed this by keeping the JOIN objects until + we've printed the warning (have also an intent to remove the const + tables from the join output). -* Should eliminated tables be shown in EXPLAIN EXTENDED? - - If we just ignore the question, they will be shown - - this is what happens for constant tables, too. - - I don't see how showing them could be of any use. They only make it - harder to read the rewritten query. - It turns out that - - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement - lifetime) changes. - - it is hard to have it show per-execution data. This is because the warning - text is generated after the execution structures have been destroyed. +7. Additional issues +-------------------- +* We remove ON clauses within semi-join nests. If these clauses contain + subqueries, they probably should be gone from EXPLAIN output also? * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from @@ -143,8 +150,6 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + from user/EXPLAIN point of view: no. constant table is the one that we read + one record from. eliminated table is the one that we don't acccess at all. -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. -=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=- Dependency deleted: 29 no longer depends on 17 -=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=- Low Level Design modified. --- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300 +++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300 @@ -131,6 +131,11 @@ - this is what happens for constant tables, too. - I don't see how showing them could be of any use. They only make it harder to read the rewritten query. + It turns out that + - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement + lifetime) changes. + - it is hard to have it show per-execution data. This is because the warning + text is generated after the execution structures have been destroyed. * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from -=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=- Low Level Design modified. --- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300 +++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300 @@ -1,3 +1,6 @@ +The code (currently in development) is at lp: +~maria-captains/maria/maria-5.1-table-elimination tree. + <contents> 1. Conditions for removal 1.1 Quick check if there are candidates -=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=- Low Level Design modified. --- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300 +++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300 @@ -135,3 +135,8 @@ Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? + +* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: + - affected tables must not be eliminated + - tables that are used on the right side of the SET x=y assignments must + not be eliminated either. -=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=- Dependency created: 29 now depends on 17 ------------------------------------------------------------ -=-=(View All Progress Notes, 21 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Conditions for removal 1.1 Quick check if there are candidates 2. Removal operation properties 3. Removal operation 4. User interface 5. Tests and benchmarks 6. Todo, issues to resolve 6.1 To resolve 6.2 Resolved 7. Additional issues </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Conditions for removal ------------------------- We can eliminate an inner side of outer join if: 1. For each record combination of outer tables, it will always produce exactly one record. 2. There are no references to columns of the inner tables anywhere else in the query. #1 means that every table inside the outer join nest is: - is a constant table: = because it can be accessed via eq_ref(const) access, or = it is a zero-rows or one-row MyISAM-like table [MARK1] - has an eq_ref access method candidate. #2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, GROUP BY and HAVING do not refer to the inner tables of the outer join nest. 1.1 Quick check if there are candidates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start to enumerate join nests, here is a quick way to check if there *can be* something to be removed: if ((tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_tables) { attempt table elimination; } 2. Removal operation properties ------------------------------- * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 3. Removal operation -------------------- * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to front like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. * That's it. Nothing else? 4. User interface ----------------- * We'll add an @@optimizer switch flag for table elimination. Tentative name: 'table_elimination'. (Note ^^ utility of the above questioned ^, as table elimination can never be worse than no elimination. We're leaning towards not adding the flag) * EXPLAIN will not show the removed tables at all. This will allow to check if tables were removed, and also will behave nicely with anchor model and VIEWs: stuff that user doesn't care about just won't be there. 5. Tests and benchmarks ----------------------- Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - queries that would use elimination - queries that are very similar to one above (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether dbms supports table elimination. 6. Todo, issues to resolve -------------------------- 6.1 To resolve ~~~~~~~~~~~~~~ - Relationship with prepared statements. On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. I'm leaning towards doing the former. With anchor modeling, it is unlikely that we'll meet outer joins which have N inner tables of which some are 1-row MyISAM tables that do not have primary key. 6.2 Resolved ~~~~~~~~~~~~ * outer->inner join conversion is not a problem for table elimination. We make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (requirement for OJ->IJ conversion) then table elimination would not be applicable anyway. * For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - affected tables must not be eliminated - tables that are used on the right side of the SET x=y assignments must not be eliminated either. * Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports it depends on tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already happening). * EXPLAIN EXTENDED warning text was generated after the JOIN object has been discarded. This didn't allow to use information about join plan when printing the warning. Fixed this by keeping the JOIN objects until we've printed the warning (have also an intent to remove the const tables from the join output). 7. Additional issues -------------------- * We remove ON clauses within outer join nests. If these clauses contain subqueries, they probably should be gone from EXPLAIN output also? Yes. Current approach: when removing an outer join nest, walk the ON clause and mark subselects as eliminated. Then let EXPLAIN code check if the SELECT was eliminated before the printing (EXPLAIN is generated by doing a recursive descent, so the check will also cause children of eliminated selects not to be printed) * Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. * What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); This is because condition "B.fromDate= func(tableB)" cannot be used. Reason#1: update_ref_and_keys() does not consider such conditions to be of any use (and indeed they are not usable for ref access) so they are not put into KEYUSE array. Reason#2: even if they were put there, we would need to be able to tell between predicates like B.fromDate= func(B.id) // guarantees only one matching row as // B.id is already bound by B.id=A.id // hence B.fromDate becomes bound too. and "B.fromDate= func(B.*)" // Can potentially have many matching // records. We need to - Have update_ref_and_keys() create KEYUSE elements for such equalities - Have eliminate_tables() and friends make a more accurate check. The right check is to check whether all parts of a unique key are bound. If we have keypartX to be bound, then t.keypartY=func(keypartX) makes keypartY to be bound. The difficulty here is that correlated subquery predicate cannot tell what columns it depends on (it only remembers tables). Traversing the predicate is expensive and complicated. We're leaning towards making each subquery predicate have a List<Item> with items that - are in the current select - and it depends on. This list will be useful in certain other subquery optimizations as well, it is cheap to collect it in fix_fields() phase, so it will be collected for every subquery predicate. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

Re: [Maria-developers] [Merge] lp:~maria-captains/maria/maria-xtradb into lp:maria
by Kristian Nielsen 10 Jul '09

10 Jul '09

Percona <launchpad(a)percona.com> writes: > Percona has proposed merging lp:~maria-captains/maria/maria-xtradb into lp:maria. > > Requested reviews: > Maria-captains (maria-captains) > > Proposal to merge replacement InnoDB->XtraDB Thanks a lot for your efforts in this! I branched the tree and took a look. There are a couple of issues that I think need to be resolved before we can merge it into MariaDB. I have some questions below, but please don't hesitate to ask me for any kind of help needed to move this forward. > === modified file 'storage/innobase/include/sync0rw.h' > --- storage/innobase/include/sync0rw.h 2008-02-19 16:44:09 +0000 > +++ storage/innobase/include/sync0rw.h 2009-03-31 04:19:17 +0000 > +#ifndef INNODB_RW_LOCKS_USE_ATOMICS > +#error INNODB_RW_LOCKS_USE_ATOMICS is not defined. Do you use enough new GCC or compatibles? > +#error Or do you use exact options for CFLAGS? > +#error e.g. (for x86_32): "-m32 -march=i586 -mtune=i686" > +#error e.g. (for Sparc_64): "-m64 -mcpu=v9" > +#error Otherwise, this build may be slower than normal version. > +#endif > + My attempt to build (BUILD/compile-pentium64-max) failed with this error. There were also other build errors (I assume places where the atomics-using code has not been extended with a part that works without the availability of atomics). The reason is that in the MariaDB tree, HAVE_GCC_ATOMIC_BUILTINS is disabled, which caused XtraDB to disable INNODB_RW_LOCKS_USE_ATOMICS, which triggers the above error. I think I understand why this makes sense for Percona, after all using these better synchronisation primitives is part of the reason for using the Percona server in the first place. Can you tell me if Percona has decided not to maintain XtraDB working without the availability of atomic operations? Or if it is just an oversight? I need to discuss with other MariaDB people whether XtraDB for MariaDB should be maintained working without the atomic operations (if so, we should of course be willing to do the work/effort required). So, any thoughts about the best way to deal with this? Should the above #error be removed and XtraDB extended to work without atomics in the MariaDB tree? And is this something Percona wants to do, or should I look into it? Also, Sergei Golubchik told me that HAVE_GCC_ATOMIC_BUILTINS is for my_atomic_ops, and InnoDB/XtraDB shouldn't really be using it. But I need to look more into the code to understand what the problem is, if any. > === added directory 'storage/innobase/mysql-test' > === added directory 'storage/innobase/mysql-test/patches' When I ran the test suite, I got test failures in test main.innodb. I see that the patch contains patches for main MySQL test cases in mysql-test/t/*.test, and also seems to add separate test cases in the storage/innobase/mysql-test/ directory. Do you know what the status is of these test suite modifications? Do the patches need to be applied to the existing test suite, and/or should the extra test cases be used to add to/overwrite the existing tests? We would need to get the test suite to run without problems before merging. Does Percona run the test suite with no failures? Can you suggest which directions I should work in to solve the test failures? Ie. I'm unsure to what extent the extra test cases/patches have already been applied in main MySQL sources, and whether failures are expected or are just due to not being adapted for current MariaDB source changes. Any help with the above would be great. I plan to continue working with you on this so we can get it merged without unnecessary delays. - Kristian.

3 17

[Maria-developers] Storage Engine API changes
by Stewart Smith 08 Jul '09

08 Jul '09

So, I've been making a fair bit of changes around bits of the storage engine API (by all accounts for the better) in Drizzle. The idea being to move the handler to be a cursor on a table, with actions not pertaining that to reside in StorageEngine (e.g. DDL). There's also the (now rather old) change to drop table return code. The next thing that will move into the StorageEngine is metadata handling with the engine being able to be responsible for its own (table) metadata. This is well and truly increasing the differences between MySQL/MariaDB and Drizzle in this area of code - increasing the work needed to port an engine (either way). I would guess it makes little sense for MySQL and MariaDB to diverge here, although I have been (and continue to be) okay with Drizzle diverging. So, is there somebody interested in working with me to have the MySQL/MariaDB API evolve in the same way? -- Stewart Smith

7 14