developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 8 participants
- 6811 discussions
[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Knielsen - Fri, 14 Aug 2009, 14:17)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.6963 2009-08-14 14:17:32.000000000 +0300
+++ /tmp/wklog.41.new.6963 2009-08-14 14:17:32.000000000 +0300
@@ -1,6 +1,11 @@
The implementation will depend on design choices made in WL#40:
-- If we decide to parse the statement, SQL-verb filtering will be trivial
-- If we decide not to parse the statement, we still can reliably distinguish the
+
+Option 1:
+
+If we decide to parse the statement, SQL-verb filtering will be trivial
+
+Option 2:
+If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
If we chose the second, we'll have to perform certain normalization before
-=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300
+++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300
@@ -2,3 +2,10 @@
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
+
+If we chose the second, we'll have to perform certain normalization before
+matching the patterns:
+ - Remove all comments from the command
+ - Remove all pre-space
+ - Compare the string case-insensitively
+ - etc
-=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300
+++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300
@@ -1 +1,4 @@
-
+The implementation will depend on design choices made in WL#40:
+- If we decide to parse the statement, SQL-verb filtering will be trivial
+- If we decide not to parse the statement, we still can reliably distinguish the
+statement by matching the first characters against a set of patterns.
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
HIGH-LEVEL SPECIFICATION:
The implementation will depend on design choices made in WL#40:
Option 1:
If we decide to parse the statement, SQL-verb filtering will be trivial
Option 2:
If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
If we chose the second, we'll have to perform certain normalization before
matching the patterns:
- Remove all comments from the command
- Remove all pre-space
- Compare the string case-insensitively
- etc
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 20
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=-
2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL
came up with the idea how to bypass temporary table when executing UNION ALL
2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table
2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations
(in subqueries, in queries that do not use tables)
2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task
Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours).
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 20
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=-
2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL
came up with the idea how to bypass temporary table when executing UNION ALL
2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table
2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations
(in subqueries, in queries that do not use tables)
2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task
Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours).
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 20
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=-
2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL
came up with the idea how to bypass temporary table when executing UNION ALL
2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table
2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations
(in subqueries, in queries that do not use tables)
2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task
Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours).
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Supervisor updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Bothorsen
+Monty
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Version updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=-
Privacy level updated.
--- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300
+++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300
@@ -1 +1 @@
-y
+n
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300
+++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300
@@ -19,28 +19,29 @@
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
- (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union all
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
- (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
-a2!=b2) union all
+ (select a1,b1,c3 from t1 where a1=b1) union
+ (select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
- (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
-a2!=b2) union
+ (select a1,b1,c1 from t1 where a1=b1) union all
+ (select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
+
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
-MySQL does not accept nested unions. For example the following valid query is
-considered by MySQL Server as erroneous:
- ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
-) union all
- ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+MySQL does not accept nested unions. For example the following valid SQL query
+is considered by MySQL Server as erroneous:
+ ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
+ union all
+ ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union
(select a2,b2,c3 from t2 where a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all
(select a2,b2,c2 from t2 where a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid SQL query
is considered by MySQL Server as erroneous:
((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2))
union all
((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4))
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300
+++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300
@@ -6,15 +6,15 @@
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
- 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
-==================================
+============================================
1.1. Specifics of MySQL union operations
-------------------------------------------------------
+----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
@@ -49,7 +49,7 @@
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------------
+-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
@@ -77,7 +77,7 @@
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
-----------------------------------
+----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
@@ -109,13 +109,13 @@
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
-=================================================
+===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
-------------------------------------------------------------------
+--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
@@ -159,7 +159,7 @@
};
2.2. Avoiding unnecessary copying
-------------------------------------------
+---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
@@ -174,8 +174,8 @@
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
-2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
-----------------------------------------------------------------------------------------------------------
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
+----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
@@ -190,7 +190,7 @@
3. Other possible optimizations for union units
-=================================
+===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
============================================
1.1. Specifics of MySQL union operations
----------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
-----------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
===============================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
--------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
---------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL
----------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
===============================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
==================================
1.1. Specifics of MySQL union operations
------------------------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
----------------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
=================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
------------------------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
------------------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
----------------------------------------------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
=================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
==================================
1.1. Specifics of MySQL union operations
------------------------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
----------------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
=================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
------------------------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
------------------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
----------------------------------------------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
=================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
==================================
1.1. Specifics of MySQL union operations
------------------------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
----------------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
=================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
------------------------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
------------------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
----------------------------------------------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
=================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300
+++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300
@@ -1 +1,205 @@
+<contents>
+1. Handling union operations in MySQL Server
+ 1.1. Specifics of MySQL union operations
+ 1.2 Validation of union units
+ 1.3 Execution of union units
+2. Optimizations improving performance of UNION ALL operations
+ 2.1 Execution of UNION ALL without temporary table
+ 2.2. Avoiding unnecessary copying
+ 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+3. Other possible optimizations for union units
+</contents>
+
+1. Handling union operations in MySQL Server
+==================================
+
+1.1. Specifics of MySQL union operations
+------------------------------------------------------
+
+UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
+allows us to use these operations in a sequence, one after another. For example
+the following queries are accepted by the MySQL Server:
+ (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (1)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (2)
+Any mix of UNION and UNION ALL is also acceptable:
+ (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
+a2!=b2) union all
+ (select a3,b3,c3 from t3 where a3>b3); (3)
+ (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
+a2!=b2) union
+ (select a3,b3,c3 from t3 where a3>b3); (4)
+It should be noted that query (4) is equivalent to query (1). At the same time
+query (3) is not equivalent to any of the queries (1),(2),(4).
+In general any UNION ALL in a sequence of union operations can be equivalently
+substituted for UNION if there occur another UNION further in the sequence.
+MySQL does not accept nested unions. For example the following valid query is
+considered by MySQL Server as erroneous:
+ ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
+) union all
+ ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
+
+A sequence of select constructs separated by UNION/UNION ALL is called 'union
+unit' if it s not a part of another such sequence.
+A union unit can be executed as a query. It also can be used as a subquery.
+A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
+In this case it cannot be used as a subquery.
+
+1.2 Validation of union units
+----------------------------------
+
+When the parser stage is over the further processing of a union unit is
+performed by the function mysql_union.
+The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
+The method first validates each of the select constructs of the unit and then it
+checks that all select are compatible. The method checks that the selects return
+the same number of columns and for each set of columns with the same number k
+there is a type to which the types of the columns can be coerced. This type is
+considered as the type of column k of the result set returned by the union unit.
+For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
+bigint and double respectively then the second column of the union unit will be
+of the type double. If the types of the columns c1,c2,c3 are specified as
+varchar(10), varchar(20), varchar(10) then the type of the corresponding column
+of the result set will be varchar(20). If the columns have different collations
+then a collation from which all these collations can be derived is looked for
+and it is assigned as the
+collation of the third column in the result set.
+After compatibility of the corresponding select columns has been checked and the
+types of the columns from of the result set have been determined the method
+SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
+result set for the union unit. Currently rows returned by the selects from the
+union unit are always written into a temporary table. To force selects to send
+rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
+the selects such that the JOIN::result field refers to an object of the class
+select_union. All selects from a union unit share the same select_union object.
+
+1.3 Execution of union units
+----------------------------------
+
+After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
+created a temporary table as a container for rows from the result sets returned
+by the selects of the unit, and has prepared all data structures needed for
+execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
+The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
+by one.
+Each select first is optimized with JOIN::optimize(), then it's executed with
+JOIN::exec().The result rows from each select are sent to a temporary table.
+This table accumulates all rows that are to be returned by the union unit. For
+UNION operations duplicate rows are not added, for UNION ALL operations all
+records are added. It is achieved by enabling and disabling usage of the unique
+index defined on all fields of the temporary table. The index is never used if
+only UINION ALL operation occurs in the unit. Otherwise it is enabled before
+the first select is executed and disabled after the last UNION operation.
+To send rows to the temporary table the method select_union::send_data is used.
+For a row it receives from the currently executed select the method first stores
+the fields of the row in in the fields of the record buffer of the temporary
+table. To do this the method calls function fill_record. All needed type
+conversions of the field values are performed when they are stored the record
+buffer. After this the method select_union::send_data calls the ha_write_row
+handler function to write the record from the buffer to the temporary table. A
+possible error on duplicate key that occurs with an attempt to write a duplicate
+row is ignored.
+After all rows received from all selects have been placed into the temporary
+table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
+from the temporary table and sends them to the output stream (to the client). If
+there is an ORDER BY clause to be applied to result of the union unit then the
+rows read from the temporary table have to be sorted first.
+
+2. Optimizations improving performance of UNION ALL operations
+=================================================
+
+The following three optimizations are proposed to be implemented in the
+framework of this task.
+
+2.1 Execution of UNION ALL without temporary table
+------------------------------------------------------------------
+
+If a union unit with only UNION ALL operations is used at the top level of the
+query (in other words it's not used as a subquery) and is not appended with an
+ORDER BY clause then it does not make sense to send rows received from selects
+to a temporary table at all. After all needed type conversions have been done
+the row fields could be sent directly into the output stream. It would improve
+the performance of UNION ALL operations since writing to the temporary table and
+reading from it would not be needed anymore. In the cases when the result set is
+big enough and the temporary table cannot be allocated in the main memory the
+performance gains would be significant. Besides, the client could get the first
+result rows at once as it would not have to wait until all selects have been
+executed.
+To make an UNION ALL operation not to send rows to a temporary table we could
+provide the JOIN objects created for the selects from the union unit with an
+interceptor object that differs from the one they use now. In the current code
+they use an object of the class select_union derived from the
+select_result_interceptor class. The new interceptor object of the class that
+we'll call select_union_send (by analogy with the class select_send) shall
+inherit from the select_union and shall have its own implementations of the
+virtual methods send_data, send_fields, and send_eof.
+The method send_data shall send fields received from selects to the record
+buffer of the temporary table and then from this buffer to the output stream.
+The method send_fields shall send the format of the rows to the client before it
+starts getting records from the first select , while the method send_eof shall
+signal about the end of the rows after the last select finishes sending records.
+The method create_result_table of the class select_union shall be re-defined
+as virtual. The implementation of this method for the class select_union_send
+shall call select_union::create_result_table and then shall build internal
+structures needed for select_unionsend::send_data. So, the definition of the
+class select_union_send should look like this:
+ class select_union_send :public select_union
+ {
+ ... // private structures
+ public:
+ select_union_send() :select_union(), ...{...}
+ bool send_data(List<Item> &items);
+ bool send_fields(List<Item> &list, uint flags);
+ bool create_result_table(THD *thd, List<Item> *column_types,
+ bool is_distinct, ulonglong options,
+ const char *alias);
+ };
+
+2.2. Avoiding unnecessary copying
+------------------------------------------
+
+If a field does not need type conversion it does not make sense to send it to a
+record buffer. It can be sent directly to the output stream. Different selects
+can require type conversions for different columns.
+Let's provide each select from the union unit with a data structure (e.g. a
+bitmap) that says what fields require conversions, and what don't . Before
+execution of a select this data structure must be passed to the
+select_union_send object shared by all selects from the unit. The info in this
+structure will tell select_union_send::send_data what fields should be sent to
+the record buffer for type conversion and what can be sent directly to the
+output stream. In this case another variant of the fill_record procedure is
+needed that would take as parameter the info that says what fields are to be
+stored in the record buffer.
+
+2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
+----------------------------------------------------------------------------------------------------------
+
+If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
+used at the top level of a query then any UNION ALL operation after the last
+UNION operation can be executed in more efficient way than it's done in the
+current implementation. More exactly, the rows from any select that follows
+after the second operand of the last UNION operations could be sent directly to
+the output stream. In this case two interceptor objects have to be created: one,
+of the type select_union, is shared by the selects for which UNION operations
+are performed, another, of the type select_union_send, is shared by the the
+remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
+undergo a serious re-work.
+
+
+3. Other possible optimizations for union units
+=================================
+
+The following optimizations are not supposed to be implemented in the framework
+this task.
+1. For a union unit containing only UNION ALL with an ORDER BY send rows from
+selects directly to the sorting procedure.
+2. For a union unit at the top level of the query without ORDER BY clause send
+any row received from an operand of a UNION operation directly to the output
+stream as soon as it has been checked by a lookup in the temporary table that
+it's not a duplicate.
+3. Not to use temporary table for any union unit used in EXIST or IN subquery.
+
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
HIGH-LEVEL SPECIFICATION:
<contents>
1. Handling union operations in MySQL Server
1.1. Specifics of MySQL union operations
1.2 Validation of union units
1.3 Execution of union units
2. Optimizations improving performance of UNION ALL operations
2.1 Execution of UNION ALL without temporary table
2.2. Avoiding unnecessary copying
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
3. Other possible optimizations for union units
</contents>
1. Handling union operations in MySQL Server
==================================
1.1. Specifics of MySQL union operations
------------------------------------------------------
UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL
allows us to use these operations in a sequence, one after another. For example
the following queries are accepted by the MySQL Server:
(select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (1)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (2)
Any mix of UNION and UNION ALL is also acceptable:
(select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where
a2!=b2) union all
(select a3,b3,c3 from t3 where a3>b3); (3)
(select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where
a2!=b2) union
(select a3,b3,c3 from t3 where a3>b3); (4)
It should be noted that query (4) is equivalent to query (1). At the same time
query (3) is not equivalent to any of the queries (1),(2),(4).
In general any UNION ALL in a sequence of union operations can be equivalently
substituted for UNION if there occur another UNION further in the sequence.
MySQL does not accept nested unions. For example the following valid query is
considered by MySQL Server as erroneous:
( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)
) union all
( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) )
A sequence of select constructs separated by UNION/UNION ALL is called 'union
unit' if it s not a part of another such sequence.
A union unit can be executed as a query. It also can be used as a subquery.
A union unit can be optionally appended by an ORDER BY and/or LIMIT construct.
In this case it cannot be used as a subquery.
1.2 Validation of union units
----------------------------------
When the parser stage is over the further processing of a union unit is
performed by the function mysql_union.
The function first validate the unit in the method SELECT_LEX_UNIT::prepare.
The method first validates each of the select constructs of the unit and then it
checks that all select are compatible. The method checks that the selects return
the same number of columns and for each set of columns with the same number k
there is a type to which the types of the columns can be coerced. This type is
considered as the type of column k of the result set returned by the union unit.
For example, if in the query (1) the columns b1, b2 and b3 are of the types int,
bigint and double respectively then the second column of the union unit will be
of the type double. If the types of the columns c1,c2,c3 are specified as
varchar(10), varchar(20), varchar(10) then the type of the corresponding column
of the result set will be varchar(20). If the columns have different collations
then a collation from which all these collations can be derived is looked for
and it is assigned as the
collation of the third column in the result set.
After compatibility of the corresponding select columns has been checked and the
types of the columns from of the result set have been determined the method
SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the
result set for the union unit. Currently rows returned by the selects from the
union unit are always written into a temporary table. To force selects to send
rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for
the selects such that the JOIN::result field refers to an object of the class
select_union. All selects from a union unit share the same select_union object.
1.3 Execution of union units
----------------------------------
After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has
created a temporary table as a container for rows from the result sets returned
by the selects of the unit, and has prepared all data structures needed for
execution, the function mysql_union invokes SELECT_LEX_UNIT::exec.
The method SELECT_LEX_UNIT::exec processes the selects from the union unit one
by one.
Each select first is optimized with JOIN::optimize(), then it's executed with
JOIN::exec().The result rows from each select are sent to a temporary table.
This table accumulates all rows that are to be returned by the union unit. For
UNION operations duplicate rows are not added, for UNION ALL operations all
records are added. It is achieved by enabling and disabling usage of the unique
index defined on all fields of the temporary table. The index is never used if
only UINION ALL operation occurs in the unit. Otherwise it is enabled before
the first select is executed and disabled after the last UNION operation.
To send rows to the temporary table the method select_union::send_data is used.
For a row it receives from the currently executed select the method first stores
the fields of the row in in the fields of the record buffer of the temporary
table. To do this the method calls function fill_record. All needed type
conversions of the field values are performed when they are stored the record
buffer. After this the method select_union::send_data calls the ha_write_row
handler function to write the record from the buffer to the temporary table. A
possible error on duplicate key that occurs with an attempt to write a duplicate
row is ignored.
After all rows received from all selects have been placed into the temporary
table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows
from the temporary table and sends them to the output stream (to the client). If
there is an ORDER BY clause to be applied to result of the union unit then the
rows read from the temporary table have to be sorted first.
2. Optimizations improving performance of UNION ALL operations
=================================================
The following three optimizations are proposed to be implemented in the
framework of this task.
2.1 Execution of UNION ALL without temporary table
------------------------------------------------------------------
If a union unit with only UNION ALL operations is used at the top level of the
query (in other words it's not used as a subquery) and is not appended with an
ORDER BY clause then it does not make sense to send rows received from selects
to a temporary table at all. After all needed type conversions have been done
the row fields could be sent directly into the output stream. It would improve
the performance of UNION ALL operations since writing to the temporary table and
reading from it would not be needed anymore. In the cases when the result set is
big enough and the temporary table cannot be allocated in the main memory the
performance gains would be significant. Besides, the client could get the first
result rows at once as it would not have to wait until all selects have been
executed.
To make an UNION ALL operation not to send rows to a temporary table we could
provide the JOIN objects created for the selects from the union unit with an
interceptor object that differs from the one they use now. In the current code
they use an object of the class select_union derived from the
select_result_interceptor class. The new interceptor object of the class that
we'll call select_union_send (by analogy with the class select_send) shall
inherit from the select_union and shall have its own implementations of the
virtual methods send_data, send_fields, and send_eof.
The method send_data shall send fields received from selects to the record
buffer of the temporary table and then from this buffer to the output stream.
The method send_fields shall send the format of the rows to the client before it
starts getting records from the first select , while the method send_eof shall
signal about the end of the rows after the last select finishes sending records.
The method create_result_table of the class select_union shall be re-defined
as virtual. The implementation of this method for the class select_union_send
shall call select_union::create_result_table and then shall build internal
structures needed for select_unionsend::send_data. So, the definition of the
class select_union_send should look like this:
class select_union_send :public select_union
{
... // private structures
public:
select_union_send() :select_union(), ...{...}
bool send_data(List<Item> &items);
bool send_fields(List<Item> &list, uint flags);
bool create_result_table(THD *thd, List<Item> *column_types,
bool is_distinct, ulonglong options,
const char *alias);
};
2.2. Avoiding unnecessary copying
------------------------------------------
If a field does not need type conversion it does not make sense to send it to a
record buffer. It can be sent directly to the output stream. Different selects
can require type conversions for different columns.
Let's provide each select from the union unit with a data structure (e.g. a
bitmap) that says what fields require conversions, and what don't . Before
execution of a select this data structure must be passed to the
select_union_send object shared by all selects from the unit. The info in this
structure will tell select_union_send::send_data what fields should be sent to
the record buffer for type conversion and what can be sent directly to the
output stream. In this case another variant of the fill_record procedure is
needed that would take as parameter the info that says what fields are to be
stored in the record buffer.
2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations
----------------------------------------------------------------------------------------------------------
If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is
used at the top level of a query then any UNION ALL operation after the last
UNION operation can be executed in more efficient way than it's done in the
current implementation. More exactly, the rows from any select that follows
after the second operand of the last UNION operations could be sent directly to
the output stream. In this case two interceptor objects have to be created: one,
of the type select_union, is shared by the selects for which UNION operations
are performed, another, of the type select_union_send, is shared by the the
remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to
undergo a serious re-work.
3. Other possible optimizations for union units
=================================
The following optimizations are not supposed to be implemented in the framework
this task.
1. For a union unit containing only UNION ALL with an ORDER BY send rows from
selects directly to the sorting procedure.
2. For a union unit at the top level of the query without ORDER BY clause send
any row received from an operand of a UNION operation directly to the output
stream as soon as it has been checked by a lookup in the temporary table that
it's not a duplicate.
3. Not to use temporary table for any union unit used in EXIST or IN subquery.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply@askmonty.org 14 Aug '09
by worklog-noreply@askmonty.org 14 Aug '09
14 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement UNION ALL without usage of a temporary table
CREATION DATE..: Fri, 14 Aug 2009, 08:31
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......: Monty, Psergey
CATEGORY.......: Client-BackLog
TASK ID........: 44 (http://askmonty.org/worklog/?tid=44)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Currently when any union operation is executed the rows received from its
operands are always sent to a temporary table. Meanwhile for a UNION ALL
operation that is used at the top level of a query without an ORDER BY clause it
is not necessary. In this case the rows could be sent directly to the client.
The goal of this task is to provide such an implementation of UNION ALL
operation that would not use temporary table at all in certain, most usable cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Rev 2720: Merge maria-5.1 -> maria-5.1-table-elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r10/
------------------------------------------------------------
revno: 2720
revision-id: psergey(a)askmonty.org-20090813211212-jghejwxsl6adtopl
parent: knielsen(a)knielsen-hq.org-20090805072137-wg97dcem1cxnzt3p
parent: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r10
timestamp: Fri 2009-08-14 01:12:12 +0400
message:
Merge maria-5.1 -> maria-5.1-table-elimination
added:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
modified:
.bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj
libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g
mysql-test/r/mysql-bug41486.result mysqlbug41486.result-20090323135900-fobg67a3yzg0b7e8-1
mysql-test/r/ps_11bugs.result sp1f-ps_11bugs.result-20041012140047-4pktjlfeq27q6bxqfdsbcszr5nybv6zz
mysql-test/r/select.result sp1f-select.result-20010103001548-znkoalxem6wchsbxizfosjhpfmhfyxuk
mysql-test/r/subselect.result sp1f-subselect.result-20020512204640-zgegcsgavnfd7t7eyrf7ibuqomsw7uzo
mysql-test/r/union.result sp1f-unions_one.result-20010725122836-ofxtwraxeohz7whhrmfdz57sl4a5prmp
mysql-test/t/mysql-bug41486.test mysqlbug41486.test-20090323135900-fobg67a3yzg0b7e8-2
mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy
sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy
sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo
sql/item_subselect.h sp1f-item_subselect.h-20020512204640-qdg77wil56cxyhtc2bjjdrppxq3wqgh3
sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm
sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl
sql/sql_bitmap.h sp1f-sql_bitmap.h-20031024204444-g4eiad7vopzqxe2trxmt3fn3xsvnomvj
sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph
sql/sql_lex.h sp1f-sql_lex.h-19700101030959-sgldb2sooc7twtw5q7pgjx7qzqiaa3sn
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
------------------------------------------------------------
revno: 2707.1.27
revision-id: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv
parent: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Fri 2009-08-14 00:44:52 +0400
message:
MWL#17: Table elimination
- More function renames, added comments
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
------------------------------------------------------------
revno: 2707.1.26
revision-id: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc
parent: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 23:10:53 +0400
message:
MWL#17: Table elimination
- Better comments
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.25
revision-id: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq
parent: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 13:36:13 +0400
message:
MWL#17: Table elimination
Fixes after post-review fixes:
- Don't search for tables in JOIN_TAB array. it's not initialized yet.
use select_lex->leaf_tables instead.
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
------------------------------------------------------------
revno: 2707.1.24
revision-id: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b
parent: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 13:24:02 +0400
message:
MWL#17: Table elimination
- Post-postreview changes fix: Do set NESTED_JOIN::n_tables to number of
tables left after elimination.
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.23
revision-id: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7
parent: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 04:01:43 +0400
message:
MWL#17: Table elimination
- When making inferences "field is bound" -> "key is bound", do check
that the field is part of the key
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
------------------------------------------------------------
revno: 2707.1.22
revision-id: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq
parent: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 03:43:02 +0400
message:
MWL#17: Table elimination
- Continue addressing review feedback: remove "unusable KEYUSEs"
extension as it is no longer needed.
modified:
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.1.21
revision-id: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps
parent: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 02:34:21 +0400
message:
MWL#17: Table elimination
Address review feedback:
- Change from Wave-based approach (a-la const table detection) to
building and walking functional dependency graph.
- Change from piggy-backing on ref-access code and KEYUSE structures
to using our own expression analyzer.
modified:
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_bitmap.h sp1f-sql_bitmap.h-20031024204444-g4eiad7vopzqxe2trxmt3fn3xsvnomvj
------------------------------------------------------------
revno: 2707.1.20
revision-id: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635
parent: psergey(a)askmonty.org-20090630132018-8qwou8bqiq5z1qjg
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-07-08 21:10:38 +0400
message:
MWL#17: Table elimination
- When collecting Item_subselect::refers_to, put references to the correct
subselect entry.
modified:
sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph
------------------------------------------------------------
revno: 2707.1.19
revision-id: psergey(a)askmonty.org-20090630132018-8qwou8bqiq5z1qjg
parent: psergey(a)askmonty.org-20090630131100-r6o8yqzse4yvny9l
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Tue 2009-06-30 17:20:18 +0400
message:
MWL#17: Table elimination
- More comments
- Renove old code
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
------------------------------------------------------------
revno: 2707.1.18
revision-id: psergey(a)askmonty.org-20090630131100-r6o8yqzse4yvny9l
parent: psergey(a)askmonty.org-20090629135115-472up9wsj0dq843i
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Tue 2009-06-30 17:11:00 +0400
message:
MWL#17: Table elimination
- Last fixes
modified:
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
------------------------------------------------------------
revno: 2707.1.17
revision-id: psergey(a)askmonty.org-20090629135115-472up9wsj0dq843i
parent: psergey(a)askmonty.org-20090625200729-u11xpwwn5ebddx09
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Mon 2009-06-29 17:51:15 +0400
message:
MWL#17: Table elimination
modified:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
------------------------------------------------------------
revno: 2707.1.16
revision-id: psergey(a)askmonty.org-20090625200729-u11xpwwn5ebddx09
parent: psergey(a)askmonty.org-20090625100947-mg9xwnbeyyjgzl3w
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-movearound
timestamp: Fri 2009-06-26 00:07:29 +0400
message:
MWL#17: Table elimination
- Better comments, variable/function renames
modified:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.1.15
revision-id: psergey(a)askmonty.org-20090625100947-mg9xwnbeyyjgzl3w
parent: psergey(a)askmonty.org-20090624224414-71xqbljy8jf4z1qs
parent: psergey(a)askmonty.org-20090625100553-j1xenbz3o5nekiu2
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Thu 2009-06-25 14:09:47 +0400
message:
Automerge
added:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
modified:
.bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj
libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g
sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy
sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo
sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.3.1
revision-id: psergey(a)askmonty.org-20090625100553-j1xenbz3o5nekiu2
parent: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-movearound
timestamp: Thu 2009-06-25 14:05:53 +0400
message:
MWL#17: Table elimination
- Moved table elimination code to sql/opt_table_elimination.cc
- Added comments
added:
sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1
modified:
.bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj
libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g
sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy
sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo
sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.1.14
revision-id: psergey(a)askmonty.org-20090624224414-71xqbljy8jf4z1qs
parent: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Thu 2009-06-25 02:44:14 +0400
message:
MWL#17: Table elimination
- fix a typo bug in has_eqref_access_candidate()
- Adjust test to remove race condition
modified:
mysql-test/r/mysql-bug41486.result mysqlbug41486.result-20090323135900-fobg67a3yzg0b7e8-1
mysql-test/t/mysql-bug41486.test mysqlbug41486.test-20090323135900-fobg67a3yzg0b7e8-2
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
------------------------------------------------------------
revno: 2707.1.13
revision-id: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d
parent: psergey(a)askmonty.org-20090623200613-w9dl8g41ysf51r80
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-24 13:01:04 +0400
message:
More comments
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.12
revision-id: psergey(a)askmonty.org-20090623200613-w9dl8g41ysf51r80
parent: psergey(a)askmonty.org-20090622114631-yop0q2p8ktmfnctm
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-24 00:06:13 +0400
message:
MWL#17: Table elimination
- More testcases
- Let add_ft_key() set keyuse->usable
modified:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.11
revision-id: psergey(a)askmonty.org-20090622114631-yop0q2p8ktmfnctm
parent: psergey(a)askmonty.org-20090617052739-37i1r8lip0m4ft9r
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Mon 2009-06-22 15:46:31 +0400
message:
MWL#17: Table elimination
- Make elimination check to be able detect cases like t.primary_key_col1=othertbl.col AND t.primary_key_col2=func(t.primary_key_col1).
These are needed to handle e.g. the case of func() being a correlated subquery that selects the latest value.
- If we've removed a condition with subquery predicate, EXPLAIN [EXTENDED] won't show the subquery anymore
modified:
sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo
sql/item_subselect.h sp1f-item_subselect.h-20020512204640-qdg77wil56cxyhtc2bjjdrppxq3wqgh3
sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm
sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph
sql/sql_lex.h sp1f-sql_lex.h-19700101030959-sgldb2sooc7twtw5q7pgjx7qzqiaa3sn
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.1.10
revision-id: psergey(a)askmonty.org-20090617052739-37i1r8lip0m4ft9r
parent: psergey(a)askmonty.org-20090616204358-yjkyfxczsomrn9yn
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-17 09:27:39 +0400
message:
* Use excessive parentheses to stop compiler warning
* Fix test results to account for changes in previous cset
modified:
mysql-test/r/select.result sp1f-select.result-20010103001548-znkoalxem6wchsbxizfosjhpfmhfyxuk
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.9
revision-id: psergey(a)askmonty.org-20090616204358-yjkyfxczsomrn9yn
parent: psergey(a)askmonty.org-20090616195413-rfmi9un20za8gn8g
parent: psergey(a)askmonty.org-20090615162208-p4w8s8jo06bdz1vj
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-17 00:43:58 +0400
message:
* Merge
* Change valgrind suppression to work on valgrind 3.3.0
modified:
mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy
------------------------------------------------------------
revno: 2707.2.1
revision-id: psergey(a)askmonty.org-20090615162208-p4w8s8jo06bdz1vj
parent: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-movearound
timestamp: Mon 2009-06-15 20:22:08 +0400
message:
Fix spurious valgrind warnings in rpl_trigger.test
modified:
mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy
------------------------------------------------------------
revno: 2707.1.8
revision-id: psergey(a)askmonty.org-20090616195413-rfmi9un20za8gn8g
parent: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Tue 2009-06-16 23:54:13 +0400
message:
MWL#17: Table elimination
- Move eliminate_tables() to before constant table detection.
- First code for benchmark
added:
sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.7
revision-id: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp
parent: psergey(a)askmonty.org-20090614123504-jf4pcb333ojwaxfy
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Mon 2009-06-15 00:59:24 +0400
message:
MWL#17: Table elimination
- Fix print_join() to work both for EXPLAIN EXTENDED (after table elimination) and for
CREATE VIEW (after join->prepare() but without any optimization).
modified:
mysql-test/r/union.result sp1f-unions_one.result-20010725122836-ofxtwraxeohz7whhrmfdz57sl4a5prmp
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.6
revision-id: psergey(a)askmonty.org-20090614123504-jf4pcb333ojwaxfy
parent: psergey(a)askmonty.org-20090614100110-u7l54gk0b6zbtj50
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Sun 2009-06-14 16:35:04 +0400
message:
MWL#17: Table elimination
- Fix the previous cset: take into account that select_lex may be printed when
1. There is no select_lex->join at all (in that case, assume that no tables were eliminated)
2. select_lex->join exists but there was no JOIN::optimize() call yet. handle this by initializing join->eliminated really early.
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
------------------------------------------------------------
revno: 2707.1.5
revision-id: psergey(a)askmonty.org-20090614100110-u7l54gk0b6zbtj50
parent: psergey(a)askmonty.org-20090609211133-wfau2tgwo2vpgc5d
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Sun 2009-06-14 14:01:10 +0400
message:
MWL#17: Table elimination
- Do not show eliminated tables in the output of EXPLAIN EXTENDED
modified:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
------------------------------------------------------------
revno: 2707.1.4
revision-id: psergey(a)askmonty.org-20090609211133-wfau2tgwo2vpgc5d
parent: psergey(a)askmonty.org-20090608135546-ut1yrzbah4gdw6e6
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-10 01:11:33 +0400
message:
MWL#17: Table elimination
- Make elimination work with aggregate functions. The problem was that aggregate functions
reported all table bits in used_tables(), and that prevented table elimination. Fixed by
making aggregate functions return more correct value from used_tables().
modified:
mysql-test/r/ps_11bugs.result sp1f-ps_11bugs.result-20041012140047-4pktjlfeq27q6bxqfdsbcszr5nybv6zz
mysql-test/r/subselect.result sp1f-subselect.result-20020512204640-zgegcsgavnfd7t7eyrf7ibuqomsw7uzo
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa
sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm
sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl
------------------------------------------------------------
revno: 2707.1.3
revision-id: psergey(a)askmonty.org-20090608135546-ut1yrzbah4gdw6e6
parent: psergey(a)askmonty.org-20090607182938-ycajee5ozg33b7c8
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-fix
timestamp: Mon 2009-06-08 17:55:46 +0400
message:
Fix valgrind failure: provide an implementation of strmov_overlapp() that really can
handle overlapping.
added:
strings/strmov_overlapp.c strmov_overlapp.c-20090608135132-403c5p4dlnexqwxi-1
modified:
include/m_string.h sp1f-m_string.h-19700101030959-rraattbvw5ffkokv4sixxf3s7brqqaga
libmysql/Makefile.shared sp1f-makefile.shared-20000818182429-m3kdhxi23vorlqjct2y2hl3yw357jtxt
strings/Makefile.am sp1f-makefile.am-19700101030959-jfitkanzc3r4h2otoyaaprgqn7muf4ux
------------------------------------------------------------
revno: 2707.1.2
revision-id: psergey(a)askmonty.org-20090607182938-ycajee5ozg33b7c8
parent: psergey(a)askmonty.org-20090603182330-ll3gc91iowhtgb23
parent: psergey(a)askmonty.org-20090607182403-6sfpvdr7nkkekcy9
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1
timestamp: Sun 2009-06-07 22:29:38 +0400
message:
Merge MWL#17: Table elimination
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2705.2.2
revision-id: psergey(a)askmonty.org-20090607182403-6sfpvdr7nkkekcy9
parent: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Sun 2009-06-07 22:24:03 +0400
message:
MWL#17: Table Elimination
- Fix trivial valgrind warning
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
------------------------------------------------------------
revno: 2707.1.1
revision-id: psergey(a)askmonty.org-20090603182330-ll3gc91iowhtgb23
parent: knielsen(a)knielsen-hq.org-20090602110359-n4q9gof38buucrny
parent: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1
timestamp: Wed 2009-06-03 22:23:30 +0400
message:
Merge MWL#17 with maria/5.1
added:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
------------------------------------------------------------
revno: 2705.2.1
revision-id: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv
parent: knielsen(a)knielsen-hq.org-20090522175325-xpwm83ilnhqoqjz0
committer: Sergey Petrunia <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-06-03 17:10:45 +0400
message:
MWL#17: Table elimination
- First code. Elimination works for simple cases, passes the testsuite.
- Known issues:
= No elimination is done for aggregate functions.
= EXPLAIN EXTENDED shows eliminated tables (I think it better not)
= No benchmark yet
= The code needs some polishing.
added:
mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1
mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1
modified:
sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb
sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz
sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija
Diff too large for email (3022 lines, the limit is 1000).
1
0
[Maria-developers] Rev 2734: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2734
revision-id: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv
parent: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Fri 2009-08-14 00:44:52 +0400
message:
MWL#17: Table elimination
- More function renames, added comments
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-13 19:10:53 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-13 20:44:52 +0000
@@ -93,11 +93,9 @@
/*
- A field.
- - Depends on table or equality
- - Has expressions it participates as dependencies
-
- There is no counter, bound fields are in $list, not bound are not.
+ A table field. There is only one such object for any tblX.fieldY
+ - the field epends on its table and equalities
+ - expressions that use the field are its dependencies
*/
class Field_dep : public Func_dep
{
@@ -107,19 +105,23 @@
{
type= Func_dep::FD_FIELD;
}
- /* Table we're from. It also has pointers to keys that we're part of */
- Table_dep *table;
+
+ Table_dep *table; /* Table this field is from */
Field *field;
+ /*
+ Field_deps that belong to one table form a linked list. list members are
+ ordered by field_index
+ */
Field_dep *next_table_field;
uint bitmap_offset; /* Offset of our part of the bitmap */
};
/*
- A unique key.
- - Depends on all its components
- - Has its table as dependency
+ A Unique key.
+ - Unique key depends on all of its components
+ - Key's table is its dependency
*/
class Key_dep: public Func_dep
{
@@ -133,14 +135,15 @@
Table_dep *table; /* Table this key is from */
uint keyno;
uint n_missing_keyparts;
+ /* Unique keys form a linked list, ordered by keyno */
Key_dep *next_table_key;
};
/*
- A table.
- - Depends on any of its unique keys
- - Has its fields and embedding outer join as dependency.
+ A table.
+ - table depends on any of its unique keys
+ - has its fields and embedding outer join as dependency.
*/
class Table_dep : public Func_dep
{
@@ -151,16 +154,16 @@
type= Func_dep::FD_TABLE;
}
TABLE *table;
- Field_dep *fields; /* Fields that belong to this table */
- Key_dep *keys; /* Unique keys */
- Outer_join_dep *outer_join_dep;
+ Field_dep *fields; /* Ordered list of fields that belong to this table */
+ Key_dep *keys; /* Ordered list of Unique keys in this table */
+ Outer_join_dep *outer_join_dep; /* Innermost eliminable outer join we're in */
};
/*
- An outer join nest.
- - Depends on all tables inside it.
- - (And that's it).
+ An outer join nest that is subject to elimination
+ - it depends on all tables inside it
+ - has its parent outer join as dependency
*/
class Outer_join_dep: public Func_dep
{
@@ -171,14 +174,27 @@
{
type= Func_dep::FD_OUTER_JOIN;
}
+ /*
+ Outer join we're representing. This can be a join nest or a one table that
+ is outer join'ed.
+ */
TABLE_LIST *table_list;
+
+ /*
+ Tables within this outer join (and its descendants) that are not yet known
+ to be functionally dependent.
+ */
table_map missing_tables;
+ /* All tables within this outer join and its descendants */
table_map all_tables;
+ /* Parent eliminable outer join, if any */
Outer_join_dep *parent;
};
-/* TODO need this? */
+/*
+ Table elimination context
+*/
class Table_elimination
{
public:
@@ -204,20 +220,22 @@
static
-void build_funcdeps_for_cond(Table_elimination *te, Equality_dep **fdeps,
- uint *and_level, Item *cond,
- table_map usable_tables);
+void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps,
+ uint *and_level, Item *cond,
+ table_map usable_tables);
static
-void add_funcdep(Table_elimination *te,
- Equality_dep **eq_dep, uint and_level,
- Item_func *cond, Field *field,
- bool eq_func, Item **value,
- uint num_values, table_map usable_tables);
+void add_eq_dep(Table_elimination *te,
+ Equality_dep **eq_dep, uint and_level,
+ Item_func *cond, Field *field,
+ bool eq_func, Item **value,
+ uint num_values, table_map usable_tables);
static
Equality_dep *merge_func_deps(Equality_dep *start, Equality_dep *new_fields,
Equality_dep *end, uint and_level);
-Field_dep *get_field_dep(Table_elimination *te, Field *field);
+static Table_dep *get_table_dep(Table_elimination *te, TABLE *table);
+static Field_dep *get_field_dep(Table_elimination *te, Field *field);
+
void eliminate_tables(JOIN *join);
static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl);
@@ -228,24 +246,25 @@
/*******************************************************************************************/
/*
- Produce FUNC_DEP elements for the given item (i.e. condition) and add them
- to fdeps array.
+ Produce Eq_dep elements for given condition.
SYNOPSIS
- build_funcdeps_for_cond()
- fdeps INOUT Put created FUNC_DEP structures here
-
+ build_eq_deps_for_cond()
+ te Table elimination context
+ fdeps INOUT Put produced equality conditions here
+ and_level INOUT AND-level (like in add_key_fields)
+ cond Condition to process
+ usable_tables Tables which fields we're interested in. That is,
+ Equality_dep represent "tbl.col=expr" and we'll
+ produce them only if tbl is in usable_tables.
DESCRIPTION
- a
-
- SEE ALSO
- add_key_fields()
-
+ This function is modeled after add_key_fields()
*/
+
static
-void build_funcdeps_for_cond(Table_elimination *te,
- Equality_dep **fdeps, uint *and_level, Item *cond,
- table_map usable_tables)
+void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps,
+ uint *and_level, Item *cond,
+ table_map usable_tables)
{
if (cond->type() == Item_func::COND_ITEM)
{
@@ -258,7 +277,7 @@
Item *item;
while ((item=li++))
{
- build_funcdeps_for_cond(te, fdeps, and_level, item, usable_tables);
+ build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables);
}
/*
TODO: inject here a "if we have {t.col=const AND t.col=smth_else}, then
@@ -270,13 +289,13 @@
else
{
(*and_level)++;
- build_funcdeps_for_cond(te, fdeps, and_level, li++, usable_tables);
+ build_eq_deps_for_cond(te, fdeps, and_level, li++, usable_tables);
Item *item;
while ((item=li++))
{
Equality_dep *start_key_fields= *fdeps;
(*and_level)++;
- build_funcdeps_for_cond(te, fdeps, and_level, item, usable_tables);
+ build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables);
*fdeps= merge_func_deps(org_key_fields, start_key_fields, *fdeps,
++(*and_level));
}
@@ -304,11 +323,11 @@
values--;
DBUG_ASSERT(cond_func->functype() != Item_func::IN_FUNC ||
cond_func->argument_count() != 2);
- add_funcdep(te, fdeps, *and_level, cond_func,
- ((Item_field*)(cond_func->key_item()->real_item()))->field,
- 0, values,
- cond_func->argument_count()-1,
- usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func,
+ ((Item_field*)(cond_func->key_item()->real_item()))->field,
+ 0, values,
+ cond_func->argument_count()-1,
+ usable_tables);
}
if (cond_func->functype() == Item_func::BETWEEN)
{
@@ -321,8 +340,8 @@
!(cond_func->arguments()[i]->used_tables() & OUTER_REF_TABLE_BIT))
{
field_item= (Item_field *) (cond_func->arguments()[i]->real_item());
- add_funcdep(te, fdeps, *and_level, cond_func,
- field_item->field, 0, values, 1, usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func,
+ field_item->field, 0, values, 1, usable_tables);
}
}
}
@@ -336,19 +355,19 @@
if (cond_func->arguments()[0]->real_item()->type() == Item::FIELD_ITEM &&
!(cond_func->arguments()[0]->used_tables() & OUTER_REF_TABLE_BIT))
{
- add_funcdep(te, fdeps, *and_level, cond_func,
- ((Item_field*)(cond_func->arguments()[0])->real_item())->field,
- equal_func,
- cond_func->arguments()+1, 1, usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func,
+ ((Item_field*)(cond_func->arguments()[0])->real_item())->field,
+ equal_func,
+ cond_func->arguments()+1, 1, usable_tables);
}
if (cond_func->arguments()[1]->real_item()->type() == Item::FIELD_ITEM &&
cond_func->functype() != Item_func::LIKE_FUNC &&
!(cond_func->arguments()[1]->used_tables() & OUTER_REF_TABLE_BIT))
{
- add_funcdep(te, fdeps, *and_level, cond_func,
- ((Item_field*)(cond_func->arguments()[1])->real_item())->field,
- equal_func,
- cond_func->arguments(),1,usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func,
+ ((Item_field*)(cond_func->arguments()[1])->real_item())->field,
+ equal_func,
+ cond_func->arguments(),1,usable_tables);
}
break;
}
@@ -360,10 +379,10 @@
Item *tmp=new Item_null;
if (unlikely(!tmp)) // Should never be true
return;
- add_funcdep(te, fdeps, *and_level, cond_func,
- ((Item_field*)(cond_func->arguments()[0])->real_item())->field,
- cond_func->functype() == Item_func::ISNULL_FUNC,
- &tmp, 1, usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func,
+ ((Item_field*)(cond_func->arguments()[0])->real_item())->field,
+ cond_func->functype() == Item_func::ISNULL_FUNC,
+ &tmp, 1, usable_tables);
}
break;
case Item_func::OPTIMIZE_EQUAL:
@@ -380,8 +399,8 @@
*/
while ((item= it++))
{
- add_funcdep(te, fdeps, *and_level, cond_func, item->field,
- TRUE, &const_item, 1, usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func, item->field,
+ TRUE, &const_item, 1, usable_tables);
}
}
else
@@ -400,8 +419,8 @@
{
if (!field->eq(item->field))
{
- add_funcdep(te, fdeps, *and_level, cond_func, field/*item*/,
- TRUE, (Item **) &item, 1, usable_tables);
+ add_eq_dep(te, fdeps, *and_level, cond_func, field,
+ TRUE, (Item **) &item, 1, usable_tables);
}
}
it.rewind();
@@ -411,15 +430,19 @@
}
}
+
/*
- Perform an OR operation on two (adjacent) FUNC_DEP arrays.
+ Perform an OR operation on two (adjacent) Equality_dep arrays.
SYNOPSIS
merge_func_deps()
+ start Start of left OR-part
+ new_fields Start of right OR-part
+ end End of right OR-part
+ and_level AND-level.
DESCRIPTION
-
- This function is invoked for two adjacent arrays of FUNC_DEP elements:
+ This function is invoked for two adjacent arrays of Equality_dep elements:
$LEFT_PART $RIGHT_PART
+-----------------------+-----------------------+
@@ -527,17 +550,18 @@
/*
- Add a funcdep for a given equality.
+ Add an Equality_dep element for a given predicate, if applicable
+
+ DESCRIPTION
+ This function is modeled after add_key_field().
*/
static
-void add_funcdep(Table_elimination *te,
- Equality_dep **eq_dep, uint and_level,
- Item_func *cond, Field *field,
- bool eq_func, Item **value,
- uint num_values, table_map usable_tables)
+void add_eq_dep(Table_elimination *te, Equality_dep **eq_dep,
+ uint and_level, Item_func *cond, Field *field,
+ bool eq_func, Item **value, uint num_values,
+ table_map usable_tables)
{
- // Field *field= item_field->field;
if (!(field->table->map & usable_tables))
return;
@@ -606,7 +630,11 @@
}
-Table_dep *get_table_dep(Table_elimination *te, TABLE *table)
+/*
+ Get a Table_dep object for the given table, creating it if necessary.
+*/
+
+static Table_dep *get_table_dep(Table_elimination *te, TABLE *table)
{
Table_dep *tbl_dep= new Table_dep(table);
Key_dep **key_list= &(tbl_dep->keys);
@@ -625,19 +653,21 @@
return te->table_deps[table->tablenr] = tbl_dep;
}
+
/*
- Given a field, get its dependency element: if it already exists, find it,
- otherwise create it.
+ Get a Field_dep object for the given field, creating it if necessary
*/
-Field_dep *get_field_dep(Table_elimination *te, Field *field)
+static Field_dep *get_field_dep(Table_elimination *te, Field *field)
{
TABLE *table= field->table;
Table_dep *tbl_dep;
+ /* First, get the table*/
if (!(tbl_dep= te->table_deps[table->tablenr]))
tbl_dep= get_table_dep(te, table);
-
+
+ /* Try finding the field in field list */
Field_dep **pfield= &(tbl_dep->fields);
while (*pfield && (*pfield)->field->field_index < field->field_index)
{
@@ -646,20 +676,34 @@
if (*pfield && (*pfield)->field->field_index == field->field_index)
return *pfield;
+ /* Create the field and insert it in the list */
Field_dep *new_field= new Field_dep(tbl_dep, field);
-
new_field->next_table_field= *pfield;
*pfield= new_field;
+
return new_field;
}
+/*
+ Create an Outer_join_dep object for the given outer join
+
+ DESCRIPTION
+ Outer_join_dep objects for children (or further descendants) are always
+ created before the parents.
+*/
+
+static
Outer_join_dep *get_outer_join_dep(Table_elimination *te,
TABLE_LIST *outer_join, table_map deps_map)
{
Outer_join_dep *oj_dep;
oj_dep= new Outer_join_dep(outer_join, deps_map);
-
+
+ /*
+ Collect a bitmap fo tables that we depend on, and also set parent pointer
+ for descendant outer join elements.
+ */
Table_map_iterator it(deps_map);
int idx;
while ((idx= it.next_bit()) != Table_map_iterator::BITMAP_END)
@@ -667,6 +711,11 @@
Table_dep *table_dep;
if (!(table_dep= te->table_deps[idx]))
{
+ /*
+ We get here only when ON expression had no references to inner tables
+ and Table_map objects weren't created for them. This is a rare/
+ unimportant case so it's ok to do not too efficient searches.
+ */
TABLE *table= NULL;
for (TABLE_LIST *tlist= te->join->select_lex->leaf_tables; tlist;
tlist=tlist->next_leaf)
@@ -680,7 +729,13 @@
DBUG_ASSERT(table);
table_dep= get_table_dep(te, table);
}
-
+
+ /*
+ Walk from the table up to its embedding outer joins. The goal is to
+ find the least embedded outer join nest and set its parent pointer to
+ point to the newly created Outer_join_dep.
+ to set the pointer of its near
+ */
if (!table_dep->outer_join_dep)
table_dep->outer_join_dep= oj_dep;
else
@@ -690,43 +745,35 @@
oj= oj->parent;
oj->parent=oj_dep;
}
-
}
return oj_dep;
}
/*
- Perform table elimination in a given join list
+ Build functional dependency graph for elements of given join list
SYNOPSIS
collect_funcdeps_for_join_list()
- te Table elimination context.
- join_list Join list to work on
- its_outer_join TRUE <=> the join_list is an inner side of an
- outer join
- FALSE <=> otherwise (this is top-level join
- list, simplify_joins flattens out all
- other kinds of join lists)
-
- tables_in_list Bitmap of tables embedded in the join_list.
- tables_used_elsewhere Bitmap of tables that are referred to from
- somewhere outside of the join list (e.g.
- select list, HAVING, etc).
+ te Table elimination context.
+ join_list Join list to work on
+ build_eq_deps TRUE <=> build Equality_dep elements for all
+ members of the join list, even if they cannot
+ be individually eliminated
+ tables_used_elsewhere Bitmap of tables that are referred to from
+ somewhere outside of this join list (e.g.
+ select list, HAVING, ON expressions of parent
+ joins, etc).
+ eliminable_tables INOUT Tables that can potentially be eliminated
+ (needed so we know for which tables to build
+ dependencies for)
+ eq_dep INOUT End of array of equality dependencies.
DESCRIPTION
- Perform table elimination for a join list.
- Try eliminating children nests first.
- The "all tables in join nest can produce only one matching record
- combination" property checking is modeled after constant table detection,
- plus we reuse info attempts to eliminate child join nests.
-
- RETURN
- Number of children left after elimination. 0 means everything was
- eliminated.
+ .
*/
-static uint
+static void
collect_funcdeps_for_join_list(Table_elimination *te,
List<TABLE_LIST> *join_list,
bool build_eq_deps,
@@ -771,7 +818,7 @@
{
// build comp_cond from ON expression
uint and_level=0;
- build_funcdeps_for_cond(te, eq_dep, &and_level, tbl->on_expr,
+ build_eq_deps_for_cond(te, eq_dep, &and_level, tbl->on_expr,
*eliminable_tables);
}
@@ -781,19 +828,13 @@
tables_used_on_left |= tbl->on_expr->used_tables();
}
}
- return 0;
+ return;
}
+
/*
- Analyze exising FUNC_DEP array and add elements for tables and uniq keys
-
- SYNOPSIS
-
- DESCRIPTION
- Add FUNC_DEP elements
-
- RETURN
- .
+ This is used to analyse expressions in "tbl.col=expr" dependencies so
+ that we can figure out which fields the expression depends on.
*/
class Field_dependency_setter : public Field_enumerator
@@ -819,20 +860,41 @@
return;
}
}
- /* We didn't find the field. Bump the dependency anyway */
+ /*
+ We got here if didn't find this field. It's not a part of
+ a unique key, and/or there is no field=expr element for it.
+ Bump the dependency anyway, this will signal that this dependency
+ cannot be satisfied.
+ */
te->equality_deps[expr_offset].unknown_args++;
}
}
+
Table_elimination *te;
- uint expr_offset; /* Offset of the expression we're processing */
+ /* Offset of the expression we're processing in the dependency bitmap */
+ uint expr_offset;
};
+/*
+ Setup equality dependencies
+
+ SYNOPSIS
+ setup_equality_deps()
+ te Table elimination context
+ bound_deps_list OUT Start of linked list of elements that were found to
+ be bound (caller will use this to see if that
+ allows to declare further elements bound)
+*/
+
static
bool setup_equality_deps(Table_elimination *te, Func_dep **bound_deps_list)
{
DBUG_ENTER("setup_equality_deps");
+ /*
+ Count Field_dep objects and assign each of them a unique bitmap_offset.
+ */
uint offset= 0;
for (Table_dep **tbl_dep=te->table_deps;
tbl_dep < te->table_deps + MAX_TABLES;
@@ -859,7 +921,10 @@
bitmap_clear_all(&te->expr_deps);
/*
- Walk through all field=expr elements and collect all fields.
+ Analyze all "field=expr" dependencies, and have te->expr_deps encode
+ dependencies of expressions from fields.
+
+ Also collect a linked list of equalities that are bound.
*/
Func_dep *bound_dep= NULL;
Field_dependency_setter deps_setter(te);
1
0
[Maria-developers] Rev 2733: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2733
revision-id: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc
parent: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 23:10:53 +0400
message:
MWL#17: Table elimination
- Better comments
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-13 09:36:13 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-13 19:10:53 +0000
@@ -20,19 +20,16 @@
OVERVIEW
The module has one entry point - eliminate_tables() function, which one
- needs to call (once) sometime after update_ref_and_keys() but before the
- join optimization.
+ needs to call (once) at some point before the join optimization.
eliminate_tables() operates over the JOIN structures. Logically, it
removes the right sides of outer join nests. Physically, it changes the
following members:
* Eliminated tables are marked as constant and moved to the front of the
join order.
+
* In addition to this, they are recorded in JOIN::eliminated_tables bitmap.
- * All join nests have their NESTED_JOIN::n_tables updated to discount
- the eliminated tables
-
* Items that became disused because they were in the ON expression of an
eliminated outer join are notified by means of the Item tree walk which
calls Item::mark_as_eliminated_processor for every item
@@ -40,26 +37,13 @@
Item_subselect with its Item_subselect::eliminated flag which is used
by EXPLAIN code to check if the subquery should be shown in EXPLAIN.
- Table elimination is redone on every PS re-execution. (TODO reasons?)
+ Table elimination is redone on every PS re-execution.
*/
+
/*
- A structure that represents a functional dependency of something over
- something else. This can be one of:
-
- 1. A "tbl.field = expr" equality. The field depends on the expression.
-
- 2. An Item_equal(...) multi-equality. Each participating field depends on
- every other participating field. (TODO???)
-
- 3. A UNIQUE_KEY(field1, field2, fieldN). The key depends on the fields that
- it is composed of.
-
- 4. A table (which is within an outer join nest). Table depends on a unique
- key (value of a unique key identifies a table record)
-
- 5. An outer join nest. It depends on all tables it contains.
-
+ An abstract structure that represents some entity that's being dependent on
+ some other entity.
*/
class Func_dep : public Sql_alloc
@@ -73,9 +57,14 @@
FD_UNIQUE_KEY,
FD_TABLE,
FD_OUTER_JOIN
- } type;
- Func_dep *next;
- bool bound;
+ } type; /* Type of the object */
+
+ /*
+ Used to make a linked list of elements that became bound and thus can
+ make elements that depend on them bound, too.
+ */
+ Func_dep *next;
+ bool bound; /* TRUE<=> The entity is considered bound */
Func_dep() : next(NULL), bound(FALSE) {}
};
@@ -84,10 +73,10 @@
class Table_dep;
class Outer_join_dep;
+
/*
- An equality
- - Depends on multiple fields (those in its expression), unknown_args is a
- counter of unsatisfied dependencies.
+ A "tbl.column= expr" equality dependency. tbl.column depends on fields
+ used in expr.
*/
class Equality_dep : public Func_dep
{
@@ -95,8 +84,11 @@
Field_dep *field;
Item *val;
- uint level; /* Used during condition analysis only */
- uint unknown_args; /* Number of yet unknown arguments */
+ /* Used during condition analysis only, similar to KEYUSE::level */
+ uint level;
+
+ /* Number of fields referenced from *val that are not yet 'bound' */
+ uint unknown_args;
};
@@ -139,7 +131,7 @@
type= Func_dep::FD_UNIQUE_KEY;
}
Table_dep *table; /* Table this key is from */
- uint keyno; // TODO do we care about this
+ uint keyno;
uint n_missing_keyparts;
Key_dep *next_table_key;
};
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-08-13 09:24:02 +0000
+++ b/sql/sql_select.cc 2009-08-13 19:10:53 +0000
@@ -114,7 +114,7 @@
COND *conds, bool top);
static bool check_interleaving_with_nj(JOIN_TAB *next);
static void restore_prev_nj_state(JOIN_TAB *last);
-static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list);
+static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list);
static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list,
uint first_unused);
@@ -8791,23 +8791,26 @@
tables which will be ignored.
*/
-static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list)
+static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list)
{
List_iterator<TABLE_LIST> li(*join_list);
TABLE_LIST *table;
DBUG_ENTER("reset_nj_counters");
+ uint n=0;
while ((table= li++))
{
NESTED_JOIN *nested_join;
if ((nested_join= table->nested_join))
{
nested_join->counter= 0;
- nested_join->n_tables= my_count_bits(nested_join->used_tables &
- ~join->eliminated_tables);
- reset_nj_counters(join, &nested_join->join_list);
+ //nested_join->n_tables= my_count_bits(nested_join->used_tables &
+ // ~join->eliminated_tables);
+ nested_join->n_tables= reset_nj_counters(join, &nested_join->join_list);
}
+ if (table->table && (table->table->map & ~join->eliminated_tables))
+ n++;
}
- DBUG_VOID_RETURN;
+ DBUG_RETURN(n);
}
1
0
[Maria-developers] Rev 2732: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2732
revision-id: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq
parent: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 13:36:13 +0400
message:
MWL#17: Table elimination
Fixes after post-review fixes:
- Don't search for tables in JOIN_TAB array. it's not initialized yet.
use select_lex->leaf_tables instead.
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-13 00:01:43 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-13 09:36:13 +0000
@@ -676,16 +676,12 @@
if (!(table_dep= te->table_deps[idx]))
{
TABLE *table= NULL;
- /*
- Locate and create the table. The search isnt very efficient but
- typically we won't get here as we process the ON expression first
- and that will create the Table_dep
- */
- for (uint i= 0; i < te->join->tables; i++)
+ for (TABLE_LIST *tlist= te->join->select_lex->leaf_tables; tlist;
+ tlist=tlist->next_leaf)
{
- if (te->join->join_tab[i].table->tablenr == (uint)idx)
+ if (tlist->table->tablenr == (uint)idx)
{
- table= te->join->join_tab[i].table;
+ table=tlist->table;
break;
}
}
1
0
[Maria-developers] Rev 2731: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2731
revision-id: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b
parent: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 13:24:02 +0400
message:
MWL#17: Table elimination
- Post-postreview changes fix: Do set NESTED_JOIN::n_tables to number of
tables left after elimination.
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-08-12 23:43:02 +0000
+++ b/sql/sql_select.cc 2009-08-13 09:24:02 +0000
@@ -114,7 +114,7 @@
COND *conds, bool top);
static bool check_interleaving_with_nj(JOIN_TAB *next);
static void restore_prev_nj_state(JOIN_TAB *last);
-static void reset_nj_counters(List<TABLE_LIST> *join_list);
+static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list);
static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list,
uint first_unused);
@@ -1011,7 +1011,7 @@
DBUG_RETURN(1);
}
- reset_nj_counters(join_list);
+ reset_nj_counters(this, join_list);
make_outerjoin_info(this);
/*
@@ -4625,7 +4625,7 @@
DBUG_ENTER("choose_plan");
join->cur_embedding_map= 0;
- reset_nj_counters(join->join_list);
+ reset_nj_counters(join, join->join_list);
/*
if (SELECT_STRAIGHT_JOIN option is set)
reorder tables so dependent tables come after tables they depend
@@ -8791,7 +8791,7 @@
tables which will be ignored.
*/
-static void reset_nj_counters(List<TABLE_LIST> *join_list)
+static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list)
{
List_iterator<TABLE_LIST> li(*join_list);
TABLE_LIST *table;
@@ -8802,7 +8802,9 @@
if ((nested_join= table->nested_join))
{
nested_join->counter= 0;
- reset_nj_counters(&nested_join->join_list);
+ nested_join->n_tables= my_count_bits(nested_join->used_tables &
+ ~join->eliminated_tables);
+ reset_nj_counters(join, &nested_join->join_list);
}
}
DBUG_VOID_RETURN;
1
0
[Maria-developers] Rev 2730: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2730
revision-id: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7
parent: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 04:01:43 +0400
message:
MWL#17: Table elimination
- When making inferences "field is bound" -> "key is bound", do check
that the field is part of the key
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-12 23:43:02 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-13 00:01:43 +0000
@@ -1043,7 +1043,8 @@
DBUG_PRINT("info", ("key %s.%s is now bound",
key_dep->table->table->alias,
key_dep->table->table->key_info[key_dep->keyno].name));
- if (!key_dep->bound)
+ if (field_dep->field->part_of_key.is_set(key_dep->keyno) &&
+ !key_dep->bound)
{
if (!--key_dep->n_missing_keyparts)
{
1
0
[Maria-developers] Rev 2729: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2729
revision-id: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq
parent: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 03:43:02 +0400
message:
MWL#17: Table elimination
- Continue addressing review feedback: remove "unusable KEYUSEs"
extension as it is no longer needed.
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-08-12 22:34:21 +0000
+++ b/sql/item.h 2009-08-12 23:43:02 +0000
@@ -1017,18 +1017,6 @@
bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs);
};
-#if 0
-typedef struct
-{
- TABLE *table; /* Table of interest */
- uint keyno; /* Index of interest */
- uint forbidden_part; /* key part which one is not allowed to refer to */
- /* [Set by processor] used tables, besides the table of interest */
- table_map used_tables;
- /* [Set by processor] Parts of index of interest that expression refers to */
- uint needed_key_parts;
-} Field_processor_info;
-#endif
/* Data for Item::check_column_usage_processor */
class Field_enumerator
=== modified file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 2009-08-12 22:34:21 +0000
+++ b/sql/opt_table_elimination.cc 2009-08-12 23:43:02 +0000
@@ -1119,7 +1119,6 @@
case Func_dep::FD_OUTER_JOIN:
{
Outer_join_dep *outer_join_dep= (Outer_join_dep*)bound_dep;
- /* TODO what do here? Stop if eliminated the top-level? */
mark_as_eliminated(te.join, outer_join_dep->table_list);
Outer_join_dep *parent= outer_join_dep->parent;
if (parent &&
@@ -1236,38 +1235,6 @@
#endif
-/***********************************************************************************************/
-
-#if 0
-static void dbug_print_fdep(FUNC_DEP *fd)
-{
- switch (fd->type) {
- case FUNC_DEP::FD_OUTER_JOIN:
- {
- fprintf(DBUG_FILE, "outer_join(");
- if (fd->table_list->nested_join)
- {
- bool first= TRUE;
- List_iterator<TABLE_LIST> it(fd->table_list->nested_join->join_list);
- TABLE_LIST *tbl;
- while ((tbl= it++))
- {
- fprintf(DBUG_FILE, "%s%s", first?"":" ",
- tbl->table? tbl->table->alias : "...");
- first= FALSE;
- }
- fprintf(DBUG_FILE, ")");
- }
- else
- fprintf(DBUG_FILE, "%s", fd->table_list->table->alias);
- fprintf(DBUG_FILE, ")");
- break;
- }
- }
-}
-
-#endif
-
/**
@} (end of group Table_Elimination)
*/
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-30 13:11:00 +0000
+++ b/sql/sql_select.cc 2009-08-12 23:43:02 +0000
@@ -2474,7 +2474,6 @@
DBUG_RETURN(HA_POS_ERROR); /* This shouldn't happend */
}
-
/*
This structure is used to collect info on potentially sargable
predicates in order to check whether they become sargable after
@@ -2762,16 +2761,14 @@
{
start_keyuse=keyuse;
key=keyuse->key;
- if (keyuse->type == KEYUSE_USABLE)
- s->keys.set_bit(key); // QQ: remove this ?
+ s->keys.set_bit(key); // QQ: remove this ?
refs=0;
const_ref.clear_all();
eq_part.clear_all();
do
{
- if (keyuse->type == KEYUSE_USABLE &&
- keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize)
+ if (keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize)
{
if (!((~found_const_table_map) & keyuse->used_tables))
const_ref.set_bit(keyuse->keypart);
@@ -2971,9 +2968,11 @@
*/
bool null_rejecting;
bool *cond_guard; /* See KEYUSE::cond_guard */
- enum keyuse_type type; /* See KEYUSE::type */
} KEY_FIELD;
+/* Values in optimize */
+#define KEY_OPTIMIZE_EXISTS 1
+#define KEY_OPTIMIZE_REF_OR_NULL 2
/**
Merge new key definitions to old ones, remove those not used in both.
@@ -3064,18 +3063,13 @@
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
- /*
- The conditions are the same, hence their usabilities should
- be, too (TODO: shouldn't that apply to the above
- null_rejecting and optimize attributes?)
- */
- DBUG_ASSERT(old->type == new_fields->type);
}
}
else if (old->eq_func && new_fields->eq_func &&
old->val->eq_by_collation(new_fields->val,
old->field->binary(),
old->field->charset()))
+
{
old->level= and_level;
old->optimize= ((old->optimize & new_fields->optimize &
@@ -3084,15 +3078,10 @@
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
- // "t.key_col=const" predicates are always usable
- DBUG_ASSERT(old->type == KEYUSE_USABLE &&
- new_fields->type == KEYUSE_USABLE);
}
else if (old->eq_func && new_fields->eq_func &&
- ((new_fields->type == KEYUSE_USABLE &&
- old->val->const_item() && old->val->is_null()) ||
- ((old->type == KEYUSE_USABLE && new_fields->val->is_null()))))
- /* TODO ^ why is the above asymmetric, why const_item()? */
+ ((old->val->const_item() && old->val->is_null()) ||
+ new_fields->val->is_null()))
{
/* field = expression OR field IS NULL */
old->level= and_level;
@@ -3163,7 +3152,6 @@
table_map usable_tables, SARGABLE_PARAM **sargables)
{
uint exists_optimize= 0;
- bool optimizable=0;
if (!(field->flags & PART_KEY_FLAG))
{
// Don't remove column IS NULL on a LEFT JOIN table
@@ -3176,12 +3164,15 @@
else
{
table_map used_tables=0;
+ bool optimizable=0;
for (uint i=0; i<num_values; i++)
{
used_tables|=(value[i])->used_tables();
if (!((value[i])->used_tables() & (field->table->map | RAND_TABLE_BIT)))
optimizable=1;
}
+ if (!optimizable)
+ return;
if (!(usable_tables & field->table->map))
{
if (!eq_func || (*value)->type() != Item::NULL_ITEM ||
@@ -3194,8 +3185,7 @@
JOIN_TAB *stat=field->table->reginfo.join_tab;
key_map possible_keys=field->key_start;
possible_keys.intersect(field->table->keys_in_use_for_query);
- if (optimizable)
- stat[0].keys.merge(possible_keys); // Add possible keys
+ stat[0].keys.merge(possible_keys); // Add possible keys
/*
Save the following cases:
@@ -3288,7 +3278,6 @@
(*key_fields)->val= *value;
(*key_fields)->level= and_level;
(*key_fields)->optimize= exists_optimize;
- (*key_fields)->type= optimizable? KEYUSE_USABLE : KEYUSE_UNKNOWN;
/*
If the condition has form "tbl.keypart = othertbl.field" and
othertbl.field can be NULL, there will be no matches if othertbl.field
@@ -3600,7 +3589,6 @@
keyuse.optimize= key_field->optimize & KEY_OPTIMIZE_REF_OR_NULL;
keyuse.null_rejecting= key_field->null_rejecting;
keyuse.cond_guard= key_field->cond_guard;
- keyuse.type= key_field->type;
VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse));
}
}
@@ -3609,6 +3597,7 @@
}
+#define FT_KEYPART (MAX_REF_PARTS+10)
static void
add_ft_keys(DYNAMIC_ARRAY *keyuse_array,
@@ -3667,7 +3656,6 @@
keyuse.used_tables=cond_func->key_item()->used_tables();
keyuse.optimize= 0;
keyuse.keypart_map= 0;
- keyuse.type= KEYUSE_USABLE;
VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse));
}
@@ -3682,13 +3670,6 @@
return (int) (a->key - b->key);
if (a->keypart != b->keypart)
return (int) (a->keypart - b->keypart);
-
- // Usable ones go before the unusable
- int a_ok= test(a->type == KEYUSE_USABLE);
- int b_ok= test(b->type == KEYUSE_USABLE);
- if (a_ok != b_ok)
- return a_ok? -1 : 1;
-
// Place const values before other ones
if ((res= test((a->used_tables & ~OUTER_REF_TABLE_BIT)) -
test((b->used_tables & ~OUTER_REF_TABLE_BIT))))
@@ -3899,8 +3880,7 @@
found_eq_constant=0;
for (i=0 ; i < keyuse->elements-1 ; i++,use++)
{
- if (use->type == KEYUSE_USABLE && !use->used_tables &&
- use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
+ if (!use->used_tables && use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
use->table->const_key_parts[use->key]|= use->keypart_map;
if (use->keypart != FT_KEYPART)
{
@@ -3924,8 +3904,7 @@
/* Save ptr to first use */
if (!use->table->reginfo.join_tab->keyuse)
use->table->reginfo.join_tab->keyuse=save_pos;
- if (use->type == KEYUSE_USABLE)
- use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
+ use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
save_pos++;
}
i=(uint) (save_pos-(KEYUSE*) keyuse->buffer);
@@ -3955,7 +3934,7 @@
To avoid bad matches, we don't make ref_table_rows less than 100.
*/
keyuse->ref_table_rows= ~(ha_rows) 0; // If no ref
- if (keyuse->type == KEYUSE_USABLE && keyuse->used_tables &
+ if (keyuse->used_tables &
(map= (keyuse->used_tables & ~join->const_table_map &
~OUTER_REF_TABLE_BIT)))
{
@@ -4147,8 +4126,7 @@
if 1. expression doesn't refer to forward tables
2. we won't get two ref-or-null's
*/
- if (keyuse->type == KEYUSE_USABLE &&
- !(remaining_tables & keyuse->used_tables) &&
+ if (!(remaining_tables & keyuse->used_tables) &&
!(ref_or_null_part && (keyuse->optimize &
KEY_OPTIMIZE_REF_OR_NULL)))
{
@@ -5602,8 +5580,7 @@
*/
do
{
- if (!(~used_tables & keyuse->used_tables) &&
- keyuse->type == KEYUSE_USABLE)
+ if (!(~used_tables & keyuse->used_tables))
{
if (keyparts == keyuse->keypart &&
!(found_part_ref_or_null & keyuse->optimize))
@@ -5653,11 +5630,9 @@
uint i;
for (i=0 ; i < keyparts ; keyuse++,i++)
{
- while (keyuse->keypart != i || ((~used_tables) & keyuse->used_tables) ||
- !(keyuse->type == KEYUSE_USABLE))
- {
+ while (keyuse->keypart != i ||
+ ((~used_tables) & keyuse->used_tables))
keyuse++; /* Skip other parts */
- }
uint maybe_null= test(keyinfo->key_part[i].null_bit);
j->ref.items[i]=keyuse->val; // Save for cond removal
=== modified file 'sql/sql_select.h'
--- a/sql/sql_select.h 2009-06-30 13:11:00 +0000
+++ b/sql/sql_select.h 2009-08-12 23:43:02 +0000
@@ -28,45 +28,6 @@
#include "procedure.h"
#include <myisam.h>
-#define FT_KEYPART (MAX_REF_PARTS+10)
-/* Values in optimize */
-#define KEY_OPTIMIZE_EXISTS 1
-#define KEY_OPTIMIZE_REF_OR_NULL 2
-
-/* KEYUSE element types */
-enum keyuse_type
-{
- /*
- val refers to the same table, this is either KEYUSE_BIND or KEYUSE_NO_BIND
- type, we didn't determine which one yet.
- */
- KEYUSE_UNKNOWN= 0,
- /*
- 'regular' keyuse, i.e. it represents one of the following
- * t.keyXpartY = func(constants, other-tables)
- * t.keyXpartY IS NULL
- * t.keyXpartY = func(constants, other-tables) OR t.keyXpartY IS NULL
- and can be used to construct ref acces
- */
- KEYUSE_USABLE,
- /*
- The keyuse represents a condition in form:
-
- t.uniq_keyXpartY = func(other parts of uniq_keyX)
-
- This can't be used to construct uniq_keyX but we could use it to determine
- that the table will produce at most one match.
- */
- KEYUSE_BIND,
- /*
- Keyuse that's not usable for ref access and doesn't meet the criteria of
- KEYUSE_BIND. Examples:
- t.keyXpartY = func(t.keyXpartY)
- t.keyXpartY = func(column of t that's not covered by keyX)
- */
- KEYUSE_NO_BIND
-};
-
typedef struct keyuse_t {
TABLE *table;
Item *val; /**< or value if no field */
@@ -90,15 +51,6 @@
NULL - Otherwise (the source equality can't be turned off)
*/
bool *cond_guard;
- /*
- 1 <=> This keyuse can be used to construct key access.
- 0 <=> Otherwise. Currently unusable KEYUSEs represent equalities
- where one table column refers to another one, like this:
- t.keyXpartA=func(t.keyXpartB)
- This equality cannot be used for index access but is useful
- for table elimination.
- */
- enum keyuse_type type;
} KEYUSE;
class store_key;
@@ -258,7 +210,7 @@
JOIN *join;
/** Bitmap of nested joins this table is part of */
nested_join_map embedding_map;
-
+
void cleanup();
inline bool is_using_loose_index_scan()
{
1
0
[Maria-developers] Rev 2728: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09
by Sergey Petrunya 13 Aug '09
13 Aug '09
At file:///home/psergey/dev/maria-5.1-table-elim-r5/
------------------------------------------------------------
revno: 2728
revision-id: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps
parent: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-r5
timestamp: Thu 2009-08-13 02:34:21 +0400
message:
MWL#17: Table elimination
Address review feedback:
- Change from Wave-based approach (a-la const table detection) to
building and walking functional dependency graph.
- Change from piggy-backing on ref-access code and KEYUSE structures
to using our own expression analyzer.
Diff too large for email (1602 lines, the limit is 1000).
1
0
I have now implemented and installed in our Buildbot enhanced facilities for
dealing with compiler warnings.
We already have a file support-files/compiler_warnings.supp, which I think is
used by PushBuild @ MySQL. The new facilities in our Buildbot uses the same
file to suppress certain warnings that for some reason cannot be removed or
are not desirable to remove.
See for example:
https://askmonty.org/buildbot/waterfall?branch=5.1
https://askmonty.org/buildbot/builders/hardy-amd64-valgrind/builds/113/step…
So there are still a few warnings that need to be eliminated, patches welcome :-)
Note that old builds from earlier than today still have the old log files,
without these new warning facilities.
Would be great to get us to compile without any warnings. The Drizzle people
already compile with -pedantic -Werror, so we are trailing behind there!
- Kristian.
4
4
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300
+++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300
@@ -5,8 +5,43 @@
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
-Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+Exclusion should be made for old table and key options where
'=' was not obligatory.
+Old key options:
+KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
+WITH PARSER <name> -> PARSER=name
+
+Old table options:
+ENGINE name -> ENGINE=name
+TYPE name -> TYPE=name
+MAX_ROWS num -> MAX_ROWS=num
+MIX_ROWS num -> MIX_ROWS=num
+AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
+PASSWORD string -> PASSWORD=string
+COMMENT string -> COMMENT=string
+AUTO_INCREMENT num -> AUTO_INCREMENT=num
+PACK_KEYS num/default -> PACK_KEYS=num/default
+CHECKSUM num -> CHECKSUM=num
+TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
+PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
+DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
+ROW_FORMAT name -> ROW_FORMAT=name
+INSERT_METHOD name -> INSERT_METHOD=name
+KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
+TRANSACTIONAL num -> TRANSACTIONAL=num
+
+Table options which will be left hardcoded
+UNION
+default charset
+default collation
+DATA DIRECTORY
+TABLESPACE
+STORAGE
+
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
+
+
+
+
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key options where
'=' was not obligatory.
Old key options:
KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
WITH PARSER <name> -> PARSER=name
Old table options:
ENGINE name -> ENGINE=name
TYPE name -> TYPE=name
MAX_ROWS num -> MAX_ROWS=num
MIX_ROWS num -> MIX_ROWS=num
AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
PASSWORD string -> PASSWORD=string
COMMENT string -> COMMENT=string
AUTO_INCREMENT num -> AUTO_INCREMENT=num
PACK_KEYS num/default -> PACK_KEYS=num/default
CHECKSUM num -> CHECKSUM=num
TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
ROW_FORMAT name -> ROW_FORMAT=name
INSERT_METHOD name -> INSERT_METHOD=name
KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
TRANSACTIONAL num -> TRANSACTIONAL=num
Table options which will be left hardcoded
UNION
default charset
default collation
DATA DIRECTORY
TABLESPACE
STORAGE
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300
+++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300
@@ -5,8 +5,43 @@
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
-Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+Exclusion should be made for old table and key options where
'=' was not obligatory.
+Old key options:
+KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
+WITH PARSER <name> -> PARSER=name
+
+Old table options:
+ENGINE name -> ENGINE=name
+TYPE name -> TYPE=name
+MAX_ROWS num -> MAX_ROWS=num
+MIX_ROWS num -> MIX_ROWS=num
+AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
+PASSWORD string -> PASSWORD=string
+COMMENT string -> COMMENT=string
+AUTO_INCREMENT num -> AUTO_INCREMENT=num
+PACK_KEYS num/default -> PACK_KEYS=num/default
+CHECKSUM num -> CHECKSUM=num
+TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
+PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
+DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
+ROW_FORMAT name -> ROW_FORMAT=name
+INSERT_METHOD name -> INSERT_METHOD=name
+KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
+TRANSACTIONAL num -> TRANSACTIONAL=num
+
+Table options which will be left hardcoded
+UNION
+default charset
+default collation
+DATA DIRECTORY
+TABLESPACE
+STORAGE
+
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
+
+
+
+
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key options where
'=' was not obligatory.
Old key options:
KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
WITH PARSER <name> -> PARSER=name
Old table options:
ENGINE name -> ENGINE=name
TYPE name -> TYPE=name
MAX_ROWS num -> MAX_ROWS=num
MIX_ROWS num -> MIX_ROWS=num
AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
PASSWORD string -> PASSWORD=string
COMMENT string -> COMMENT=string
AUTO_INCREMENT num -> AUTO_INCREMENT=num
PACK_KEYS num/default -> PACK_KEYS=num/default
CHECKSUM num -> CHECKSUM=num
TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
ROW_FORMAT name -> ROW_FORMAT=name
INSERT_METHOD name -> INSERT_METHOD=name
KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
TRANSACTIONAL num -> TRANSACTIONAL=num
Table options which will be left hardcoded
UNION
default charset
default collation
DATA DIRECTORY
TABLESPACE
STORAGE
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300
+++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300
@@ -5,8 +5,43 @@
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
-Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+Exclusion should be made for old table and key options where
'=' was not obligatory.
+Old key options:
+KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
+WITH PARSER <name> -> PARSER=name
+
+Old table options:
+ENGINE name -> ENGINE=name
+TYPE name -> TYPE=name
+MAX_ROWS num -> MAX_ROWS=num
+MIX_ROWS num -> MIX_ROWS=num
+AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
+PASSWORD string -> PASSWORD=string
+COMMENT string -> COMMENT=string
+AUTO_INCREMENT num -> AUTO_INCREMENT=num
+PACK_KEYS num/default -> PACK_KEYS=num/default
+CHECKSUM num -> CHECKSUM=num
+TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
+PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
+DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
+ROW_FORMAT name -> ROW_FORMAT=name
+INSERT_METHOD name -> INSERT_METHOD=name
+KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
+TRANSACTIONAL num -> TRANSACTIONAL=num
+
+Table options which will be left hardcoded
+UNION
+default charset
+default collation
+DATA DIRECTORY
+TABLESPACE
+STORAGE
+
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
+
+
+
+
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key options where
'=' was not obligatory.
Old key options:
KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num
WITH PARSER <name> -> PARSER=name
Old table options:
ENGINE name -> ENGINE=name
TYPE name -> TYPE=name
MAX_ROWS num -> MAX_ROWS=num
MIX_ROWS num -> MIX_ROWS=num
AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num
PASSWORD string -> PASSWORD=string
COMMENT string -> COMMENT=string
AUTO_INCREMENT num -> AUTO_INCREMENT=num
PACK_KEYS num/default -> PACK_KEYS=num/default
CHECKSUM num -> CHECKSUM=num
TABLE_CHECKSUM num -> TABLE_CHECKSUM=num
PAGE_CHECKSUM num -> PAGE_CHECKSUM=num
DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num
ROW_FORMAT name -> ROW_FORMAT=name
INSERT_METHOD name -> INSERT_METHOD=name
KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num
TRANSACTIONAL num -> TRANSACTIONAL=num
Table options which will be left hardcoded
UNION
default charset
default collation
DATA DIRECTORY
TABLESPACE
STORAGE
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
'=' was not obligatory.
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
'=' was not obligatory.
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
-=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300
+++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300
@@ -1 +1,12 @@
+Table definition ca looks like following
+CREATE TABLE table
+ (field int ... field_opt1=fval1 field_opt2=fval2,
+ key key1(field) key_opt1=kval1 key_opt2=kval2)
+ table_option1=tval1, table_option2=tval2;
+
+Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
+'=' was not obligatory.
+
+For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
+be separated from them by '=' sign.
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
HIGH-LEVEL SPECIFICATION:
Table definition ca looks like following
CREATE TABLE table
(field int ... field_opt1=fval1 field_opt2=fval2,
key key1(field) key_opt1=kval1 key_opt2=kval2)
table_option1=tval1, table_option2=tval2;
Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where
'=' was not obligatory.
For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can
be separated from them by '=' sign.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Hi!
I am copying to maria-developers@ to ensure that everyone has a change
to answer...
>>>>> "Patrick" == Patrick Galbraith <patg(a)patg.net> writes:
Patrick> Monty,
Patrick> I saw your message in IRC - I replied in case you don't see it . I want
Patrick> to get this into the tree soon and am only having a small problem right now:
Patrick> [08:06] <CaptTofu> montywi: I am striving to
Patrick> [08:08] <CaptTofu> montywi: I just have one issue to solve - if the
Patrick> engine is build as a plugin, how I can get the test to run. right now,
Patrick> when it runs, it doesn't find the engine loaded, so it skips the test. I
Patrick> tried to add a 'load plugin' to the test, but it can find the shared
Patrick> library because it expects it to be in "(errno: 2
Patrick> dlopen(/Users/patg/code_devel/federated/lib/mysql/plugin/ha_federatedx.so,
Patrick> 2): image not found)"
We should probably try to fix that for the test suite.
Kristian, do you have any ideas for this ?
Patrick> So, I'm wondering if to test properly, one needs to compile the engine
Patrick> into the server versus as a plugin?
Yes, that is what you need to do (as far as I know).
Regards,
Monty
4
4
[Maria-developers] New (by Sanja): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Sanja): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Sanja): options for CREATE TABLE (43)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: options for CREATE TABLE
CREATION DATE..: Tue, 11 Aug 2009, 17:02
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....: Sanja
COPIES TO......: Monty
CATEGORY.......: Server-BackLog
TASK ID........: 43 (http://askmonty.org/worklog/?tid=43)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 32 (hours remain)
ORIG. ESTIMATE.: 32
PROGRESS NOTES:
DESCRIPTION:
Add ability to create table with additional option which can be passed to engine.
Also make current options such as TRANSACTIONAL working via this mechanism.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Monty): Backporting pool of threads to MariaDB (6)
by worklog-noreply@askmonty.org 11 Aug '09
by worklog-noreply@askmonty.org 11 Aug '09
11 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Backporting pool of threads to MariaDB
CREATION DATE..: Mon, 09 Mar 2009, 17:21
SUPERVISOR.....: Monty
IMPLEMENTOR....: Monty
COPIES TO......: Monty
CATEGORY.......: Server-Sprint
TASK ID........: 6 (http://askmonty.org/worklog/?tid=6)
VERSION........: Server-5.1
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 16
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 8
PROGRESS NOTES:
-=-=(Monty - Tue, 11 Aug 2009, 16:58)=-=-
Done, ages ago
Worked 16 hours and estimate 0 hours remain (original estimate increased by 8 hours).
-=-=(Monty - Thu, 26 Mar 2009, 00:32)=-=-
Privacy level updated.
--- /tmp/wklog.6.old.6586 2009-03-26 00:32:23.000000000 +0200
+++ /tmp/wklog.6.new.6586 2009-03-26 00:32:23.000000000 +0200
@@ -1 +1 @@
-y
+n
-=-=(Monty - Thu, 26 Mar 2009, 00:31)=-=-
Supervisor updated.
--- /tmp/wklog.6.old.6580 2009-03-26 00:31:30.000000000 +0200
+++ /tmp/wklog.6.new.6580 2009-03-26 00:31:30.000000000 +0200
@@ -1 +1 @@
-Knielsen
+Monty
-=-=(Monty - Fri, 13 Mar 2009, 02:43)=-=-
Low Level Design modified.
--- /tmp/wklog.6.old.26076 2009-03-13 02:43:17.000000000 +0200
+++ /tmp/wklog.6.new.26076 2009-03-13 02:43:17.000000000 +0200
@@ -1 +1,20 @@
+To be able to work with both one-thread-per-connection and pool-of-threads at
+the same time, I added a new global scheduler variable 'extra_thread_scheduler'
+that is always using the one-thread-per-connection method.
+
+To the THD structure was added a pointer to the 'scheduler' variable that should
+be used for this connection.
+
+To do easy handing of two connect counter and two max_connection variables, I
+added pointer to these pointer in the scheduler variable.:
+
+Other changes was:
+
+- If extra-port was <> 0, start listing to this port too
+- At connect time, set THD->scheduler to point to the given scheduler (based on
+the port that was used to connect)
+- Change some calls that was done trough functions pointer in the scheduler to
+instead use thd->scheduler->
+- Change max_connections to *thd->scheduler->max_connections
+- Change connection_count to *thd->scheduler->connection_count
-=-=(Monty - Fri, 13 Mar 2009, 02:29)=-=-
Version updated.
--- /tmp/wklog.6.old.25818 2009-03-13 02:29:16.000000000 +0200
+++ /tmp/wklog.6.new.25818 2009-03-13 02:29:16.000000000 +0200
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Monty - Fri, 13 Mar 2009, 02:29)=-=-
Status updated.
--- /tmp/wklog.6.old.25818 2009-03-13 02:29:16.000000000 +0200
+++ /tmp/wklog.6.new.25818 2009-03-13 02:29:16.000000000 +0200
@@ -1 +1 @@
-Assigned
+Complete
-=-=(Monty - Fri, 13 Mar 2009, 02:28)=-=-
High Level Description modified.
--- /tmp/wklog.6.old.25790 2009-03-13 02:28:25.000000000 +0200
+++ /tmp/wklog.6.new.25790 2009-03-13 02:28:25.000000000 +0200
@@ -8,3 +8,6 @@
Add option --extra-port to allow connections with old one-thread-per-connection
method. This is needed to allow root to login and kill threads if something
goes wrong.
+Add option --extra-max-connections to regulate how many connections can be made
+to 'extra-port'. This should work in a similar way as 'max-connections', in the
+way that one connection is reserved for a SUPER user.
-=-=(Knielsen - Mon, 09 Mar 2009, 19:02)=-=-
Version updated.
--- /tmp/wklog.6.old.10740 2009-03-09 19:02:38.000000000 +0200
+++ /tmp/wklog.6.new.10740 2009-03-09 19:02:38.000000000 +0200
@@ -1 +1 @@
-WorkLog-3.4
+Server-9.x
-=-=(Knielsen - Mon, 09 Mar 2009, 19:02)=-=-
Title modified.
--- /tmp/wklog.6.old.10740 2009-03-09 19:02:38.000000000 +0200
+++ /tmp/wklog.6.new.10740 2009-03-09 19:02:38.000000000 +0200
@@ -1 +1 @@
-Backporting pool of threads tro MariaDB
+Backporting pool of threads to MariaDB
DESCRIPTION:
Back porting pool of threads to MariaDB
We will use code for Maria 6.0, with the following extensions:
Add option: --test-ignore-wrong-options to ignore errors in enum values for
testing pool-of-threads. (Better than having --pool-of-threads command line
option just for testing)
Add option --extra-port to allow connections with old one-thread-per-connection
method. This is needed to allow root to login and kill threads if something
goes wrong.
Add option --extra-max-connections to regulate how many connections can be made
to 'extra-port'. This should work in a similar way as 'max-connections', in the
way that one connection is reserved for a SUPER user.
LOW-LEVEL DESIGN:
To be able to work with both one-thread-per-connection and pool-of-threads at
the same time, I added a new global scheduler variable 'extra_thread_scheduler'
that is always using the one-thread-per-connection method.
To the THD structure was added a pointer to the 'scheduler' variable that should
be used for this connection.
To do easy handing of two connect counter and two max_connection variables, I
added pointer to these pointer in the scheduler variable.:
Other changes was:
- If extra-port was <> 0, start listing to this port too
- At connect time, set THD->scheduler to point to the given scheduler (based on
the port that was used to connect)
- Change some calls that was done trough functions pointer in the scheduler to
instead use thd->scheduler->
- Change max_connections to *thd->scheduler->max_connections
- Change connection_count to *thd->scheduler->connection_count
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Monty): Add Sphinx storage engine to MariaDB (42)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add Sphinx storage engine to MariaDB
CREATION DATE..: Mon, 10 Aug 2009, 23:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Maria-BackLog
TASK ID........: 42 (http://askmonty.org/worklog/?tid=42)
VERSION........: Connector/.NET-5.1
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 16 (hours remain)
ORIG. ESTIMATE.: 16
PROGRESS NOTES:
DESCRIPTION:
Add the Sphinx storage engine to the MariaDB tree
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Hi!
For those that don't know what this is about:
- This is a fix for the case where you do a DROP TABLE of a MyISAM
table with key delayed MyISAM writes all the changed key pages for the
file to disk before closing and then deleting the table.
This patch is a first attempt to fix that we don't write the key pages
in case of drop.
>>>>> "Oleksandr" == Oleksandr Byelkin <Oleksandr> writes:
Oleksandr> Hi!
Oleksandr> I made different patch from you suggested maybe I am wrong but IMHO it
Oleksandr> is better because:
Oleksandr> 1) use already existing keycache calls
Oleksandr> 2) do not require additional finding table MI_INFO by name and locking
Oleksandr> around it
Oleksandr> The main idea was that if we are going to drop table it can be passed
Oleksandr> via existing table descriptors to the place where we call flush and it
Oleksandr> does not matter if other threads trying to do something with the table
Oleksandr> we will drop it in any case.
Oleksandr> === modified file 'sql/handler.h'
Oleksandr> --- sql/handler.h 2009-06-29 21:03:30 +0000
Oleksandr> +++ sql/handler.h 2009-08-09 19:52:01 +0000
Oleksandr> @@ -1342,6 +1342,7 @@ public:
Oleksandr> virtual void column_bitmaps_signal();
Oleksandr> uint get_index(void) const { return active_index; }
Oleksandr> virtual int close(void)=0;
Oleksandr> + virtual void prepare_for_delete() {}
Oleksandr> /**
Oleksandr> @retval 0 Bulk update used by handler
Why using prepare_for_delete(), instead of adding a more general call
?
Oleksandr> === modified file 'sql/lock.cc'
Oleksandr> --- sql/lock.cc 2009-04-25 10:05:32 +0000
Oleksandr> +++ sql/lock.cc 2009-08-09 22:08:41 +0000
Oleksandr> @@ -1049,10 +1049,12 @@ int lock_table_name(THD *thd, TABLE_LIST
Oleksandr> DBUG_RETURN(-1);
Oleksandr> table_list->table=table;
Oleksandr> + table->s->deleting= table_list->deleting;
Oleksandr> /* Return 1 if table is in use */
Oleksandr> DBUG_RETURN(test(remove_table_from_cache(thd, db, table_list->table_name,
Oleksandr> - check_in_use ? RTFC_NO_FLAG : RTFC_WAIT_OTHER_THREAD_FLAG)));
Oleksandr> + check_in_use ? RTFC_NO_FLAG : RTFC_WAIT_OTHER_THREAD_FLAG,
Oleksandr> + table_list->deleting)));
Oleksandr> }
Oleksandr> === modified file 'sql/mysql_priv.h'
Oleksandr> --- sql/mysql_priv.h 2009-04-25 10:05:32 +0000
Oleksandr> +++ sql/mysql_priv.h 2009-08-09 21:51:48 +0000
Oleksandr> @@ -1609,7 +1609,7 @@ uint prep_alter_part_table(THD *thd, TAB
Oleksandr> #define RTFC_WAIT_OTHER_THREAD_FLAG 0x0002
Oleksandr> #define RTFC_CHECK_KILLED_FLAG 0x0004
Oleksandr> bool remove_table_from_cache(THD *thd, const char *db, const char *table,
Oleksandr> - uint flags);
Oleksandr> + uint flags, my_bool deleting);
Oleksandr> #define NORMAL_PART_NAME 0
Oleksandr> #define TEMP_PART_NAME 1
Oleksandr> === modified file 'sql/sql_base.cc'
Oleksandr> --- sql/sql_base.cc 2009-05-19 09:28:05 +0000
Oleksandr> +++ sql/sql_base.cc 2009-08-09 21:54:54 +0000
Oleksandr> @@ -927,7 +927,7 @@ bool close_cached_tables(THD *thd, TABLE
Oleksandr> for (TABLE_LIST *table= tables; table; table= table->next_local)
Oleksandr> {
Oleksandr> if (remove_table_from_cache(thd, table->db, table->table_name,
Oleksandr> - RTFC_OWNED_BY_THD_FLAG))
Oleksandr> + RTFC_OWNED_BY_THD_FLAG, table->deleting))
Oleksandr> found=1;
Oleksandr> }
Oleksandr> if (!found)
Oleksandr> @@ -8395,7 +8395,7 @@ void flush_tables()
Oleksandr> */
Oleksandr> bool remove_table_from_cache(THD *thd, const char *db, const char *table_name,
Oleksandr> - uint flags)
Oleksandr> + uint flags, my_bool deleting)
Oleksandr> {
Oleksandr> char key[MAX_DBKEY_LENGTH];
Oleksandr> uint key_length;
Oleksandr> @@ -8482,7 +8482,10 @@ bool remove_table_from_cache(THD *thd, c
Oleksandr> }
Oleksandr> }
Oleksandr> while (unused_tables && !unused_tables->s->version)
Oleksandr> + {
Oleksandr> + unused_tables->s->deleting= deleting;
Oleksandr> VOID(hash_delete(&open_cache,(uchar*) unused_tables));
Oleksandr> + }
Oleksandr> DBUG_PRINT("info", ("Removing table from table_def_cache"));
Oleksandr> /* Remove table from table definition cache if it's not in use */
Oleksandr> @@ -8676,7 +8679,8 @@ int abort_and_upgrade_lock(ALTER_PARTITI
Oleksandr> /* If MERGE child, forward lock handling to parent. */
Oleksandr> mysql_lock_abort(lpt->thd, lpt->table->parent ? lpt->table->parent :
Oleksandr> lpt->table, TRUE);
Oleksandr> - VOID(remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name, flags));
Oleksandr> + VOID(remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name, flags,
Oleksandr> + FALSE));
Oleksandr> VOID(pthread_mutex_unlock(&LOCK_open));
Oleksandr> DBUG_RETURN(0);
Oleksandr> }
Oleksandr> @@ -8701,7 +8705,7 @@ void close_open_tables_and_downgrade(ALT
Oleksandr> {
Oleksandr> VOID(pthread_mutex_lock(&LOCK_open));
Oleksandr> remove_table_from_cache(lpt->thd, lpt->db, lpt->table_name,
Oleksandr> - RTFC_WAIT_OTHER_THREAD_FLAG);
Oleksandr> + RTFC_WAIT_OTHER_THREAD_FLAG, FALSE);
Oleksandr> VOID(pthread_mutex_unlock(&LOCK_open));
Oleksandr> /* If MERGE child, forward lock handling to parent. */
Oleksandr> mysql_lock_downgrade_write(lpt->thd, lpt->table->parent ? lpt->table->parent :
Oleksandr> === modified file 'sql/sql_table.cc'
Oleksandr> --- sql/sql_table.cc 2009-06-18 12:39:21 +0000
Oleksandr> +++ sql/sql_table.cc 2009-08-09 21:48:04 +0000
Oleksandr> @@ -1599,6 +1599,8 @@ int mysql_rm_table_part2(THD *thd, TABLE
Oleksandr> if ((share= get_cached_table_share(table->db, table->table_name)))
Oleksandr> table->db_type= share->db_type();
Oleksandr> + table->deleting= TRUE;
Oleksandr> +
Oleksandr> /* Disable drop of enabled log tables */
Oleksandr> if (share && (share->table_category == TABLE_CATEGORY_PERFORMANCE) &&
Oleksandr> check_if_log_table(table->db_length, table->db,
Oleksandr> @@ -1676,7 +1678,7 @@ int mysql_rm_table_part2(THD *thd, TABLE
Oleksandr> abort_locked_tables(thd, db, table->table_name);
Oleksandr> remove_table_from_cache(thd, db, table->table_name,
Oleksandr> RTFC_WAIT_OTHER_THREAD_FLAG |
Oleksandr> - RTFC_CHECK_KILLED_FLAG);
Oleksandr> + RTFC_CHECK_KILLED_FLAG, TRUE);
Oleksandr> /*
Oleksandr> If the table was used in lock tables, remember it so that
Oleksandr> unlock_table_names can free it
Oleksandr> @@ -3862,7 +3864,7 @@ void wait_while_table_is_used(THD *thd,T
Oleksandr> /* Wait until all there are no other threads that has this table open */
Oleksandr> remove_table_from_cache(thd, table->s->db.str,
Oleksandr> table->s->table_name.str,
Oleksandr> - RTFC_WAIT_OTHER_THREAD_FLAG);
Oleksandr> + RTFC_WAIT_OTHER_THREAD_FLAG, FALSE);
Oleksandr> /* extra() call must come only after all instances above are closed */
Oleksandr> VOID(table->file->extra(function));
Oleksandr> DBUG_VOID_RETURN;
Oleksandr> @@ -4366,7 +4368,7 @@ static bool mysql_admin_table(THD* thd,
Oleksandr> remove_table_from_cache(thd, table->table->s->db.str,
Oleksandr> table->table->s->table_name.str,
Oleksandr> RTFC_WAIT_OTHER_THREAD_FLAG |
Oleksandr> - RTFC_CHECK_KILLED_FLAG);
Oleksandr> + RTFC_CHECK_KILLED_FLAG, FALSE);
Oleksandr> thd->exit_cond(old_message);
Oleksandr> DBUG_EXECUTE_IF("wait_in_mysql_admin_table", wait_for_kill_signal(thd););
Oleksandr> if (thd->killed)
Oleksandr> @@ -4624,7 +4626,8 @@ send_result_message:
Oleksandr> {
Oleksandr> pthread_mutex_lock(&LOCK_open);
Oleksandr> remove_table_from_cache(thd, table->table->s->db.str,
Oleksandr> - table->table->s->table_name.str, RTFC_NO_FLAG);
Oleksandr> + table->table->s->table_name.str,
Oleksandr> + RTFC_NO_FLAG, FALSE);
Oleksandr> pthread_mutex_unlock(&LOCK_open);
Oleksandr> }
Oleksandr> /* May be something modified consequently we have to invalidate cache */
Oleksandr> === modified file 'sql/table.cc'
Oleksandr> --- sql/table.cc 2009-06-29 21:03:30 +0000
Oleksandr> +++ sql/table.cc 2009-08-09 20:46:07 +0000
Oleksandr> @@ -1960,7 +1960,12 @@ int closefrm(register TABLE *table, bool
Oleksandr> DBUG_PRINT("enter", ("table: 0x%lx", (long) table));
Oleksandr> if (table->db_stat)
Oleksandr> - error=table->file->close();
Oleksandr> + {
Oleksandr> + if (table->s->deleting)
Oleksandr> + table->file->prepare_for_delete();
Oleksandr> + error= table->file->close();
Oleksandr> + }
Oleksandr> +
As we have a handler here, we not instead do ?
table->file->extra(HA_EXTRA_PREPARE_FOR_DROP);
There is no reason to add an extra prepare_for_delete() here.
Oleksandr> --- storage/myisam/ha_myisam.cc 2009-06-29 21:03:30 +0000
Oleksandr> +++ storage/myisam/ha_myisam.cc 2009-08-09 20:42:15 +0000
Oleksandr> @@ -26,7 +26,9 @@
Oleksandr> #include <myisampack.h>
Oleksandr> #include "ha_myisam.h"
Oleksandr> #include <stdarg.h>
Oleksandr> +C_MODE_START
Oleksandr> #include "myisamdef.h"
Oleksandr> +C_MODE_END
Oleksandr> #include "rt_index.h"
With my suggested change, no reason to do any changes in ha_myisam.cc
or ha_myisam.h
<cut>
Oleksandr> +++ storage/myisam/mi_close.c 2009-08-09 22:01:32 +0000
Oleksandr> @@ -65,8 +65,9 @@ int mi_close(register MI_INFO *info)
Oleksandr> {
Oleksandr> if (share->kfile >= 0 &&
Oleksandr> flush_key_blocks(share->key_cache, share->kfile,
Oleksandr> - share->temporary ? FLUSH_IGNORE_CHANGED :
Oleksandr> - FLUSH_RELEASE))
Oleksandr> + (share->temporary || share->deleting) ?
Oleksandr> + FLUSH_IGNORE_CHANGED :
Oleksandr> + FLUSH_RELEASE))
Oleksandr> error=my_errno;
Oleksandr> if (share->kfile >= 0)
Oleksandr> {
No reason for the above change.
1) In my suggestion, no reason to do this.
2) If we implement it your way, we could reuse 'share->temporary' for
this cse.
Oleksandr> === modified file 'storage/myisam/mi_locking.c'
Oleksandr> --- storage/myisam/mi_locking.c 2009-04-01 09:34:52 +0000
Oleksandr> +++ storage/myisam/mi_locking.c 2009-08-09 20:42:00 +0000
Oleksandr> @@ -68,7 +68,10 @@ int mi_lock_database(MI_INFO *info, int
Oleksandr> --share->tot_locks;
Oleksandr> if (info->lock_type == F_WRLCK && !share->w_locks &&
Oleksandr> !share->delay_key_write && flush_key_blocks(share->key_cache,
Oleksandr> - share->kfile,FLUSH_KEEP))
Oleksandr> + share->kfile,
Oleksandr> + (share->deleting ?
Oleksandr> + FLUSH_IGNORE_CHANGED :
Oleksandr> + FLUSH_KEEP)))
No reason to do the above. Reasons:
- In case of delay_key_write, they above code will not be executed.
- If delay_key_write is not set, things was flushed at previous
statement.
Did I miss some case?
Oleksandr> --- storage/myisam/myisamdef.h 2009-04-25 09:04:38 +0000
Oleksandr> +++ storage/myisam/myisamdef.h 2009-08-09 20:41:25 +0000
Oleksandr> @@ -218,6 +218,7 @@ typedef struct st_mi_isam_share
Oleksandr> my_bool changed, /* If changed since lock */
Oleksandr> global_changed, /* If changed since open */
Oleksandr> not_flushed, temporary, delay_key_write, concurrent_insert;
Oleksandr> + my_bool deleting; /* we are going to delete this table */
Not needed.
---------------
Other fixes:
Please fix mi_extra.c as we dicussed (move the #ifdef so that things
are flushed)
do also the folloing fix to ma_extra.c:
if (share->kfile.file >= 0)
_ma_decrement_open_count(info);
->
if (share->kfile.file >= 0 && do_flush)
_ma_decrement_open_count(info);
The idea is that we should not decrement the open_count in case of
drop. This will ensure that if we die between flushing the key cache
and close, the index will be rechecked.
-------------
Regards,
Monty
1
0
[Maria-developers] Progress (by Guest): Replication tasks (39)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 17
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=-
Adding 1 hour for Monty's initial work on starting the architecture review.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 38
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Replication tasks (39)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 17
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=-
Adding 1 hour for Monty's initial work on starting the architecture review.
Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour).
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 38
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Psergey): Replication tasks (39)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 16
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 38
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Psergey): Replication tasks (39)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 16
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=-
Re-searched and added subtasks.
Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours).
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 38
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300
+++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300
@@ -2,3 +2,10 @@
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
+
+If we chose the second, we'll have to perform certain normalization before
+matching the patterns:
+ - Remove all comments from the command
+ - Remove all pre-space
+ - Compare the string case-insensitively
+ - etc
-=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300
+++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300
@@ -1 +1,4 @@
-
+The implementation will depend on design choices made in WL#40:
+- If we decide to parse the statement, SQL-verb filtering will be trivial
+- If we decide not to parse the statement, we still can reliably distinguish the
+statement by matching the first characters against a set of patterns.
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
HIGH-LEVEL SPECIFICATION:
The implementation will depend on design choices made in WL#40:
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
If we chose the second, we'll have to perform certain normalization before
matching the patterns:
- Remove all comments from the command
- Remove all pre-space
- Compare the string case-insensitively
- etc
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300
+++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300
@@ -2,3 +2,10 @@
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
+
+If we chose the second, we'll have to perform certain normalization before
+matching the patterns:
+ - Remove all comments from the command
+ - Remove all pre-space
+ - Compare the string case-insensitively
+ - etc
-=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300
+++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300
@@ -1 +1,4 @@
-
+The implementation will depend on design choices made in WL#40:
+- If we decide to parse the statement, SQL-verb filtering will be trivial
+- If we decide not to parse the statement, we still can reliably distinguish the
+statement by matching the first characters against a set of patterns.
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
HIGH-LEVEL SPECIFICATION:
The implementation will depend on design choices made in WL#40:
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
If we chose the second, we'll have to perform certain normalization before
matching the patterns:
- Remove all comments from the command
- Remove all pre-space
- Compare the string case-insensitively
- etc
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300
+++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300
@@ -1,5 +1,7 @@
Context
-------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300
+++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300
@@ -1,5 +1,7 @@
Context
-------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300
+++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300
@@ -1,6 +1,7 @@
-
1. Context
----------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High Level Description modified.
--- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300
+++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300
@@ -1,3 +1,4 @@
Replication slave can be set to filter updates to certain tables with
---replicate-[wild-]{do,ignore}-table options. This task is about adding similar
-functionality to mysqlbinlog.
+--replicate-[wild-]{do,ignore}-table options.
+
+This task is about adding similar functionality to mysqlbinlog.
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options.
This task is about adding similar functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300
+++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300
@@ -1,6 +1,7 @@
-
1. Context
----------
+(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
+overview)
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
-=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=-
Dependency created: 39 now depends on 40
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High Level Description modified.
--- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300
+++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300
@@ -1,3 +1,4 @@
Replication slave can be set to filter updates to certain tables with
---replicate-[wild-]{do,ignore}-table options. This task is about adding similar
-functionality to mysqlbinlog.
+--replicate-[wild-]{do,ignore}-table options.
+
+This task is about adding similar functionality to mysqlbinlog.
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options.
This task is about adding similar functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global
overview)
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300
+++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300
@@ -1 +1,4 @@
-
+The implementation will depend on design choices made in WL#40:
+- If we decide to parse the statement, SQL-verb filtering will be trivial
+- If we decide not to parse the statement, we still can reliably distinguish the
+statement by matching the first characters against a set of patterns.
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
HIGH-LEVEL SPECIFICATION:
The implementation will depend on design choices made in WL#40:
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=-
High-Level Specification modified.
--- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300
+++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300
@@ -1 +1,4 @@
-
+The implementation will depend on design choices made in WL#40:
+- If we decide to parse the statement, SQL-verb filtering will be trivial
+- If we decide not to parse the statement, we still can reliably distinguish the
+statement by matching the first characters against a set of patterns.
-=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=-
Dependency created: 39 now depends on 41
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
HIGH-LEVEL SPECIFICATION:
The implementation will depend on design choices made in WL#40:
- If we decide to parse the statement, SQL-verb filtering will be trivial
- If we decide not to parse the statement, we still can reliably distinguish the
statement by matching the first characters against a set of patterns.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter certain kinds of statements
CREATION DATE..: Mon, 10 Aug 2009, 15:30
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 41 (http://askmonty.org/worklog/?tid=41)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax
subject to discussion):
mysqlbinlog --exclude='alter table,drop table,alter database,...'
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High Level Description modified.
--- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300
+++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300
@@ -1,3 +1,4 @@
Replication slave can be set to filter updates to certain tables with
---replicate-[wild-]{do,ignore}-table options. This task is about adding similar
-functionality to mysqlbinlog.
+--replicate-[wild-]{do,ignore}-table options.
+
+This task is about adding similar functionality to mysqlbinlog.
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options.
This task is about adding similar functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High Level Description modified.
--- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300
+++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300
@@ -1,3 +1,4 @@
Replication slave can be set to filter updates to certain tables with
---replicate-[wild-]{do,ignore}-table options. This task is about adding similar
-functionality to mysqlbinlog.
+--replicate-[wild-]{do,ignore}-table options.
+
+This task is about adding similar functionality to mysqlbinlog.
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options.
This task is about adding similar functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options. This task is about adding similar
functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+ --replicate-do-table=db.tbl
+ --replicate-ignore-table=db.tbl
+ --replicate-wild-do-table=pattern.pattern
+ --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog.
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+ UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
+
+and further processing in mysqlbinlog will be trivial.
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options. This task is about adding similar
functionality to mysqlbinlog.
HIGH-LEVEL SPECIFICATION:
1. Context
----------
At the moment, the server has these replication slave options:
--replicate-do-table=db.tbl
--replicate-ignore-table=db.tbl
--replicate-wild-do-table=pattern.pattern
--replicate-wild-ignore-table=pattern.pattern
They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.
What is interesting is that this scheme still allows to update the ignored
table through a VIEW.
2. Table filtering in mysqlbinlog
---------------------------------
Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).
Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.
This opens possible options:
2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using --replicate-* slave options.
(It is not clear how much effort is needed to put the parser into mysqlbinlog.
Any guesses?)
2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression
CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..
The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.
(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from
[not hostile] developers. Remove also all string constants before attempting
to do match?)
2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query
UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
the binlog will have
/* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ...
and further processing in mysqlbinlog will be trivial.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Knielsen): Using the Valgrind API in mysqld (23)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Using the Valgrind API in mysqld
CREATION DATE..: Fri, 22 May 2009, 11:43
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 23 (http://askmonty.org/worklog/?tid=23)
VERSION........: Server-5.1
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 40 (hours remain)
ORIG. ESTIMATE.: 40
PROGRESS NOTES:
-=-=(Knielsen - Mon, 10 Aug 2009, 14:27)=-=-
Low Level Design modified.
--- /tmp/wklog.23.old.16018 2009-08-10 14:27:09.000000000 +0300
+++ /tmp/wklog.23.new.16018 2009-08-10 14:27:09.000000000 +0300
@@ -5,3 +5,5 @@
- sql/item_strfunc.cc (Item_func_compress).
+Another good place is in the TRASH_MEM macro.
+
-=-=(Knielsen - Wed, 24 Jun 2009, 15:55)=-=-
Supervisor updated.
--- /tmp/wklog.23.old.944 2009-06-24 15:55:57.000000000 +0300
+++ /tmp/wklog.23.new.944 2009-06-24 15:55:57.000000000 +0300
@@ -1 +1 @@
-Knielsen
+Monty
-=-=(Knielsen - Wed, 24 Jun 2009, 15:53)=-=-
Version updated.
--- /tmp/wklog.23.old.911 2009-06-24 15:53:32.000000000 +0300
+++ /tmp/wklog.23.new.911 2009-06-24 15:53:32.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Server-5.1
-=-=(Knielsen - Wed, 24 Jun 2009, 15:52)=-=-
Version updated.
--- /tmp/wklog.23.old.897 2009-06-24 15:52:43.000000000 +0300
+++ /tmp/wklog.23.new.897 2009-06-24 15:52:43.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-2.1
+Maria-1.0
-=-=(Knielsen - Wed, 24 Jun 2009, 15:52)=-=-
Version updated.
--- /tmp/wklog.23.old.895 2009-06-24 15:52:28.000000000 +0300
+++ /tmp/wklog.23.new.895 2009-06-24 15:52:28.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Connector/.NET-2.1
-=-=(Knielsen - Wed, 24 Jun 2009, 15:35)=-=-
Version updated.
--- /tmp/wklog.23.old.32742 2009-06-24 15:35:48.000000000 +0300
+++ /tmp/wklog.23.new.32742 2009-06-24 15:35:48.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+Maria-1.0
-=-=(Knielsen - Fri, 22 May 2009, 14:31)=-=-
Low Level Design modified.
--- /tmp/wklog.23.old.24587 2009-05-22 14:31:52.000000000 +0300
+++ /tmp/wklog.23.new.24587 2009-05-22 14:31:52.000000000 +0300
@@ -1 +1,7 @@
+Two places where we call into libz, and where checking for defined parameters
+would be good:
+
+ - mysys/my_compress.c
+
+ - sql/item_strfunc.cc (Item_func_compress).
-=-=(Guest - Fri, 22 May 2009, 12:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.23.old.18061 2009-05-22 12:04:05.000000000 +0300
+++ /tmp/wklog.23.new.18061 2009-05-22 12:04:05.000000000 +0300
@@ -26,3 +26,5 @@
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
+
+See also a patch here: http://bugs.mysql.com/bug.php?id=44582
-=-=(Knielsen - Fri, 22 May 2009, 11:52)=-=-
High-Level Specification modified.
--- /tmp/wklog.23.old.17628 2009-05-22 11:52:33.000000000 +0300
+++ /tmp/wklog.23.new.17628 2009-05-22 11:52:33.000000000 +0300
@@ -1 +1,28 @@
+With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
+a memory block is allocated (so that data read from memory is marked as undefined
+instead of being defined or not at random depending on prior use); and when a
+memory block is freed (so that use after freeing can be reported as an error).
+In some cases cheking for leaks may also be appropriate.
+
+Another possibility is to add an explicit check for whether memory is defined.
+
+One place this would be useful is when calling libz. Due to the design of that
+library, Valgrind produces lots of false alarms about using undefined values
+(I think the issue is that it runs a few bytes off of initialized memory to
+reduce boundary checks in each loop iteration, then after the loop has checks to
+avoid using the undefined part of the result). This means we have lots of libz
+Valgrind suppressions and continue to add more as new warnings surface. So we
+might easily miss a real problem in this area. This could be improved by adding
+explicit checks at the call to libz functions that the passed memory is properly
+defined.
+
+Another use is to improve debugging. It is often the case when debugging a
+warning about using un-initialised memory that the detection happens long after
+the real problem, the un-initialized value being passed along through the code
+for a long time before being detected. This makes debugging the problem slow.
+
+By adding in strategic places code that asserts that a specific value must be
+initialised, it is possible to detect problems earlier, speeding up debugging.
+Such code can be added in more places over time as development and debugging
+goes on.
DESCRIPTION:
Valgrind (the memcheck tool) has some very useful APIs that can be used in mysqld
when testing with Valgrind to improve testing and/or debugging:
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.clientreqs
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.mempools
This worklog is about adding configure checks and headers to allow to use these
in a way that continues to work on machines where the Valgrind headers or
functionality is missing.
It also includes adding some basic Valgrind enhancements:
- Adding Valgrind annotations to custom memory allocators so that Valgrind can
detect leaks, use-before-init, and use-after-free problems also for these
allocators.
- Adding checks for definedness in appropriate places (eg. when calling libz).
HIGH-LEVEL SPECIFICATION:
With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
a memory block is allocated (so that data read from memory is marked as undefined
instead of being defined or not at random depending on prior use); and when a
memory block is freed (so that use after freeing can be reported as an error).
In some cases cheking for leaks may also be appropriate.
Another possibility is to add an explicit check for whether memory is defined.
One place this would be useful is when calling libz. Due to the design of that
library, Valgrind produces lots of false alarms about using undefined values
(I think the issue is that it runs a few bytes off of initialized memory to
reduce boundary checks in each loop iteration, then after the loop has checks to
avoid using the undefined part of the result). This means we have lots of libz
Valgrind suppressions and continue to add more as new warnings surface. So we
might easily miss a real problem in this area. This could be improved by adding
explicit checks at the call to libz functions that the passed memory is properly
defined.
Another use is to improve debugging. It is often the case when debugging a
warning about using un-initialised memory that the detection happens long after
the real problem, the un-initialized value being passed along through the code
for a long time before being detected. This makes debugging the problem slow.
By adding in strategic places code that asserts that a specific value must be
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
See also a patch here: http://bugs.mysql.com/bug.php?id=44582
LOW-LEVEL DESIGN:
Two places where we call into libz, and where checking for defined parameters
would be good:
- mysys/my_compress.c
- sql/item_strfunc.cc (Item_func_compress).
Another good place is in the TRASH_MEM macro.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Knielsen): Using the Valgrind API in mysqld (23)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Using the Valgrind API in mysqld
CREATION DATE..: Fri, 22 May 2009, 11:43
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 23 (http://askmonty.org/worklog/?tid=23)
VERSION........: Server-5.1
STATUS.........: Code-Review
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 40 (hours remain)
ORIG. ESTIMATE.: 40
PROGRESS NOTES:
-=-=(Knielsen - Mon, 10 Aug 2009, 14:27)=-=-
Low Level Design modified.
--- /tmp/wklog.23.old.16018 2009-08-10 14:27:09.000000000 +0300
+++ /tmp/wklog.23.new.16018 2009-08-10 14:27:09.000000000 +0300
@@ -5,3 +5,5 @@
- sql/item_strfunc.cc (Item_func_compress).
+Another good place is in the TRASH_MEM macro.
+
-=-=(Knielsen - Wed, 24 Jun 2009, 15:55)=-=-
Supervisor updated.
--- /tmp/wklog.23.old.944 2009-06-24 15:55:57.000000000 +0300
+++ /tmp/wklog.23.new.944 2009-06-24 15:55:57.000000000 +0300
@@ -1 +1 @@
-Knielsen
+Monty
-=-=(Knielsen - Wed, 24 Jun 2009, 15:53)=-=-
Version updated.
--- /tmp/wklog.23.old.911 2009-06-24 15:53:32.000000000 +0300
+++ /tmp/wklog.23.new.911 2009-06-24 15:53:32.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Server-5.1
-=-=(Knielsen - Wed, 24 Jun 2009, 15:52)=-=-
Version updated.
--- /tmp/wklog.23.old.897 2009-06-24 15:52:43.000000000 +0300
+++ /tmp/wklog.23.new.897 2009-06-24 15:52:43.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-2.1
+Maria-1.0
-=-=(Knielsen - Wed, 24 Jun 2009, 15:52)=-=-
Version updated.
--- /tmp/wklog.23.old.895 2009-06-24 15:52:28.000000000 +0300
+++ /tmp/wklog.23.new.895 2009-06-24 15:52:28.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Connector/.NET-2.1
-=-=(Knielsen - Wed, 24 Jun 2009, 15:35)=-=-
Version updated.
--- /tmp/wklog.23.old.32742 2009-06-24 15:35:48.000000000 +0300
+++ /tmp/wklog.23.new.32742 2009-06-24 15:35:48.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+Maria-1.0
-=-=(Knielsen - Fri, 22 May 2009, 14:31)=-=-
Low Level Design modified.
--- /tmp/wklog.23.old.24587 2009-05-22 14:31:52.000000000 +0300
+++ /tmp/wklog.23.new.24587 2009-05-22 14:31:52.000000000 +0300
@@ -1 +1,7 @@
+Two places where we call into libz, and where checking for defined parameters
+would be good:
+
+ - mysys/my_compress.c
+
+ - sql/item_strfunc.cc (Item_func_compress).
-=-=(Guest - Fri, 22 May 2009, 12:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.23.old.18061 2009-05-22 12:04:05.000000000 +0300
+++ /tmp/wklog.23.new.18061 2009-05-22 12:04:05.000000000 +0300
@@ -26,3 +26,5 @@
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
+
+See also a patch here: http://bugs.mysql.com/bug.php?id=44582
-=-=(Knielsen - Fri, 22 May 2009, 11:52)=-=-
High-Level Specification modified.
--- /tmp/wklog.23.old.17628 2009-05-22 11:52:33.000000000 +0300
+++ /tmp/wklog.23.new.17628 2009-05-22 11:52:33.000000000 +0300
@@ -1 +1,28 @@
+With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
+a memory block is allocated (so that data read from memory is marked as undefined
+instead of being defined or not at random depending on prior use); and when a
+memory block is freed (so that use after freeing can be reported as an error).
+In some cases cheking for leaks may also be appropriate.
+
+Another possibility is to add an explicit check for whether memory is defined.
+
+One place this would be useful is when calling libz. Due to the design of that
+library, Valgrind produces lots of false alarms about using undefined values
+(I think the issue is that it runs a few bytes off of initialized memory to
+reduce boundary checks in each loop iteration, then after the loop has checks to
+avoid using the undefined part of the result). This means we have lots of libz
+Valgrind suppressions and continue to add more as new warnings surface. So we
+might easily miss a real problem in this area. This could be improved by adding
+explicit checks at the call to libz functions that the passed memory is properly
+defined.
+
+Another use is to improve debugging. It is often the case when debugging a
+warning about using un-initialised memory that the detection happens long after
+the real problem, the un-initialized value being passed along through the code
+for a long time before being detected. This makes debugging the problem slow.
+
+By adding in strategic places code that asserts that a specific value must be
+initialised, it is possible to detect problems earlier, speeding up debugging.
+Such code can be added in more places over time as development and debugging
+goes on.
DESCRIPTION:
Valgrind (the memcheck tool) has some very useful APIs that can be used in mysqld
when testing with Valgrind to improve testing and/or debugging:
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.clientreqs
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.mempools
This worklog is about adding configure checks and headers to allow to use these
in a way that continues to work on machines where the Valgrind headers or
functionality is missing.
It also includes adding some basic Valgrind enhancements:
- Adding Valgrind annotations to custom memory allocators so that Valgrind can
detect leaks, use-before-init, and use-after-free problems also for these
allocators.
- Adding checks for definedness in appropriate places (eg. when calling libz).
HIGH-LEVEL SPECIFICATION:
With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
a memory block is allocated (so that data read from memory is marked as undefined
instead of being defined or not at random depending on prior use); and when a
memory block is freed (so that use after freeing can be reported as an error).
In some cases cheking for leaks may also be appropriate.
Another possibility is to add an explicit check for whether memory is defined.
One place this would be useful is when calling libz. Due to the design of that
library, Valgrind produces lots of false alarms about using undefined values
(I think the issue is that it runs a few bytes off of initialized memory to
reduce boundary checks in each loop iteration, then after the loop has checks to
avoid using the undefined part of the result). This means we have lots of libz
Valgrind suppressions and continue to add more as new warnings surface. So we
might easily miss a real problem in this area. This could be improved by adding
explicit checks at the call to libz functions that the passed memory is properly
defined.
Another use is to improve debugging. It is often the case when debugging a
warning about using un-initialised memory that the detection happens long after
the real problem, the un-initialized value being passed along through the code
for a long time before being detected. This makes debugging the problem slow.
By adding in strategic places code that asserts that a specific value must be
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
See also a patch here: http://bugs.mysql.com/bug.php?id=44582
LOW-LEVEL DESIGN:
Two places where we call into libz, and where checking for defined parameters
would be good:
- mysys/my_compress.c
- sql/item_strfunc.cc (Item_func_compress).
Another good place is in the TRASH_MEM macro.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options. This task is about adding similar
functionality to mysqlbinlog.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options. This task is about adding similar
functionality to mysqlbinlog.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 10 Aug '09
by worklog-noreply@askmonty.org 10 Aug '09
10 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300
+++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300
@@ -1,4 +1,3 @@
-
Context
-------
At the moment, the server has a replication slave option
@@ -67,6 +66,6 @@
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
-mysqlbinlog (adding a comment is easy and doesn't require use to parse the
-statement).
+mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
+parse the statement).
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to
parse the statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require use to parse the
statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=-
High-Level Specification modified.
--- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300
+++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300
@@ -1 +1,72 @@
+Context
+-------
+At the moment, the server has a replication slave option
+
+ --replicate-rewrite-db="from->to"
+
+the option affects
+- Table_map_log_event (all RBR events)
+- Load_log_event (LOAD DATA)
+- Query_log_event (SBR-based updates, with the usual assumption that the
+ statement refers to tables in current database, so that changing the current
+ database will make the statement to work on a table in a different database).
+
+What we could do
+----------------
+
+Option1: make mysqlbinlog accept --replicate-rewrite-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
+same extent as replication slave would process --replicate-rewrite-db option.
+
+
+Option2: Add database-agnostic RBR events and --strip-db option
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Right now RBR events require a databasename. It is not possible to have RBR
+event stream that won't mention which database the events are for. When I
+tried to use debugger and specify empty database name, attempt to apply the
+binlog resulted in this error:
+
+090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
+opening tables,
+
+We could do as follows:
+- Make the server interpret empty database name in RBR event (i.e. in a
+ Table_map_log_event) as "use current database". Binlog slave thread
+ probably should not allow such events as it doesn't have a natural current
+ database.
+- Add a mysqlbinlog --strip-db option that would
+ = not produce any "USE dbname" statements
+ = change databasename for all RBR events to be empty
+
+That way, mysqlbinlog output will be database-agnostic and apply to the
+current database.
+(this will have the usual limitations that we assume that all statements in
+the binlog refer to the current database).
+
+Option3: Enhance database rewrite
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If there is a need to support database change for statements that use
+dbname.tablename notation and are replicated as statements (i.e. are DDL
+statements and/or DML statements that are binlogged as statements),
+then that could be supported as follows:
+
+- Make the server's parser recognize special form of comments
+
+ /* !database-alias(oldname,newname) */
+
+ and save the mapping somewhere
+
+- Put the hooks in table open and name resolution code to use the saved
+ mapping.
+
+
+Once we've done the above, it will be easy to perform a complete,
+no-compromise or restrictions database name change in binary log.
+
+It will be possible to do the rewrites either on the slave (
+--replicate-rewrite-db will work for all kinds of statements), or in
+mysqlbinlog (adding a comment is easy and doesn't require use to parse the
+statement).
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=-
Dependency created: 39 now depends on 36
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
HIGH-LEVEL SPECIFICATION:
Context
-------
At the moment, the server has a replication slave option
--replicate-rewrite-db="from->to"
the option affects
- Table_map_log_event (all RBR events)
- Load_log_event (LOAD DATA)
- Query_log_event (SBR-based updates, with the usual assumption that the
statement refers to tables in current database, so that changing the current
database will make the statement to work on a table in a different database).
What we could do
----------------
Option1: make mysqlbinlog accept --replicate-rewrite-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Make mysqlbinlog accept --replicate-rewrite-db options and process them to the
same extent as replication slave would process --replicate-rewrite-db option.
Option2: Add database-agnostic RBR events and --strip-db option
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now RBR events require a databasename. It is not possible to have RBR
event stream that won't mention which database the events are for. When I
tried to use debugger and specify empty database name, attempt to apply the
binlog resulted in this error:
090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on
opening tables,
We could do as follows:
- Make the server interpret empty database name in RBR event (i.e. in a
Table_map_log_event) as "use current database". Binlog slave thread
probably should not allow such events as it doesn't have a natural current
database.
- Add a mysqlbinlog --strip-db option that would
= not produce any "USE dbname" statements
= change databasename for all RBR events to be empty
That way, mysqlbinlog output will be database-agnostic and apply to the
current database.
(this will have the usual limitations that we assume that all statements in
the binlog refer to the current database).
Option3: Enhance database rewrite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a need to support database change for statements that use
dbname.tablename notation and are replicated as statements (i.e. are DDL
statements and/or DML statements that are binlogged as statements),
then that could be supported as follows:
- Make the server's parser recognize special form of comments
/* !database-alias(oldname,newname) */
and save the mapping somewhere
- Put the hooks in table open and name resolution code to use the saved
mapping.
Once we've done the above, it will be easy to perform a complete,
no-compromise or restrictions database name change in binary log.
It will be possible to do the rewrites either on the slave (
--replicate-rewrite-db will work for all kinds of statements), or in
mysqlbinlog (adding a comment is easy and doesn't require use to parse the
statement).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 12:56)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.22083 2009-08-09 12:56:36.000000000 +0300
+++ /tmp/wklog.37.new.22083 2009-08-09 12:56:36.000000000 +0300
@@ -11,3 +11,16 @@
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
+Note: mysqlbinlog already uses
+
+ DELIMITER /*!*/;
+
+so that it can process "multi-statements" like
+
+ CREATE PROCEDURE ... BEGIN stmt1; stmt2; ... END
+
+what remains to be done is to print the /*!*/; only when we're about to exceed
+$args[combine-statements] bytes. In all other cases, delimit statements with
+regular semicolon.
+
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:30)=-=-
High Level Description modified.
--- /tmp/wklog.37.old.21090 2009-08-09 12:30:26.000000000 +0300
+++ /tmp/wklog.37.new.21090 2009-08-09 12:30:26.000000000 +0300
@@ -1,6 +1,6 @@
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
-of roundtrips, and they become a bottleneck.
+of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
@@ -14,7 +14,7 @@
loading such sql script will require fewer roundtrips.
-The behavior can be controlled using a command line option
+The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
Note: mysqlbinlog already uses
DELIMITER /*!*/;
so that it can process "multi-statements" like
CREATE PROCEDURE ... BEGIN stmt1; stmt2; ... END
what remains to be done is to print the /*!*/; only when we're about to exceed
$args[combine-statements] bytes. In all other cases, delimit statements with
regular semicolon.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 12:56)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.22083 2009-08-09 12:56:36.000000000 +0300
+++ /tmp/wklog.37.new.22083 2009-08-09 12:56:36.000000000 +0300
@@ -11,3 +11,16 @@
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
+Note: mysqlbinlog already uses
+
+ DELIMITER /*!*/;
+
+so that it can process "multi-statements" like
+
+ CREATE PROCEDURE ... BEGIN stmt1; stmt2; ... END
+
+what remains to be done is to print the /*!*/; only when we're about to exceed
+$args[combine-statements] bytes. In all other cases, delimit statements with
+regular semicolon.
+
+
-=-=(Psergey - Sun, 09 Aug 2009, 12:30)=-=-
High Level Description modified.
--- /tmp/wklog.37.old.21090 2009-08-09 12:30:26.000000000 +0300
+++ /tmp/wklog.37.new.21090 2009-08-09 12:30:26.000000000 +0300
@@ -1,6 +1,6 @@
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
-of roundtrips, and they become a bottleneck.
+of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
@@ -14,7 +14,7 @@
loading such sql script will require fewer roundtrips.
-The behavior can be controlled using a command line option
+The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
Note: mysqlbinlog already uses
DELIMITER /*!*/;
so that it can process "multi-statements" like
CREATE PROCEDURE ... BEGIN stmt1; stmt2; ... END
what remains to be done is to print the /*!*/; only when we're about to exceed
$args[combine-statements] bytes. In all other cases, delimit statements with
regular semicolon.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 12:30)=-=-
High Level Description modified.
--- /tmp/wklog.37.old.21090 2009-08-09 12:30:26.000000000 +0300
+++ /tmp/wklog.37.new.21090 2009-08-09 12:30:26.000000000 +0300
@@ -1,6 +1,6 @@
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
-of roundtrips, and they become a bottleneck.
+of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
@@ -14,7 +14,7 @@
loading such sql script will require fewer roundtrips.
-The behavior can be controlled using a command line option
+The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 09 Aug 2009, 12:30)=-=-
High Level Description modified.
--- /tmp/wklog.37.old.21090 2009-08-09 12:30:26.000000000 +0300
+++ /tmp/wklog.37.new.21090 2009-08-09 12:30:26.000000000 +0300
@@ -1,6 +1,6 @@
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
-of roundtrips, and they become a bottleneck.
+of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
@@ -14,7 +14,7 @@
loading such sql script will require fewer roundtrips.
-The behavior can be controlled using a command line option
+The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
-=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=-
Dependency created: 39 now depends on 37
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and the network roundtrips become the bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behaviour can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Replication tasks (39)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Replication tasks (39)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Replication tasks
CREATION DATE..: Sun, 09 Aug 2009, 12:24
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Client-RawIdeaBin
TASK ID........: 39 (http://askmonty.org/worklog/?tid=39)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
A combine task for all replication tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.38.old.20756 2009-08-09 12:22:52.000000000 +0300
+++ /tmp/wklog.38.new.20756 2009-08-09 12:22:52.000000000 +0300
@@ -1 +1,18 @@
+Monty's suggestions for fix:
+
+A way to fix this for 'most' cases are:
+
+If we do filtering (like mysqlbinlog --database='xxx') then:
+
+- In mysql_bin_log(), do a flush of the Log_event::cache() between
+ each statement.
+- Log on statement.
+- If the statement was ignored (we need a flag for this) and
+ there is something in the cache and the file position didn't change
+ (the cache didn't overflow), then reset the cache.
+
+Bug #23890 mysqlbinlog outputs COMMIT unnecessarily when single
+database is used
+- Could be fixed by having a flag to mark if something was printed
+ to the log since last commit.
-=-=(Guest - Sun, 09 Aug 2009, 12:20)=-=-
High Level Description modified.
--- /tmp/wklog.38.old.20618 2009-08-09 12:20:16.000000000 +0300
+++ /tmp/wklog.38.new.20618 2009-08-09 12:20:16.000000000 +0300
@@ -1,10 +1,24 @@
-This comes from MySQL BUG#23890:
+This comes from MySQL BUG#23890 and BUG#23894: when one runs
- mysqlbinlog --database=bar N-bin.000003
+ mysqlbinlog --database=bar binlog_file
-will output all the COMMIT statements in the binary log even if it didn't print
-any statements between the COMMITs (because all the statements that were there
-were for the other databases).
+the produced SQL may contain useless sequences of commands like:
-The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
-printed the previous commit statement.
+COMMIT;
+COMMIT;
+COMMIT;
+...
+
+or
+
+SET INSERT_ID=val1;
+SET INSERT_ID=val2;
+SET INSERT_ID=val3;
+...
+
+This happens because the statements between COMMIT (or SET) statements had no
+effect on the specified database and so were filtered out. COMMIT and SET
+statements themselves are not associated with any database and were left in.
+
+Presence of redundant COMMIT or SET statements makes binlog SQL script
+unnecessarily big and it will take more client<->server roundtrips to apply it.
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890 and BUG#23894: when one runs
mysqlbinlog --database=bar binlog_file
the produced SQL may contain useless sequences of commands like:
COMMIT;
COMMIT;
COMMIT;
...
or
SET INSERT_ID=val1;
SET INSERT_ID=val2;
SET INSERT_ID=val3;
...
This happens because the statements between COMMIT (or SET) statements had no
effect on the specified database and so were filtered out. COMMIT and SET
statements themselves are not associated with any database and were left in.
Presence of redundant COMMIT or SET statements makes binlog SQL script
unnecessarily big and it will take more client<->server roundtrips to apply it.
HIGH-LEVEL SPECIFICATION:
Monty's suggestions for fix:
A way to fix this for 'most' cases are:
If we do filtering (like mysqlbinlog --database='xxx') then:
- In mysql_bin_log(), do a flush of the Log_event::cache() between
each statement.
- Log on statement.
- If the statement was ignored (we need a flag for this) and
there is something in the cache and the file position didn't change
(the cache didn't overflow), then reset the cache.
Bug #23890 mysqlbinlog outputs COMMIT unnecessarily when single
database is used
- Could be fixed by having a flag to mark if something was printed
to the log since last commit.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.38.old.20756 2009-08-09 12:22:52.000000000 +0300
+++ /tmp/wklog.38.new.20756 2009-08-09 12:22:52.000000000 +0300
@@ -1 +1,18 @@
+Monty's suggestions for fix:
+
+A way to fix this for 'most' cases are:
+
+If we do filtering (like mysqlbinlog --database='xxx') then:
+
+- In mysql_bin_log(), do a flush of the Log_event::cache() between
+ each statement.
+- Log on statement.
+- If the statement was ignored (we need a flag for this) and
+ there is something in the cache and the file position didn't change
+ (the cache didn't overflow), then reset the cache.
+
+Bug #23890 mysqlbinlog outputs COMMIT unnecessarily when single
+database is used
+- Could be fixed by having a flag to mark if something was printed
+ to the log since last commit.
-=-=(Guest - Sun, 09 Aug 2009, 12:20)=-=-
High Level Description modified.
--- /tmp/wklog.38.old.20618 2009-08-09 12:20:16.000000000 +0300
+++ /tmp/wklog.38.new.20618 2009-08-09 12:20:16.000000000 +0300
@@ -1,10 +1,24 @@
-This comes from MySQL BUG#23890:
+This comes from MySQL BUG#23890 and BUG#23894: when one runs
- mysqlbinlog --database=bar N-bin.000003
+ mysqlbinlog --database=bar binlog_file
-will output all the COMMIT statements in the binary log even if it didn't print
-any statements between the COMMITs (because all the statements that were there
-were for the other databases).
+the produced SQL may contain useless sequences of commands like:
-The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
-printed the previous commit statement.
+COMMIT;
+COMMIT;
+COMMIT;
+...
+
+or
+
+SET INSERT_ID=val1;
+SET INSERT_ID=val2;
+SET INSERT_ID=val3;
+...
+
+This happens because the statements between COMMIT (or SET) statements had no
+effect on the specified database and so were filtered out. COMMIT and SET
+statements themselves are not associated with any database and were left in.
+
+Presence of redundant COMMIT or SET statements makes binlog SQL script
+unnecessarily big and it will take more client<->server roundtrips to apply it.
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890 and BUG#23894: when one runs
mysqlbinlog --database=bar binlog_file
the produced SQL may contain useless sequences of commands like:
COMMIT;
COMMIT;
COMMIT;
...
or
SET INSERT_ID=val1;
SET INSERT_ID=val2;
SET INSERT_ID=val3;
...
This happens because the statements between COMMIT (or SET) statements had no
effect on the specified database and so were filtered out. COMMIT and SET
statements themselves are not associated with any database and were left in.
Presence of redundant COMMIT or SET statements makes binlog SQL script
unnecessarily big and it will take more client<->server roundtrips to apply it.
HIGH-LEVEL SPECIFICATION:
Monty's suggestions for fix:
A way to fix this for 'most' cases are:
If we do filtering (like mysqlbinlog --database='xxx') then:
- In mysql_bin_log(), do a flush of the Log_event::cache() between
each statement.
- Log on statement.
- If the statement was ignored (we need a flag for this) and
there is something in the cache and the file position didn't change
(the cache didn't overflow), then reset the cache.
Bug #23890 mysqlbinlog outputs COMMIT unnecessarily when single
database is used
- Could be fixed by having a flag to mark if something was printed
to the log since last commit.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:20)=-=-
High Level Description modified.
--- /tmp/wklog.38.old.20618 2009-08-09 12:20:16.000000000 +0300
+++ /tmp/wklog.38.new.20618 2009-08-09 12:20:16.000000000 +0300
@@ -1,10 +1,24 @@
-This comes from MySQL BUG#23890:
+This comes from MySQL BUG#23890 and BUG#23894: when one runs
- mysqlbinlog --database=bar N-bin.000003
+ mysqlbinlog --database=bar binlog_file
-will output all the COMMIT statements in the binary log even if it didn't print
-any statements between the COMMITs (because all the statements that were there
-were for the other databases).
+the produced SQL may contain useless sequences of commands like:
-The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
-printed the previous commit statement.
+COMMIT;
+COMMIT;
+COMMIT;
+...
+
+or
+
+SET INSERT_ID=val1;
+SET INSERT_ID=val2;
+SET INSERT_ID=val3;
+...
+
+This happens because the statements between COMMIT (or SET) statements had no
+effect on the specified database and so were filtered out. COMMIT and SET
+statements themselves are not associated with any database and were left in.
+
+Presence of redundant COMMIT or SET statements makes binlog SQL script
+unnecessarily big and it will take more client<->server roundtrips to apply it.
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890 and BUG#23894: when one runs
mysqlbinlog --database=bar binlog_file
the produced SQL may contain useless sequences of commands like:
COMMIT;
COMMIT;
COMMIT;
...
or
SET INSERT_ID=val1;
SET INSERT_ID=val2;
SET INSERT_ID=val3;
...
This happens because the statements between COMMIT (or SET) statements had no
effect on the specified database and so were filtered out. COMMIT and SET
statements themselves are not associated with any database and were left in.
Presence of redundant COMMIT or SET statements makes binlog SQL script
unnecessarily big and it will take more client<->server roundtrips to apply it.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:20)=-=-
High Level Description modified.
--- /tmp/wklog.38.old.20618 2009-08-09 12:20:16.000000000 +0300
+++ /tmp/wklog.38.new.20618 2009-08-09 12:20:16.000000000 +0300
@@ -1,10 +1,24 @@
-This comes from MySQL BUG#23890:
+This comes from MySQL BUG#23890 and BUG#23894: when one runs
- mysqlbinlog --database=bar N-bin.000003
+ mysqlbinlog --database=bar binlog_file
-will output all the COMMIT statements in the binary log even if it didn't print
-any statements between the COMMITs (because all the statements that were there
-were for the other databases).
+the produced SQL may contain useless sequences of commands like:
-The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
-printed the previous commit statement.
+COMMIT;
+COMMIT;
+COMMIT;
+...
+
+or
+
+SET INSERT_ID=val1;
+SET INSERT_ID=val2;
+SET INSERT_ID=val3;
+...
+
+This happens because the statements between COMMIT (or SET) statements had no
+effect on the specified database and so were filtered out. COMMIT and SET
+statements themselves are not associated with any database and were left in.
+
+Presence of redundant COMMIT or SET statements makes binlog SQL script
+unnecessarily big and it will take more client<->server roundtrips to apply it.
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890 and BUG#23894: when one runs
mysqlbinlog --database=bar binlog_file
the produced SQL may contain useless sequences of commands like:
COMMIT;
COMMIT;
COMMIT;
...
or
SET INSERT_ID=val1;
SET INSERT_ID=val2;
SET INSERT_ID=val3;
...
This happens because the statements between COMMIT (or SET) statements had no
effect on the specified database and so were filtered out. COMMIT and SET
statements themselves are not associated with any database and were left in.
Presence of redundant COMMIT or SET statements makes binlog SQL script
unnecessarily big and it will take more client<->server roundtrips to apply it.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890:
mysqlbinlog --database=bar N-bin.000003
will output all the COMMIT statements in the binary log even if it didn't print
any statements between the COMMITs (because all the statements that were there
were for the other databases).
The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
printed the previous commit statement.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Make mysqlbinlog not to output unneeded statements when using --database (38)
by worklog-noreply@askmonty.org 09 Aug '09
by worklog-noreply@askmonty.org 09 Aug '09
09 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded statements when using
--database
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sun, 09 Aug 2009, 12:19)=-=-
Title modified.
--- /tmp/wklog.38.old.20544 2009-08-09 12:19:22.000000000 +0300
+++ /tmp/wklog.38.new.20544 2009-08-09 12:19:22.000000000 +0300
@@ -1 +1 @@
-Make mysqlbinlog not to output unneeded COMMIT statements
+Make mysqlbinlog not to output unneeded statements when using --database
DESCRIPTION:
This comes from MySQL BUG#23890:
mysqlbinlog --database=bar N-bin.000003
will output all the COMMIT statements in the binary log even if it didn't print
any statements between the COMMITs (because all the statements that were there
were for the other databases).
The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
printed the previous commit statement.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Make mysqlbinlog not to output unneeded COMMIT statements (38)
by worklog-noreply@askmonty.org 08 Aug '09
by worklog-noreply@askmonty.org 08 Aug '09
08 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded COMMIT statements
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
This comes from MySQL BUG#23890:
mysqlbinlog --database=bar N-bin.000003
will output all the COMMIT statements in the binary log even if it didn't print
any statements between the COMMITs (because all the statements that were there
were for the other databases).
The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
printed the previous commit statement.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Make mysqlbinlog not to output unneeded COMMIT statements (38)
by worklog-noreply@askmonty.org 08 Aug '09
by worklog-noreply@askmonty.org 08 Aug '09
08 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Make mysqlbinlog not to output unneeded COMMIT statements
CREATION DATE..: Sat, 08 Aug 2009, 12:40
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 38 (http://askmonty.org/worklog/?tid=38)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
This comes from MySQL BUG#23890:
mysqlbinlog --database=bar N-bin.000003
will output all the COMMIT statements in the binary log even if it didn't print
any statements between the COMMITs (because all the statements that were there
were for the other databases).
The fix is trivial: in mysqlbinlog, check if we've printed anything after we've
printed the previous commit statement.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and they become a bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behavior can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Fri, 07 Aug 2009, 17:16)=-=-
High-Level Specification modified.
--- /tmp/wklog.37.old.20454 2009-08-07 17:16:54.000000000 +0300
+++ /tmp/wklog.37.new.20454 2009-08-07 17:16:54.000000000 +0300
@@ -1 +1,13 @@
+Implementation overview:
+
+- At start, print "--delimiter=;;"
+- Modify the start of each print functions as follows
+
+ if (my_b_tell(&cache) - my_start_of_combine_statement) +
+ estimiated_size_of_log_event) > combine_statement_size)
+ my_b_write(&cache,";;",2);
+
+- And we should end mysqlbinlog with;
+ if (my_b_tell(&cache) != 0)
+ my_b_write(&cache,";;",2);
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and they become a bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behavior can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
HIGH-LEVEL SPECIFICATION:
Implementation overview:
- At start, print "--delimiter=;;"
- Modify the start of each print functions as follows
if (my_b_tell(&cache) - my_start_of_combine_statement) +
estimiated_size_of_log_event) > combine_statement_size)
my_b_write(&cache,";;",2);
- And we should end mysqlbinlog with;
if (my_b_tell(&cache) != 0)
my_b_write(&cache,";;",2);
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and they become a bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behavior can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add an option to mysqlbinlog to produce SQL script with fewer roundtrips (37)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add an option to mysqlbinlog to produce SQL script with fewer
roundtrips
CREATION DATE..: Fri, 07 Aug 2009, 17:14
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 37 (http://askmonty.org/worklog/?tid=37)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
SQL scripts generated by mysqlbinlog can be slow to load because they have many
small queries, hence applying the script against a remote server requires a lot
of roundtrips, and they become a bottleneck.
This bottleneck can be addressed by having mysqlbinlog combine multiple
statements into one:
+delimiter //
binlog statement1;
binlog statement2;
binlog statement3;
+//
binlog statement4;
loading such sql script will require fewer roundtrips.
The behavior can be controlled using a command line option
mysqlbinlog --combine-statements=#
Where the # is maximum allowed packet length.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the used database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=-
Title modified.
--- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300
+++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300
@@ -1 +1 @@
-Add a mysqlbinlog option to change the database
+Add a mysqlbinlog option to change the used database
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to change the database (36)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add a mysqlbinlog option to change the database (36)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to change the database
CREATION DATE..: Fri, 07 Aug 2009, 14:57
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 36 (http://askmonty.org/worklog/?tid=36)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Sometimes there is a need to take a binary log and apply it to a database with
a different name than the original name of the database on binlog producer.
If one is using statement-based replication, he can achieve this by grepping
out "USE dbname" statements out of the output of mysqlbinlog(*). With
row-based replication this is no longer possible, as database name is encoded
within the the BINLOG '....' statement.
This task is about adding an option to mysqlbinlog that would allow to change
the names of used databases in both RBR and SBR events.
(*) this implies that all statements refer to tables in the current database,
doesn't catch updates made inside stored functions and so forth, but still
works for a practially-important subset of cases.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Hingo): Test task for using worklog time track features (35)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Test task for using worklog time track features
CREATION DATE..: Fri, 07 Aug 2009, 09:28
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Other
TASK ID........: 35 (http://askmonty.org/worklog/?tid=35)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 9 (hours remain)
ORIG. ESTIMATE.: 10
PROGRESS NOTES:
-=-=(Hingo - Fri, 07 Aug 2009, 09:30)=-=-
Adding first hour worked
Worked 1 hour and estimate 9 hours remain (original estimate unchanged).
DESCRIPTION:
Test task for testing time tracking features.
Marking as private. What does that mean?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Hingo): Test task for using worklog time track features (35)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Test task for using worklog time track features
CREATION DATE..: Fri, 07 Aug 2009, 09:28
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Other
TASK ID........: 35 (http://askmonty.org/worklog/?tid=35)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 9 (hours remain)
ORIG. ESTIMATE.: 10
PROGRESS NOTES:
-=-=(Hingo - Fri, 07 Aug 2009, 09:30)=-=-
Adding first hour worked
Worked 1 hour and estimate 9 hours remain (original estimate unchanged).
DESCRIPTION:
Test task for testing time tracking features.
Marking as private. What does that mean?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Hingo): Test task for using worklog time track features (35)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Test task for using worklog time track features
CREATION DATE..: Fri, 07 Aug 2009, 09:28
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Other
TASK ID........: 35 (http://askmonty.org/worklog/?tid=35)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 10 (hours remain)
ORIG. ESTIMATE.: 10
PROGRESS NOTES:
DESCRIPTION:
Test task for testing time tracking features.
Marking as private. What does that mean?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Hingo): Test task for using worklog time track features (35)
by worklog-noreply@askmonty.org 07 Aug '09
by worklog-noreply@askmonty.org 07 Aug '09
07 Aug '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Test task for using worklog time track features
CREATION DATE..: Fri, 07 Aug 2009, 09:28
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Other
TASK ID........: 35 (http://askmonty.org/worklog/?tid=35)
VERSION........: Benchmarks-3.0
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 10 (hours remain)
ORIG. ESTIMATE.: 10
PROGRESS NOTES:
DESCRIPTION:
Test task for testing time tracking features.
Marking as private. What does that mean?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Vadim Tkachenko <vadim(a)percona.com> writes:
> Kristian,
>
> Now lp:percona-xtradb/release-6 and lp:percona-xtradb
> should be synchronized and be up-to-date.
I still do not see that this is the case :-(
- Last push to lp:percona-xtradb is from June 26, and does not appear
up-to-date with latest XtraDB6 release.
- lp:percona-xtradb/release-6 does not exist. Maybe a typo for
lp:~percona-dev/percona-xtradb/release-6 ?
- I diffed lp:~percona-dev/percona-xtradb/release-6 against the XtraDB source
tarball on http://www.percona.com/mysql/xtradb/5.1.36-6/source/. They do
not match. The bzr branch is missing two bug fixes, on the other hand the
tarball has #define PERCONA_INNODB_VERSION 5a which seems to be wrong (diff
appended at the end of the mail).
Ok. To proceed, I have now committed the two missing bugfixes and pushed here:
lp:~maria-captains/percona-xtradb/release-6-fixed
I will use this as the basis for my merge. Maybe you can pull that into
lp:~percona-dev/percona-xtradb/release-6, or I will handle the conflict if
necessary in the next merge (should be no problem).
> Please let me know if there something else from our side.
Only that I feel a bit stupid having these difficulties finding out what tree
to merge. Seems to me there is something I do not understand correctly... is
there some mailing list or IRC channel or something I should follow to better
keep track of XtraDB development?
- Kristian.
-----------------------------------------------------------------------
$ diff -u --recursive . ../mysql-5.1.36-xtradb6/storage/innobase
Only in .: .bzr
diff -u --recursive ./dict/dict0dict.c ../mysql-5.1.36-xtradb6/storage/innobase/dict/dict0dict.c
--- ./dict/dict0dict.c 2009-08-03 08:30:02.000000000 +0200
+++ ../mysql-5.1.36-xtradb6/storage/innobase/dict/dict0dict.c 2009-07-22 20:07:56.000000000 +0200
@@ -3049,7 +3049,7 @@
} else if (quote) {
/* Within quotes: do not look for
starting quotes or comments. */
- } else if (*sptr == '"' || *sptr == '`') {
+ } else if (*sptr == '"' || *sptr == '`' || *sptr == '\'') {
/* Starting quote: remember the quote character. */
quote = *sptr;
} else if (*sptr == '#'
diff -u --recursive ./handler/ha_innodb.cc ../mysql-5.1.36-xtradb6/storage/innobase/handler/ha_innodb.cc
--- ./handler/ha_innodb.cc 2009-08-03 08:30:02.000000000 +0200
+++ ../mysql-5.1.36-xtradb6/storage/innobase/handler/ha_innodb.cc 2009-07-22 20:07:56.000000000 +0200
@@ -9319,7 +9319,8 @@
/* Check that row format didn't change */
if ((info->used_fields & HA_CREATE_USED_ROW_FORMAT) &&
- get_row_type() != info->row_type) {
+ get_row_type() != ((info->row_type == ROW_TYPE_DEFAULT)
+ ? ROW_TYPE_COMPACT : info->row_type)) {
return(COMPATIBLE_DATA_NO);
}
Only in ../mysql-5.1.36-xtradb6/storage/innobase/handler: ha_innodb.cc.orig
Only in ../mysql-5.1.36-xtradb6/storage/innobase/handler: innodb_patch_info.h.orig
Only in ../mysql-5.1.36-xtradb6/storage/innobase/handler: i_s.cc.orig
diff -u --recursive ./include/univ.i ../mysql-5.1.36-xtradb6/storage/innobase/include/univ.i
--- ./include/univ.i 2009-08-03 08:30:02.000000000 +0200
+++ ../mysql-5.1.36-xtradb6/storage/innobase/include/univ.i 2009-07-22 20:07:56.000000000 +0200
@@ -35,7 +35,7 @@
#define INNODB_VERSION_MAJOR 1
#define INNODB_VERSION_MINOR 0
#define INNODB_VERSION_BUGFIX 3
-#define PERCONA_INNODB_VERSION 6a
+#define PERCONA_INNODB_VERSION 5a
/* The following is the InnoDB version as shown in
SELECT plugin_version FROM information_schema.plugins;
Only in ./mysql-test/patches: mysqlbinlog_row_big.diff
Only in ./mysql-test/patches: variables-big.diff
2
3
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2719: Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
by noreply@launchpad.net 05 Aug '09
by noreply@launchpad.net 05 Aug '09
05 Aug '09
------------------------------------------------------------
revno: 2719
committer: knielsen(a)knielsen-hq.org
branch nick: mariadb-5.1
timestamp: Wed 2009-08-05 09:21:37 +0200
message:
Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
modified:
mysql-test/valgrind.supp
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2719)
by knielsen@knielsen-hq.org 05 Aug '09
by knielsen@knielsen-hq.org 05 Aug '09
05 Aug '09
#At lp:maria
2719 knielsen(a)knielsen-hq.org 2009-08-05
Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
modified:
mysql-test/valgrind.supp
=== modified file 'mysql-test/valgrind.supp'
--- a/mysql-test/valgrind.supp 2009-08-04 14:09:08 +0000
+++ b/mysql-test/valgrind.supp 2009-08-05 07:21:37 +0000
@@ -415,20 +415,6 @@
}
{
- dlclose memory loss from plugin variant 4, seen on Ubuntu Jaunty i686
- Memcheck:Leak
- fun:malloc
- fun:_dl_close_worker
- fun:_dl_close
- fun:dlclose_doit
- fun:_dl_catch_error
- fun:_dlerror_run
- fun:dlclose
- fun:_Z15free_plugin_memP12st_plugin_dl
- fun:_Z13plugin_dl_delPK19st_mysql_lex_string
-}
-
-{
dlclose memory loss from plugin variant 4
Memcheck:Leak
fun:malloc
@@ -455,6 +441,35 @@
}
{
+ dlclose memory loss from plugin variant 6, seen on Ubuntu Jaunty i686
+ Memcheck:Leak
+ fun:malloc
+ fun:_dl_scope_free
+ fun:_dl_close_worker
+ fun:_dl_close
+ fun:dlclose_doit
+ fun:_dl_catch_error
+ fun:_dlerror_run
+ fun:dlclose
+ fun:_ZL15free_plugin_memP12st_plugin_dl
+ fun:_ZL13plugin_dl_delPK19st_mysql_lex_string
+}
+
+{
+ dlclose memory loss from plugin variant 7, seen on Ubuntu Jaunty i686
+ Memcheck:Leak
+ fun:malloc
+ fun:_dl_close_worker
+ fun:_dl_close
+ fun:dlclose_doit
+ fun:_dl_catch_error
+ fun:_dlerror_run
+ fun:dlclose
+ fun:_ZL15free_plugin_memP12st_plugin_dl
+ fun:_ZL13plugin_dl_delPK19st_mysql_lex_string
+}
+
+{
dlopen / ptread_cancel_init memory loss on Suse Linux 10.3 32/64 bit ver 1
Memcheck:Leak
fun:*alloc
1
0
Just a remark about adding Valgrind suppressions for problems in system
libraries that we cannot fix.
I see in mysql-test/valgrind.supp a number of suppressions like this:
{
dlclose memory loss from plugin variant 4
Memcheck:Leak
fun:malloc
obj:/lib*/ld-*.so
obj:/lib*/ld-*.so
obj:/lib*/ld-*.so
obj:/lib*/libdl-*.so
fun:dlclose
fun:_ZL15free_plugin_memP12st_plugin_dl
fun:_ZL13plugin_dl_delPK19st_mysql_lex_string
}
where these "obj:/lib*/ld-*.so" entries are caused by lack of debugging
information, so Valgrind cannot provide proper stack traces.
Please do not add any more suppressions like this. Instead, install debugging
versions of the system libraries (this can be done without any loss of
efficiency, as the debug libraries are only used when explicitly requested,
eg. by `mtr --valgrind`).
On Ubuntu/Debian:
sudo apt-get install libc6-dbg
On Suse:
Enable the `debuginfo' repository (if not already enabled)
Install the package `glibc-debuginfo'
Then the needed suppression will look much nicer:
{
dlclose memory loss from plugin variant 2
Memcheck:Leak
fun:malloc
fun:_dl_close_worker
fun:_dl_close
fun:_dl_catch_error
fun:_dlerror_run
fun:dlclose
fun:_ZL15free_plugin_memP12st_plugin_dl
fun:_ZL13plugin_dl_delPK19st_mysql_lex_string
}
This way, we get fewer suppressions to maintain, and those we do need to
maintain are much easier to read.
- Kristian.
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2718: Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
by noreply@launchpad.net 04 Aug '09
by noreply@launchpad.net 04 Aug '09
04 Aug '09
------------------------------------------------------------
revno: 2718
committer: knielsen(a)knielsen-hq.org
branch nick: mariadb-5.1
timestamp: Tue 2009-08-04 16:09:08 +0200
message:
Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
modified:
mysql-test/valgrind.supp
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2718)
by knielsen@knielsen-hq.org 04 Aug '09
by knielsen@knielsen-hq.org 04 Aug '09
04 Aug '09
#At lp:maria
2718 knielsen(a)knielsen-hq.org 2009-08-04
Add a new variant of dlclose() Valgrind suppressions to fix a Buildbot issue.
modified:
mysql-test/valgrind.supp
=== modified file 'mysql-test/valgrind.supp'
--- a/mysql-test/valgrind.supp 2009-06-05 20:46:23 +0000
+++ b/mysql-test/valgrind.supp 2009-08-04 14:09:08 +0000
@@ -415,6 +415,20 @@
}
{
+ dlclose memory loss from plugin variant 4, seen on Ubuntu Jaunty i686
+ Memcheck:Leak
+ fun:malloc
+ fun:_dl_close_worker
+ fun:_dl_close
+ fun:dlclose_doit
+ fun:_dl_catch_error
+ fun:_dlerror_run
+ fun:dlclose
+ fun:_Z15free_plugin_memP12st_plugin_dl
+ fun:_Z13plugin_dl_delPK19st_mysql_lex_string
+}
+
+{
dlclose memory loss from plugin variant 4
Memcheck:Leak
fun:malloc
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2717: Merge XtraDB 6 with latest MariaDB 5.1
by noreply@launchpad.net 04 Aug '09
by noreply@launchpad.net 04 Aug '09
04 Aug '09
Merge authors:
akuzminsky <akuzminsky(a)localhost.localdomain>
akuzminsky <akuzminsky@sm1u02>
Kristian Nielsen (knielsen)
Vadim Tkachenko (vadim-tk)
------------------------------------------------------------
revno: 2717 [merge]
committer: knielsen(a)knielsen-hq.org
branch nick: mariadb-5.1
timestamp: Mon 2009-08-03 22:19:12 +0200
message:
Merge XtraDB 6 with latest MariaDB 5.1
removed:
mysql-test/include/ctype_innodb_like.inc
mysql-test/include/have_innodb.inc
mysql-test/include/innodb_trx_weight.inc
mysql-test/r/innodb-autoinc.result
mysql-test/r/innodb-lock.result
mysql-test/r/innodb-replace.result
mysql-test/r/innodb-semi-consistent.result
mysql-test/r/innodb.result
mysql-test/r/innodb_bug34053.result
mysql-test/r/innodb_bug34300.result
mysql-test/r/innodb_bug35220.result
mysql-test/r/innodb_trx_weight.result
mysql-test/t/innodb-autoinc.test
mysql-test/t/innodb-lock.test
mysql-test/t/innodb-master.opt
mysql-test/t/innodb-replace.test
mysql-test/t/innodb-semi-consistent-master.opt
mysql-test/t/innodb-semi-consistent.test
mysql-test/t/innodb.test
mysql-test/t/innodb_bug34053.test
mysql-test/t/innodb_bug34300.test
mysql-test/t/innodb_bug35220.test
mysql-test/t/innodb_trx_weight.test
added:
BUILD/compile-innodb
BUILD/compile-innodb-debug
mysql-test/include/ctype_innodb_like.inc
mysql-test/include/have_innodb.inc
mysql-test/include/innodb-index.inc
mysql-test/include/innodb_trx_weight.inc
mysql-test/r/innodb-analyze.result
mysql-test/r/innodb-autoinc.result
mysql-test/r/innodb-index.result
mysql-test/r/innodb-index_ucs2.result
mysql-test/r/innodb-lock.result
mysql-test/r/innodb-replace.result
mysql-test/r/innodb-semi-consistent.result
mysql-test/r/innodb-timeout.result
mysql-test/r/innodb-use-sys-malloc.result
mysql-test/r/innodb-zip.result
mysql-test/r/innodb.result
mysql-test/r/innodb_bug34053.result
mysql-test/r/innodb_bug34300.result
mysql-test/r/innodb_bug35220.result
mysql-test/r/innodb_bug36169.result
mysql-test/r/innodb_bug36172.result
mysql-test/r/innodb_bug40360.result
mysql-test/r/innodb_bug41904.result
mysql-test/r/innodb_information_schema.result
mysql-test/r/innodb_trx_weight.result
mysql-test/r/innodb_xtradb_bug317074.result
mysql-test/t/innodb-analyze.test
mysql-test/t/innodb-autoinc.test
mysql-test/t/innodb-index.test
mysql-test/t/innodb-index_ucs2.test
mysql-test/t/innodb-lock.test
mysql-test/t/innodb-master.opt
mysql-test/t/innodb-replace.test
mysql-test/t/innodb-semi-consistent-master.opt
mysql-test/t/innodb-semi-consistent.test
mysql-test/t/innodb-timeout.test
mysql-test/t/innodb-use-sys-malloc-master.opt
mysql-test/t/innodb-use-sys-malloc.test
mysql-test/t/innodb-zip.test
mysql-test/t/innodb.test
mysql-test/t/innodb_bug34053.test
mysql-test/t/innodb_bug34300.test
mysql-test/t/innodb_bug35220.test
mysql-test/t/innodb_bug36169.test
mysql-test/t/innodb_bug36172.test
mysql-test/t/innodb_bug40360.test
mysql-test/t/innodb_bug41904.test
mysql-test/t/innodb_information_schema.test
mysql-test/t/innodb_trx_weight.test
mysql-test/t/innodb_xtradb_bug317074.test
storage/xtradb/
storage/xtradb/CMakeLists.txt
storage/xtradb/COPYING.Google
storage/xtradb/ChangeLog
storage/xtradb/Makefile.am
storage/xtradb/btr/
storage/xtradb/btr/btr0btr.c
storage/xtradb/btr/btr0cur.c
storage/xtradb/btr/btr0pcur.c
storage/xtradb/btr/btr0sea.c
storage/xtradb/buf/
storage/xtradb/buf/buf0buddy.c
storage/xtradb/buf/buf0buf.c
storage/xtradb/buf/buf0flu.c
storage/xtradb/buf/buf0lru.c
storage/xtradb/buf/buf0rea.c
storage/xtradb/data/
storage/xtradb/data/data0data.c
storage/xtradb/data/data0type.c
storage/xtradb/dict/
storage/xtradb/dict/dict0boot.c
storage/xtradb/dict/dict0crea.c
storage/xtradb/dict/dict0dict.c
storage/xtradb/dict/dict0load.c
storage/xtradb/dict/dict0mem.c
storage/xtradb/dyn/
storage/xtradb/dyn/dyn0dyn.c
storage/xtradb/eval/
storage/xtradb/eval/eval0eval.c
storage/xtradb/eval/eval0proc.c
storage/xtradb/fil/
storage/xtradb/fil/fil0fil.c
storage/xtradb/fsp/
storage/xtradb/fsp/fsp0fsp.c
storage/xtradb/fut/
storage/xtradb/fut/fut0fut.c
storage/xtradb/fut/fut0lst.c
storage/xtradb/ha/
storage/xtradb/ha/ha0ha.c
storage/xtradb/ha/ha0storage.c
storage/xtradb/ha/hash0hash.c
storage/xtradb/ha_innodb.def
storage/xtradb/handler/
storage/xtradb/handler/ha_innodb.cc
storage/xtradb/handler/ha_innodb.h
storage/xtradb/handler/handler0alter.cc
storage/xtradb/handler/handler0vars.h
storage/xtradb/handler/i_s.cc
storage/xtradb/handler/i_s.h
storage/xtradb/handler/innodb_patch_info.h
storage/xtradb/handler/mysql_addons.cc
storage/xtradb/handler/win_delay_loader.cc
storage/xtradb/ibuf/
storage/xtradb/ibuf/ibuf0ibuf.c
storage/xtradb/include/
storage/xtradb/include/btr0btr.h
storage/xtradb/include/btr0btr.ic
storage/xtradb/include/btr0cur.h
storage/xtradb/include/btr0cur.ic
storage/xtradb/include/btr0pcur.h
storage/xtradb/include/btr0pcur.ic
storage/xtradb/include/btr0sea.h
storage/xtradb/include/btr0sea.ic
storage/xtradb/include/btr0types.h
storage/xtradb/include/buf0buddy.h
storage/xtradb/include/buf0buddy.ic
storage/xtradb/include/buf0buf.h
storage/xtradb/include/buf0buf.ic
storage/xtradb/include/buf0flu.h
storage/xtradb/include/buf0flu.ic
storage/xtradb/include/buf0lru.h
storage/xtradb/include/buf0lru.ic
storage/xtradb/include/buf0rea.h
storage/xtradb/include/buf0types.h
storage/xtradb/include/data0data.h
storage/xtradb/include/data0data.ic
storage/xtradb/include/data0type.h
storage/xtradb/include/data0type.ic
storage/xtradb/include/data0types.h
storage/xtradb/include/db0err.h
storage/xtradb/include/dict0boot.h
storage/xtradb/include/dict0boot.ic
storage/xtradb/include/dict0crea.h
storage/xtradb/include/dict0crea.ic
storage/xtradb/include/dict0dict.h
storage/xtradb/include/dict0dict.ic
storage/xtradb/include/dict0load.h
storage/xtradb/include/dict0load.ic
storage/xtradb/include/dict0mem.h
storage/xtradb/include/dict0mem.ic
storage/xtradb/include/dict0types.h
storage/xtradb/include/dyn0dyn.h
storage/xtradb/include/dyn0dyn.ic
storage/xtradb/include/eval0eval.h
storage/xtradb/include/eval0eval.ic
storage/xtradb/include/eval0proc.h
storage/xtradb/include/eval0proc.ic
storage/xtradb/include/fil0fil.h
storage/xtradb/include/fsp0fsp.h
storage/xtradb/include/fsp0fsp.ic
storage/xtradb/include/fut0fut.h
storage/xtradb/include/fut0fut.ic
storage/xtradb/include/fut0lst.h
storage/xtradb/include/fut0lst.ic
storage/xtradb/include/ha0ha.h
storage/xtradb/include/ha0ha.ic
storage/xtradb/include/ha0storage.h
storage/xtradb/include/ha0storage.ic
storage/xtradb/include/ha_prototypes.h
storage/xtradb/include/handler0alter.h
storage/xtradb/include/hash0hash.h
storage/xtradb/include/hash0hash.ic
storage/xtradb/include/ibuf0ibuf.h
storage/xtradb/include/ibuf0ibuf.ic
storage/xtradb/include/ibuf0types.h
storage/xtradb/include/lock0iter.h
storage/xtradb/include/lock0lock.h
storage/xtradb/include/lock0lock.ic
storage/xtradb/include/lock0priv.h
storage/xtradb/include/lock0priv.ic
storage/xtradb/include/lock0types.h
storage/xtradb/include/log0log.h
storage/xtradb/include/log0log.ic
storage/xtradb/include/log0recv.h
storage/xtradb/include/log0recv.ic
storage/xtradb/include/mach0data.h
storage/xtradb/include/mach0data.ic
storage/xtradb/include/mem0dbg.h
storage/xtradb/include/mem0dbg.ic
storage/xtradb/include/mem0mem.h
storage/xtradb/include/mem0mem.ic
storage/xtradb/include/mem0pool.h
storage/xtradb/include/mem0pool.ic
storage/xtradb/include/mtr0log.h
storage/xtradb/include/mtr0log.ic
storage/xtradb/include/mtr0mtr.h
storage/xtradb/include/mtr0mtr.ic
storage/xtradb/include/mtr0types.h
storage/xtradb/include/mysql_addons.h
storage/xtradb/include/os0file.h
storage/xtradb/include/os0proc.h
storage/xtradb/include/os0proc.ic
storage/xtradb/include/os0sync.h
storage/xtradb/include/os0sync.ic
storage/xtradb/include/os0thread.h
storage/xtradb/include/os0thread.ic
storage/xtradb/include/page0cur.h
storage/xtradb/include/page0cur.ic
storage/xtradb/include/page0page.h
storage/xtradb/include/page0page.ic
storage/xtradb/include/page0types.h
storage/xtradb/include/page0zip.h
storage/xtradb/include/page0zip.ic
storage/xtradb/include/pars0grm.h
storage/xtradb/include/pars0opt.h
storage/xtradb/include/pars0opt.ic
storage/xtradb/include/pars0pars.h
storage/xtradb/include/pars0pars.ic
storage/xtradb/include/pars0sym.h
storage/xtradb/include/pars0sym.ic
storage/xtradb/include/pars0types.h
storage/xtradb/include/que0que.h
storage/xtradb/include/que0que.ic
storage/xtradb/include/que0types.h
storage/xtradb/include/read0read.h
storage/xtradb/include/read0read.ic
storage/xtradb/include/read0types.h
storage/xtradb/include/rem0cmp.h
storage/xtradb/include/rem0cmp.ic
storage/xtradb/include/rem0rec.h
storage/xtradb/include/rem0rec.ic
storage/xtradb/include/rem0types.h
storage/xtradb/include/row0ext.h
storage/xtradb/include/row0ext.ic
storage/xtradb/include/row0ins.h
storage/xtradb/include/row0ins.ic
storage/xtradb/include/row0merge.h
storage/xtradb/include/row0mysql.h
storage/xtradb/include/row0mysql.ic
storage/xtradb/include/row0purge.h
storage/xtradb/include/row0purge.ic
storage/xtradb/include/row0row.h
storage/xtradb/include/row0row.ic
storage/xtradb/include/row0sel.h
storage/xtradb/include/row0sel.ic
storage/xtradb/include/row0types.h
storage/xtradb/include/row0uins.h
storage/xtradb/include/row0uins.ic
storage/xtradb/include/row0umod.h
storage/xtradb/include/row0umod.ic
storage/xtradb/include/row0undo.h
storage/xtradb/include/row0undo.ic
storage/xtradb/include/row0upd.h
storage/xtradb/include/row0upd.ic
storage/xtradb/include/row0vers.h
storage/xtradb/include/row0vers.ic
storage/xtradb/include/srv0que.h
storage/xtradb/include/srv0srv.h
storage/xtradb/include/srv0srv.ic
storage/xtradb/include/srv0start.h
storage/xtradb/include/sync0arr.h
storage/xtradb/include/sync0arr.ic
storage/xtradb/include/sync0rw.h
storage/xtradb/include/sync0rw.ic
storage/xtradb/include/sync0sync.h
storage/xtradb/include/sync0sync.ic
storage/xtradb/include/sync0types.h
storage/xtradb/include/thr0loc.h
storage/xtradb/include/thr0loc.ic
storage/xtradb/include/trx0i_s.h
storage/xtradb/include/trx0purge.h
storage/xtradb/include/trx0purge.ic
storage/xtradb/include/trx0rec.h
storage/xtradb/include/trx0rec.ic
storage/xtradb/include/trx0roll.h
storage/xtradb/include/trx0roll.ic
storage/xtradb/include/trx0rseg.h
storage/xtradb/include/trx0rseg.ic
storage/xtradb/include/trx0sys.h
storage/xtradb/include/trx0sys.ic
storage/xtradb/include/trx0trx.h
storage/xtradb/include/trx0trx.ic
storage/xtradb/include/trx0types.h
storage/xtradb/include/trx0undo.h
storage/xtradb/include/trx0undo.ic
storage/xtradb/include/trx0xa.h
storage/xtradb/include/univ.i
storage/xtradb/include/usr0sess.h
storage/xtradb/include/usr0sess.ic
storage/xtradb/include/usr0types.h
storage/xtradb/include/ut0auxconf.h
storage/xtradb/include/ut0byte.h
storage/xtradb/include/ut0byte.ic
storage/xtradb/include/ut0dbg.h
storage/xtradb/include/ut0list.h
storage/xtradb/include/ut0list.ic
storage/xtradb/include/ut0lst.h
storage/xtradb/include/ut0mem.h
storage/xtradb/include/ut0mem.ic
storage/xtradb/include/ut0rnd.h
storage/xtradb/include/ut0rnd.ic
storage/xtradb/include/ut0sort.h
storage/xtradb/include/ut0ut.h
storage/xtradb/include/ut0ut.ic
storage/xtradb/include/ut0vec.h
storage/xtradb/include/ut0vec.ic
storage/xtradb/include/ut0wqueue.h
storage/xtradb/lock/
storage/xtradb/lock/lock0iter.c
storage/xtradb/lock/lock0lock.c
storage/xtradb/log/
storage/xtradb/log/log0log.c
storage/xtradb/log/log0recv.c
storage/xtradb/mach/
storage/xtradb/mach/mach0data.c
storage/xtradb/mem/
storage/xtradb/mem/mem0dbg.c
storage/xtradb/mem/mem0mem.c
storage/xtradb/mem/mem0pool.c
storage/xtradb/mtr/
storage/xtradb/mtr/mtr0log.c
storage/xtradb/mtr/mtr0mtr.c
storage/xtradb/os/
storage/xtradb/os/os0file.c
storage/xtradb/os/os0proc.c
storage/xtradb/os/os0sync.c
storage/xtradb/os/os0thread.c
storage/xtradb/page/
storage/xtradb/page/page0cur.c
storage/xtradb/page/page0page.c
storage/xtradb/page/page0zip.c
storage/xtradb/pars/
storage/xtradb/pars/lexyy.c
storage/xtradb/pars/make_bison.sh
storage/xtradb/pars/make_flex.sh
storage/xtradb/pars/pars0grm.c
storage/xtradb/pars/pars0grm.y
storage/xtradb/pars/pars0lex.l
storage/xtradb/pars/pars0opt.c
storage/xtradb/pars/pars0pars.c
storage/xtradb/pars/pars0sym.c
storage/xtradb/plug.in
storage/xtradb/que/
storage/xtradb/que/que0que.c
storage/xtradb/read/
storage/xtradb/read/read0read.c
storage/xtradb/rem/
storage/xtradb/rem/rem0cmp.c
storage/xtradb/rem/rem0rec.c
storage/xtradb/row/
storage/xtradb/row/row0ext.c
storage/xtradb/row/row0ins.c
storage/xtradb/row/row0merge.c
storage/xtradb/row/row0mysql.c
storage/xtradb/row/row0purge.c
storage/xtradb/row/row0row.c
storage/xtradb/row/row0sel.c
storage/xtradb/row/row0uins.c
storage/xtradb/row/row0umod.c
storage/xtradb/row/row0undo.c
storage/xtradb/row/row0upd.c
storage/xtradb/row/row0vers.c
storage/xtradb/scripts/
storage/xtradb/scripts/install_innodb_plugins.sql
storage/xtradb/scripts/install_innodb_plugins_win.sql
storage/xtradb/srv/
storage/xtradb/srv/srv0que.c
storage/xtradb/srv/srv0srv.c
storage/xtradb/srv/srv0start.c
storage/xtradb/sync/
storage/xtradb/sync/sync0arr.c
storage/xtradb/sync/sync0rw.c
storage/xtradb/sync/sync0sync.c
storage/xtradb/thr/
storage/xtradb/thr/thr0loc.c
storage/xtradb/trx/
storage/xtradb/trx/trx0i_s.c
storage/xtradb/trx/trx0purge.c
storage/xtradb/trx/trx0rec.c
storage/xtradb/trx/trx0roll.c
storage/xtradb/trx/trx0rseg.c
storage/xtradb/trx/trx0sys.c
storage/xtradb/trx/trx0trx.c
storage/xtradb/trx/trx0undo.c
storage/xtradb/usr/
storage/xtradb/usr/usr0sess.c
storage/xtradb/ut/
storage/xtradb/ut/ut0auxconf.c
storage/xtradb/ut/ut0byte.c
storage/xtradb/ut/ut0dbg.c
storage/xtradb/ut/ut0list.c
storage/xtradb/ut/ut0mem.c
storage/xtradb/ut/ut0rnd.c
storage/xtradb/ut/ut0ut.c
storage/xtradb/ut/ut0vec.c
storage/xtradb/ut/ut0wqueue.c
storage/xtradb/win-plugin/
storage/xtradb/win-plugin/README
storage/xtradb/win-plugin/win-plugin.diff
strings/strmov_overlapp.c
renamed:
storage/innobase/plug.in => storage/innobase/plug.in.disabled
modified:
.bzrignore
CMakeLists.txt
configure.in
include/atomic/nolock.h
include/m_string.h
include/my_sys.h
libmysql/Makefile.shared
libmysqld/CMakeLists.txt
mysql-test/include/mtr_check.sql
mysql-test/include/varchar.inc
mysql-test/lib/mtr_cases.pm
mysql-test/mysql-test-run.pl
mysql-test/r/events_stress.result
mysql-test/r/index_merge_innodb.result
mysql-test/r/information_schema.result
mysql-test/r/information_schema_all_engines.result
mysql-test/r/mysqlbinlog_row_big.result
mysql-test/r/row-checksum-old.result
mysql-test/r/row-checksum.result
mysql-test/r/variables-big.result
mysql-test/t/events_stress.test
mysql-test/t/information_schema.test
mysql-test/t/mysqlbinlog_row_big.test
mysql-test/t/partition_innodb.test
mysql-test/t/type_bit_innodb.test
mysql-test/t/variables-big.test
mysys/mf_iocache2.c
mysys/thr_mutex.c
sql-common/client.c
sql/log_event.cc
sql/log_event.h
sql/rpl_mi.cc
sql/rpl_rli.cc
sql/slave.cc
sql/sql_table.cc
strings/Makefile.am
The size of the diff (225341 lines) is larger than your specified limit of 1000 lines
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
Hi Sergey,
I was looking at the many failures we have in the jaunty-x86-valgrind host in
Buildbot ...
I was wondering if you could install the `libc6-dbg' package?
sudo apt-get install libc6-dbg
Assuming it works as in Ubuntu Hardy, this should give proper symbols in the
Valgrind stack traces, so that the bogus warnings from libc internals are
suppressed (or we can add meaningful suppressions for any remaining problems).
- Kristian.
2
1
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2717)
by knielsen@knielsen-hq.org 03 Aug '09
by knielsen@knielsen-hq.org 03 Aug '09
03 Aug '09
#At lp:maria
2717 knielsen(a)knielsen-hq.org 2009-08-03 [merge]
Merge XtraDB 6 into MariaDB.
modified:
mysql-test/r/events_stress.result
mysql-test/r/information_schema.result
mysql-test/r/information_schema_all_engines.result
mysql-test/r/innodb_bug36169.result
mysql-test/r/innodb_xtradb_bug317074.result
mysql-test/t/events_stress.test
mysql-test/t/innodb-analyze.test
mysql-test/t/innodb_bug36169.test
mysql-test/t/innodb_bug36172.test
mysql-test/t/innodb_xtradb_bug317074.test
storage/xtradb/btr/btr0cur.c
storage/xtradb/btr/btr0sea.c
storage/xtradb/buf/buf0buddy.c
storage/xtradb/buf/buf0buf.c
storage/xtradb/buf/buf0flu.c
storage/xtradb/buf/buf0lru.c
storage/xtradb/buf/buf0rea.c
storage/xtradb/dict/dict0boot.c
storage/xtradb/dict/dict0crea.c
storage/xtradb/dict/dict0dict.c
storage/xtradb/dict/dict0load.c
storage/xtradb/fil/fil0fil.c
storage/xtradb/handler/ha_innodb.cc
storage/xtradb/handler/i_s.cc
storage/xtradb/handler/i_s.h
storage/xtradb/handler/innodb_patch_info.h
storage/xtradb/ibuf/ibuf0ibuf.c
storage/xtradb/include/buf0buddy.h
storage/xtradb/include/buf0buddy.ic
storage/xtradb/include/buf0buf.h
storage/xtradb/include/buf0buf.ic
storage/xtradb/include/buf0flu.ic
storage/xtradb/include/buf0lru.h
storage/xtradb/include/dict0dict.h
storage/xtradb/include/dict0dict.ic
storage/xtradb/include/log0log.h
storage/xtradb/include/rem0cmp.h
storage/xtradb/include/rem0cmp.ic
storage/xtradb/include/srv0srv.h
storage/xtradb/include/sync0sync.h
storage/xtradb/include/univ.i
storage/xtradb/include/ut0auxconf.h
storage/xtradb/log/log0log.c
storage/xtradb/log/log0recv.c
storage/xtradb/mtr/mtr0mtr.c
storage/xtradb/os/os0file.c
storage/xtradb/rem/rem0cmp.c
storage/xtradb/row/row0mysql.c
storage/xtradb/scripts/install_innodb_plugins.sql
storage/xtradb/srv/srv0srv.c
storage/xtradb/srv/srv0start.c
storage/xtradb/sync/sync0sync.c
storage/xtradb/ut/ut0ut.c
=== modified file 'mysql-test/r/events_stress.result'
--- a/mysql-test/r/events_stress.result 2006-09-01 11:08:44 +0000
+++ b/mysql-test/r/events_stress.result 2009-08-03 20:09:53 +0000
@@ -32,6 +32,7 @@ USE events_conn1_test2;
SELECT COUNT(*) FROM INFORMATION_SCHEMA.EVENTS WHERE EVENT_SCHEMA='events_conn1_test2';
COUNT(*)
50
+SET @old_event_scheduler=@@event_scheduler;
SET GLOBAL event_scheduler=on;
DROP DATABASE events_conn1_test2;
SET GLOBAL event_scheduler=off;
@@ -63,3 +64,4 @@ DROP TABLE fill_it1;
DROP TABLE fill_it2;
DROP TABLE fill_it3;
DROP DATABASE events_test;
+SET GLOBAL event_scheduler=@old_event_scheduler;
=== modified file 'mysql-test/r/information_schema.result'
--- a/mysql-test/r/information_schema.result 2009-06-11 17:49:51 +0000
+++ b/mysql-test/r/information_schema.result 2009-08-03 20:09:53 +0000
@@ -61,9 +61,11 @@ INNODB_CMP
INNODB_CMPMEM
INNODB_CMPMEM_RESET
INNODB_CMP_RESET
+INNODB_INDEX_STATS
INNODB_LOCKS
INNODB_LOCK_WAITS
INNODB_RSEG
+INNODB_TABLE_STATS
INNODB_TRX
KEY_COLUMN_USAGE
PARTITIONS
@@ -863,6 +865,8 @@ TABLE_CONSTRAINTS TABLE_NAME select
TABLE_PRIVILEGES TABLE_NAME select
VIEWS TABLE_NAME select
INNODB_BUFFER_POOL_PAGES_INDEX table_name select
+INNODB_INDEX_STATS table_name select
+INNODB_TABLE_STATS table_name select
delete from mysql.user where user='mysqltest_4';
delete from mysql.db where user='mysqltest_4';
flush privileges;
=== modified file 'mysql-test/r/information_schema_all_engines.result'
--- a/mysql-test/r/information_schema_all_engines.result 2009-06-11 12:53:26 +0000
+++ b/mysql-test/r/information_schema_all_engines.result 2009-08-03 20:09:53 +0000
@@ -35,13 +35,15 @@ INNODB_CMP
INNODB_RSEG
XTRADB_ENHANCEMENTS
INNODB_BUFFER_POOL_PAGES_INDEX
-INNODB_BUFFER_POOL_PAGES_BLOB
+INNODB_INDEX_STATS
INNODB_TRX
INNODB_CMP_RESET
INNODB_LOCK_WAITS
INNODB_CMPMEM_RESET
INNODB_LOCKS
INNODB_CMPMEM
+INNODB_TABLE_STATS
+INNODB_BUFFER_POOL_PAGES_BLOB
SELECT t.table_name, c1.column_name
FROM information_schema.tables t
INNER JOIN
@@ -91,13 +93,15 @@ INNODB_CMP page_size
INNODB_RSEG rseg_id
XTRADB_ENHANCEMENTS name
INNODB_BUFFER_POOL_PAGES_INDEX schema_name
-INNODB_BUFFER_POOL_PAGES_BLOB space_id
+INNODB_INDEX_STATS table_name
INNODB_TRX trx_id
INNODB_CMP_RESET page_size
INNODB_LOCK_WAITS requesting_trx_id
INNODB_CMPMEM_RESET page_size
INNODB_LOCKS lock_id
INNODB_CMPMEM page_size
+INNODB_TABLE_STATS table_name
+INNODB_BUFFER_POOL_PAGES_BLOB space_id
SELECT t.table_name, c1.column_name
FROM information_schema.tables t
INNER JOIN
@@ -147,13 +151,15 @@ INNODB_CMP page_size
INNODB_RSEG rseg_id
XTRADB_ENHANCEMENTS name
INNODB_BUFFER_POOL_PAGES_INDEX schema_name
-INNODB_BUFFER_POOL_PAGES_BLOB space_id
+INNODB_INDEX_STATS table_name
INNODB_TRX trx_id
INNODB_CMP_RESET page_size
INNODB_LOCK_WAITS requesting_trx_id
INNODB_CMPMEM_RESET page_size
INNODB_LOCKS lock_id
INNODB_CMPMEM page_size
+INNODB_TABLE_STATS table_name
+INNODB_BUFFER_POOL_PAGES_BLOB space_id
select 1 as f1 from information_schema.tables where "CHARACTER_SETS"=
(select cast(table_name as char) from information_schema.tables
order by table_name limit 1) limit 1;
@@ -192,9 +198,11 @@ INNODB_CMP information_schema.INNODB_CMP
INNODB_CMPMEM information_schema.INNODB_CMPMEM 1
INNODB_CMPMEM_RESET information_schema.INNODB_CMPMEM_RESET 1
INNODB_CMP_RESET information_schema.INNODB_CMP_RESET 1
+INNODB_INDEX_STATS information_schema.INNODB_INDEX_STATS 1
INNODB_LOCKS information_schema.INNODB_LOCKS 1
INNODB_LOCK_WAITS information_schema.INNODB_LOCK_WAITS 1
INNODB_RSEG information_schema.INNODB_RSEG 1
+INNODB_TABLE_STATS information_schema.INNODB_TABLE_STATS 1
INNODB_TRX information_schema.INNODB_TRX 1
KEY_COLUMN_USAGE information_schema.KEY_COLUMN_USAGE 1
PARTITIONS information_schema.PARTITIONS 1
@@ -254,13 +262,15 @@ Database: information_schema
| INNODB_RSEG |
| XTRADB_ENHANCEMENTS |
| INNODB_BUFFER_POOL_PAGES_INDEX |
-| INNODB_BUFFER_POOL_PAGES_BLOB |
+| INNODB_INDEX_STATS |
| INNODB_TRX |
| INNODB_CMP_RESET |
| INNODB_LOCK_WAITS |
| INNODB_CMPMEM_RESET |
| INNODB_LOCKS |
| INNODB_CMPMEM |
+| INNODB_TABLE_STATS |
+| INNODB_BUFFER_POOL_PAGES_BLOB |
+---------------------------------------+
Database: INFORMATION_SCHEMA
+---------------------------------------+
@@ -300,13 +310,15 @@ Database: INFORMATION_SCHEMA
| INNODB_RSEG |
| XTRADB_ENHANCEMENTS |
| INNODB_BUFFER_POOL_PAGES_INDEX |
-| INNODB_BUFFER_POOL_PAGES_BLOB |
+| INNODB_INDEX_STATS |
| INNODB_TRX |
| INNODB_CMP_RESET |
| INNODB_LOCK_WAITS |
| INNODB_CMPMEM_RESET |
| INNODB_LOCKS |
| INNODB_CMPMEM |
+| INNODB_TABLE_STATS |
+| INNODB_BUFFER_POOL_PAGES_BLOB |
+---------------------------------------+
Wildcard: inf_rmation_schema
+--------------------+
@@ -316,5 +328,5 @@ Wildcard: inf_rmation_schema
+--------------------+
SELECT table_schema, count(*) FROM information_schema.TABLES WHERE table_schema IN ('mysql', 'INFORMATION_SCHEMA', 'test', 'mysqltest') AND table_name<>'ndb_binlog_index' AND table_name<>'ndb_apply_status' GROUP BY TABLE_SCHEMA;
table_schema count(*)
-information_schema 41
+information_schema 43
mysql 22
=== modified file 'mysql-test/r/innodb_bug36169.result'
--- a/mysql-test/r/innodb_bug36169.result 2009-06-11 12:53:26 +0000
+++ b/mysql-test/r/innodb_bug36169.result 2009-08-03 20:09:53 +0000
@@ -1,5 +1,5 @@
-SET @save_innodb_file_format=@@global.innodb_file_format;
-SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
-SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
+set @old_innodb_file_per_table=@@innodb_file_per_table;
+set @old_innodb_file_format=@@innodb_file_format;
+set @old_innodb_file_format_check=@@innodb_file_format_check;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
=== modified file 'mysql-test/r/innodb_xtradb_bug317074.result'
--- a/mysql-test/r/innodb_xtradb_bug317074.result 2009-06-11 12:53:26 +0000
+++ b/mysql-test/r/innodb_xtradb_bug317074.result 2009-08-03 20:09:53 +0000
@@ -1,5 +1,5 @@
-SET @save_innodb_file_format=@@global.innodb_file_format;
-SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
-SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
+SET @old_innodb_file_format=@@innodb_file_format;
+SET @old_innodb_file_per_table=@@innodb_file_per_table;
+SET @old_innodb_file_format_check=@@innodb_file_format_check;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
=== modified file 'mysql-test/t/events_stress.test'
--- a/mysql-test/t/events_stress.test 2007-05-26 14:36:38 +0000
+++ b/mysql-test/t/events_stress.test 2009-08-03 20:09:53 +0000
@@ -61,6 +61,7 @@ while ($1)
}
--enable_query_log
SELECT COUNT(*) FROM INFORMATION_SCHEMA.EVENTS WHERE EVENT_SCHEMA='events_conn1_test2';
+SET @old_event_scheduler=@@event_scheduler;
SET GLOBAL event_scheduler=on;
--sleep 2.5
DROP DATABASE events_conn1_test2;
@@ -135,3 +136,6 @@ DROP USER event_user3@localhost;
#
DROP DATABASE events_test;
+
+# Cleanup
+SET GLOBAL event_scheduler=@old_event_scheduler;
=== modified file 'mysql-test/t/innodb-analyze.test'
--- a/mysql-test/t/innodb-analyze.test 2009-06-09 15:08:46 +0000
+++ b/mysql-test/t/innodb-analyze.test 2009-08-03 20:09:53 +0000
@@ -11,7 +11,7 @@
-- disable_result_log
-- enable_warnings
-SET @save_innodb_stats_sample_pages=@@innodb_stats_sample_pages;
+SET @old_innodb_stats_sample_pages=@@innodb_stats_sample_pages;
SET GLOBAL innodb_stats_sample_pages=0;
# check that the value has been adjusted to 1
@@ -61,5 +61,5 @@ ANALYZE TABLE innodb_analyze;
SET GLOBAL innodb_stats_sample_pages=16;
ANALYZE TABLE innodb_analyze;
-SET GLOBAL innodb_stats_sample_pages=@save_innodb_stats_sample_pages;
DROP TABLE innodb_analyze;
+SET GLOBAL innodb_stats_sample_pages=@old_innodb_stats_sample_pages;
=== modified file 'mysql-test/t/innodb_bug36169.test'
--- a/mysql-test/t/innodb_bug36169.test 2009-06-11 12:53:26 +0000
+++ b/mysql-test/t/innodb_bug36169.test 2009-08-03 20:09:53 +0000
@@ -4,10 +4,10 @@
#
-- source include/have_innodb.inc
+set @old_innodb_file_per_table=@@innodb_file_per_table;
+set @old_innodb_file_format=@@innodb_file_format;
+set @old_innodb_file_format_check=@@innodb_file_format_check;
-SET @save_innodb_file_format=@@global.innodb_file_format;
-SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
-SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
@@ -1148,10 +1148,6 @@ KEY `idx44` (`col176`(100),`col42`,`col7
KEY `idx45` (`col2`(27),`col27`(116))
)engine=innodb ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
-SET GLOBAL innodb_file_format=@save_innodb_file_format;
-SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
-SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
-
DROP TABLE IF EXISTS table0;
DROP TABLE IF EXISTS table1;
DROP TABLE IF EXISTS table2;
@@ -1160,3 +1156,7 @@ DROP TABLE IF EXISTS table4;
DROP TABLE IF EXISTS table5;
DROP TABLE IF EXISTS table6;
+set global innodb_file_per_table=@old_innodb_file_per_table;
+set global innodb_file_format=@old_innodb_file_format;
+set global innodb_file_format_check=@old_innodb_file_format_check;
+
=== modified file 'mysql-test/t/innodb_bug36172.test'
--- a/mysql-test/t/innodb_bug36172.test 2009-06-11 12:53:26 +0000
+++ b/mysql-test/t/innodb_bug36172.test 2009-08-03 20:09:53 +0000
@@ -13,10 +13,10 @@ SET storage_engine=InnoDB;
-- disable_query_log
-- disable_result_log
+set @old_innodb_file_per_table=@@innodb_file_per_table;
+set @old_innodb_file_format=@@innodb_file_format;
+set @old_innodb_file_format_check=@@innodb_file_format_check;
-SET @save_innodb_file_format=@@global.innodb_file_format;
-SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
-SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=on;
@@ -27,7 +27,8 @@ CHECK TABLE table0 EXTENDED;
INSERT IGNORE INTO `table0` SET `col19` = '19940127002709', `col20` = 2383927.9055146948, `col21` = 4293243420.5621204000, `col22` = '20511211123705', `col23` = 4289899778.6573381000, `col24` = 4293449279.0540481000, `col25` = 'emphysemic', `col26` = 'dentally', `col27` = '2347406', `col28` = 'eruct', `col30` = 1222, `col31` = 4294372994.9941406000, `col32` = 4291385574.1173744000, `col33` = 'borrowing\'s', `col34` = 'septics', `col35` = 'ratter\'s', `col36` = 'Kaye', `col37` = 'Florentia', `col38` = 'allium', `col39` = 'barkeep', `col40` = '19510407003441', `col41` = 4293559200.4215522000, `col42` = 22482, `col43` = 'decussate', `col44` = 'Brom\'s', `col45` = 'violated', `col46` = 4925506.4635456400, `col47` = 930549, `col48` = '51296066', `col49` = 'voluminously', `col50` = '29306676', `col51` = -88, `col52` = -2153690, `col53` = 4290250202.1464887000, `col54` = 'expropriation', `col55` = 'Aberdeen\'s', `col56` = 20343, `col58` = '19640415171532', `col59` = 'extern', `col60` = 'Ubana', `col61` = 4290487961.8539081000, `col62` = '2147', `col63` = -24271, `col64` = '20750801194548', `col65` = 'Cunaxa\'s', `col66` = 'pasticcio', `col67` = 2795817, `col68` = 'Indore\'s', `col70` = 6864127, `col71` = '1817832', `col72` = '20540506114211', `col73` = '20040101012300', `col74` = 'rationalized', `col75` = '45522', `col76` = 'indene', `col77` = -6964559, `col78` = 4247535.5266884370, `col79` = '20720416124357', `col80` = '2143', `col81` = 4292060102.4466386000, `col82` = 'striving', `col83` = 'boneblack\'s', `col84` = 'redolent', `col85` = 6489697.9009369183, `col86` = 4287473465.9731131000, `col87` = 7726015, `col88` = 'perplexed', `col89` = '17153791', `col90` = 5478587.1108127078, `col91` = 4287091404.7004304000, `col92` = 'Boulez\'s', `col93` = '2931278';
CHECK TABLE table0 EXTENDED;
-SET GLOBAL innodb_file_format=@save_innodb_file_format;
-SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
-SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
DROP TABLE table0;
+set global innodb_file_per_table=@old_innodb_file_per_table;
+set global innodb_file_format=@old_innodb_file_format;
+set global innodb_file_format_check=@old_innodb_file_format_check;
+
=== modified file 'mysql-test/t/innodb_xtradb_bug317074.test'
--- a/mysql-test/t/innodb_xtradb_bug317074.test 2009-06-11 12:53:26 +0000
+++ b/mysql-test/t/innodb_xtradb_bug317074.test 2009-08-03 20:09:53 +0000
@@ -1,8 +1,8 @@
-- source include/have_innodb.inc
-SET @save_innodb_file_format=@@global.innodb_file_format;
-SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
-SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
+SET @old_innodb_file_format=@@innodb_file_format;
+SET @old_innodb_file_per_table=@@innodb_file_per_table;
+SET @old_innodb_file_format_check=@@innodb_file_format_check;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
@@ -38,8 +38,7 @@ DROP PROCEDURE insert_many;
# The bug is hangup at the following statement
ALTER TABLE test1 ENGINE=MyISAM;
-SET GLOBAL innodb_file_format=@save_innodb_file_format;
-SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
-SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
-
DROP TABLE test1;
+SET GLOBAL innodb_file_format=@old_innodb_file_format;
+SET GLOBAL innodb_file_per_table=@old_innodb_file_per_table;
+SET GLOBAL innodb_file_format_check=@old_innodb_file_format_check;
=== modified file 'storage/xtradb/btr/btr0cur.c'
--- a/storage/xtradb/btr/btr0cur.c 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/btr/btr0cur.c 2009-06-25 01:43:25 +0000
@@ -3202,7 +3202,9 @@ btr_estimate_number_of_different_key_val
ulint n_cols;
ulint matched_fields;
ulint matched_bytes;
+ ib_int64_t n_recs = 0;
ib_int64_t* n_diff;
+ ib_int64_t* n_not_nulls;
ullint n_sample_pages; /* number of pages to sample */
ulint not_empty_flag = 0;
ulint total_external_size = 0;
@@ -3215,6 +3217,7 @@ btr_estimate_number_of_different_key_val
ulint offsets_next_rec_[REC_OFFS_NORMAL_SIZE];
ulint* offsets_rec = offsets_rec_;
ulint* offsets_next_rec= offsets_next_rec_;
+ ulint stats_method = srv_stats_method;
rec_offs_init(offsets_rec_);
rec_offs_init(offsets_next_rec_);
@@ -3222,6 +3225,10 @@ btr_estimate_number_of_different_key_val
n_diff = mem_zalloc((n_cols + 1) * sizeof(ib_int64_t));
+ if (stats_method == SRV_STATS_METHOD_IGNORE_NULLS) {
+ n_not_nulls = mem_zalloc((n_cols + 1) * sizeof(ib_int64_t));
+ }
+
/* It makes no sense to test more pages than are contained
in the index, thus we lower the number if it is too high */
if (srv_stats_sample_pages > index->stat_index_size) {
@@ -3260,6 +3267,20 @@ btr_estimate_number_of_different_key_val
}
while (rec != supremum) {
+ /* count recs */
+ if (stats_method == SRV_STATS_METHOD_IGNORE_NULLS) {
+ n_recs++;
+ for (j = 0; j <= n_cols; j++) {
+ ulint f_len;
+ rec_get_nth_field(rec, offsets_rec,
+ j, &f_len);
+ if (f_len == UNIV_SQL_NULL)
+ break;
+
+ n_not_nulls[j]++;
+ }
+ }
+
rec_t* next_rec = page_rec_get_next(rec);
if (next_rec == supremum) {
break;
@@ -3274,7 +3295,7 @@ btr_estimate_number_of_different_key_val
cmp_rec_rec_with_match(rec, next_rec,
offsets_rec, offsets_next_rec,
index, &matched_fields,
- &matched_bytes);
+ &matched_bytes, srv_stats_method);
for (j = matched_fields + 1; j <= n_cols; j++) {
/* We add one if this index record has
@@ -3359,9 +3380,21 @@ btr_estimate_number_of_different_key_val
}
index->stat_n_diff_key_vals[j] += add_on;
+
+ /* revision for 'nulls_ignored' */
+ if (stats_method == SRV_STATS_METHOD_IGNORE_NULLS) {
+ if (!n_not_nulls[j])
+ n_not_nulls[j] = 1;
+ index->stat_n_diff_key_vals[j] =
+ index->stat_n_diff_key_vals[j] * n_recs
+ / n_not_nulls[j];
+ }
}
mem_free(n_diff);
+ if (stats_method == SRV_STATS_METHOD_IGNORE_NULLS) {
+ mem_free(n_not_nulls);
+ }
if (UNIV_LIKELY_NULL(heap)) {
mem_heap_free(heap);
}
@@ -3733,7 +3766,8 @@ btr_blob_free(
mtr_commit(mtr);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
mutex_enter(&block->mutex);
/* Only free the block if it is still allocated to
@@ -3744,17 +3778,22 @@ btr_blob_free(
&& buf_block_get_space(block) == space
&& buf_block_get_page_no(block) == page_no) {
- if (buf_LRU_free_block(&block->page, all, NULL)
+ if (buf_LRU_free_block(&block->page, all, NULL, TRUE)
!= BUF_LRU_FREED
- && all && block->page.zip.data) {
+ && all && block->page.zip.data
+ /* Now, buf_LRU_free_block() may release mutex temporarily */
+ && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE
+ && buf_block_get_space(block) == space
+ && buf_block_get_page_no(block) == page_no) {
/* Attempt to deallocate the uncompressed page
if the whole block cannot be deallocted. */
- buf_LRU_free_block(&block->page, FALSE, NULL);
+ buf_LRU_free_block(&block->page, FALSE, NULL, TRUE);
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
mutex_exit(&block->mutex);
}
=== modified file 'storage/xtradb/btr/btr0sea.c'
--- a/storage/xtradb/btr/btr0sea.c 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/btr/btr0sea.c 2009-06-25 01:43:25 +0000
@@ -1731,7 +1731,8 @@ btr_search_validate(void)
rec_offs_init(offsets_);
rw_lock_x_lock(&btr_search_latch);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_x_lock(&page_hash_latch);
cell_count = hash_get_n_cells(btr_search_sys->hash_index);
@@ -1739,11 +1740,13 @@ btr_search_validate(void)
/* We release btr_search_latch every once in a while to
give other queries a chance to run. */
if ((i != 0) && ((i % chunk_size) == 0)) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_x_unlock(&page_hash_latch);
rw_lock_x_unlock(&btr_search_latch);
os_thread_yield();
rw_lock_x_lock(&btr_search_latch);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_x_lock(&page_hash_latch);
}
node = hash_get_nth_cell(btr_search_sys->hash_index, i)->node;
@@ -1850,11 +1853,13 @@ btr_search_validate(void)
/* We release btr_search_latch every once in a while to
give other queries a chance to run. */
if (i != 0) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_x_unlock(&page_hash_latch);
rw_lock_x_unlock(&btr_search_latch);
os_thread_yield();
rw_lock_x_lock(&btr_search_latch);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_x_lock(&page_hash_latch);
}
if (!ha_validate(btr_search_sys->hash_index, i, end_index)) {
@@ -1862,7 +1867,8 @@ btr_search_validate(void)
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_x_unlock(&page_hash_latch);
rw_lock_x_unlock(&btr_search_latch);
if (UNIV_LIKELY_NULL(heap)) {
mem_heap_free(heap);
=== modified file 'storage/xtradb/buf/buf0buddy.c'
--- a/storage/xtradb/buf/buf0buddy.c 2009-05-04 04:32:30 +0000
+++ b/storage/xtradb/buf/buf0buddy.c 2009-06-25 01:43:25 +0000
@@ -82,7 +82,7 @@ buf_buddy_add_to_free(
#endif /* UNIV_DEBUG_VALGRIND */
ut_ad(buf_pool->zip_free[i].start != bpage);
- UT_LIST_ADD_FIRST(list, buf_pool->zip_free[i], bpage);
+ UT_LIST_ADD_FIRST(zip_list, buf_pool->zip_free[i], bpage);
#ifdef UNIV_DEBUG_VALGRIND
if (b) UNIV_MEM_FREE(b, BUF_BUDDY_LOW << i);
@@ -100,8 +100,8 @@ buf_buddy_remove_from_free(
ulint i) /* in: index of buf_pool->zip_free[] */
{
#ifdef UNIV_DEBUG_VALGRIND
- buf_page_t* prev = UT_LIST_GET_PREV(list, bpage);
- buf_page_t* next = UT_LIST_GET_NEXT(list, bpage);
+ buf_page_t* prev = UT_LIST_GET_PREV(zip_list, bpage);
+ buf_page_t* next = UT_LIST_GET_NEXT(zip_list, bpage);
if (prev) UNIV_MEM_VALID(prev, BUF_BUDDY_LOW << i);
if (next) UNIV_MEM_VALID(next, BUF_BUDDY_LOW << i);
@@ -111,7 +111,7 @@ buf_buddy_remove_from_free(
#endif /* UNIV_DEBUG_VALGRIND */
ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_FREE);
- UT_LIST_REMOVE(list, buf_pool->zip_free[i], bpage);
+ UT_LIST_REMOVE(zip_list, buf_pool->zip_free[i], bpage);
#ifdef UNIV_DEBUG_VALGRIND
if (prev) UNIV_MEM_FREE(prev, BUF_BUDDY_LOW << i);
@@ -131,12 +131,13 @@ buf_buddy_alloc_zip(
{
buf_page_t* bpage;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&zip_free_mutex));
ut_a(i < BUF_BUDDY_SIZES);
#if defined UNIV_DEBUG && !defined UNIV_DEBUG_VALGRIND
/* Valgrind would complain about accessing free memory. */
- UT_LIST_VALIDATE(list, buf_page_t, buf_pool->zip_free[i]);
+ UT_LIST_VALIDATE(zip_list, buf_page_t, buf_pool->zip_free[i]);
#endif /* UNIV_DEBUG && !UNIV_DEBUG_VALGRIND */
bpage = UT_LIST_GET_LAST(buf_pool->zip_free[i]);
@@ -177,16 +178,19 @@ static
void
buf_buddy_block_free(
/*=================*/
- void* buf) /* in: buffer frame to deallocate */
+ void* buf, /* in: buffer frame to deallocate */
+ ibool have_page_hash_mutex)
{
const ulint fold = BUF_POOL_ZIP_FOLD_PTR(buf);
buf_page_t* bpage;
buf_block_t* block;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(!mutex_own(&buf_pool_zip_mutex));
ut_a(!ut_align_offset(buf, UNIV_PAGE_SIZE));
+ mutex_enter(&zip_hash_mutex);
+
HASH_SEARCH(hash, buf_pool->zip_hash, fold, buf_page_t*, bpage,
ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_MEMORY
&& bpage->in_zip_hash && !bpage->in_page_hash),
@@ -198,12 +202,14 @@ buf_buddy_block_free(
ut_d(bpage->in_zip_hash = FALSE);
HASH_DELETE(buf_page_t, hash, buf_pool->zip_hash, fold, bpage);
+ mutex_exit(&zip_hash_mutex);
+
ut_d(memset(buf, 0, UNIV_PAGE_SIZE));
UNIV_MEM_INVALID(buf, UNIV_PAGE_SIZE);
block = (buf_block_t*) bpage;
mutex_enter(&block->mutex);
- buf_LRU_block_free_non_file_page(block);
+ buf_LRU_block_free_non_file_page(block, have_page_hash_mutex);
mutex_exit(&block->mutex);
ut_ad(buf_buddy_n_frames > 0);
@@ -219,7 +225,7 @@ buf_buddy_block_register(
buf_block_t* block) /* in: buffer frame to allocate */
{
const ulint fold = BUF_POOL_ZIP_FOLD(block);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(!mutex_own(&buf_pool_zip_mutex));
buf_block_set_state(block, BUF_BLOCK_MEMORY);
@@ -230,7 +236,10 @@ buf_buddy_block_register(
ut_ad(!block->page.in_page_hash);
ut_ad(!block->page.in_zip_hash);
ut_d(block->page.in_zip_hash = TRUE);
+
+ mutex_enter(&zip_hash_mutex);
HASH_INSERT(buf_page_t, hash, buf_pool->zip_hash, fold, &block->page);
+ mutex_exit(&zip_hash_mutex);
ut_d(buf_buddy_n_frames++);
}
@@ -264,7 +273,7 @@ buf_buddy_alloc_from(
bpage->state = BUF_BLOCK_ZIP_FREE;
#if defined UNIV_DEBUG && !defined UNIV_DEBUG_VALGRIND
/* Valgrind would complain about accessing free memory. */
- UT_LIST_VALIDATE(list, buf_page_t, buf_pool->zip_free[j]);
+ UT_LIST_VALIDATE(zip_list, buf_page_t, buf_pool->zip_free[j]);
#endif /* UNIV_DEBUG && !UNIV_DEBUG_VALGRIND */
buf_buddy_add_to_free(bpage, j);
}
@@ -284,24 +293,28 @@ buf_buddy_alloc_low(
possibly NULL if lru==NULL */
ulint i, /* in: index of buf_pool->zip_free[],
or BUF_BUDDY_SIZES */
- ibool* lru) /* in: pointer to a variable that will be assigned
+ ibool* lru, /* in: pointer to a variable that will be assigned
TRUE if storage was allocated from the LRU list
and buf_pool_mutex was temporarily released,
or NULL if the LRU list should not be used */
+ ibool have_page_hash_mutex)
{
buf_block_t* block;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(!mutex_own(&buf_pool_zip_mutex));
if (i < BUF_BUDDY_SIZES) {
/* Try to allocate from the buddy system. */
+ mutex_enter(&zip_free_mutex);
block = buf_buddy_alloc_zip(i);
if (block) {
goto func_exit;
}
+
+ mutex_exit(&zip_free_mutex);
}
/* Try allocating from the buf_pool->free list. */
@@ -318,18 +331,29 @@ buf_buddy_alloc_low(
}
/* Try replacing an uncompressed page in the buffer pool. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ if (have_page_hash_mutex) {
+ rw_lock_x_unlock(&page_hash_latch);
+ }
block = buf_LRU_get_free_block(0);
*lru = TRUE;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ if (have_page_hash_mutex) {
+ rw_lock_x_lock(&page_hash_latch);
+ }
alloc_big:
buf_buddy_block_register(block);
+ mutex_enter(&zip_free_mutex);
block = buf_buddy_alloc_from(block->frame, i, BUF_BUDDY_SIZES);
func_exit:
buf_buddy_stat[i].used++;
+ mutex_exit(&zip_free_mutex);
+
return(block);
}
@@ -345,7 +369,10 @@ buf_buddy_relocate_block(
{
buf_page_t* b;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX));
+#endif
switch (buf_page_get_state(bpage)) {
case BUF_BLOCK_ZIP_FREE:
@@ -354,7 +381,7 @@ buf_buddy_relocate_block(
case BUF_BLOCK_FILE_PAGE:
case BUF_BLOCK_MEMORY:
case BUF_BLOCK_REMOVE_HASH:
- ut_error;
+ /* ut_error; */ /* optimistic */
case BUF_BLOCK_ZIP_DIRTY:
/* Cannot relocate dirty pages. */
return(FALSE);
@@ -364,9 +391,17 @@ buf_buddy_relocate_block(
}
mutex_enter(&buf_pool_zip_mutex);
+ mutex_enter(&zip_free_mutex);
if (!buf_page_can_relocate(bpage)) {
mutex_exit(&buf_pool_zip_mutex);
+ mutex_exit(&zip_free_mutex);
+ return(FALSE);
+ }
+
+ if (bpage != buf_page_hash_get(bpage->space, bpage->offset)) {
+ mutex_exit(&buf_pool_zip_mutex);
+ mutex_exit(&zip_free_mutex);
return(FALSE);
}
@@ -374,16 +409,19 @@ buf_buddy_relocate_block(
ut_d(bpage->state = BUF_BLOCK_ZIP_FREE);
/* relocate buf_pool->zip_clean */
- b = UT_LIST_GET_PREV(list, dpage);
- UT_LIST_REMOVE(list, buf_pool->zip_clean, dpage);
+ mutex_enter(&flush_list_mutex);
+ b = UT_LIST_GET_PREV(zip_list, dpage);
+ UT_LIST_REMOVE(zip_list, buf_pool->zip_clean, dpage);
if (b) {
- UT_LIST_INSERT_AFTER(list, buf_pool->zip_clean, b, dpage);
+ UT_LIST_INSERT_AFTER(zip_list, buf_pool->zip_clean, b, dpage);
} else {
- UT_LIST_ADD_FIRST(list, buf_pool->zip_clean, dpage);
+ UT_LIST_ADD_FIRST(zip_list, buf_pool->zip_clean, dpage);
}
+ mutex_exit(&flush_list_mutex);
mutex_exit(&buf_pool_zip_mutex);
+ mutex_exit(&zip_free_mutex);
return(TRUE);
}
@@ -396,13 +434,15 @@ buf_buddy_relocate(
/* out: TRUE if relocated */
void* src, /* in: block to relocate */
void* dst, /* in: free block to relocate to */
- ulint i) /* in: index of buf_pool->zip_free[] */
+ ulint i, /* in: index of buf_pool->zip_free[] */
+ ibool have_page_hash_mutex)
{
buf_page_t* bpage;
const ulint size = BUF_BUDDY_LOW << i;
ullint usec = ut_time_us(NULL);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&zip_free_mutex));
ut_ad(!mutex_own(&buf_pool_zip_mutex));
ut_ad(!ut_align_offset(src, size));
ut_ad(!ut_align_offset(dst, size));
@@ -421,9 +461,16 @@ buf_buddy_relocate(
actually is a properly initialized buf_page_t object. */
if (size >= PAGE_ZIP_MIN_SIZE) {
+ if (!have_page_hash_mutex)
+ mutex_exit(&zip_free_mutex);
+
/* This is a compressed page. */
mutex_t* mutex;
+ if (!have_page_hash_mutex) {
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ }
/* The src block may be split into smaller blocks,
some of which may be free. Thus, the
mach_read_from_4() calls below may attempt to read
@@ -444,6 +491,11 @@ buf_buddy_relocate(
added to buf_pool->page_hash yet. Obviously,
it cannot be relocated. */
+ if (!have_page_hash_mutex) {
+ mutex_enter(&zip_free_mutex);
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
return(FALSE);
}
@@ -453,16 +505,32 @@ buf_buddy_relocate(
For the sake of simplicity, give up. */
ut_ad(page_zip_get_size(&bpage->zip) < size);
+ if (!have_page_hash_mutex) {
+ mutex_enter(&zip_free_mutex);
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
return(FALSE);
}
+ /* To keep latch order */
+ if (have_page_hash_mutex)
+ mutex_exit(&zip_free_mutex);
+
/* The block must have been allocated, but it may
contain uninitialized data. */
UNIV_MEM_ASSERT_W(src, size);
mutex = buf_page_get_mutex(bpage);
+retry_lock:
mutex_enter(mutex);
+ if (mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(mutex);
+ mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
+ mutex_enter(&zip_free_mutex);
if (buf_page_can_relocate(bpage)) {
/* Relocate the compressed page. */
@@ -479,17 +547,48 @@ success:
buddy_stat->relocated_usec
+= ut_time_us(NULL) - usec;
}
+
+ if (!have_page_hash_mutex) {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
return(TRUE);
}
+ if (!have_page_hash_mutex) {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
+
mutex_exit(mutex);
} else if (i == buf_buddy_get_slot(sizeof(buf_page_t))) {
/* This must be a buf_page_t object. */
UNIV_MEM_ASSERT_RW(src, size);
+
+ mutex_exit(&zip_free_mutex);
+
+ if (!have_page_hash_mutex) {
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ }
+
if (buf_buddy_relocate_block(src, dst)) {
+ mutex_enter(&zip_free_mutex);
+
+ if (!have_page_hash_mutex) {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
goto success;
}
+
+ mutex_enter(&zip_free_mutex);
+
+ if (!have_page_hash_mutex) {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
}
return(FALSE);
@@ -503,12 +602,14 @@ buf_buddy_free_low(
/*===============*/
void* buf, /* in: block to be freed, must not be
pointed to by the buffer pool */
- ulint i) /* in: index of buf_pool->zip_free[] */
+ ulint i, /* in: index of buf_pool->zip_free[] */
+ ibool have_page_hash_mutex)
{
buf_page_t* bpage;
buf_page_t* buddy;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&zip_free_mutex));
ut_ad(!mutex_own(&buf_pool_zip_mutex));
ut_ad(i <= BUF_BUDDY_SIZES);
ut_ad(buf_buddy_stat[i].used > 0);
@@ -519,7 +620,9 @@ recombine:
ut_d(((buf_page_t*) buf)->state = BUF_BLOCK_ZIP_FREE);
if (i == BUF_BUDDY_SIZES) {
- buf_buddy_block_free(buf);
+ mutex_exit(&zip_free_mutex);
+ buf_buddy_block_free(buf, have_page_hash_mutex);
+ mutex_enter(&zip_free_mutex);
return;
}
@@ -564,7 +667,7 @@ buddy_free2:
ut_a(bpage != buf);
{
- buf_page_t* next = UT_LIST_GET_NEXT(list, bpage);
+ buf_page_t* next = UT_LIST_GET_NEXT(zip_list, bpage);
UNIV_MEM_ASSERT_AND_FREE(bpage, BUF_BUDDY_LOW << i);
bpage = next;
}
@@ -573,11 +676,11 @@ buddy_free2:
#ifndef UNIV_DEBUG_VALGRIND
buddy_nonfree:
/* Valgrind would complain about accessing free memory. */
- ut_d(UT_LIST_VALIDATE(list, buf_page_t, buf_pool->zip_free[i]));
+ ut_d(UT_LIST_VALIDATE(zip_list, buf_page_t, buf_pool->zip_free[i]));
#endif /* UNIV_DEBUG_VALGRIND */
/* The buddy is not free. Is there a free block of this size? */
- bpage = UT_LIST_GET_FIRST(buf_pool->zip_free[i]);
+ bpage = UT_LIST_GET_LAST(buf_pool->zip_free[i]);
if (bpage) {
/* Remove the block from the free list, because a successful
@@ -587,7 +690,7 @@ buddy_nonfree:
buf_buddy_remove_from_free(bpage, i);
/* Try to relocate the buddy of buf to the free block. */
- if (buf_buddy_relocate(buddy, bpage, i)) {
+ if (buf_buddy_relocate(buddy, bpage, i, have_page_hash_mutex)) {
ut_d(buddy->state = BUF_BLOCK_ZIP_FREE);
goto buddy_free2;
@@ -608,14 +711,14 @@ buddy_nonfree:
(Parts of the buddy can be free in
buf_pool->zip_free[j] with j < i.)*/
for (b = UT_LIST_GET_FIRST(buf_pool->zip_free[i]);
- b; b = UT_LIST_GET_NEXT(list, b)) {
+ b; b = UT_LIST_GET_NEXT(zip_list, b)) {
ut_a(b != buddy);
}
}
#endif /* UNIV_DEBUG && !UNIV_DEBUG_VALGRIND */
- if (buf_buddy_relocate(buddy, buf, i)) {
+ if (buf_buddy_relocate(buddy, buf, i, have_page_hash_mutex)) {
buf = bpage;
UNIV_MEM_VALID(bpage, BUF_BUDDY_LOW << i);
=== modified file 'storage/xtradb/buf/buf0buf.c'
--- a/storage/xtradb/buf/buf0buf.c 2009-05-04 04:32:30 +0000
+++ b/storage/xtradb/buf/buf0buf.c 2009-06-25 01:43:25 +0000
@@ -244,6 +244,12 @@ UNIV_INTERN buf_pool_t* buf_pool = NULL;
/* mutex protecting the buffer pool struct and control blocks, except the
read-write lock in them */
UNIV_INTERN mutex_t buf_pool_mutex;
+UNIV_INTERN mutex_t LRU_list_mutex;
+UNIV_INTERN mutex_t flush_list_mutex;
+UNIV_INTERN rw_lock_t page_hash_latch;
+UNIV_INTERN mutex_t free_list_mutex;
+UNIV_INTERN mutex_t zip_free_mutex;
+UNIV_INTERN mutex_t zip_hash_mutex;
/* mutex protecting the control blocks of compressed-only pages
(of type buf_page_t, not buf_block_t) */
UNIV_INTERN mutex_t buf_pool_zip_mutex;
@@ -664,9 +670,9 @@ buf_block_init(
block->page.in_zip_hash = FALSE;
block->page.in_flush_list = FALSE;
block->page.in_free_list = FALSE;
- block->in_unzip_LRU_list = FALSE;
#endif /* UNIV_DEBUG */
block->page.in_LRU_list = FALSE;
+ block->in_unzip_LRU_list = FALSE;
#if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
block->n_pointers = 0;
#endif /* UNIV_AHI_DEBUG || UNIV_DEBUG */
@@ -751,8 +757,10 @@ buf_chunk_init(
memset(block->frame, '\0', UNIV_PAGE_SIZE);
#endif
/* Add the block to the free list */
- UT_LIST_ADD_LAST(list, buf_pool->free, (&block->page));
+ mutex_enter(&free_list_mutex);
+ UT_LIST_ADD_LAST(free, buf_pool->free, (&block->page));
ut_d(block->page.in_free_list = TRUE);
+ mutex_exit(&free_list_mutex);
block++;
frame += UNIV_PAGE_SIZE;
@@ -778,7 +786,7 @@ buf_chunk_contains_zip(
ulint i;
ut_ad(buf_pool);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
block = chunk->blocks;
@@ -832,7 +840,7 @@ buf_chunk_not_freed(
ulint i;
ut_ad(buf_pool);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own()); /*optimistic...*/
block = chunk->blocks;
@@ -865,7 +873,7 @@ buf_chunk_all_free(
ulint i;
ut_ad(buf_pool);
- ut_ad(buf_pool_mutex_own());
+ ut_ad(buf_pool_mutex_own()); /* but we need all mutex here */
block = chunk->blocks;
@@ -891,7 +899,7 @@ buf_chunk_free(
buf_block_t* block;
const buf_block_t* block_end;
- ut_ad(buf_pool_mutex_own());
+ ut_ad(buf_pool_mutex_own()); /* but we need all mutex here */
block_end = chunk->blocks + chunk->size;
@@ -903,8 +911,10 @@ buf_chunk_free(
ut_ad(!block->in_unzip_LRU_list);
ut_ad(!block->page.in_flush_list);
/* Remove the block from the free list. */
+ mutex_enter(&free_list_mutex);
ut_ad(block->page.in_free_list);
- UT_LIST_REMOVE(list, buf_pool->free, (&block->page));
+ UT_LIST_REMOVE(free, buf_pool->free, (&block->page));
+ mutex_exit(&free_list_mutex);
/* Free the latches. */
mutex_free(&block->mutex);
@@ -935,8 +945,17 @@ buf_pool_init(void)
/* 1. Initialize general fields
------------------------------- */
mutex_create(&buf_pool_mutex, SYNC_BUF_POOL);
+ mutex_create(&LRU_list_mutex, SYNC_BUF_LRU_LIST);
+ mutex_create(&flush_list_mutex, SYNC_BUF_FLUSH_LIST);
+ rw_lock_create(&page_hash_latch, SYNC_BUF_PAGE_HASH);
+ mutex_create(&free_list_mutex, SYNC_BUF_FREE_LIST);
+ mutex_create(&zip_free_mutex, SYNC_BUF_ZIP_FREE);
+ mutex_create(&zip_hash_mutex, SYNC_BUF_ZIP_HASH);
+
mutex_create(&buf_pool_zip_mutex, SYNC_BUF_BLOCK);
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
buf_pool_mutex_enter();
buf_pool->n_chunks = 1;
@@ -973,6 +992,8 @@ buf_pool_init(void)
--------------------------- */
/* All fields are initialized by mem_zalloc(). */
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
buf_pool_mutex_exit();
btr_search_sys_create(buf_pool->curr_size
@@ -1105,7 +1126,11 @@ buf_relocate(
buf_page_t* b;
ulint fold;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX));
+#endif
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
ut_a(bpage->buf_fix_count == 0);
@@ -1186,7 +1211,8 @@ buf_pool_shrink(
try_again:
btr_search_disable(); /* Empty the adaptive hash index again */
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
shrink_again:
if (buf_pool->n_chunks <= 1) {
@@ -1257,7 +1283,7 @@ shrink_again:
buf_LRU_make_block_old(&block->page);
dirty++;
- } else if (buf_LRU_free_block(&block->page, TRUE, NULL)
+ } else if (buf_LRU_free_block(&block->page, TRUE, NULL, FALSE)
!= BUF_LRU_FREED) {
nonfree++;
}
@@ -1265,7 +1291,8 @@ shrink_again:
mutex_exit(&block->mutex);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
/* Request for a flush of the chunk if it helps.
Do not flush if there are non-free blocks, since
@@ -1314,7 +1341,8 @@ shrink_again:
func_done:
srv_buf_pool_old_size = srv_buf_pool_size;
func_exit:
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
btr_search_enable();
}
@@ -1332,7 +1360,11 @@ buf_pool_page_hash_rebuild(void)
hash_table_t* zip_hash;
buf_page_t* b;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ mutex_enter(&flush_list_mutex);
+
/* Free, create, and populate the hash table. */
hash_table_free(buf_pool->page_hash);
@@ -1374,7 +1406,7 @@ buf_pool_page_hash_rebuild(void)
in buf_pool->flush_list. */
for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(zip_list, b)) {
ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
ut_ad(!b->in_flush_list);
ut_ad(b->in_LRU_list);
@@ -1386,7 +1418,7 @@ buf_pool_page_hash_rebuild(void)
}
for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(flush_list, b)) {
ut_ad(b->in_flush_list);
ut_ad(b->in_LRU_list);
ut_ad(b->in_page_hash);
@@ -1412,7 +1444,10 @@ buf_pool_page_hash_rebuild(void)
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ mutex_exit(&flush_list_mutex);
}
/************************************************************************
@@ -1422,17 +1457,20 @@ void
buf_pool_resize(void)
/*=================*/
{
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
if (srv_buf_pool_old_size == srv_buf_pool_size) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
return;
}
if (srv_buf_pool_curr_size + 1048576 > srv_buf_pool_size) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
/* Disable adaptive hash indexes and empty the index
in order to free up memory in the buffer pool chunks. */
@@ -1466,7 +1504,8 @@ buf_pool_resize(void)
}
srv_buf_pool_old_size = srv_buf_pool_size;
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
}
buf_pool_page_hash_rebuild();
@@ -1488,12 +1527,14 @@ buf_block_make_young(
if (buf_page_peek_if_too_old(bpage)) {
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
/* There has been freeing activity in the LRU list:
best to move to the head of the LRU list */
buf_LRU_make_block_young(bpage);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
}
}
@@ -1507,13 +1548,15 @@ buf_page_make_young(
/*================*/
buf_page_t* bpage) /* in: buffer block of a file page */
{
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
ut_a(buf_page_in_file(bpage));
buf_LRU_make_block_young(bpage);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
}
/************************************************************************
@@ -1528,7 +1571,8 @@ buf_reset_check_index_page_at_flush(
{
buf_block_t* block;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
block = (buf_block_t*) buf_page_hash_get(space, offset);
@@ -1536,7 +1580,8 @@ buf_reset_check_index_page_at_flush(
block->check_index_page_at_flush = FALSE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
}
/************************************************************************
@@ -1555,7 +1600,8 @@ buf_page_peek_if_search_hashed(
buf_block_t* block;
ibool is_hashed;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
block = (buf_block_t*) buf_page_hash_get(space, offset);
@@ -1565,7 +1611,8 @@ buf_page_peek_if_search_hashed(
is_hashed = block->is_hashed;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(is_hashed);
}
@@ -1587,7 +1634,8 @@ buf_page_set_file_page_was_freed(
{
buf_page_t* bpage;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
bpage = buf_page_hash_get(space, offset);
@@ -1595,7 +1643,8 @@ buf_page_set_file_page_was_freed(
bpage->file_page_was_freed = TRUE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(bpage);
}
@@ -1616,7 +1665,8 @@ buf_page_reset_file_page_was_freed(
{
buf_page_t* bpage;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
bpage = buf_page_hash_get(space, offset);
@@ -1624,7 +1674,8 @@ buf_page_reset_file_page_was_freed(
bpage->file_page_was_freed = FALSE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(bpage);
}
@@ -1657,8 +1708,9 @@ buf_page_get_zip(
buf_pool->n_page_gets++;
for (;;) {
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
lookup:
+ rw_lock_s_lock(&page_hash_latch);
bpage = buf_page_hash_get(space, offset);
if (bpage) {
break;
@@ -1666,7 +1718,8 @@ lookup:
/* Page not in buf_pool: needs to be read from file */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
buf_read_page(space, zip_size, offset);
@@ -1677,12 +1730,21 @@ lookup:
if (UNIV_UNLIKELY(!bpage->zip.data)) {
/* There is no compressed page. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(NULL);
}
block_mutex = buf_page_get_mutex(bpage);
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
+
+ rw_lock_s_unlock(&page_hash_latch);
switch (buf_page_get_state(bpage)) {
case BUF_BLOCK_NOT_USED:
@@ -1698,7 +1760,7 @@ lookup:
break;
case BUF_BLOCK_FILE_PAGE:
/* Discard the uncompressed page frame if possible. */
- if (buf_LRU_free_block(bpage, FALSE, NULL)
+ if (buf_LRU_free_block(bpage, FALSE, NULL, FALSE)
== BUF_LRU_FREED) {
mutex_exit(block_mutex);
@@ -1712,7 +1774,7 @@ lookup:
must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ;
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
buf_page_set_accessed(bpage, TRUE);
@@ -1943,7 +2005,7 @@ buf_block_is_uncompressed(
const buf_chunk_t* chunk = buf_pool->chunks;
const buf_chunk_t* const echunk = chunk + buf_pool->n_chunks;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
if (UNIV_UNLIKELY((((ulint) block) % sizeof *block) != 0)) {
/* The pointer should be aligned. */
@@ -1986,6 +2048,7 @@ buf_page_get_gen(
ibool accessed;
ulint fix_type;
ibool must_read;
+ mutex_t* block_mutex;
ut_ad(mtr);
ut_ad((rw_latch == RW_S_LATCH)
@@ -2001,9 +2064,18 @@ buf_page_get_gen(
buf_pool->n_page_gets++;
loop:
block = guess;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
if (block) {
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+retry_lock_1:
+ mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex((buf_page_t*)block)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+ goto retry_lock_1;
+ }
+
/* If the guess is a compressed page descriptor that
has been allocated by buf_buddy_alloc(), it may have
been invalidated by buf_buddy_relocate(). In that
@@ -2017,6 +2089,8 @@ loop:
|| space != block->page.space
|| buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
+ mutex_exit(block_mutex);
+
block = guess = NULL;
} else {
ut_ad(!block->page.in_zip_hash);
@@ -2025,14 +2099,26 @@ loop:
}
if (block == NULL) {
+ rw_lock_s_lock(&page_hash_latch);
block = (buf_block_t*) buf_page_hash_get(space, offset);
+ if (block) {
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+retry_lock_2:
+ mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex((buf_page_t*)block)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+ goto retry_lock_2;
+ }
+ }
+ rw_lock_s_unlock(&page_hash_latch);
}
loop2:
if (block == NULL) {
/* Page not in buf_pool: needs to be read from file */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
if (mode == BUF_GET_IF_IN_POOL) {
@@ -2053,7 +2139,8 @@ loop2:
if (must_read && mode == BUF_GET_IF_IN_POOL) {
/* The page is only being read to buffer */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(block_mutex);
return(NULL);
}
@@ -2063,10 +2150,16 @@ loop2:
ibool success;
case BUF_BLOCK_FILE_PAGE:
+ if (block_mutex == &buf_pool_zip_mutex) {
+ /* it is wrong mutex... */
+ mutex_exit(block_mutex);
+ goto loop;
+ }
break;
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_ZIP_DIRTY:
+ ut_ad(block_mutex == &buf_pool_zip_mutex);
bpage = &block->page;
if (bpage->buf_fix_count
@@ -2077,20 +2170,25 @@ loop2:
wait_until_unfixed:
/* The block is buffer-fixed or I/O-fixed.
Try again later. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(block_mutex);
os_thread_sleep(WAIT_FOR_READ);
goto loop;
}
/* Allocate an uncompressed page. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(block_mutex);
block = buf_LRU_get_free_block(0);
ut_a(block);
+ block_mutex = &block->mutex;
- buf_pool_mutex_enter();
- mutex_enter(&block->mutex);
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ mutex_enter(block_mutex);
{
buf_page_t* hash_bpage
@@ -2101,35 +2199,55 @@ wait_until_unfixed:
while buf_pool_mutex was released.
Free the block that was allocated. */
- buf_LRU_block_free_non_file_page(block);
- mutex_exit(&block->mutex);
+ buf_LRU_block_free_non_file_page(block, TRUE);
+ mutex_exit(block_mutex);
block = (buf_block_t*) hash_bpage;
+ if (block) {
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+retry_lock_3:
+ mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex((buf_page_t*)block)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex((buf_page_t*)block);
+ goto retry_lock_3;
+ }
+ }
+ rw_lock_x_unlock(&page_hash_latch);
+ mutex_exit(&LRU_list_mutex);
goto loop2;
}
}
+ mutex_enter(&buf_pool_zip_mutex);
+
if (UNIV_UNLIKELY
(bpage->buf_fix_count
|| buf_page_get_io_fix(bpage) != BUF_IO_NONE)) {
+ mutex_exit(&buf_pool_zip_mutex);
/* The block was buffer-fixed or I/O-fixed
while buf_pool_mutex was not held by this thread.
Free the block that was allocated and try again.
This should be extremely unlikely. */
- buf_LRU_block_free_non_file_page(block);
- mutex_exit(&block->mutex);
+ buf_LRU_block_free_non_file_page(block, TRUE);
+ //mutex_exit(&block->mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ mutex_exit(&LRU_list_mutex);
goto wait_until_unfixed;
}
/* Move the compressed page from bpage to block,
and uncompress it. */
- mutex_enter(&buf_pool_zip_mutex);
+ mutex_enter(&flush_list_mutex);
buf_relocate(bpage, &block->page);
+
+ rw_lock_x_unlock(&page_hash_latch);
+
buf_block_init_low(block);
block->lock_hash_val = lock_rec_hash(space, offset);
@@ -2138,29 +2256,31 @@ wait_until_unfixed:
if (buf_page_get_state(&block->page)
== BUF_BLOCK_ZIP_PAGE) {
- UT_LIST_REMOVE(list, buf_pool->zip_clean,
+ UT_LIST_REMOVE(zip_list, buf_pool->zip_clean,
&block->page);
ut_ad(!block->page.in_flush_list);
} else {
/* Relocate buf_pool->flush_list. */
buf_page_t* b;
- b = UT_LIST_GET_PREV(list, &block->page);
+ b = UT_LIST_GET_PREV(flush_list, &block->page);
ut_ad(block->page.in_flush_list);
- UT_LIST_REMOVE(list, buf_pool->flush_list,
+ UT_LIST_REMOVE(flush_list, buf_pool->flush_list,
&block->page);
if (b) {
UT_LIST_INSERT_AFTER(
- list, buf_pool->flush_list, b,
+ flush_list, buf_pool->flush_list, b,
&block->page);
} else {
UT_LIST_ADD_FIRST(
- list, buf_pool->flush_list,
+ flush_list, buf_pool->flush_list,
&block->page);
}
}
+ mutex_exit(&flush_list_mutex);
+
/* Buffer-fix, I/O-fix, and X-latch the block
for the duration of the decompression.
Also add the block to the unzip_LRU list. */
@@ -2169,16 +2289,22 @@ wait_until_unfixed:
/* Insert at the front of unzip_LRU list */
buf_unzip_LRU_add_block(block, FALSE);
+ mutex_exit(&LRU_list_mutex);
+
block->page.buf_fix_count = 1;
buf_block_set_io_fix(block, BUF_IO_READ);
+
+ mutex_enter(&buf_pool_mutex);
buf_pool->n_pend_unzip++;
+ mutex_exit(&buf_pool_mutex);
+
rw_lock_x_lock(&block->lock);
- mutex_exit(&block->mutex);
+ mutex_exit(block_mutex);
mutex_exit(&buf_pool_zip_mutex);
- buf_buddy_free(bpage, sizeof *bpage);
+ buf_buddy_free(bpage, sizeof *bpage, FALSE);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
/* Decompress the page and apply buffered operations
while not holding buf_pool_mutex or block->mutex. */
@@ -2190,17 +2316,21 @@ wait_until_unfixed:
}
/* Unfix and unlatch the block. */
- buf_pool_mutex_enter();
- mutex_enter(&block->mutex);
+ //buf_pool_mutex_enter();
+ block_mutex = &block->mutex;
+ mutex_enter(block_mutex);
+ mutex_enter(&buf_pool_mutex);
buf_pool->n_pend_unzip--;
+ mutex_exit(&buf_pool_mutex);
block->page.buf_fix_count--;
buf_block_set_io_fix(block, BUF_IO_NONE);
- mutex_exit(&block->mutex);
+ //mutex_exit(&block->mutex);
rw_lock_x_unlock(&block->lock);
if (UNIV_UNLIKELY(!success)) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(block_mutex);
return(NULL);
}
@@ -2217,11 +2347,11 @@ wait_until_unfixed:
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
- mutex_enter(&block->mutex);
+ //mutex_enter(&block->mutex);
UNIV_MEM_ASSERT_RW(&block->page, sizeof block->page);
buf_block_buf_fix_inc(block, file, line);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
/* Check if this is the first access to the page */
@@ -2229,7 +2359,7 @@ wait_until_unfixed:
buf_page_set_accessed(&block->page, TRUE);
- mutex_exit(&block->mutex);
+ mutex_exit(block_mutex);
buf_block_make_young(&block->page);
@@ -2515,16 +2645,19 @@ buf_page_try_get_func(
ibool success;
ulint fix_type;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
block = buf_block_hash_get(space_id, page_no);
if (!block) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(NULL);
}
mutex_enter(&block->mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
@@ -2644,7 +2777,10 @@ buf_page_init(
{
buf_page_t* hash_page;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX));
+#endif
ut_ad(mutex_own(&(block->mutex)));
ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
@@ -2677,7 +2813,8 @@ buf_page_init(
(const void*) hash_page, (const void*) block);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
mutex_exit(&block->mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_x_unlock(&page_hash_latch);
buf_print();
buf_LRU_print();
buf_validate();
@@ -2756,16 +2893,24 @@ buf_page_init_for_read(
ut_ad(block);
}
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
if (buf_page_hash_get(space, offset)) {
/* The page is already in the buffer pool. */
err_exit:
if (block) {
mutex_enter(&block->mutex);
- buf_LRU_block_free_non_file_page(block);
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ buf_LRU_block_free_non_file_page(block, FALSE);
mutex_exit(&block->mutex);
}
+ else {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
bpage = NULL;
goto func_exit;
@@ -2785,6 +2930,8 @@ err_exit:
mutex_enter(&block->mutex);
buf_page_init(space, offset, block);
+ rw_lock_x_unlock(&page_hash_latch);
+
/* The block must be put to the LRU list, to the old blocks */
buf_LRU_add_block(bpage, TRUE/* to old blocks */);
@@ -2812,7 +2959,7 @@ err_exit:
been added to buf_pool->LRU and
buf_pool->page_hash. */
mutex_exit(&block->mutex);
- data = buf_buddy_alloc(zip_size, &lru);
+ data = buf_buddy_alloc(zip_size, &lru, FALSE);
mutex_enter(&block->mutex);
block->page.zip.data = data;
@@ -2825,6 +2972,7 @@ err_exit:
buf_unzip_LRU_add_block(block, TRUE);
}
+ mutex_exit(&LRU_list_mutex);
mutex_exit(&block->mutex);
} else {
/* Defer buf_buddy_alloc() until after the block has
@@ -2836,8 +2984,8 @@ err_exit:
control block (bpage), in order to avoid the
invocation of buf_buddy_relocate_block() on
uninitialized data. */
- data = buf_buddy_alloc(zip_size, &lru);
- bpage = buf_buddy_alloc(sizeof *bpage, &lru);
+ data = buf_buddy_alloc(zip_size, &lru, TRUE);
+ bpage = buf_buddy_alloc(sizeof *bpage, &lru, TRUE);
/* If buf_buddy_alloc() allocated storage from the LRU list,
it released and reacquired buf_pool_mutex. Thus, we must
@@ -2846,8 +2994,11 @@ err_exit:
&& UNIV_LIKELY_NULL(buf_page_hash_get(space, offset))) {
/* The block was added by some other thread. */
- buf_buddy_free(bpage, sizeof *bpage);
- buf_buddy_free(data, zip_size);
+ buf_buddy_free(bpage, sizeof *bpage, TRUE);
+ buf_buddy_free(data, zip_size, TRUE);
+
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
bpage = NULL;
goto func_exit;
@@ -2877,18 +3028,26 @@ err_exit:
HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
buf_page_address_fold(space, offset), bpage);
+ rw_lock_x_unlock(&page_hash_latch);
+
/* The block must be put to the LRU list, to the old blocks */
buf_LRU_add_block(bpage, TRUE/* to old blocks */);
+ mutex_enter(&flush_list_mutex);
buf_LRU_insert_zip_clean(bpage);
+ mutex_exit(&flush_list_mutex);
+
+ mutex_exit(&LRU_list_mutex);
buf_page_set_io_fix(bpage, BUF_IO_READ);
mutex_exit(&buf_pool_zip_mutex);
}
+ mutex_enter(&buf_pool_mutex);
buf_pool->n_pend_reads++;
+ mutex_exit(&buf_pool_mutex);
func_exit:
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
if (mode == BUF_READ_IBUF_PAGES_ONLY) {
@@ -2924,7 +3083,9 @@ buf_page_create(
free_block = buf_LRU_get_free_block(0);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
block = (buf_block_t*) buf_page_hash_get(space, offset);
@@ -2937,7 +3098,9 @@ buf_page_create(
#endif /* UNIV_DEBUG_FILE_ACCESSES */
/* Page can be found in buf_pool */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
buf_block_free(free_block);
@@ -2959,6 +3122,7 @@ buf_page_create(
mutex_enter(&block->mutex);
buf_page_init(space, offset, block);
+ rw_lock_x_unlock(&page_hash_latch);
/* The block must be put to the LRU list */
buf_LRU_add_block(&block->page, FALSE);
@@ -2985,7 +3149,7 @@ buf_page_create(
the reacquisition of buf_pool_mutex. We also must
defer this operation until after the block descriptor
has been added to buf_pool->LRU and buf_pool->page_hash. */
- data = buf_buddy_alloc(zip_size, &lru);
+ data = buf_buddy_alloc(zip_size, &lru, FALSE);
mutex_enter(&block->mutex);
block->page.zip.data = data;
@@ -3001,7 +3165,8 @@ buf_page_create(
rw_lock_x_unlock(&block->lock);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);
@@ -3053,6 +3218,8 @@ buf_page_io_complete(
enum buf_io_fix io_type;
const ibool uncompressed = (buf_page_get_state(bpage)
== BUF_BLOCK_FILE_PAGE);
+ enum buf_flush flush_type;
+ mutex_t* block_mutex;
ut_a(buf_page_in_file(bpage));
@@ -3187,8 +3354,23 @@ corrupt:
}
}
- buf_pool_mutex_enter();
- mutex_enter(buf_page_get_mutex(bpage));
+ //buf_pool_mutex_enter();
+ if (io_type == BUF_IO_WRITE) {
+ flush_type = buf_page_get_flush_type(bpage);
+ /* to keep consistency at buf_LRU_insert_zip_clean() */
+ //if (flush_type == BUF_FLUSH_LRU) { /* optimistic! */
+ mutex_enter(&LRU_list_mutex);
+ //}
+ }
+ block_mutex = buf_page_get_mutex(bpage);
+retry_lock:
+ mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
+ mutex_enter(&buf_pool_mutex);
#ifdef UNIV_IBUF_COUNT_DEBUG
if (io_type == BUF_IO_WRITE || uncompressed) {
@@ -3228,6 +3410,11 @@ corrupt:
buf_flush_write_complete(bpage);
+ /* to keep consistency at buf_LRU_insert_zip_clean() */
+ //if (flush_type == BUF_FLUSH_LRU) { /* optimistic! */
+ mutex_exit(&LRU_list_mutex);
+ //}
+
if (uncompressed) {
rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,
BUF_IO_WRITE);
@@ -3250,8 +3437,9 @@ corrupt:
}
#endif /* UNIV_DEBUG */
- mutex_exit(buf_page_get_mutex(bpage));
- buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
+ mutex_exit(block_mutex);
+ //buf_pool_mutex_exit();
}
/*************************************************************************
@@ -3273,12 +3461,14 @@ buf_pool_invalidate(void)
freed = buf_LRU_search_and_free_block(100);
}
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
}
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
@@ -3302,7 +3492,10 @@ buf_validate(void)
ut_ad(buf_pool);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ /* for keep the new latch order, it cannot validate correctly... */
chunk = buf_pool->chunks;
@@ -3401,7 +3594,7 @@ buf_validate(void)
/* Check clean compressed-only blocks. */
for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(zip_list, b)) {
ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
switch (buf_page_get_io_fix(b)) {
case BUF_IO_NONE:
@@ -3426,8 +3619,9 @@ buf_validate(void)
/* Check dirty compressed-only blocks. */
+ mutex_enter(&flush_list_mutex);
for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(flush_list, b)) {
ut_ad(b->in_flush_list);
switch (buf_page_get_state(b)) {
@@ -3472,6 +3666,7 @@ buf_validate(void)
}
ut_a(buf_page_hash_get(b->space, b->offset) == b);
}
+ mutex_exit(&flush_list_mutex);
mutex_exit(&buf_pool_zip_mutex);
@@ -3483,19 +3678,27 @@ buf_validate(void)
}
ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);
+ /* because of latching order with block->mutex, we cannot get free_list_mutex before that */
+/*
if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {
fprintf(stderr, "Free list len %lu, free blocks %lu\n",
(ulong) UT_LIST_GET_LEN(buf_pool->free),
(ulong) n_free);
ut_error;
}
+*/
+ /* because of latching order with block->mutex, we cannot get flush_list_mutex before that */
+/*
ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == n_flush);
ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_single_flush);
ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
+*/
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
ut_a(buf_LRU_validate());
ut_a(buf_flush_validate());
@@ -3529,7 +3732,10 @@ buf_print(void)
index_ids = mem_alloc(sizeof(dulint) * size);
counts = mem_alloc(sizeof(ulint) * size);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ mutex_enter(&free_list_mutex);
+ mutex_enter(&flush_list_mutex);
fprintf(stderr,
"buf_pool size %lu\n"
@@ -3592,7 +3798,10 @@ buf_print(void)
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ mutex_exit(&free_list_mutex);
+ mutex_exit(&flush_list_mutex);
for (i = 0; i < n_found; i++) {
index = dict_index_get_if_in_cache(index_ids[i]);
@@ -3630,7 +3839,7 @@ buf_get_latched_pages_number(void)
ulint i;
ulint fixed_pages_number = 0;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
chunk = buf_pool->chunks;
@@ -3664,7 +3873,7 @@ buf_get_latched_pages_number(void)
/* Traverse the lists of clean and dirty compressed-only blocks. */
for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(zip_list, b)) {
ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
ut_a(buf_page_get_io_fix(b) != BUF_IO_WRITE);
@@ -3674,8 +3883,9 @@ buf_get_latched_pages_number(void)
}
}
+ mutex_enter(&flush_list_mutex);
for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
- b = UT_LIST_GET_NEXT(list, b)) {
+ b = UT_LIST_GET_NEXT(flush_list, b)) {
ut_ad(b->in_flush_list);
switch (buf_page_get_state(b)) {
@@ -3698,9 +3908,10 @@ buf_get_latched_pages_number(void)
break;
}
}
+ mutex_exit(&flush_list_mutex);
mutex_exit(&buf_pool_zip_mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
return(fixed_pages_number);
}
@@ -3757,7 +3968,11 @@ buf_print_io(
ut_ad(buf_pool);
size = buf_pool->curr_size;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ mutex_enter(&free_list_mutex);
+ mutex_enter(&buf_pool_mutex);
+ mutex_enter(&flush_list_mutex);
fprintf(file,
"Buffer pool size %lu\n"
@@ -3824,7 +4039,11 @@ buf_print_io(
buf_LRU_stat_sum.io, buf_LRU_stat_cur.io,
buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ mutex_exit(&free_list_mutex);
+ mutex_exit(&buf_pool_mutex);
+ mutex_exit(&flush_list_mutex);
}
/**************************************************************************
@@ -3853,7 +4072,7 @@ buf_all_freed(void)
ut_ad(buf_pool);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter(); /* optimistic */
chunk = buf_pool->chunks;
@@ -3870,7 +4089,7 @@ buf_all_freed(void)
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit(); /* optimistic */
return(TRUE);
}
@@ -3886,7 +4105,8 @@ buf_pool_check_no_pending_io(void)
{
ibool ret;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
if (buf_pool->n_pend_reads + buf_pool->n_flush[BUF_FLUSH_LRU]
+ buf_pool->n_flush[BUF_FLUSH_LIST]
@@ -3896,7 +4116,8 @@ buf_pool_check_no_pending_io(void)
ret = TRUE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
return(ret);
}
@@ -3910,11 +4131,13 @@ buf_get_free_list_len(void)
{
ulint len;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&free_list_mutex);
len = UT_LIST_GET_LEN(buf_pool->free);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&free_list_mutex);
return(len);
}
=== modified file 'storage/xtradb/buf/buf0flu.c'
--- a/storage/xtradb/buf/buf0flu.c 2009-05-04 04:32:30 +0000
+++ b/storage/xtradb/buf/buf0flu.c 2009-06-25 01:43:25 +0000
@@ -61,7 +61,9 @@ buf_flush_insert_into_flush_list(
/*=============================*/
buf_block_t* block) /* in/out: block which is modified */
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&block->mutex));
+ ut_ad(mutex_own(&flush_list_mutex));
ut_ad((UT_LIST_GET_FIRST(buf_pool->flush_list) == NULL)
|| (UT_LIST_GET_FIRST(buf_pool->flush_list)->oldest_modification
<= block->page.oldest_modification));
@@ -72,7 +74,7 @@ buf_flush_insert_into_flush_list(
ut_ad(!block->page.in_zip_hash);
ut_ad(!block->page.in_flush_list);
ut_d(block->page.in_flush_list = TRUE);
- UT_LIST_ADD_FIRST(list, buf_pool->flush_list, &block->page);
+ UT_LIST_ADD_FIRST(flush_list, buf_pool->flush_list, &block->page);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
ut_a(buf_flush_validate_low());
@@ -92,7 +94,9 @@ buf_flush_insert_sorted_into_flush_list(
buf_page_t* prev_b;
buf_page_t* b;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&block->mutex));
+ ut_ad(mutex_own(&flush_list_mutex));
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(block->page.in_LRU_list);
@@ -107,13 +111,13 @@ buf_flush_insert_sorted_into_flush_list(
while (b && b->oldest_modification > block->page.oldest_modification) {
ut_ad(b->in_flush_list);
prev_b = b;
- b = UT_LIST_GET_NEXT(list, b);
+ b = UT_LIST_GET_NEXT(flush_list, b);
}
if (prev_b == NULL) {
- UT_LIST_ADD_FIRST(list, buf_pool->flush_list, &block->page);
+ UT_LIST_ADD_FIRST(flush_list, buf_pool->flush_list, &block->page);
} else {
- UT_LIST_INSERT_AFTER(list, buf_pool->flush_list,
+ UT_LIST_INSERT_AFTER(flush_list, buf_pool->flush_list,
prev_b, &block->page);
}
@@ -134,7 +138,7 @@ buf_flush_ready_for_replace(
buf_page_in_file(bpage) and in the LRU list */
{
//ut_ad(buf_pool_mutex_own());
- //ut_ad(mutex_own(buf_page_get_mutex(bpage)));
+ ut_ad(mutex_own(buf_page_get_mutex(bpage)));
//ut_ad(bpage->in_LRU_list); /* optimistic use */
if (UNIV_LIKELY(bpage->in_LRU_list && buf_page_in_file(bpage))) {
@@ -169,12 +173,12 @@ buf_flush_ready_for_flush(
buf_page_in_file(bpage) */
enum buf_flush flush_type)/* in: BUF_FLUSH_LRU or BUF_FLUSH_LIST */
{
- ut_a(buf_page_in_file(bpage));
- ut_ad(buf_pool_mutex_own());
+ //ut_a(buf_page_in_file(bpage));
+ //ut_ad(buf_pool_mutex_own()); /*optimistic...*/
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
ut_ad(flush_type == BUF_FLUSH_LRU || BUF_FLUSH_LIST);
- if (bpage->oldest_modification != 0
+ if (buf_page_in_file(bpage) && bpage->oldest_modification != 0
&& buf_page_get_io_fix(bpage) == BUF_IO_NONE) {
ut_ad(bpage->in_flush_list);
@@ -203,8 +207,11 @@ buf_flush_remove(
/*=============*/
buf_page_t* bpage) /* in: pointer to the block in question */
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
+
+ mutex_enter(&flush_list_mutex);
+
ut_ad(bpage->in_flush_list);
ut_d(bpage->in_flush_list = FALSE);
@@ -216,21 +223,23 @@ buf_flush_remove(
case BUF_BLOCK_READY_FOR_USE:
case BUF_BLOCK_MEMORY:
case BUF_BLOCK_REMOVE_HASH:
+ mutex_exit(&flush_list_mutex);
ut_error;
return;
case BUF_BLOCK_ZIP_DIRTY:
buf_page_set_state(bpage, BUF_BLOCK_ZIP_PAGE);
- UT_LIST_REMOVE(list, buf_pool->flush_list, bpage);
+ UT_LIST_REMOVE(flush_list, buf_pool->flush_list, bpage);
buf_LRU_insert_zip_clean(bpage);
break;
case BUF_BLOCK_FILE_PAGE:
- UT_LIST_REMOVE(list, buf_pool->flush_list, bpage);
+ UT_LIST_REMOVE(flush_list, buf_pool->flush_list, bpage);
break;
}
bpage->oldest_modification = 0;
- ut_d(UT_LIST_VALIDATE(list, buf_page_t, buf_pool->flush_list));
+ ut_d(UT_LIST_VALIDATE(flush_list, buf_page_t, buf_pool->flush_list));
+ mutex_exit(&flush_list_mutex);
}
/************************************************************************
@@ -678,7 +687,9 @@ buf_flush_write_block_low(
io_fixed and oldest_modification != 0. Thus, it cannot be
relocated in the buffer pool or removed from flush_list or
LRU_list. */
- ut_ad(!buf_pool_mutex_own());
+ //ut_ad(!buf_pool_mutex_own());
+ ut_ad(!mutex_own(&LRU_list_mutex));
+ ut_ad(!mutex_own(&flush_list_mutex));
ut_ad(!mutex_own(buf_page_get_mutex(bpage)));
ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_WRITE);
ut_ad(bpage->oldest_modification != 0);
@@ -762,12 +773,19 @@ buf_flush_page(
ibool is_uncompressed;
ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX)
+ || rw_lock_own(&page_hash_latch, RW_LOCK_SHARED));
+#endif
ut_ad(buf_page_in_file(bpage));
block_mutex = buf_page_get_mutex(bpage);
ut_ad(mutex_own(block_mutex));
+ mutex_enter(&buf_pool_mutex);
+ rw_lock_s_unlock(&page_hash_latch);
+
ut_ad(buf_flush_ready_for_flush(bpage, flush_type));
buf_page_set_io_fix(bpage, BUF_IO_WRITE);
@@ -798,7 +816,8 @@ buf_flush_page(
}
mutex_exit(block_mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
/* Even though bpage is not protected by any mutex at
this point, it is safe to access bpage, because it is
@@ -835,7 +854,8 @@ buf_flush_page(
immediately. */
mutex_exit(block_mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
break;
default:
@@ -899,7 +919,8 @@ buf_flush_try_neighbors(
high = fil_space_get_size(space);
}
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
for (i = low; i < high; i++) {
@@ -920,7 +941,13 @@ buf_flush_try_neighbors(
|| buf_page_is_old(bpage)) {
mutex_t* block_mutex = buf_page_get_mutex(bpage);
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
if (buf_flush_ready_for_flush(bpage, flush_type)
&& (i == offset || !bpage->buf_fix_count)) {
@@ -936,14 +963,16 @@ buf_flush_try_neighbors(
ut_ad(!mutex_own(block_mutex));
count++;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
} else {
mutex_exit(block_mutex);
}
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(count);
}
@@ -980,6 +1009,7 @@ buf_flush_batch(
ulint old_page_count;
ulint space;
ulint offset;
+ ulint remaining = 0;
ut_ad((flush_type == BUF_FLUSH_LRU)
|| (flush_type == BUF_FLUSH_LIST));
@@ -987,20 +1017,28 @@ buf_flush_batch(
ut_ad((flush_type != BUF_FLUSH_LIST)
|| sync_thread_levels_empty_gen(TRUE));
#endif /* UNIV_SYNC_DEBUG */
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
if ((buf_pool->n_flush[flush_type] > 0)
|| (buf_pool->init_flush[flush_type] == TRUE)) {
/* There is already a flush batch of the same type running */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
return(ULINT_UNDEFINED);
}
buf_pool->init_flush[flush_type] = TRUE;
+ mutex_exit(&buf_pool_mutex);
+
+ if (flush_type == BUF_FLUSH_LRU) {
+ mutex_enter(&LRU_list_mutex);
+ }
+
for (;;) {
flush_next:
/* If we have flushed enough, leave the loop */
@@ -1017,7 +1055,10 @@ flush_next:
} else {
ut_ad(flush_type == BUF_FLUSH_LIST);
+ mutex_enter(&flush_list_mutex);
+ remaining = UT_LIST_GET_LEN(buf_pool->flush_list);
bpage = UT_LIST_GET_LAST(buf_pool->flush_list);
+ mutex_exit(&flush_list_mutex);
if (!bpage
|| bpage->oldest_modification >= lsn_limit) {
/* We have flushed enough */
@@ -1037,9 +1078,15 @@ flush_next:
mutex_t*block_mutex = buf_page_get_mutex(bpage);
ibool ready;
+retry_lock_1:
ut_a(buf_page_in_file(bpage));
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock_1;
+ }
ready = buf_flush_ready_for_flush(bpage, flush_type);
mutex_exit(block_mutex);
@@ -1047,7 +1094,10 @@ flush_next:
space = buf_page_get_space(bpage);
offset = buf_page_get_page_no(bpage);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ if (flush_type == BUF_FLUSH_LRU) {
+ mutex_exit(&LRU_list_mutex);
+ }
old_page_count = page_count;
@@ -1057,10 +1107,17 @@ flush_next:
space, offset, flush_type);
} else {
/* Try to flush the page only */
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
mutex_t* block_mutex = buf_page_get_mutex(bpage);
+retry_lock_2:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock_2;
+ }
buf_page_t* bpage_tmp = buf_page_hash_get(space, offset);
if (bpage_tmp) {
@@ -1073,7 +1130,10 @@ flush_next:
flush_type, offset,
page_count - old_page_count); */
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ if (flush_type == BUF_FLUSH_LRU) {
+ mutex_enter(&LRU_list_mutex);
+ }
goto flush_next;
} else if (flush_type == BUF_FLUSH_LRU) {
@@ -1081,16 +1141,28 @@ flush_next:
} else {
ut_ad(flush_type == BUF_FLUSH_LIST);
- bpage = UT_LIST_GET_PREV(list, bpage);
- ut_ad(!bpage || bpage->in_flush_list);
+ mutex_enter(&flush_list_mutex);
+ bpage = UT_LIST_GET_PREV(flush_list, bpage);
+ //ut_ad(!bpage || bpage->in_flush_list); /* optimistic */
+ mutex_exit(&flush_list_mutex);
+ remaining--;
}
} while (bpage != NULL);
+ if (remaining)
+ goto flush_next;
+
/* If we could not find anything to flush, leave the loop */
break;
}
+ if (flush_type == BUF_FLUSH_LRU) {
+ mutex_exit(&LRU_list_mutex);
+ }
+
+ mutex_enter(&buf_pool_mutex);
+
buf_pool->init_flush[flush_type] = FALSE;
if (buf_pool->n_flush[flush_type] == 0) {
@@ -1100,7 +1172,8 @@ flush_next:
os_event_set(buf_pool->no_flush[flush_type]);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
buf_flush_buffered_writes();
@@ -1154,7 +1227,7 @@ buf_flush_LRU_recommendation(void)
//buf_pool_mutex_enter();
if (have_LRU_mutex)
- buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
n_replaceable = UT_LIST_GET_LEN(buf_pool->free);
@@ -1173,7 +1246,13 @@ buf_flush_LRU_recommendation(void)
mutex_t* block_mutex = buf_page_get_mutex(bpage);
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
if (buf_flush_ready_for_replace(bpage)) {
n_replaceable++;
@@ -1188,7 +1267,7 @@ buf_flush_LRU_recommendation(void)
//buf_pool_mutex_exit();
if (have_LRU_mutex)
- buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
if (n_replaceable >= BUF_FLUSH_FREE_BLOCK_MARGIN) {
@@ -1238,17 +1317,17 @@ buf_flush_validate_low(void)
{
buf_page_t* bpage;
- UT_LIST_VALIDATE(list, buf_page_t, buf_pool->flush_list);
+ UT_LIST_VALIDATE(flush_list, buf_page_t, buf_pool->flush_list);
bpage = UT_LIST_GET_FIRST(buf_pool->flush_list);
while (bpage != NULL) {
const ib_uint64_t om = bpage->oldest_modification;
ut_ad(bpage->in_flush_list);
- ut_a(buf_page_in_file(bpage));
+ //ut_a(buf_page_in_file(bpage)); /* optimistic */
ut_a(om > 0);
- bpage = UT_LIST_GET_NEXT(list, bpage);
+ bpage = UT_LIST_GET_NEXT(flush_list, bpage);
ut_a(!bpage || om >= bpage->oldest_modification);
}
@@ -1266,11 +1345,13 @@ buf_flush_validate(void)
{
ibool ret;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&flush_list_mutex);
ret = buf_flush_validate_low();
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&flush_list_mutex);
return(ret);
}
=== modified file 'storage/xtradb/buf/buf0lru.c'
--- a/storage/xtradb/buf/buf0lru.c 2009-05-04 04:32:30 +0000
+++ b/storage/xtradb/buf/buf0lru.c 2009-06-25 01:43:25 +0000
@@ -129,25 +129,31 @@ static
void
buf_LRU_block_free_hashed_page(
/*===========================*/
- buf_block_t* block); /* in: block, must contain a file page and
+ buf_block_t* block, /* in: block, must contain a file page and
be in a state where it can be freed */
+ ibool have_page_hash_mutex);
/**********************************************************************
Determines if the unzip_LRU list should be used for evicting a victim
instead of the general LRU list. */
UNIV_INLINE
ibool
-buf_LRU_evict_from_unzip_LRU(void)
+buf_LRU_evict_from_unzip_LRU(
+ ibool have_LRU_mutex)
/*==============================*/
/* out: TRUE if should use unzip_LRU */
{
ulint io_avg;
ulint unzip_avg;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ if (!have_LRU_mutex)
+ mutex_enter(&LRU_list_mutex);
/* If the unzip_LRU list is empty, we can only use the LRU. */
if (UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0) {
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
return(FALSE);
}
@@ -156,14 +162,20 @@ buf_LRU_evict_from_unzip_LRU(void)
decompressed pages in the buffer pool. */
if (UT_LIST_GET_LEN(buf_pool->unzip_LRU)
<= UT_LIST_GET_LEN(buf_pool->LRU) / 10) {
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
return(FALSE);
}
/* If eviction hasn't started yet, we assume by default
that a workload is disk bound. */
if (buf_pool->freed_page_clock == 0) {
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
return(TRUE);
}
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
/* Calculate the average over past intervals, and add the values
of the current interval. */
@@ -229,7 +241,8 @@ buf_LRU_drop_page_hash_for_tablespace(
page_arr = ut_malloc(sizeof(ulint)
* BUF_LRU_DROP_SEARCH_HASH_SIZE);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
scan_again:
num_entries = 0;
@@ -239,7 +252,13 @@ scan_again:
mutex_t* block_mutex = buf_page_get_mutex(bpage);
buf_page_t* prev_bpage;
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
ut_a(buf_page_in_file(bpage));
@@ -269,12 +288,14 @@ scan_again:
}
/* Array full. We release the buf_pool_mutex to
obey the latching order. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
buf_LRU_drop_page_hash_batch(id, zip_size, page_arr,
num_entries);
num_entries = 0;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
} else {
mutex_exit(block_mutex);
}
@@ -299,7 +320,8 @@ next_page:
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
/* Drop any remaining batch of search hashed pages. */
buf_LRU_drop_page_hash_batch(id, zip_size, page_arr, num_entries);
@@ -327,7 +349,9 @@ buf_LRU_invalidate_tablespace(
buf_LRU_drop_page_hash_for_tablespace(id);
scan_again:
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
all_freed = TRUE;
@@ -339,7 +363,13 @@ scan_again:
ut_a(buf_page_in_file(bpage));
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
if (buf_page_get_space(bpage) == id) {
@@ -369,7 +399,9 @@ scan_again:
ulint page_no;
ulint zip_size;
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
zip_size = buf_page_get_zip_size(bpage);
page_no = buf_page_get_page_no(bpage);
@@ -393,7 +425,7 @@ scan_again:
if (buf_LRU_block_remove_hashed_page(bpage, TRUE)
!= BUF_BLOCK_ZIP_FREE) {
buf_LRU_block_free_hashed_page((buf_block_t*)
- bpage);
+ bpage, TRUE);
} else {
/* The block_mutex should have been
released by buf_LRU_block_remove_hashed_page()
@@ -416,7 +448,9 @@ next_page:
bpage = prev_bpage;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
if (!all_freed) {
os_thread_sleep(20000);
@@ -439,14 +473,16 @@ buf_LRU_get_recent_limit(void)
ulint len;
ulint limit;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
len = UT_LIST_GET_LEN(buf_pool->LRU);
if (len < BUF_LRU_OLD_MIN_LEN) {
/* The LRU list is too short to do read-ahead */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
return(0);
}
@@ -455,7 +491,8 @@ buf_LRU_get_recent_limit(void)
limit = buf_page_get_LRU_position(bpage) - len / BUF_LRU_INITIAL_RATIO;
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
return(limit);
}
@@ -470,7 +507,9 @@ buf_LRU_insert_zip_clean(
{
buf_page_t* b;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
+ ut_ad(mutex_own(&flush_list_mutex));
ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);
/* Find the first successor of bpage in the LRU list
@@ -478,17 +517,17 @@ buf_LRU_insert_zip_clean(
b = bpage;
do {
b = UT_LIST_GET_NEXT(LRU, b);
- } while (b && buf_page_get_state(b) != BUF_BLOCK_ZIP_PAGE);
+ } while (b && (buf_page_get_state(b) != BUF_BLOCK_ZIP_PAGE || !b->in_LRU_list));
/* Insert bpage before b, i.e., after the predecessor of b. */
if (b) {
- b = UT_LIST_GET_PREV(list, b);
+ b = UT_LIST_GET_PREV(zip_list, b);
}
if (b) {
- UT_LIST_INSERT_AFTER(list, buf_pool->zip_clean, b, bpage);
+ UT_LIST_INSERT_AFTER(zip_list, buf_pool->zip_clean, b, bpage);
} else {
- UT_LIST_ADD_FIRST(list, buf_pool->zip_clean, bpage);
+ UT_LIST_ADD_FIRST(zip_list, buf_pool->zip_clean, bpage);
}
}
@@ -500,16 +539,17 @@ ibool
buf_LRU_free_from_unzip_LRU_list(
/*=============================*/
/* out: TRUE if freed */
- ulint n_iterations) /* in: how many times this has been called
+ ulint n_iterations, /* in: how many times this has been called
repeatedly without result: a high value means
that we should search farther; we will search
n_iterations / 5 of the unzip_LRU list,
or nothing if n_iterations >= 5 */
+ ibool have_LRU_mutex)
{
buf_block_t* block;
ulint distance;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own()); /* optimistic */
/* Theoratically it should be much easier to find a victim
from unzip_LRU as we can choose even a dirty block (as we'll
@@ -519,7 +559,7 @@ buf_LRU_free_from_unzip_LRU_list(
if we have done five iterations so far. */
if (UNIV_UNLIKELY(n_iterations >= 5)
- || !buf_LRU_evict_from_unzip_LRU()) {
+ || !buf_LRU_evict_from_unzip_LRU(have_LRU_mutex)) {
return(FALSE);
}
@@ -527,18 +567,25 @@ buf_LRU_free_from_unzip_LRU_list(
distance = 100 + (n_iterations
* UT_LIST_GET_LEN(buf_pool->unzip_LRU)) / 5;
+restart:
for (block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);
UNIV_LIKELY(block != NULL) && UNIV_LIKELY(distance > 0);
block = UT_LIST_GET_PREV(unzip_LRU, block), distance--) {
enum buf_lru_free_block_status freed;
+ mutex_enter(&block->mutex);
+ if (!block->in_unzip_LRU_list || !block->page.in_LRU_list
+ || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
+ mutex_exit(&block->mutex);
+ goto restart;
+ }
+
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(block->in_unzip_LRU_list);
ut_ad(block->page.in_LRU_list);
- mutex_enter(&block->mutex);
- freed = buf_LRU_free_block(&block->page, FALSE, NULL);
+ freed = buf_LRU_free_block(&block->page, FALSE, NULL, have_LRU_mutex);
mutex_exit(&block->mutex);
switch (freed) {
@@ -571,20 +618,22 @@ ibool
buf_LRU_free_from_common_LRU_list(
/*==============================*/
/* out: TRUE if freed */
- ulint n_iterations) /* in: how many times this has been called
+ ulint n_iterations, /* in: how many times this has been called
repeatedly without result: a high value means
that we should search farther; if
n_iterations < 10, then we search
n_iterations / 10 * buf_pool->curr_size
pages from the end of the LRU list */
+ ibool have_LRU_mutex)
{
buf_page_t* bpage;
ulint distance;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own()); /* optimistic */
distance = 100 + (n_iterations * buf_pool->curr_size) / 10;
+restart:
for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
UNIV_LIKELY(bpage != NULL) && UNIV_LIKELY(distance > 0);
bpage = UT_LIST_GET_PREV(LRU, bpage), distance--) {
@@ -593,11 +642,25 @@ buf_LRU_free_from_common_LRU_list(
mutex_t* block_mutex
= buf_page_get_mutex(bpage);
+retry_lock:
+ mutex_enter(block_mutex);
+
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
+
+ if (!bpage->in_LRU_list
+ || !buf_page_in_file(bpage)) {
+ mutex_exit(block_mutex);
+ goto restart;
+ }
+
ut_ad(buf_page_in_file(bpage));
ut_ad(bpage->in_LRU_list);
- mutex_enter(block_mutex);
- freed = buf_LRU_free_block(bpage, TRUE, NULL);
+ freed = buf_LRU_free_block(bpage, TRUE, NULL, have_LRU_mutex);
mutex_exit(block_mutex);
switch (freed) {
@@ -640,22 +703,33 @@ buf_LRU_search_and_free_block(
n_iterations / 5 of the unzip_LRU list. */
{
ibool freed = FALSE;
+ ibool have_LRU_mutex = FALSE;
+
+ if (UT_LIST_GET_LEN(buf_pool->unzip_LRU))
+ have_LRU_mutex = TRUE;
- buf_pool_mutex_enter();
+ /* optimistic search... */
+ //buf_pool_mutex_enter();
+ if (have_LRU_mutex)
+ mutex_enter(&LRU_list_mutex);
- freed = buf_LRU_free_from_unzip_LRU_list(n_iterations);
+ freed = buf_LRU_free_from_unzip_LRU_list(n_iterations, have_LRU_mutex);
if (!freed) {
- freed = buf_LRU_free_from_common_LRU_list(n_iterations);
+ freed = buf_LRU_free_from_common_LRU_list(n_iterations, have_LRU_mutex);
}
+ mutex_enter(&buf_pool_mutex);
if (!freed) {
buf_pool->LRU_flush_ended = 0;
} else if (buf_pool->LRU_flush_ended > 0) {
buf_pool->LRU_flush_ended--;
}
+ mutex_exit(&buf_pool_mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ if (have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
return(freed);
}
@@ -673,18 +747,22 @@ void
buf_LRU_try_free_flushed_blocks(void)
/*=================================*/
{
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
while (buf_pool->LRU_flush_ended > 0) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
buf_LRU_search_and_free_block(1);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
}
/**********************************************************************
@@ -700,7 +778,9 @@ buf_LRU_buf_pool_running_out(void)
{
ibool ret = FALSE;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
+ mutex_enter(&free_list_mutex);
if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)
+ UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 4) {
@@ -708,7 +788,9 @@ buf_LRU_buf_pool_running_out(void)
ret = TRUE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ mutex_exit(&free_list_mutex);
return(ret);
}
@@ -725,9 +807,10 @@ buf_LRU_get_free_only(void)
{
buf_block_t* block;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
- block = (buf_block_t*) UT_LIST_GET_FIRST(buf_pool->free);
+ mutex_enter(&free_list_mutex);
+ block = (buf_block_t*) UT_LIST_GET_LAST(buf_pool->free);
if (block) {
ut_ad(block->page.in_free_list);
@@ -735,7 +818,9 @@ buf_LRU_get_free_only(void)
ut_ad(!block->page.in_flush_list);
ut_ad(!block->page.in_LRU_list);
ut_a(!buf_page_in_file(&block->page));
- UT_LIST_REMOVE(list, buf_pool->free, (&block->page));
+ UT_LIST_REMOVE(free, buf_pool->free, (&block->page));
+
+ mutex_exit(&free_list_mutex);
mutex_enter(&block->mutex);
@@ -743,6 +828,8 @@ buf_LRU_get_free_only(void)
UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE);
mutex_exit(&block->mutex);
+ } else {
+ mutex_exit(&free_list_mutex);
}
return(block);
@@ -767,7 +854,7 @@ buf_LRU_get_free_block(
ibool mon_value_was = FALSE;
ibool started_monitor = FALSE;
loop:
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)
+ UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) {
@@ -847,14 +934,16 @@ loop:
if (UNIV_UNLIKELY(zip_size)) {
ibool lru;
page_zip_set_size(&block->page.zip, zip_size);
- block->page.zip.data = buf_buddy_alloc(zip_size, &lru);
+ mutex_enter(&LRU_list_mutex);
+ block->page.zip.data = buf_buddy_alloc(zip_size, &lru, FALSE);
+ mutex_exit(&LRU_list_mutex);
UNIV_MEM_DESC(block->page.zip.data, zip_size, block);
} else {
page_zip_set_size(&block->page.zip, 0);
block->page.zip.data = NULL;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
if (started_monitor) {
srv_print_innodb_monitor = mon_value_was;
@@ -866,7 +955,7 @@ loop:
/* If no block was in the free list, search from the end of the LRU
list and try to free a block there */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
freed = buf_LRU_search_and_free_block(n_iterations);
@@ -915,18 +1004,21 @@ loop:
os_aio_simulated_wake_handler_threads();
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
if (buf_pool->LRU_flush_ended > 0) {
/* We have written pages in an LRU flush. To make the insert
buffer more efficient, we try to move these pages to the free
list. */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
buf_LRU_try_free_flushed_blocks();
} else {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
}
if (n_iterations > 10) {
@@ -951,7 +1043,8 @@ buf_LRU_old_adjust_len(void)
ulint new_len;
ut_a(buf_pool->LRU_old);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
#if 3 * (BUF_LRU_OLD_MIN_LEN / 8) <= BUF_LRU_OLD_TOLERANCE + 5
# error "3 * (BUF_LRU_OLD_MIN_LEN / 8) <= BUF_LRU_OLD_TOLERANCE + 5"
#endif
@@ -1009,7 +1102,8 @@ buf_LRU_old_init(void)
{
buf_page_t* bpage;
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN);
/* We first initialize all blocks in the LRU list as old and then use
@@ -1041,13 +1135,14 @@ buf_unzip_LRU_remove_block_if_needed(
ut_ad(buf_pool);
ut_ad(bpage);
ut_ad(buf_page_in_file(bpage));
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
if (buf_page_belongs_to_unzip_LRU(bpage)) {
buf_block_t* block = (buf_block_t*) bpage;
ut_ad(block->in_unzip_LRU_list);
- ut_d(block->in_unzip_LRU_list = FALSE);
+ block->in_unzip_LRU_list = FALSE;
UT_LIST_REMOVE(unzip_LRU, buf_pool->unzip_LRU, block);
}
@@ -1063,7 +1158,8 @@ buf_LRU_remove_block(
{
ut_ad(buf_pool);
ut_ad(bpage);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_a(buf_page_in_file(bpage));
@@ -1126,12 +1222,13 @@ buf_unzip_LRU_add_block(
{
ut_ad(buf_pool);
ut_ad(block);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_a(buf_page_belongs_to_unzip_LRU(&block->page));
ut_ad(!block->in_unzip_LRU_list);
- ut_d(block->in_unzip_LRU_list = TRUE);
+ block->in_unzip_LRU_list = TRUE;
if (old) {
UT_LIST_ADD_LAST(unzip_LRU, buf_pool->unzip_LRU, block);
@@ -1152,7 +1249,8 @@ buf_LRU_add_block_to_end_low(
ut_ad(buf_pool);
ut_ad(bpage);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_a(buf_page_in_file(bpage));
@@ -1212,7 +1310,8 @@ buf_LRU_add_block_low(
{
ut_ad(buf_pool);
ut_ad(bpage);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_a(buf_page_in_file(bpage));
ut_ad(!bpage->in_LRU_list);
@@ -1331,22 +1430,23 @@ buf_LRU_free_block(
buf_page_t* bpage, /* in: block to be freed */
ibool zip, /* in: TRUE if should remove also the
compressed page of an uncompressed page */
- ibool* buf_pool_mutex_released)
+ ibool* buf_pool_mutex_released,
/* in: pointer to a variable that will
be assigned TRUE if buf_pool_mutex
was temporarily released, or NULL */
+ ibool have_LRU_mutex)
{
buf_page_t* b = NULL;
mutex_t* block_mutex = buf_page_get_mutex(bpage);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(block_mutex));
ut_ad(buf_page_in_file(bpage));
- ut_ad(bpage->in_LRU_list);
+ //ut_ad(bpage->in_LRU_list);
ut_ad(!bpage->in_flush_list == !bpage->oldest_modification);
UNIV_MEM_ASSERT_RW(bpage, sizeof *bpage);
- if (!buf_page_can_relocate(bpage)) {
+ if (!bpage->in_LRU_list || !block_mutex || !buf_page_can_relocate(bpage)) {
/* Do not free buffer-fixed or I/O-fixed blocks. */
return(BUF_LRU_NOT_FREED);
@@ -1378,15 +1478,15 @@ buf_LRU_free_block(
If it cannot be allocated (without freeing a block
from the LRU list), refuse to free bpage. */
alloc:
- buf_pool_mutex_exit_forbid();
- b = buf_buddy_alloc(sizeof *b, NULL);
- buf_pool_mutex_exit_allow();
+ //buf_pool_mutex_exit_forbid();
+ b = buf_buddy_alloc(sizeof *b, NULL, FALSE);
+ //buf_pool_mutex_exit_allow();
if (UNIV_UNLIKELY(!b)) {
return(BUF_LRU_CANNOT_RELOCATE);
}
- memcpy(b, bpage, sizeof *b);
+ //memcpy(b, bpage, sizeof *b);
}
#ifdef UNIV_DEBUG
@@ -1397,6 +1497,39 @@ alloc:
}
#endif /* UNIV_DEBUG */
+ /* not to break latch order, must re-enter block_mutex */
+ mutex_exit(block_mutex);
+
+ if (!have_LRU_mutex)
+ mutex_enter(&LRU_list_mutex); /* optimistic */
+ rw_lock_x_lock(&page_hash_latch);
+ mutex_enter(block_mutex);
+
+ /* recheck states of block */
+ if (!bpage->in_LRU_list || block_mutex != buf_page_get_mutex(bpage)
+ || !buf_page_can_relocate(bpage)) {
+not_freed:
+ if (b) {
+ buf_buddy_free(b, sizeof *b, TRUE);
+ }
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ return(BUF_LRU_NOT_FREED);
+ } else if (zip || !bpage->zip.data) {
+ if (bpage->oldest_modification)
+ goto not_freed;
+ } else if (bpage->oldest_modification) {
+ if (buf_page_get_state(bpage) != BUF_BLOCK_FILE_PAGE) {
+ ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY);
+ goto not_freed;
+ }
+ }
+
+ if (b) {
+ memcpy(b, bpage, sizeof *b);
+ }
+
if (buf_LRU_block_remove_hashed_page(bpage, zip)
!= BUF_BLOCK_ZIP_FREE) {
ut_a(bpage->buf_fix_count == 0);
@@ -1408,6 +1541,10 @@ alloc:
ut_a(!buf_page_hash_get(bpage->space, bpage->offset));
+ while (prev_b && !prev_b->in_LRU_list) {
+ prev_b = UT_LIST_GET_PREV(LRU, prev_b);
+ }
+
b->state = b->oldest_modification
? BUF_BLOCK_ZIP_DIRTY
: BUF_BLOCK_ZIP_PAGE;
@@ -1482,6 +1619,7 @@ alloc:
buf_LRU_add_block_low(b, buf_page_is_old(b));
}
+ mutex_enter(&flush_list_mutex);
if (b->state == BUF_BLOCK_ZIP_PAGE) {
buf_LRU_insert_zip_clean(b);
} else {
@@ -1490,22 +1628,23 @@ alloc:
ut_ad(b->in_flush_list);
ut_d(bpage->in_flush_list = FALSE);
- prev = UT_LIST_GET_PREV(list, b);
- UT_LIST_REMOVE(list, buf_pool->flush_list, b);
+ prev = UT_LIST_GET_PREV(flush_list, b);
+ UT_LIST_REMOVE(flush_list, buf_pool->flush_list, b);
if (prev) {
ut_ad(prev->in_flush_list);
UT_LIST_INSERT_AFTER(
- list,
+ flush_list,
buf_pool->flush_list,
prev, b);
} else {
UT_LIST_ADD_FIRST(
- list,
+ flush_list,
buf_pool->flush_list,
b);
}
}
+ mutex_exit(&flush_list_mutex);
bpage->zip.data = NULL;
page_zip_set_size(&bpage->zip, 0);
@@ -1521,7 +1660,9 @@ alloc:
*buf_pool_mutex_released = TRUE;
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
mutex_exit(block_mutex);
/* Remove possible adaptive hash index on the page.
@@ -1553,7 +1694,9 @@ alloc:
: BUF_NO_CHECKSUM_MAGIC);
}
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ if (have_LRU_mutex)
+ mutex_enter(&LRU_list_mutex);
mutex_enter(block_mutex);
if (b) {
@@ -1563,13 +1706,17 @@ alloc:
mutex_exit(&buf_pool_zip_mutex);
}
- buf_LRU_block_free_hashed_page((buf_block_t*) bpage);
+ buf_LRU_block_free_hashed_page((buf_block_t*) bpage, FALSE);
} else {
/* The block_mutex should have been released by
buf_LRU_block_remove_hashed_page() when it returns
BUF_BLOCK_ZIP_FREE. */
ut_ad(block_mutex == &buf_pool_zip_mutex);
mutex_enter(block_mutex);
+
+ if (!have_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
}
return(BUF_LRU_FREED);
@@ -1581,12 +1728,13 @@ UNIV_INTERN
void
buf_LRU_block_free_non_file_page(
/*=============================*/
- buf_block_t* block) /* in: block, must not contain a file page */
+ buf_block_t* block, /* in: block, must not contain a file page */
+ ibool have_page_hash_mutex)
{
void* data;
ut_ad(block);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(&block->mutex));
switch (buf_block_get_state(block)) {
@@ -1620,15 +1768,17 @@ buf_LRU_block_free_non_file_page(
if (data) {
block->page.zip.data = NULL;
mutex_exit(&block->mutex);
- buf_pool_mutex_exit_forbid();
- buf_buddy_free(data, page_zip_get_size(&block->page.zip));
- buf_pool_mutex_exit_allow();
+ //buf_pool_mutex_exit_forbid();
+ buf_buddy_free(data, page_zip_get_size(&block->page.zip), have_page_hash_mutex);
+ //buf_pool_mutex_exit_allow();
mutex_enter(&block->mutex);
page_zip_set_size(&block->page.zip, 0);
}
- UT_LIST_ADD_FIRST(list, buf_pool->free, (&block->page));
+ mutex_enter(&free_list_mutex);
+ UT_LIST_ADD_FIRST(free, buf_pool->free, (&block->page));
ut_d(block->page.in_free_list = TRUE);
+ mutex_exit(&free_list_mutex);
UNIV_MEM_ASSERT_AND_FREE(block->frame, UNIV_PAGE_SIZE);
}
@@ -1657,7 +1807,11 @@ buf_LRU_block_remove_hashed_page(
{
const buf_page_t* hashed_bpage;
ut_ad(bpage);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX));
+#endif
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
@@ -1758,7 +1912,9 @@ buf_LRU_block_remove_hashed_page(
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
mutex_exit(buf_page_get_mutex(bpage));
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
buf_print();
buf_LRU_print();
buf_validate();
@@ -1781,14 +1937,14 @@ buf_LRU_block_remove_hashed_page(
ut_a(bpage->zip.data);
ut_a(buf_page_get_zip_size(bpage));
- UT_LIST_REMOVE(list, buf_pool->zip_clean, bpage);
+ UT_LIST_REMOVE(zip_list, buf_pool->zip_clean, bpage);
mutex_exit(&buf_pool_zip_mutex);
- buf_pool_mutex_exit_forbid();
+ //buf_pool_mutex_exit_forbid();
buf_buddy_free(bpage->zip.data,
- page_zip_get_size(&bpage->zip));
- buf_buddy_free(bpage, sizeof(*bpage));
- buf_pool_mutex_exit_allow();
+ page_zip_get_size(&bpage->zip), TRUE);
+ buf_buddy_free(bpage, sizeof(*bpage), TRUE);
+ //buf_pool_mutex_exit_allow();
UNIV_MEM_UNDESC(bpage);
return(BUF_BLOCK_ZIP_FREE);
@@ -1807,9 +1963,9 @@ buf_LRU_block_remove_hashed_page(
bpage->zip.data = NULL;
mutex_exit(&((buf_block_t*) bpage)->mutex);
- buf_pool_mutex_exit_forbid();
- buf_buddy_free(data, page_zip_get_size(&bpage->zip));
- buf_pool_mutex_exit_allow();
+ //buf_pool_mutex_exit_forbid();
+ buf_buddy_free(data, page_zip_get_size(&bpage->zip), TRUE);
+ //buf_pool_mutex_exit_allow();
mutex_enter(&((buf_block_t*) bpage)->mutex);
page_zip_set_size(&bpage->zip, 0);
}
@@ -1835,15 +1991,16 @@ static
void
buf_LRU_block_free_hashed_page(
/*===========================*/
- buf_block_t* block) /* in: block, must contain a file page and
+ buf_block_t* block, /* in: block, must contain a file page and
be in a state where it can be freed */
+ ibool have_page_hash_mutex)
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(&block->mutex));
buf_block_set_state(block, BUF_BLOCK_MEMORY);
- buf_LRU_block_free_non_file_page(block);
+ buf_LRU_block_free_non_file_page(block, have_page_hash_mutex);
}
/************************************************************************
@@ -1861,7 +2018,8 @@ buf_LRU_stat_update(void)
goto func_exit;
}
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
/* Update the index. */
item = &buf_LRU_stat_arr[buf_LRU_stat_arr_ind];
@@ -1875,7 +2033,8 @@ buf_LRU_stat_update(void)
/* Put current entry in the array. */
memcpy(item, &buf_LRU_stat_cur, sizeof *item);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
func_exit:
/* Clear the current entry. */
@@ -1897,7 +2056,8 @@ buf_LRU_validate(void)
ulint LRU_pos;
ut_ad(buf_pool);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) {
@@ -1956,15 +2116,21 @@ buf_LRU_validate(void)
ut_a(buf_pool->LRU_old_len == old_len);
}
- UT_LIST_VALIDATE(list, buf_page_t, buf_pool->free);
+ mutex_exit(&LRU_list_mutex);
+ mutex_enter(&free_list_mutex);
+
+ UT_LIST_VALIDATE(free, buf_page_t, buf_pool->free);
for (bpage = UT_LIST_GET_FIRST(buf_pool->free);
bpage != NULL;
- bpage = UT_LIST_GET_NEXT(list, bpage)) {
+ bpage = UT_LIST_GET_NEXT(free, bpage)) {
ut_a(buf_page_get_state(bpage) == BUF_BLOCK_NOT_USED);
}
+ mutex_exit(&free_list_mutex);
+ mutex_enter(&LRU_list_mutex);
+
UT_LIST_VALIDATE(unzip_LRU, buf_block_t, buf_pool->unzip_LRU);
for (block = UT_LIST_GET_FIRST(buf_pool->unzip_LRU);
@@ -1976,7 +2142,8 @@ buf_LRU_validate(void)
ut_a(buf_page_belongs_to_unzip_LRU(&block->page));
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
return(TRUE);
}
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
@@ -1992,7 +2159,8 @@ buf_LRU_print(void)
const buf_page_t* bpage;
ut_ad(buf_pool);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&LRU_list_mutex);
fprintf(stderr, "Pool ulint clock %lu\n",
(ulong) buf_pool->ulint_clock);
@@ -2055,6 +2223,7 @@ buf_LRU_print(void)
bpage = UT_LIST_GET_NEXT(LRU, bpage);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&LRU_list_mutex);
}
#endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */
=== modified file 'storage/xtradb/buf/buf0rea.c'
--- a/storage/xtradb/buf/buf0rea.c 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/buf/buf0rea.c 2009-07-06 05:47:15 +0000
@@ -134,6 +134,46 @@ buf_read_page_low(
bpage = buf_page_init_for_read(err, mode, space, zip_size, unzip,
tablespace_version, offset);
if (bpage == NULL) {
+ /* bugfix: http://bugs.mysql.com/bug.php?id=43948 */
+ if (recv_recovery_is_on() && *err == DB_TABLESPACE_DELETED) {
+ /* hashed log recs must be treated here */
+ recv_addr_t* recv_addr;
+
+ mutex_enter(&(recv_sys->mutex));
+
+ if (recv_sys->apply_log_recs == FALSE) {
+ mutex_exit(&(recv_sys->mutex));
+ goto not_to_recover;
+ }
+
+ /* recv_get_fil_addr_struct() */
+ recv_addr = HASH_GET_FIRST(recv_sys->addr_hash,
+ hash_calc_hash(ut_fold_ulint_pair(space, offset),
+ recv_sys->addr_hash));
+ while (recv_addr) {
+ if ((recv_addr->space == space)
+ && (recv_addr->page_no == offset)) {
+ break;
+ }
+ recv_addr = HASH_GET_NEXT(addr_hash, recv_addr);
+ }
+
+ if ((recv_addr == NULL)
+ || (recv_addr->state == RECV_BEING_PROCESSED)
+ || (recv_addr->state == RECV_PROCESSED)) {
+ mutex_exit(&(recv_sys->mutex));
+ goto not_to_recover;
+ }
+
+ fprintf(stderr, " (cannot find space: %lu)", space);
+ recv_addr->state = RECV_PROCESSED;
+
+ ut_a(recv_sys->n_addrs);
+ recv_sys->n_addrs--;
+
+ mutex_exit(&(recv_sys->mutex));
+ }
+not_to_recover:
return(0);
}
@@ -246,18 +286,22 @@ buf_read_ahead_random(
LRU_recent_limit = buf_LRU_get_recent_limit();
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
if (buf_pool->n_pend_reads
> buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
return(0);
}
+ mutex_exit(&buf_pool_mutex);
/* Count how many blocks in the area have been recently accessed,
that is, reside near the start of the LRU list. */
+ rw_lock_s_lock(&page_hash_latch);
for (i = low; i < high; i++) {
const buf_page_t* bpage = buf_page_hash_get(space, i);
@@ -269,13 +313,15 @@ buf_read_ahead_random(
if (recent_blocks >= BUF_READ_AHEAD_RANDOM_THRESHOLD) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
goto read_ahead;
}
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
/* Do nothing */
return(0);
@@ -469,10 +515,12 @@ buf_read_ahead_linear(
tablespace_version = fil_space_get_version(space);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&buf_pool_mutex);
if (high > fil_space_get_size(space)) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
/* The area is not whole, return */
return(0);
@@ -480,10 +528,12 @@ buf_read_ahead_linear(
if (buf_pool->n_pend_reads
> buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&buf_pool_mutex);
return(0);
}
+ mutex_exit(&buf_pool_mutex);
/* Check that almost all pages in the area have been accessed; if
offset == low, the accesses must be in a descending order, otherwise,
@@ -497,6 +547,7 @@ buf_read_ahead_linear(
fail_count = 0;
+ rw_lock_s_lock(&page_hash_latch);
for (i = low; i < high; i++) {
bpage = buf_page_hash_get(space, i);
@@ -520,7 +571,8 @@ buf_read_ahead_linear(
* LINEAR_AREA_THRESHOLD_COEF) {
/* Too many failures: return */
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(0);
}
@@ -531,7 +583,8 @@ buf_read_ahead_linear(
bpage = buf_page_hash_get(space, offset);
if (bpage == NULL) {
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(0);
}
@@ -557,7 +610,8 @@ buf_read_ahead_linear(
pred_offset = fil_page_get_prev(frame);
succ_offset = fil_page_get_next(frame);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
if ((offset == low) && (succ_offset == offset + 1)) {
@@ -770,11 +824,11 @@ buf_read_recv_pages(
while (buf_pool->n_pend_reads >= recv_n_pool_free_frames / 2) {
os_aio_simulated_wake_handler_threads();
- os_thread_sleep(500000);
+ os_thread_sleep(10000);
count++;
- if (count > 100) {
+ if (count > 5000) {
fprintf(stderr,
"InnoDB: Error: InnoDB has waited for"
" 50 seconds for pending\n"
=== modified file 'storage/xtradb/dict/dict0boot.c'
--- a/storage/xtradb/dict/dict0boot.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/dict/dict0boot.c 2009-06-25 01:43:25 +0000
@@ -265,6 +265,7 @@ dict_boot(void)
system tables */
/*-------------------------*/
table = dict_mem_table_create("SYS_TABLES", DICT_HDR_SPACE, 8, 0);
+ table->n_mysql_handles_opened = 1; /* for pin */
dict_mem_table_add_col(table, heap, "NAME", DATA_BINARY, 0, 0);
dict_mem_table_add_col(table, heap, "ID", DATA_BINARY, 0, 0);
@@ -314,6 +315,7 @@ dict_boot(void)
/*-------------------------*/
table = dict_mem_table_create("SYS_COLUMNS", DICT_HDR_SPACE, 7, 0);
+ table->n_mysql_handles_opened = 1; /* for pin */
dict_mem_table_add_col(table, heap, "TABLE_ID", DATA_BINARY, 0, 0);
dict_mem_table_add_col(table, heap, "POS", DATA_INT, 0, 4);
@@ -346,6 +348,7 @@ dict_boot(void)
/*-------------------------*/
table = dict_mem_table_create("SYS_INDEXES", DICT_HDR_SPACE, 7, 0);
+ table->n_mysql_handles_opened = 1; /* for pin */
dict_mem_table_add_col(table, heap, "TABLE_ID", DATA_BINARY, 0, 0);
dict_mem_table_add_col(table, heap, "ID", DATA_BINARY, 0, 0);
@@ -388,6 +391,7 @@ dict_boot(void)
/*-------------------------*/
table = dict_mem_table_create("SYS_FIELDS", DICT_HDR_SPACE, 3, 0);
+ table->n_mysql_handles_opened = 1; /* for pin */
dict_mem_table_add_col(table, heap, "INDEX_ID", DATA_BINARY, 0, 0);
dict_mem_table_add_col(table, heap, "POS", DATA_INT, 0, 4);
=== modified file 'storage/xtradb/dict/dict0crea.c'
--- a/storage/xtradb/dict/dict0crea.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/dict/dict0crea.c 2009-06-25 01:43:25 +0000
@@ -1184,6 +1184,9 @@ dict_create_or_check_foreign_constraint_
/* Foreign constraint system tables have already been
created, and they are ok */
+ table1->n_mysql_handles_opened = 1; /* for pin */
+ table2->n_mysql_handles_opened = 1; /* for pin */
+
mutex_exit(&(dict_sys->mutex));
return(DB_SUCCESS);
@@ -1265,6 +1268,11 @@ dict_create_or_check_foreign_constraint_
trx_commit_for_mysql(trx);
+ table1 = dict_table_get_low("SYS_FOREIGN");
+ table2 = dict_table_get_low("SYS_FOREIGN_COLS");
+ table1->n_mysql_handles_opened = 1; /* for pin */
+ table2->n_mysql_handles_opened = 1; /* for pin */
+
row_mysql_unlock_data_dictionary(trx);
trx_free_for_mysql(trx);
=== modified file 'storage/xtradb/dict/dict0dict.c'
--- a/storage/xtradb/dict/dict0dict.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/dict/dict0dict.c 2009-08-03 07:14:02 +0000
@@ -545,6 +545,8 @@ dict_table_get_on_id(
table = dict_table_get_on_id_low(table_id);
+ dict_table_LRU_trim(table);
+
mutex_exit(&(dict_sys->mutex));
return(table);
@@ -659,6 +661,8 @@ dict_table_get(
table->n_mysql_handles_opened++;
}
+ dict_table_LRU_trim(table);
+
mutex_exit(&(dict_sys->mutex));
if (table != NULL) {
@@ -1153,6 +1157,64 @@ dict_table_remove_from_cache(
dict_mem_table_free(table);
}
+/**************************************************************************
+Frees tables from the end of table_LRU if the dictionary cache occupies
+too much space. */
+UNIV_INTERN
+void
+dict_table_LRU_trim(
+/*================*/
+ dict_table_t* self)
+{
+ dict_table_t* table;
+ dict_table_t* prev_table;
+ dict_foreign_t* foreign;
+ ulint n_removed;
+ ulint n_have_parent;
+ ulint cached_foreign_tables;
+
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(mutex_own(&(dict_sys->mutex)));
+#endif /* UNIV_SYNC_DEBUG */
+
+retry:
+ n_removed = n_have_parent = 0;
+ table = UT_LIST_GET_LAST(dict_sys->table_LRU);
+
+ while ( srv_dict_size_limit && table
+ && ((dict_sys->table_hash->n_cells
+ + dict_sys->table_id_hash->n_cells) * sizeof(hash_cell_t)
+ + dict_sys->size) > srv_dict_size_limit ) {
+ prev_table = UT_LIST_GET_PREV(table_LRU, table);
+
+ if (table == self || table->n_mysql_handles_opened)
+ goto next_loop;
+
+ cached_foreign_tables = 0;
+ foreign = UT_LIST_GET_FIRST(table->foreign_list);
+ while (foreign != NULL) {
+ if (foreign->referenced_table)
+ cached_foreign_tables++;
+ foreign = UT_LIST_GET_NEXT(foreign_list, foreign);
+ }
+
+ if (cached_foreign_tables == 0) {
+ dict_table_remove_from_cache(table);
+ n_removed++;
+ } else {
+ n_have_parent++;
+ }
+next_loop:
+ table = prev_table;
+ }
+
+ if ( srv_dict_size_limit && n_have_parent && n_removed
+ && ((dict_sys->table_hash->n_cells
+ + dict_sys->table_id_hash->n_cells) * sizeof(hash_cell_t)
+ + dict_sys->size) > srv_dict_size_limit )
+ goto retry;
+}
+
/********************************************************************
If the given column name is reserved for InnoDB system columns, return
TRUE. */
@@ -2987,7 +3049,7 @@ scan_more:
} else if (quote) {
/* Within quotes: do not look for
starting quotes or comments. */
- } else if (*sptr == '"' || *sptr == '`') {
+ } else if (*sptr == '"' || *sptr == '`' || *sptr == '\'') {
/* Starting quote: remember the quote character. */
quote = *sptr;
} else if (*sptr == '#'
@@ -4276,7 +4338,8 @@ dict_table_print_low(
ut_ad(mutex_own(&(dict_sys->mutex)));
- dict_update_statistics_low(table, TRUE);
+ if (srv_stats_auto_update)
+ dict_update_statistics_low(table, TRUE);
fprintf(stderr,
"--------------------------------------\n"
=== modified file 'storage/xtradb/dict/dict0load.c'
--- a/storage/xtradb/dict/dict0load.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/dict/dict0load.c 2009-06-25 01:43:25 +0000
@@ -223,7 +223,7 @@ loop:
/* The table definition was corrupt if there
is no index */
- if (dict_table_get_first_index(table)) {
+ if (srv_stats_auto_update && dict_table_get_first_index(table)) {
dict_update_statistics_low(table, TRUE);
}
=== modified file 'storage/xtradb/fil/fil0fil.c'
--- a/storage/xtradb/fil/fil0fil.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/fil/fil0fil.c 2009-06-25 01:43:25 +0000
@@ -42,6 +42,10 @@ Created 10/25/1995 Heikki Tuuri
#include "mtr0log.h"
#include "dict0dict.h"
#include "page0zip.h"
+#include "trx0trx.h"
+#include "trx0sys.h"
+#include "pars0pars.h"
+#include "row0mysql.h"
/*
@@ -2977,7 +2981,7 @@ fil_open_single_table_tablespace(
ut_a(flags != DICT_TF_COMPACT);
file = os_file_create_simple_no_error_handling(
- filepath, OS_FILE_OPEN, OS_FILE_READ_ONLY, &success);
+ filepath, OS_FILE_OPEN, OS_FILE_READ_WRITE, &success);
if (!success) {
/* The following call prints an error message */
os_file_get_last_error(TRUE);
@@ -3025,6 +3029,275 @@ fil_open_single_table_tablespace(
space_id = fsp_header_get_space_id(page);
space_flags = fsp_header_get_flags(page);
+ if (srv_expand_import && (space_id != id || space_flags != flags)) {
+ dulint old_id[31];
+ dulint new_id[31];
+ ulint root_page[31];
+ ulint n_index;
+ os_file_t info_file = -1;
+ char* info_file_path;
+ ulint i;
+ int len;
+ ib_uint64_t current_lsn;
+
+ current_lsn = log_get_lsn();
+
+ /* overwrite fsp header */
+ fsp_header_init_fields(page, id, flags);
+ mach_write_to_4(page + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, id);
+ space_id = id;
+ space_flags = flags;
+ if (mach_read_ull(page + FIL_PAGE_FILE_FLUSH_LSN) > current_lsn)
+ mach_write_ull(page + FIL_PAGE_FILE_FLUSH_LSN, current_lsn);
+ mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM,
+ srv_use_checksums
+ ? buf_calc_page_new_checksum(page)
+ : BUF_NO_CHECKSUM_MAGIC);
+ mach_write_to_4(page + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM,
+ srv_use_checksums
+ ? buf_calc_page_old_checksum(page)
+ : BUF_NO_CHECKSUM_MAGIC);
+ success = os_file_write(filepath, file, page, 0, 0, UNIV_PAGE_SIZE);
+
+ /* get file size */
+ ulint size_low, size_high, size;
+ ib_int64_t size_bytes;
+ os_file_get_size(file, &size_low, &size_high);
+ size_bytes = (((ib_int64_t)size_high) << 32)
+ + (ib_int64_t)size_low;
+
+ /* get cruster index information */
+ dict_table_t* table;
+ dict_index_t* index;
+ table = dict_table_get_low(name);
+ index = dict_table_get_first_index(table);
+ ut_a(index->page==3);
+
+
+ /* read metadata from .exp file */
+ n_index = 0;
+ bzero(old_id, sizeof(old_id));
+ bzero(new_id, sizeof(new_id));
+ bzero(root_page, sizeof(root_page));
+
+ info_file_path = fil_make_ibd_name(name, FALSE);
+ len = strlen(info_file_path);
+ info_file_path[len - 3] = 'e';
+ info_file_path[len - 2] = 'x';
+ info_file_path[len - 1] = 'p';
+
+ info_file = os_file_create_simple_no_error_handling(
+ info_file_path, OS_FILE_OPEN, OS_FILE_READ_ONLY, &success);
+ if (!success) {
+ fprintf(stderr, "InnoDB: cannot open %s\n", info_file_path);
+ goto skip_info;
+ }
+ success = os_file_read(info_file, page, 0, 0, UNIV_PAGE_SIZE);
+ if (!success) {
+ fprintf(stderr, "InnoDB: cannot read %s\n", info_file_path);
+ goto skip_info;
+ }
+ if (mach_read_from_4(page) != 0x78706f72UL
+ || mach_read_from_4(page + 4) != 0x74696e66UL) {
+ fprintf(stderr, "InnoDB: %s seems not to be a correct .exp file\n", info_file_path);
+ goto skip_info;
+ }
+
+ fprintf(stderr, "InnoDB: import: extended import of %s is started.\n", name);
+
+ n_index = mach_read_from_4(page + 8);
+ fprintf(stderr, "InnoDB: import: %lu indexes are detected.\n", (ulong)n_index);
+ for (i = 0; i < n_index; i++) {
+ new_id[i] =
+ dict_table_get_index_on_name(table,
+ (page + (i + 1) * 512 + 12))->id;
+ old_id[i] = mach_read_from_8(page + (i + 1) * 512);
+ root_page[i] = mach_read_from_4(page + (i + 1) * 512 + 8);
+ }
+
+skip_info:
+ if (info_file != -1)
+ os_file_close(info_file);
+
+ /*
+ if (size_bytes >= 1024 * 1024) {
+ size_bytes = ut_2pow_round(size_bytes, 1024 * 1024);
+ }
+ */
+ if (!(flags & DICT_TF_ZSSIZE_MASK)) {
+ mem_heap_t* heap = NULL;
+ ulint offsets_[REC_OFFS_NORMAL_SIZE];
+ ulint* offsets = offsets_;
+ size = (ulint) (size_bytes / UNIV_PAGE_SIZE);
+ /* over write space id of all pages */
+ ib_int64_t offset;
+
+ rec_offs_init(offsets_);
+
+ fprintf(stderr, "InnoDB: Progress in %:");
+
+ for (offset = 0; offset < size_bytes; offset += UNIV_PAGE_SIZE) {
+ success = os_file_read(file, page,
+ (ulint)(offset & 0xFFFFFFFFUL),
+ (ulint)(offset >> 32), UNIV_PAGE_SIZE);
+ if (mach_read_from_4(page + FIL_PAGE_OFFSET) || !offset) {
+ mach_write_to_4(page + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, id);
+
+ for (i = 0; i < n_index; i++) {
+ if (offset / UNIV_PAGE_SIZE == root_page[i]) {
+ /* this is index root page */
+ mach_write_to_4(page + FIL_PAGE_DATA + PAGE_BTR_SEG_LEAF
+ + FSEG_HDR_SPACE, id);
+ mach_write_to_4(page + FIL_PAGE_DATA + PAGE_BTR_SEG_TOP
+ + FSEG_HDR_SPACE, id);
+ break;
+ }
+ }
+
+ if (fil_page_get_type(page) == FIL_PAGE_INDEX) {
+ dulint tmp = mach_read_from_8(page + (PAGE_HEADER + PAGE_INDEX_ID));
+
+ if (mach_read_from_2(page + PAGE_HEADER + PAGE_LEVEL) == 0
+ && ut_dulint_cmp(old_id[0], tmp) == 0) {
+ /* leaf page of cluster index, reset trx_id of records */
+ rec_t* rec;
+ rec_t* supremum;
+ ulint n_recs;
+
+ supremum = page_get_supremum_rec(page);
+ rec = page_rec_get_next(page_get_infimum_rec(page));
+ n_recs = page_get_n_recs(page);
+
+ while (rec && rec != supremum && n_recs > 0) {
+ ulint offset = index->trx_id_offset;
+ if (!offset) {
+ offsets = rec_get_offsets(rec, index, offsets,
+ ULINT_UNDEFINED, &heap);
+ offset = row_get_trx_id_offset(rec, index, offsets);
+ }
+ trx_write_trx_id(rec + offset, ut_dulint_create(0, 1));
+ rec = page_rec_get_next(rec);
+ n_recs--;
+ }
+ }
+
+ for (i = 0; i < n_index; i++) {
+ if (ut_dulint_cmp(old_id[i], tmp) == 0) {
+ mach_write_to_8(page + (PAGE_HEADER + PAGE_INDEX_ID), new_id[i]);
+ break;
+ }
+ }
+ }
+
+ if (mach_read_ull(page + FIL_PAGE_LSN) > current_lsn) {
+ mach_write_ull(page + FIL_PAGE_LSN, current_lsn);
+ mach_write_ull(page + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM,
+ current_lsn);
+ }
+
+ mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM,
+ srv_use_checksums
+ ? buf_calc_page_new_checksum(page)
+ : BUF_NO_CHECKSUM_MAGIC);
+ mach_write_to_4(page + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM,
+ srv_use_checksums
+ ? buf_calc_page_old_checksum(page)
+ : BUF_NO_CHECKSUM_MAGIC);
+
+ success = os_file_write(filepath, file, page,
+ (ulint)(offset & 0xFFFFFFFFUL),
+ (ulint)(offset >> 32), UNIV_PAGE_SIZE);
+ }
+
+ if (size_bytes
+ && ((ib_int64_t)((offset + UNIV_PAGE_SIZE) * 100) / size_bytes)
+ != ((offset * 100) / size_bytes)) {
+ fprintf(stderr, " %lu",
+ (ulong)((ib_int64_t)((offset + UNIV_PAGE_SIZE) * 100) / size_bytes));
+ }
+ }
+
+ fprintf(stderr, " done.\n");
+
+ /* update SYS_INDEXES set root page */
+ index = dict_table_get_first_index(table);
+ while (index) {
+ for (i = 0; i < n_index; i++) {
+ if (ut_dulint_cmp(new_id[i], index->id) == 0) {
+ break;
+ }
+ }
+
+ if (i != n_index
+ && root_page[i] != index->page) {
+ /* must update */
+ ulint error;
+ trx_t* trx;
+ pars_info_t* info = NULL;
+
+ trx = trx_allocate_for_mysql();
+ trx->op_info = "extended import";
+
+ info = pars_info_create();
+
+ pars_info_add_dulint_literal(info, "indexid", new_id[i]);
+ pars_info_add_int4_literal(info, "new_page", (lint) root_page[i]);
+
+ error = que_eval_sql(info,
+ "PROCEDURE UPDATE_INDEX_PAGE () IS\n"
+ "BEGIN\n"
+ "UPDATE SYS_INDEXES"
+ " SET PAGE_NO = :new_page"
+ " WHERE ID = :indexid;\n"
+ "COMMIT WORK;\n"
+ "END;\n",
+ FALSE, trx);
+
+ if (error != DB_SUCCESS) {
+ fprintf(stderr, "InnoDB: failed to update SYS_INDEXES\n");
+ }
+
+ trx_commit_for_mysql(trx);
+
+ trx_free_for_mysql(trx);
+
+ index->page = root_page[i];
+ }
+
+ index = dict_table_get_next_index(index);
+ }
+ if (UNIV_LIKELY_NULL(heap)) {
+ mem_heap_free(heap);
+ }
+ } else {
+ /* zip page? */
+ size = (ulint)
+ (size_bytes
+ / dict_table_flags_to_zip_size(flags));
+ fprintf(stderr, "InnoDB: import: table %s seems to be in newer format."
+ " It may not be able to treated for now.\n", name);
+ }
+ /* .exp file should be removed */
+ success = os_file_delete(info_file_path);
+ if (!success) {
+ success = os_file_delete_if_exists(info_file_path);
+ }
+ mem_free(info_file_path);
+
+ fil_system_t* system = fil_system;
+ mutex_enter(&(system->mutex));
+ fil_node_t* node = NULL;
+ fil_space_t* space;
+ space = fil_space_get_by_id(id);
+ if (space)
+ node = UT_LIST_GET_FIRST(space->chain);
+ if (node && node->size < size) {
+ space->size += (size - node->size);
+ node->size = size;
+ }
+ mutex_exit(&(system->mutex));
+ }
+
ut_free(buf2);
if (UNIV_UNLIKELY(space_id != id || space_flags != flags)) {
=== modified file 'storage/xtradb/handler/ha_innodb.cc'
--- a/storage/xtradb/handler/ha_innodb.cc 2009-06-18 12:39:21 +0000
+++ b/storage/xtradb/handler/ha_innodb.cc 2009-08-03 20:09:53 +0000
@@ -157,6 +157,7 @@ static long innobase_mirrored_log_groups
innobase_autoinc_lock_mode;
static unsigned long innobase_read_io_threads, innobase_write_io_threads;
+static my_bool innobase_thread_concurrency_timer_based;
static long long innobase_buffer_pool_size, innobase_log_file_size;
/* The default values for the following char* start-up parameters
@@ -488,6 +489,8 @@ static SHOW_VAR innodb_status_variables[
(char*) &export_vars.innodb_dblwr_pages_written, SHOW_LONG},
{"dblwr_writes",
(char*) &export_vars.innodb_dblwr_writes, SHOW_LONG},
+ {"dict_tables",
+ (char*) &export_vars.innodb_dict_tables, SHOW_LONG},
{"have_atomic_builtins",
(char*) &export_vars.innodb_have_atomic_builtins, SHOW_BOOL},
{"log_waits",
@@ -2100,77 +2103,6 @@ mem_free_and_error:
goto error;
}
-#ifdef HAVE_REPLICATION
-#ifdef MYSQL_SERVER
- if(innobase_overwrite_relay_log_info) {
- /* If InnoDB progressed from relay-log.info, overwrite it */
- if (fname[0] == '\0') {
- fprintf(stderr,
- "InnoDB: something wrong with relay-info.log. InnoDB will not overwrite it.\n");
- } else if (0 != strcmp(fname, trx_sys_mysql_master_log_name)
- || pos != trx_sys_mysql_master_log_pos) {
- /* Overwrite relay-log.info */
- bzero((char*) &info_file, sizeof(info_file));
- fn_format(fname, relay_log_info_file, mysql_data_home, "", 4+32);
-
- int error = 0;
-
- if (!access(fname,F_OK)) {
- /* exist */
- if ((info_fd = my_open(fname, O_RDWR|O_BINARY, MYF(MY_WME))) < 0) {
- error = 1;
- } else if (init_io_cache(&info_file, info_fd, IO_SIZE*2,
- WRITE_CACHE, 0L, 0, MYF(MY_WME))) {
- error = 1;
- }
-
- if (error) {
- if (info_fd >= 0)
- my_close(info_fd, MYF(0));
- goto skip_overwrite;
- }
- } else {
- error = 1;
- goto skip_overwrite;
- }
-
- char buff[FN_REFLEN*2+22*2+4], *pos;
-
- my_b_seek(&info_file, 0L);
- pos=strmov(buff, trx_sys_mysql_relay_log_name);
- *pos++='\n';
- pos=longlong2str(trx_sys_mysql_relay_log_pos, pos, 10);
- *pos++='\n';
- pos=strmov(pos, trx_sys_mysql_master_log_name);
- *pos++='\n';
- pos=longlong2str(trx_sys_mysql_master_log_pos, pos, 10);
- *pos='\n';
-
- if (my_b_write(&info_file, (uchar*) buff, (size_t) (pos-buff)+1))
- error = 1;
- if (flush_io_cache(&info_file))
- error = 1;
-
- end_io_cache(&info_file);
- if (info_fd >= 0)
- my_close(info_fd, MYF(0));
-skip_overwrite:
- if (error) {
- fprintf(stderr,
- "InnoDB: ERROR: error occured during overwriting relay-log.info.\n");
- } else {
- fprintf(stderr,
- "InnoDB: relay-log.info was overwritten.\n");
- }
- } else {
- fprintf(stderr,
- "InnoDB: InnoDB and relay-log.info are synchronized. InnoDB will not overwrite it.\n");
- }
- }
-#endif /* MYSQL_SERVER */
-#endif /* HAVE_REPLICATION */
-
-
srv_extra_undoslots = (ibool) innobase_extra_undoslots;
/* -------------- Log files ---------------------------*/
@@ -2266,6 +2198,9 @@ skip_overwrite:
srv_n_log_files = (ulint) innobase_log_files_in_group;
srv_log_file_size = (ulint) innobase_log_file_size;
+ srv_thread_concurrency_timer_based =
+ (ibool) innobase_thread_concurrency_timer_based;
+
#ifdef UNIV_LOG_ARCHIVE
srv_log_archive_on = (ulint) innobase_log_archive;
#endif /* UNIV_LOG_ARCHIVE */
@@ -2280,6 +2215,7 @@ skip_overwrite:
srv_n_write_io_threads = (ulint) innobase_write_io_threads;
srv_read_ahead &= 3;
+ srv_adaptive_checkpoint %= 3;
srv_force_recovery = (ulint) innobase_force_recovery;
@@ -2329,6 +2265,76 @@ skip_overwrite:
goto mem_free_and_error;
}
+#ifdef HAVE_REPLICATION
+#ifdef MYSQL_SERVER
+ if(innobase_overwrite_relay_log_info) {
+ /* If InnoDB progressed from relay-log.info, overwrite it */
+ if (fname[0] == '\0') {
+ fprintf(stderr,
+ "InnoDB: something wrong with relay-info.log. InnoDB will not overwrite it.\n");
+ } else if (0 != strcmp(fname, trx_sys_mysql_master_log_name)
+ || pos != trx_sys_mysql_master_log_pos) {
+ /* Overwrite relay-log.info */
+ bzero((char*) &info_file, sizeof(info_file));
+ fn_format(fname, relay_log_info_file, mysql_data_home, "", 4+32);
+
+ int error = 0;
+
+ if (!access(fname,F_OK)) {
+ /* exist */
+ if ((info_fd = my_open(fname, O_RDWR|O_BINARY, MYF(MY_WME))) < 0) {
+ error = 1;
+ } else if (init_io_cache(&info_file, info_fd, IO_SIZE*2,
+ WRITE_CACHE, 0L, 0, MYF(MY_WME))) {
+ error = 1;
+ }
+
+ if (error) {
+ if (info_fd >= 0)
+ my_close(info_fd, MYF(0));
+ goto skip_overwrite;
+ }
+ } else {
+ error = 1;
+ goto skip_overwrite;
+ }
+
+ char buff[FN_REFLEN*2+22*2+4], *pos;
+
+ my_b_seek(&info_file, 0L);
+ pos=strmov(buff, trx_sys_mysql_relay_log_name);
+ *pos++='\n';
+ pos=longlong2str(trx_sys_mysql_relay_log_pos, pos, 10);
+ *pos++='\n';
+ pos=strmov(pos, trx_sys_mysql_master_log_name);
+ *pos++='\n';
+ pos=longlong2str(trx_sys_mysql_master_log_pos, pos, 10);
+ *pos='\n';
+
+ if (my_b_write(&info_file, (uchar*) buff, (size_t) (pos-buff)+1))
+ error = 1;
+ if (flush_io_cache(&info_file))
+ error = 1;
+
+ end_io_cache(&info_file);
+ if (info_fd >= 0)
+ my_close(info_fd, MYF(0));
+skip_overwrite:
+ if (error) {
+ fprintf(stderr,
+ "InnoDB: ERROR: error occured during overwriting relay-log.info.\n");
+ } else {
+ fprintf(stderr,
+ "InnoDB: relay-log.info was overwritten.\n");
+ }
+ } else {
+ fprintf(stderr,
+ "InnoDB: InnoDB and relay-log.info are synchronized. InnoDB will not overwrite it.\n");
+ }
+ }
+#endif /* MYSQL_SERVER */
+#endif /* HAVE_REPLICATION */
+
innobase_open_tables = hash_create(200);
pthread_mutex_init(&innobase_share_mutex, MY_MUTEX_INIT_FAST);
pthread_mutex_init(&prepare_commit_mutex, MY_MUTEX_INIT_FAST);
@@ -7079,7 +7085,9 @@ ha_innobase::info(
ib_table = prebuilt->table;
if (flag & HA_STATUS_TIME) {
- if (innobase_stats_on_metadata) {
+ if (innobase_stats_on_metadata
+ && (thd_sql_command(user_thd) == SQLCOM_ANALYZE
+ || srv_stats_auto_update)) {
/* In sql_show we call with this flag: update
then statistics so that they are up-to-date */
@@ -9321,7 +9329,8 @@ ha_innobase::check_if_incompatible_data(
if (info_row_type == ROW_TYPE_DEFAULT)
info_row_type = ROW_TYPE_COMPACT;
if ((info->used_fields & HA_CREATE_USED_ROW_FORMAT) &&
- row_type != info_row_type) {
+ get_row_type() != ((info->row_type == ROW_TYPE_DEFAULT)
+ ? ROW_TYPE_COMPACT : info->row_type)) {
DBUG_PRINT("info", ("get_row_type()=%d != info->row_type=%d -> "
"COMPATIBLE_DATA_NO",
@@ -9830,6 +9839,31 @@ static MYSQL_SYSVAR_ULONGLONG(stats_samp
"The number of index pages to sample when calculating statistics (default 8)",
NULL, NULL, 8, 1, ~0ULL, 0);
+const char *innobase_stats_method_names[]=
+{
+ "nulls_equal",
+ "nulls_unequal",
+ "nulls_ignored",
+ NullS
+};
+TYPELIB innobase_stats_method_typelib=
+{
+ array_elements(innobase_stats_method_names) - 1, "innobase_stats_method_typelib",
+ innobase_stats_method_names, NULL
+};
+static MYSQL_SYSVAR_ENUM(stats_method, srv_stats_method,
+ PLUGIN_VAR_RQCMDARG,
+ "Specifies how InnoDB index statistics collection code should threat NULLs. "
+ "Possible values of name are same to for 'myisam_stats_method'. "
+ "This is startup parameter.",
+ NULL, NULL, 0, &innobase_stats_method_typelib);
+
+static MYSQL_SYSVAR_ULONG(stats_auto_update, srv_stats_auto_update,
+ PLUGIN_VAR_RQCMDARG,
+ "Enable/Disable InnoDB's auto update statistics of indexes. "
+ "(except for ANALYZE TABLE command) 0:disable 1:enable",
+ NULL, NULL, 1, 0, 1, 0);
+
static MYSQL_SYSVAR_BOOL(adaptive_hash_index, btr_search_enabled,
PLUGIN_VAR_OPCMDARG,
"Enable InnoDB adaptive hash index (enabled by default). "
@@ -9907,6 +9941,12 @@ static MYSQL_SYSVAR_ULONG(sync_spin_loop
"Count of spin-loop rounds in InnoDB mutexes",
NULL, NULL, 20L, 0L, ~0L, 0);
+static MYSQL_SYSVAR_BOOL(thread_concurrency_timer_based,
+ innobase_thread_concurrency_timer_based,
+ PLUGIN_VAR_NOCMDARG | PLUGIN_VAR_READONLY,
+ "Use InnoDB timer based concurrency throttling. ",
+ NULL, NULL, FALSE);
+
static MYSQL_SYSVAR_ULONG(thread_concurrency, srv_thread_concurrency,
PLUGIN_VAR_RQCMDARG,
"Helps in performance tuning in heavily concurrent environments. Sets the maximum number of threads allowed inside InnoDB. Value 0 will disable the thread throttling.",
@@ -9953,7 +9993,7 @@ static MYSQL_SYSVAR_STR(change_buffering
static MYSQL_SYSVAR_ULONG(io_capacity, srv_io_capacity,
PLUGIN_VAR_RQCMDARG,
"Number of IO operations per second the server can do. Tunes background IO rate.",
- NULL, NULL, 100, 100, 999999999, 0);
+ NULL, NULL, 200, 100, 999999999, 0);
static MYSQL_SYSVAR_LONGLONG(ibuf_max_size, srv_ibuf_max_size,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
@@ -10008,10 +10048,36 @@ static MYSQL_SYSVAR_ENUM(read_ahead, srv
"Control read ahead activity. (none, random, linear, [both])",
NULL, innodb_read_ahead_update, 3, &read_ahead_typelib);
-static MYSQL_SYSVAR_ULONG(adaptive_checkpoint, srv_adaptive_checkpoint,
+static
+void
+innodb_adaptive_checkpoint_update(
+ THD* thd,
+ struct st_mysql_sys_var* var,
+ void* var_ptr,
+ const void* save)
+{
+ *(long *)var_ptr= (*(long *)save) % 3;
+}
+const char *adaptive_checkpoint_names[]=
+{
+ "none", /* 0 */
+ "reflex", /* 1 */
+ "estimate", /* 2 */
+ /* For compatibility of the older patch */
+ "0", /* 3 ("none" + 3) */
+ "1", /* 4 ("reflex" + 3) */
+ "2", /* 5 ("estimate" + 3) */
+ NullS
+};
+TYPELIB adaptive_checkpoint_typelib=
+{
+ array_elements(adaptive_checkpoint_names) - 1, "adaptive_checkpoint_typelib",
+ adaptive_checkpoint_names, NULL
+};
+static MYSQL_SYSVAR_ENUM(adaptive_checkpoint, srv_adaptive_checkpoint,
PLUGIN_VAR_RQCMDARG,
- "Enable/Disable flushing along modified age. 0:disable 1:enable",
- NULL, NULL, 0, 0, 1, 0);
+ "Enable/Disable flushing along modified age. ([none], reflex, estimate)",
+ NULL, innodb_adaptive_checkpoint_update, 0, &adaptive_checkpoint_typelib);
static MYSQL_SYSVAR_ULONG(enable_unsafe_group_commit, srv_enable_unsafe_group_commit,
PLUGIN_VAR_RQCMDARG,
@@ -10021,18 +10087,28 @@ static MYSQL_SYSVAR_ULONG(enable_unsafe_
static MYSQL_SYSVAR_ULONG(read_io_threads, innobase_read_io_threads,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
"Number of background read I/O threads in InnoDB.",
- NULL, NULL, 1, 1, 64, 0);
+ NULL, NULL, 8, 1, 64, 0);
static MYSQL_SYSVAR_ULONG(write_io_threads, innobase_write_io_threads,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
"Number of background write I/O threads in InnoDB.",
- NULL, NULL, 1, 1, 64, 0);
+ NULL, NULL, 8, 1, 64, 0);
+
+static MYSQL_SYSVAR_ULONG(expand_import, srv_expand_import,
+ PLUGIN_VAR_RQCMDARG,
+ "Enable/Disable converting automatically *.ibd files when import tablespace.",
+ NULL, NULL, 0, 0, 1, 0);
static MYSQL_SYSVAR_ULONG(extra_rsegments, srv_extra_rsegments,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
"Number of extra user rollback segments when create new database.",
NULL, NULL, 0, 0, 127, 0);
+static MYSQL_SYSVAR_ULONG(dict_size_limit, srv_dict_size_limit,
+ PLUGIN_VAR_RQCMDARG,
+ "Limit the allocated memory for dictionary cache. (0: unlimited)",
+ NULL, NULL, 0, 0, LONG_MAX, 0);
+
static struct st_mysql_sys_var* innobase_system_variables[]= {
MYSQL_SYSVAR(additional_mem_pool_size),
MYSQL_SYSVAR(autoextend_increment),
@@ -10069,6 +10145,8 @@ static struct st_mysql_sys_var* innobase
MYSQL_SYSVAR(overwrite_relay_log_info),
MYSQL_SYSVAR(rollback_on_timeout),
MYSQL_SYSVAR(stats_on_metadata),
+ MYSQL_SYSVAR(stats_method),
+ MYSQL_SYSVAR(stats_auto_update),
MYSQL_SYSVAR(stats_sample_pages),
MYSQL_SYSVAR(adaptive_hash_index),
MYSQL_SYSVAR(replication_delay),
@@ -10078,6 +10156,7 @@ static struct st_mysql_sys_var* innobase
MYSQL_SYSVAR(sync_spin_loops),
MYSQL_SYSVAR(table_locks),
MYSQL_SYSVAR(thread_concurrency),
+ MYSQL_SYSVAR(thread_concurrency_timer_based),
MYSQL_SYSVAR(thread_sleep_delay),
MYSQL_SYSVAR(autoinc_lock_mode),
MYSQL_SYSVAR(show_verbose_locks),
@@ -10093,7 +10172,9 @@ static struct st_mysql_sys_var* innobase
MYSQL_SYSVAR(enable_unsafe_group_commit),
MYSQL_SYSVAR(read_io_threads),
MYSQL_SYSVAR(write_io_threads),
+ MYSQL_SYSVAR(expand_import),
MYSQL_SYSVAR(extra_rsegments),
+ MYSQL_SYSVAR(dict_size_limit),
MYSQL_SYSVAR(use_sys_malloc),
MYSQL_SYSVAR(change_buffering),
NULL
@@ -10287,6 +10368,8 @@ i_s_innodb_cmp,
i_s_innodb_cmp_reset,
i_s_innodb_cmpmem,
i_s_innodb_cmpmem_reset,
+i_s_innodb_table_stats,
+i_s_innodb_index_stats,
i_s_innodb_patches
mysql_declare_plugin_end;
=== modified file 'storage/xtradb/handler/i_s.cc'
--- a/storage/xtradb/handler/i_s.cc 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/handler/i_s.cc 2009-06-25 01:43:25 +0000
@@ -45,6 +45,7 @@ extern "C" {
#include "dict0dict.h" /* for dict_index_get_if_in_cache */
#include "trx0rseg.h" /* for trx_rseg_struct */
#include "trx0sys.h" /* for trx_sys */
+#include "dict0dict.h" /* for dict_sys */
/* from buf0buf.c */
struct buf_chunk_struct{
ulint mem_size; /* allocated size of the chunk */
@@ -2282,7 +2283,8 @@ i_s_cmpmem_fill_low(
RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ mutex_enter(&zip_free_mutex);
for (uint x = 0; x <= BUF_BUDDY_SIZES; x++) {
buf_buddy_stat_t* buddy_stat = &buf_buddy_stat[x];
@@ -2308,7 +2310,8 @@ i_s_cmpmem_fill_low(
}
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&zip_free_mutex);
DBUG_RETURN(status);
}
@@ -2653,3 +2656,299 @@ UNIV_INTERN struct st_mysql_plugin i_s_i
/* void* */
STRUCT_FLD(__reserved1, NULL)
};
+
+/***********************************************************************
+*/
+static ST_FIELD_INFO i_s_innodb_table_stats_info[] =
+{
+ {STRUCT_FLD(field_name, "table_name"),
+ STRUCT_FLD(field_length, NAME_LEN),
+ STRUCT_FLD(field_type, MYSQL_TYPE_STRING),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, 0),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "rows"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "clust_size"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "other_size"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "modified"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ END_OF_ST_FIELD_INFO
+};
+
+static ST_FIELD_INFO i_s_innodb_index_stats_info[] =
+{
+ {STRUCT_FLD(field_name, "table_name"),
+ STRUCT_FLD(field_length, NAME_LEN),
+ STRUCT_FLD(field_type, MYSQL_TYPE_STRING),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, 0),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "index_name"),
+ STRUCT_FLD(field_length, NAME_LEN),
+ STRUCT_FLD(field_type, MYSQL_TYPE_STRING),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, 0),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "fields"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "row_per_keys"),
+ STRUCT_FLD(field_length, 256),
+ STRUCT_FLD(field_type, MYSQL_TYPE_STRING),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, 0),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "index_size"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ {STRUCT_FLD(field_name, "leaf_pages"),
+ STRUCT_FLD(field_length, MY_INT64_NUM_DECIMAL_DIGITS),
+ STRUCT_FLD(field_type, MYSQL_TYPE_LONGLONG),
+ STRUCT_FLD(value, 0),
+ STRUCT_FLD(field_flags, MY_I_S_UNSIGNED),
+ STRUCT_FLD(old_name, ""),
+ STRUCT_FLD(open_method, SKIP_OPEN_TABLE)},
+
+ END_OF_ST_FIELD_INFO
+};
+
+static
+int
+i_s_innodb_table_stats_fill(
+/*========================*/
+ THD* thd,
+ TABLE_LIST* tables,
+ COND* cond)
+{
+ TABLE* i_s_table = (TABLE *) tables->table;
+ int status = 0;
+ dict_table_t* table;
+
+ DBUG_ENTER("i_s_innodb_table_stats_fill");
+
+ /* deny access to non-superusers */
+ if (check_global_access(thd, PROCESS_ACL)) {
+ DBUG_RETURN(0);
+ }
+
+ mutex_enter(&(dict_sys->mutex));
+
+ table = UT_LIST_GET_FIRST(dict_sys->table_LRU);
+
+ while (table) {
+ if (table->stat_clustered_index_size == 0) {
+ table = UT_LIST_GET_NEXT(table_LRU, table);
+ continue;
+ }
+
+ field_store_string(i_s_table->field[0], table->name);
+ i_s_table->field[1]->store(table->stat_n_rows);
+ i_s_table->field[2]->store(table->stat_clustered_index_size);
+ i_s_table->field[3]->store(table->stat_sum_of_other_index_sizes);
+ i_s_table->field[4]->store(table->stat_modified_counter);
+
+ if (schema_table_store_record(thd, i_s_table)) {
+ status = 1;
+ break;
+ }
+
+ table = UT_LIST_GET_NEXT(table_LRU, table);
+ }
+
+ mutex_exit(&(dict_sys->mutex));
+
+ DBUG_RETURN(status);
+}
+
+static
+int
+i_s_innodb_index_stats_fill(
+/*========================*/
+ THD* thd,
+ TABLE_LIST* tables,
+ COND* cond)
+{
+ TABLE* i_s_table = (TABLE *) tables->table;
+ int status = 0;
+ dict_table_t* table;
+ dict_index_t* index;
+
+ DBUG_ENTER("i_s_innodb_index_stats_fill");
+
+ /* deny access to non-superusers */
+ if (check_global_access(thd, PROCESS_ACL)) {
+ DBUG_RETURN(0);
+ }
+
+ mutex_enter(&(dict_sys->mutex));
+
+ table = UT_LIST_GET_FIRST(dict_sys->table_LRU);
+
+ while (table) {
+ if (table->stat_clustered_index_size == 0) {
+ table = UT_LIST_GET_NEXT(table_LRU, table);
+ continue;
+ }
+
+ ib_int64_t n_rows = table->stat_n_rows;
+
+ if (n_rows < 0) {
+ n_rows = 0;
+ }
+
+ index = dict_table_get_first_index(table);
+
+ while (index) {
+ char buff[256+1];
+ char row_per_keys[256+1];
+ ulint i;
+
+ field_store_string(i_s_table->field[0], table->name);
+ field_store_string(i_s_table->field[1], index->name);
+ i_s_table->field[2]->store(index->n_uniq);
+
+ row_per_keys[0] = '\0';
+ if (index->stat_n_diff_key_vals) {
+ for (i = 1; i <= index->n_uniq; i++) {
+ ib_int64_t rec_per_key;
+ if (index->stat_n_diff_key_vals[i]) {
+ rec_per_key = n_rows / index->stat_n_diff_key_vals[i];
+ } else {
+ rec_per_key = n_rows;
+ }
+ snprintf(buff, 256, (i == index->n_uniq)?"%llu":"%llu, ",
+ rec_per_key);
+ strncat(row_per_keys, buff, 256 - strlen(row_per_keys));
+ }
+ }
+ field_store_string(i_s_table->field[3], row_per_keys);
+
+ i_s_table->field[4]->store(index->stat_index_size);
+ i_s_table->field[5]->store(index->stat_n_leaf_pages);
+
+ if (schema_table_store_record(thd, i_s_table)) {
+ status = 1;
+ break;
+ }
+
+ index = dict_table_get_next_index(index);
+ }
+
+ if (status == 1) {
+ break;
+ }
+
+ table = UT_LIST_GET_NEXT(table_LRU, table);
+ }
+
+ mutex_exit(&(dict_sys->mutex));
+
+ DBUG_RETURN(status);
+}
+
+static
+int
+i_s_innodb_table_stats_init(
+/*========================*/
+ void* p)
+{
+ DBUG_ENTER("i_s_innodb_table_stats_init");
+ ST_SCHEMA_TABLE* schema = (ST_SCHEMA_TABLE*) p;
+
+ schema->fields_info = i_s_innodb_table_stats_info;
+ schema->fill_table = i_s_innodb_table_stats_fill;
+
+ DBUG_RETURN(0);
+}
+
+static
+int
+i_s_innodb_index_stats_init(
+/*========================*/
+ void* p)
+{
+ DBUG_ENTER("i_s_innodb_index_stats_init");
+ ST_SCHEMA_TABLE* schema = (ST_SCHEMA_TABLE*) p;
+
+ schema->fields_info = i_s_innodb_index_stats_info;
+ schema->fill_table = i_s_innodb_index_stats_fill;
+
+ DBUG_RETURN(0);
+}
+
+UNIV_INTERN struct st_mysql_plugin i_s_innodb_table_stats =
+{
+ STRUCT_FLD(type, MYSQL_INFORMATION_SCHEMA_PLUGIN),
+ STRUCT_FLD(info, &i_s_info),
+ STRUCT_FLD(name, "INNODB_TABLE_STATS"),
+ STRUCT_FLD(author, plugin_author),
+ STRUCT_FLD(descr, "InnoDB table statistics in memory"),
+ STRUCT_FLD(license, PLUGIN_LICENSE_GPL),
+ STRUCT_FLD(init, i_s_innodb_table_stats_init),
+ STRUCT_FLD(deinit, i_s_common_deinit),
+ STRUCT_FLD(version, 0x0100 /* 1.0 */),
+ STRUCT_FLD(status_vars, NULL),
+ STRUCT_FLD(system_vars, NULL),
+ STRUCT_FLD(__reserved1, NULL)
+};
+
+UNIV_INTERN struct st_mysql_plugin i_s_innodb_index_stats =
+{
+ STRUCT_FLD(type, MYSQL_INFORMATION_SCHEMA_PLUGIN),
+ STRUCT_FLD(info, &i_s_info),
+ STRUCT_FLD(name, "INNODB_INDEX_STATS"),
+ STRUCT_FLD(author, plugin_author),
+ STRUCT_FLD(descr, "InnoDB index statistics in memory"),
+ STRUCT_FLD(license, PLUGIN_LICENSE_GPL),
+ STRUCT_FLD(init, i_s_innodb_index_stats_init),
+ STRUCT_FLD(deinit, i_s_common_deinit),
+ STRUCT_FLD(version, 0x0100 /* 1.0 */),
+ STRUCT_FLD(status_vars, NULL),
+ STRUCT_FLD(system_vars, NULL),
+ STRUCT_FLD(__reserved1, NULL)
+};
=== modified file 'storage/xtradb/handler/i_s.h'
--- a/storage/xtradb/handler/i_s.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/handler/i_s.h 2009-06-25 01:43:25 +0000
@@ -37,5 +37,7 @@ extern struct st_mysql_plugin i_s_innodb
extern struct st_mysql_plugin i_s_innodb_cmpmem_reset;
extern struct st_mysql_plugin i_s_innodb_patches;
extern struct st_mysql_plugin i_s_innodb_rseg;
+extern struct st_mysql_plugin i_s_innodb_table_stats;
+extern struct st_mysql_plugin i_s_innodb_index_stats;
#endif /* i_s_h */
=== modified file 'storage/xtradb/handler/innodb_patch_info.h'
--- a/storage/xtradb/handler/innodb_patch_info.h 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/handler/innodb_patch_info.h 2009-07-06 05:47:15 +0000
@@ -31,5 +31,12 @@ struct innodb_enhancement {
{"innodb_expand_undo_slots","expandable maximum number of undo slots","from 1024 (default) to about 4000","http://www.percona.com/docs/wiki/percona-xtradb"},
{"innodb_extra_rseg","allow to create extra rollback segments","When create new db, the new parameter allows to create more rollback segments","http://www.percona.com/docs/wiki/percona-xtradb"},
{"innodb_overwrite_relay_log_info","overwrite relay-log.info when slave recovery","Building as plugin, it is not used.","http://www.percona.com/docs/wiki/percona-xtradb:innodb_overwrite_relay_log_…"},
+{"innodb_pause_in_spin","use 'pause' instruction during spin loop for x86 (gcc)","","http://www.percona.com/docs/wiki/percona-xtradb"},
+{"innodb_thread_concurrency_timer_based","use InnoDB timer based concurrency throttling (backport from MySQL 5.4.0)","",""},
+{"innodb_expand_import","convert .ibd file automatically when import tablespace","the files are generated by xtrabackup export mode.","http://www.percona.com/docs/wiki/percona-xtradb"},
+{"innodb_dict_size_limit","Limit dictionary cache size","Variable innodb_dict_size_limit in bytes","http://www.percona.com/docs/wiki/percona-xtradb"},
+{"innodb_split_buf_pool_mutex","More fix of buffer_pool mutex","Spliting buf_pool_mutex and optimizing based on innodb_opt_lru_count","http://www.percona.com/docs/wiki/percona-xtradb"},
+{"innodb_stats","Additional features about InnoDB statistics/optimizer","","http://www.percona.com/docs/wiki/percona-xtradb"},
+{"innodb_recovery_patches","Bugfixes and adjustments about recovery process","","http://www.percona.com/docs/wiki/percona-xtradb"},
{NULL, NULL, NULL, NULL}
};
=== modified file 'storage/xtradb/ibuf/ibuf0ibuf.c'
--- a/storage/xtradb/ibuf/ibuf0ibuf.c 2009-06-22 08:06:35 +0000
+++ b/storage/xtradb/ibuf/ibuf0ibuf.c 2009-08-03 20:09:53 +0000
@@ -472,6 +472,7 @@ ibuf_init_at_db_start(void)
/* Use old-style record format for the insert buffer. */
table = dict_mem_table_create(IBUF_TABLE_NAME, IBUF_SPACE_ID, 1, 0);
+ table->n_mysql_handles_opened = 1; /* for pin */
dict_mem_table_add_col(table, heap, "DUMMY_COLUMN", DATA_BINARY, 0, 0);
=== modified file 'storage/xtradb/include/buf0buddy.h'
--- a/storage/xtradb/include/buf0buddy.h 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/include/buf0buddy.h 2009-06-25 01:43:25 +0000
@@ -49,10 +49,11 @@ buf_buddy_alloc(
/* out: allocated block,
possibly NULL if lru == NULL */
ulint size, /* in: block size, up to UNIV_PAGE_SIZE */
- ibool* lru) /* in: pointer to a variable that will be assigned
+ ibool* lru, /* in: pointer to a variable that will be assigned
TRUE if storage was allocated from the LRU list
and buf_pool_mutex was temporarily released,
or NULL if the LRU list should not be used */
+ ibool have_page_hash_mutex)
__attribute__((malloc));
/**************************************************************************
@@ -63,7 +64,8 @@ buf_buddy_free(
/*===========*/
void* buf, /* in: block to be freed, must not be
pointed to by the buffer pool */
- ulint size) /* in: block size, up to UNIV_PAGE_SIZE */
+ ulint size, /* in: block size, up to UNIV_PAGE_SIZE */
+ ibool have_page_hash_mutex)
__attribute__((nonnull));
/** Statistics of buddy blocks of a given size. */
=== modified file 'storage/xtradb/include/buf0buddy.ic'
--- a/storage/xtradb/include/buf0buddy.ic 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/include/buf0buddy.ic 2009-06-25 01:43:25 +0000
@@ -44,10 +44,11 @@ buf_buddy_alloc_low(
possibly NULL if lru==NULL */
ulint i, /* in: index of buf_pool->zip_free[],
or BUF_BUDDY_SIZES */
- ibool* lru) /* in: pointer to a variable that will be assigned
+ ibool* lru, /* in: pointer to a variable that will be assigned
TRUE if storage was allocated from the LRU list
and buf_pool_mutex was temporarily released,
or NULL if the LRU list should not be used */
+ ibool have_page_hash_mutex)
__attribute__((malloc));
/**************************************************************************
@@ -58,8 +59,9 @@ buf_buddy_free_low(
/*===============*/
void* buf, /* in: block to be freed, must not be
pointed to by the buffer pool */
- ulint i) /* in: index of buf_pool->zip_free[],
+ ulint i, /* in: index of buf_pool->zip_free[],
or BUF_BUDDY_SIZES */
+ ibool have_page_hash_mutex)
__attribute__((nonnull));
/**************************************************************************
@@ -98,14 +100,15 @@ buf_buddy_alloc(
/* out: allocated block,
possibly NULL if lru == NULL */
ulint size, /* in: block size, up to UNIV_PAGE_SIZE */
- ibool* lru) /* in: pointer to a variable that will be assigned
+ ibool* lru, /* in: pointer to a variable that will be assigned
TRUE if storage was allocated from the LRU list
and buf_pool_mutex was temporarily released,
or NULL if the LRU list should not be used */
+ ibool have_page_hash_mutex)
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
- return(buf_buddy_alloc_low(buf_buddy_get_slot(size), lru));
+ return(buf_buddy_alloc_low(buf_buddy_get_slot(size), lru, have_page_hash_mutex));
}
/**************************************************************************
@@ -116,11 +119,24 @@ buf_buddy_free(
/*===========*/
void* buf, /* in: block to be freed, must not be
pointed to by the buffer pool */
- ulint size) /* in: block size, up to UNIV_PAGE_SIZE */
+ ulint size, /* in: block size, up to UNIV_PAGE_SIZE */
+ ibool have_page_hash_mutex)
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
- buf_buddy_free_low(buf, buf_buddy_get_slot(size));
+ if (!have_page_hash_mutex) {
+ mutex_enter(&LRU_list_mutex);
+ rw_lock_x_lock(&page_hash_latch);
+ }
+
+ mutex_enter(&zip_free_mutex);
+ buf_buddy_free_low(buf, buf_buddy_get_slot(size), TRUE);
+ mutex_exit(&zip_free_mutex);
+
+ if (!have_page_hash_mutex) {
+ mutex_exit(&LRU_list_mutex);
+ rw_lock_x_unlock(&page_hash_latch);
+ }
}
#ifdef UNIV_MATERIALIZE
=== modified file 'storage/xtradb/include/buf0buf.h'
--- a/storage/xtradb/include/buf0buf.h 2009-05-04 04:32:30 +0000
+++ b/storage/xtradb/include/buf0buf.h 2009-06-25 01:43:25 +0000
@@ -1024,7 +1024,7 @@ struct buf_page_struct{
/* 2. Page flushing fields; protected by buf_pool_mutex */
- UT_LIST_NODE_T(buf_page_t) list;
+ /* UT_LIST_NODE_T(buf_page_t) list; */
/* based on state, this is a list
node in one of the following lists
in buf_pool:
@@ -1034,6 +1034,10 @@ struct buf_page_struct{
BUF_BLOCK_ZIP_DIRTY: flush_list
BUF_BLOCK_ZIP_PAGE: zip_clean
BUF_BLOCK_ZIP_FREE: zip_free[] */
+ /* resplit for optimistic use */
+ UT_LIST_NODE_T(buf_page_t) free;
+ UT_LIST_NODE_T(buf_page_t) flush_list;
+ UT_LIST_NODE_T(buf_page_t) zip_list; /* zip_clean or zip_free[] */
#ifdef UNIV_DEBUG
ibool in_flush_list; /* TRUE if in buf_pool->flush_list;
when buf_pool_mutex is free, the
@@ -1104,11 +1108,11 @@ struct buf_block_struct{
a block is in the unzip_LRU list
if page.state == BUF_BLOCK_FILE_PAGE
and page.zip.data != NULL */
-#ifdef UNIV_DEBUG
+//#ifdef UNIV_DEBUG
ibool in_unzip_LRU_list;/* TRUE if the page is in the
decompressed LRU list;
used in debugging */
-#endif /* UNIV_DEBUG */
+//#endif /* UNIV_DEBUG */
byte* frame; /* pointer to buffer frame which
is of size UNIV_PAGE_SIZE, and
aligned to an address divisible by
@@ -1316,6 +1320,12 @@ struct buf_pool_struct{
/* mutex protecting the buffer pool struct and control blocks, except the
read-write lock in them */
extern mutex_t buf_pool_mutex;
+extern mutex_t LRU_list_mutex;
+extern mutex_t flush_list_mutex;
+extern rw_lock_t page_hash_latch;
+extern mutex_t free_list_mutex;
+extern mutex_t zip_free_mutex;
+extern mutex_t zip_hash_mutex;
/* mutex protecting the control blocks of compressed-only pages
(of type buf_page_t, not buf_block_t) */
extern mutex_t buf_pool_zip_mutex;
=== modified file 'storage/xtradb/include/buf0buf.ic'
--- a/storage/xtradb/include/buf0buf.ic 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/include/buf0buf.ic 2009-06-25 01:43:25 +0000
@@ -100,7 +100,9 @@ buf_pool_get_oldest_modification(void)
buf_page_t* bpage;
ib_uint64_t lsn;
- buf_pool_mutex_enter();
+try_again:
+ //buf_pool_mutex_enter();
+ mutex_enter(&flush_list_mutex);
bpage = UT_LIST_GET_LAST(buf_pool->flush_list);
@@ -109,9 +111,14 @@ buf_pool_get_oldest_modification(void)
} else {
ut_ad(bpage->in_flush_list);
lsn = bpage->oldest_modification;
+ if (lsn == 0) {
+ mutex_exit(&flush_list_mutex);
+ goto try_again;
+ }
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(&flush_list_mutex);
/* The returned answer may be out of date: the flush_list can
change after the mutex has been released. */
@@ -128,7 +135,8 @@ buf_pool_clock_tic(void)
/*====================*/
/* out: new clock value */
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
buf_pool->ulint_clock++;
@@ -246,7 +254,7 @@ buf_page_in_file(
case BUF_BLOCK_ZIP_FREE:
/* This is a free page in buf_pool->zip_free[].
Such pages should only be accessed by the buddy allocator. */
- ut_error;
+ /* ut_error; */ /* optimistic */
break;
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_ZIP_DIRTY:
@@ -288,7 +296,7 @@ buf_page_get_LRU_position(
const buf_page_t* bpage) /* in: control block */
{
ut_ad(buf_page_in_file(bpage));
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own()); /* This is used in optimistic */
return(bpage->LRU_position);
}
@@ -305,7 +313,7 @@ buf_page_get_mutex(
{
switch (buf_page_get_state(bpage)) {
case BUF_BLOCK_ZIP_FREE:
- ut_error;
+ /* ut_error; */ /* optimistic */
return(NULL);
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_ZIP_DIRTY:
@@ -410,7 +418,7 @@ buf_page_set_io_fix(
buf_page_t* bpage, /* in/out: control block */
enum buf_io_fix io_fix) /* in: io_fix state */
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
bpage->io_fix = io_fix;
@@ -438,12 +446,13 @@ buf_page_can_relocate(
/*==================*/
const buf_page_t* bpage) /* control block being relocated */
{
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
ut_ad(buf_page_in_file(bpage));
- ut_ad(bpage->in_LRU_list);
+ /* optimistic */
+ //ut_ad(bpage->in_LRU_list);
- return(buf_page_get_io_fix(bpage) == BUF_IO_NONE
+ return(bpage->in_LRU_list && bpage->io_fix == BUF_IO_NONE
&& bpage->buf_fix_count == 0);
}
@@ -457,7 +466,7 @@ buf_page_is_old(
const buf_page_t* bpage) /* in: control block */
{
ut_ad(buf_page_in_file(bpage));
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own()); /* This is used in optimistic */
return(bpage->old);
}
@@ -472,7 +481,8 @@ buf_page_set_old(
ibool old) /* in: old */
{
ut_a(buf_page_in_file(bpage));
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+ ut_ad(mutex_own(&LRU_list_mutex));
ut_ad(bpage->in_LRU_list);
#ifdef UNIV_LRU_DEBUG
@@ -728,17 +738,17 @@ buf_block_free(
/*===========*/
buf_block_t* block) /* in, own: block to be freed */
{
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
mutex_enter(&block->mutex);
ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
- buf_LRU_block_free_non_file_page(block);
+ buf_LRU_block_free_non_file_page(block, FALSE);
mutex_exit(&block->mutex);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
}
/*************************************************************************
@@ -783,14 +793,23 @@ buf_page_io_query(
buf_page_t* bpage) /* in: buf_pool block, must be bufferfixed */
{
ibool io_fixed;
+ mutex_t* block_mutex = buf_page_get_mutex(bpage);
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+retry_lock:
+ mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
ut_ad(buf_page_in_file(bpage));
ut_ad(bpage->buf_fix_count > 0);
io_fixed = buf_page_get_io_fix(bpage) != BUF_IO_NONE;
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ mutex_exit(block_mutex);
return(io_fixed);
}
@@ -809,7 +828,13 @@ buf_page_get_newest_modification(
ib_uint64_t lsn;
mutex_t* block_mutex = buf_page_get_mutex(bpage);
+retry_lock:
mutex_enter(block_mutex);
+ if (block_mutex != buf_page_get_mutex(bpage)) {
+ mutex_exit(block_mutex);
+ block_mutex = buf_page_get_mutex(bpage);
+ goto retry_lock;
+ }
if (buf_page_in_file(bpage)) {
lsn = bpage->newest_modification;
@@ -833,7 +858,7 @@ buf_block_modify_clock_inc(
buf_block_t* block) /* in: block */
{
#ifdef UNIV_SYNC_DEBUG
- ut_ad((buf_pool_mutex_own()
+ ut_ad((mutex_own(&LRU_list_mutex)
&& (block->page.buf_fix_count == 0))
|| rw_lock_own(&(block->lock), RW_LOCK_EXCLUSIVE));
#endif /* UNIV_SYNC_DEBUG */
@@ -917,7 +942,11 @@ buf_page_hash_get(
ulint fold;
ut_ad(buf_pool);
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
+#ifdef UNIV_SYNC_DEBUG
+ ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX)
+ || rw_lock_own(&page_hash_latch, RW_LOCK_SHARED));
+#endif
/* Look for the page in the hash table */
@@ -966,11 +995,13 @@ buf_page_peek(
{
const buf_page_t* bpage;
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
+ rw_lock_s_lock(&page_hash_latch);
bpage = buf_page_hash_get(space, offset);
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ rw_lock_s_unlock(&page_hash_latch);
return(bpage != NULL);
}
@@ -1032,11 +1063,14 @@ buf_page_release(
ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_a(block->page.buf_fix_count > 0);
+ /* buf_flush_note_modification() should be called before this function. */
+/*
if (rw_latch == RW_X_LATCH && mtr->modifications) {
buf_pool_mutex_enter();
buf_flush_note_modification(block, mtr);
buf_pool_mutex_exit();
}
+*/
mutex_enter(&block->mutex);
=== modified file 'storage/xtradb/include/buf0flu.ic'
--- a/storage/xtradb/include/buf0flu.ic 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/include/buf0flu.ic 2009-06-25 01:43:25 +0000
@@ -53,13 +53,23 @@ buf_flush_note_modification(
buf_block_t* block, /* in: block which is modified */
mtr_t* mtr) /* in: mtr */
{
+ ibool use_LRU_mutex = FALSE;
+
+ if (UT_LIST_GET_LEN(buf_pool->unzip_LRU))
+ use_LRU_mutex = TRUE;
+
+ if (use_LRU_mutex)
+ mutex_enter(&LRU_list_mutex);
+
+ mutex_enter(&block->mutex);
+
ut_ad(block);
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(block->page.buf_fix_count > 0);
#ifdef UNIV_SYNC_DEBUG
ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));
#endif /* UNIV_SYNC_DEBUG */
- ut_ad(buf_pool_mutex_own());
+ //ut_ad(buf_pool_mutex_own());
ut_ad(mtr->start_lsn != 0);
ut_ad(mtr->modifications);
@@ -68,16 +78,23 @@ buf_flush_note_modification(
block->page.newest_modification = mtr->end_lsn;
if (!block->page.oldest_modification) {
+ mutex_enter(&flush_list_mutex);
block->page.oldest_modification = mtr->start_lsn;
ut_ad(block->page.oldest_modification != 0);
buf_flush_insert_into_flush_list(block);
+ mutex_exit(&flush_list_mutex);
} else {
ut_ad(block->page.oldest_modification <= mtr->start_lsn);
}
+ mutex_exit(&block->mutex);
+
++srv_buf_pool_write_requests;
+
+ if (use_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
}
/************************************************************************
@@ -92,6 +109,16 @@ buf_flush_recv_note_modification(
ib_uint64_t end_lsn) /* in: end lsn of the last mtr in the
set of mtr's */
{
+ ibool use_LRU_mutex = FALSE;
+
+ if(UT_LIST_GET_LEN(buf_pool->unzip_LRU))
+ use_LRU_mutex = TRUE;
+
+ if (use_LRU_mutex)
+ mutex_enter(&LRU_list_mutex);
+
+ mutex_enter(&(block->mutex));
+
ut_ad(block);
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(block->page.buf_fix_count > 0);
@@ -99,22 +126,27 @@ buf_flush_recv_note_modification(
ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));
#endif /* UNIV_SYNC_DEBUG */
- buf_pool_mutex_enter();
+ //buf_pool_mutex_enter();
ut_ad(block->page.newest_modification <= end_lsn);
block->page.newest_modification = end_lsn;
if (!block->page.oldest_modification) {
+ mutex_enter(&flush_list_mutex);
block->page.oldest_modification = start_lsn;
ut_ad(block->page.oldest_modification != 0);
buf_flush_insert_sorted_into_flush_list(block);
+ mutex_exit(&flush_list_mutex);
} else {
ut_ad(block->page.oldest_modification <= start_lsn);
}
- buf_pool_mutex_exit();
+ //buf_pool_mutex_exit();
+ if (use_LRU_mutex)
+ mutex_exit(&LRU_list_mutex);
+ mutex_exit(&(block->mutex));
}
=== modified file 'storage/xtradb/include/buf0lru.h'
--- a/storage/xtradb/include/buf0lru.h 2009-05-04 02:45:47 +0000
+++ b/storage/xtradb/include/buf0lru.h 2009-06-25 01:43:25 +0000
@@ -122,10 +122,11 @@ buf_LRU_free_block(
buf_page_t* bpage, /* in: block to be freed */
ibool zip, /* in: TRUE if should remove also the
compressed page of an uncompressed page */
- ibool* buf_pool_mutex_released);
+ ibool* buf_pool_mutex_released,
/* in: pointer to a variable that will
be assigned TRUE if buf_pool_mutex
was temporarily released, or NULL */
+ ibool have_LRU_mutex);
/**********************************************************************
Try to free a replaceable block. */
UNIV_INTERN
@@ -169,7 +170,8 @@ UNIV_INTERN
void
buf_LRU_block_free_non_file_page(
/*=============================*/
- buf_block_t* block); /* in: block, must not contain a file page */
+ buf_block_t* block, /* in: block, must not contain a file page */
+ ibool have_page_hash_mutex);
/**********************************************************************
Adds a block to the LRU list. */
UNIV_INTERN
=== modified file 'storage/xtradb/include/dict0dict.h'
--- a/storage/xtradb/include/dict0dict.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/dict0dict.h 2009-06-25 01:43:25 +0000
@@ -1102,6 +1102,12 @@ dict_table_get_index_on_name_and_min_id(
/* out: index, NULL if does not exist */
dict_table_t* table, /* in: table */
const char* name); /* in: name of the index to find */
+
+UNIV_INTERN
+void
+dict_table_LRU_trim(
+/*================*/
+ dict_table_t* self);
/* Buffers for storing detailed information about the latest foreign key
and unique key errors */
extern FILE* dict_foreign_err_file;
=== modified file 'storage/xtradb/include/dict0dict.ic'
--- a/storage/xtradb/include/dict0dict.ic 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/dict0dict.ic 2009-06-25 01:43:25 +0000
@@ -723,6 +723,13 @@ dict_table_check_if_in_cache_low(
HASH_SEARCH(name_hash, dict_sys->table_hash, table_fold,
dict_table_t*, table, ut_ad(table->cached),
!strcmp(table->name, table_name));
+
+ /* make young in table_LRU */
+ if (table) {
+ UT_LIST_REMOVE(table_LRU, dict_sys->table_LRU, table);
+ UT_LIST_ADD_FIRST(table_LRU, dict_sys->table_LRU, table);
+ }
+
return(table);
}
@@ -776,6 +783,12 @@ dict_table_get_on_id_low(
table = dict_load_table_on_id(table_id);
}
+ /* make young in table_LRU */
+ if (table) {
+ UT_LIST_REMOVE(table_LRU, dict_sys->table_LRU, table);
+ UT_LIST_ADD_FIRST(table_LRU, dict_sys->table_LRU, table);
+ }
+
ut_ad(!table || table->cached);
/* TODO: should get the type information from MySQL */
=== modified file 'storage/xtradb/include/log0log.h'
--- a/storage/xtradb/include/log0log.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/log0log.h 2009-06-25 01:43:25 +0000
@@ -186,6 +186,13 @@ void
log_buffer_flush_to_disk(void);
/*==========================*/
/********************************************************************
+Flushes the log buffer. Forces it to disk depending on the value of
+the configuration parameter innodb_flush_log_at_trx_commit. */
+UNIV_INTERN
+void
+log_buffer_flush_maybe_sync(void);
+/*=============================*/
+/********************************************************************
Advances the smallest lsn for which there are unflushed dirty blocks in the
buffer pool and also may make a new checkpoint. NOTE: this function may only
be called if the calling thread owns no synchronization objects! */
=== modified file 'storage/xtradb/include/rem0cmp.h'
--- a/storage/xtradb/include/rem0cmp.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/rem0cmp.h 2009-06-25 01:43:25 +0000
@@ -177,10 +177,11 @@ cmp_rec_rec_with_match(
matched fields; when the function returns,
contains the value the for current
comparison */
- ulint* matched_bytes);/* in/out: number of already matched
+ ulint* matched_bytes, /* in/out: number of already matched
bytes within the first field not completely
matched; when the function returns, contains
the value for the current comparison */
+ ulint stats_method);
/*****************************************************************
This function is used to compare two physical records. Only the common
first fields are compared. */
=== modified file 'storage/xtradb/include/rem0cmp.ic'
--- a/storage/xtradb/include/rem0cmp.ic 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/rem0cmp.ic 2009-06-25 01:43:25 +0000
@@ -88,5 +88,5 @@ cmp_rec_rec(
ulint match_b = 0;
return(cmp_rec_rec_with_match(rec1, rec2, offsets1, offsets2, index,
- &match_f, &match_b));
+ &match_f, &match_b, 0));
}
=== modified file 'storage/xtradb/include/srv0srv.h'
--- a/storage/xtradb/include/srv0srv.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/srv0srv.h 2009-06-25 01:43:25 +0000
@@ -127,6 +127,8 @@ extern ulint srv_buf_pool_curr_size; /*
extern ulint srv_mem_pool_size;
extern ulint srv_lock_table_size;
+extern ibool srv_thread_concurrency_timer_based;
+
extern ulint srv_n_file_io_threads;
extern ulint srv_n_read_io_threads;
extern ulint srv_n_write_io_threads;
@@ -163,6 +165,11 @@ extern ulint srv_fast_shutdown; /* If t
extern ibool srv_innodb_status;
extern unsigned long long srv_stats_sample_pages;
+extern ulint srv_stats_method;
+#define SRV_STATS_METHOD_NULLS_EQUAL 0
+#define SRV_STATS_METHOD_NULLS_NOT_EQUAL 1
+#define SRV_STATS_METHOD_IGNORE_NULLS 2
+extern ulint srv_stats_auto_update;
extern ibool srv_use_doublewrite_buf;
extern ibool srv_use_checksums;
@@ -184,8 +191,10 @@ extern ulint srv_enable_unsafe_group_com
extern ulint srv_read_ahead;
extern ulint srv_adaptive_checkpoint;
-extern ulint srv_extra_rsegments;
+extern ulint srv_expand_import;
+extern ulint srv_extra_rsegments;
+extern ulint srv_dict_size_limit;
/*-------------------------------------------*/
extern ulint srv_n_rows_inserted;
@@ -552,6 +561,7 @@ struct export_var_struct{
ulint innodb_data_writes;
ulint innodb_data_written;
ulint innodb_data_reads;
+ ulint innodb_dict_tables;
ulint innodb_buffer_pool_pages_total;
ulint innodb_buffer_pool_pages_data;
ulint innodb_buffer_pool_pages_dirty;
=== modified file 'storage/xtradb/include/sync0sync.h'
--- a/storage/xtradb/include/sync0sync.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/sync0sync.h 2009-06-25 01:43:25 +0000
@@ -464,8 +464,14 @@ or row lock! */
SYNC_SEARCH_SYS, as memory allocation
can call routines there! Otherwise
the level is SYNC_MEM_HASH. */
+#define SYNC_BUF_LRU_LIST 157
+#define SYNC_BUF_PAGE_HASH 156
+#define SYNC_BUF_BLOCK 155
+#define SYNC_BUF_FREE_LIST 153
+#define SYNC_BUF_ZIP_FREE 152
+#define SYNC_BUF_ZIP_HASH 151
#define SYNC_BUF_POOL 150
-#define SYNC_BUF_BLOCK 149
+#define SYNC_BUF_FLUSH_LIST 149
#define SYNC_DOUBLEWRITE 140
#define SYNC_ANY_LATCH 135
#define SYNC_THR_LOCAL 133
=== modified file 'storage/xtradb/include/univ.i'
--- a/storage/xtradb/include/univ.i 2009-06-18 12:39:21 +0000
+++ b/storage/xtradb/include/univ.i 2009-08-03 20:09:53 +0000
@@ -35,7 +35,7 @@ Created 1/20/1994 Heikki Tuuri
#define INNODB_VERSION_MAJOR 1
#define INNODB_VERSION_MINOR 0
#define INNODB_VERSION_BUGFIX 3
-#define PERCONA_INNODB_VERSION 5a
+#define PERCONA_INNODB_VERSION 6a
/* The following is the InnoDB version as shown in
SELECT plugin_version FROM information_schema.plugins;
=== modified file 'storage/xtradb/include/ut0auxconf.h'
--- a/storage/xtradb/include/ut0auxconf.h 2009-04-27 04:54:14 +0000
+++ b/storage/xtradb/include/ut0auxconf.h 2009-06-25 01:43:25 +0000
@@ -12,3 +12,8 @@ If by any chance Makefile.in and ./confi
the hack from Makefile.in wiped away then the "real" check from plug.in
will take over.
*/
+/* This is temprary fix for http://bugs.mysql.com/43740 */
+/* force to enable */
+#ifdef HAVE_GCC_ATOMIC_BUILTINS
+#define HAVE_ATOMIC_PTHREAD_T
+#endif
=== modified file 'storage/xtradb/log/log0log.c'
--- a/storage/xtradb/log/log0log.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/log/log0log.c 2009-06-25 01:43:25 +0000
@@ -1526,6 +1526,26 @@ log_buffer_flush_to_disk(void)
}
/********************************************************************
+Flush the log buffer. Force it to disk depending on the value of
+innodb_flush_log_at_trx_commit. */
+UNIV_INTERN
+void
+log_buffer_flush_maybe_sync(void)
+/*=============================*/
+{
+ ib_uint64_t lsn;
+
+ mutex_enter(&(log_sys->mutex));
+
+ lsn = log_sys->lsn;
+
+ mutex_exit(&(log_sys->mutex));
+
+ /* Force log buffer to disk when innodb_flush_log_at_trx_commit = 1. */
+ log_write_up_to(lsn, LOG_WAIT_ALL_GROUPS,
+ srv_flush_log_at_trx_commit == 1 ? TRUE : FALSE);
+}
+/********************************************************************
Tries to establish a big enough margin of free space in the log buffer, such
that a new log entry can be catenated without an immediate need for a flush. */
static
=== modified file 'storage/xtradb/log/log0recv.c'
--- a/storage/xtradb/log/log0recv.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/log/log0recv.c 2009-07-06 05:47:15 +0000
@@ -110,7 +110,7 @@ the log and store the scanned log record
use these free frames to read in pages when we start applying the
log records to the database. */
-UNIV_INTERN ulint recv_n_pool_free_frames = 256;
+UNIV_INTERN ulint recv_n_pool_free_frames = 1024;
/* The maximum lsn we see for a page during the recovery process. If this
is bigger than the lsn we are able to scan up to, that is an indication that
@@ -1225,6 +1225,8 @@ recv_recover_page(
buf_block_get_page_no(block));
if ((recv_addr == NULL)
+ /* bugfix: http://bugs.mysql.com/bug.php?id=44140 */
+ || (recv_addr->state == RECV_BEING_READ && !just_read_in)
|| (recv_addr->state == RECV_BEING_PROCESSED)
|| (recv_addr->state == RECV_PROCESSED)) {
=== modified file 'storage/xtradb/mtr/mtr0mtr.c'
--- a/storage/xtradb/mtr/mtr0mtr.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/mtr/mtr0mtr.c 2009-06-25 01:43:25 +0000
@@ -102,6 +102,38 @@ mtr_memo_pop_all(
}
}
+UNIV_INLINE
+void
+mtr_memo_note_modification_all(
+/*===========================*/
+ mtr_t* mtr) /* in: mtr */
+{
+ mtr_memo_slot_t* slot;
+ dyn_array_t* memo;
+ ulint offset;
+
+ ut_ad(mtr);
+ ut_ad(mtr->magic_n == MTR_MAGIC_N);
+ ut_ad(mtr->state == MTR_COMMITTING); /* Currently only used in
+ commit */
+ ut_ad(mtr->modifications);
+
+ memo = &(mtr->memo);
+
+ offset = dyn_array_get_data_size(memo);
+
+ while (offset > 0) {
+ offset -= sizeof(mtr_memo_slot_t);
+ slot = dyn_array_get_element(memo, offset);
+
+ if (UNIV_LIKELY(slot->object != NULL) &&
+ slot->type == MTR_MEMO_PAGE_X_FIX) {
+ buf_flush_note_modification(
+ (buf_block_t*)slot->object, mtr);
+ }
+ }
+}
+
/****************************************************************
Writes the contents of a mini-transaction log, if any, to the database log. */
static
@@ -180,6 +212,8 @@ mtr_commit(
if (write_log) {
mtr_log_reserve_and_write(mtr);
+
+ mtr_memo_note_modification_all(mtr);
}
/* We first update the modification info to buffer pages, and only
@@ -190,12 +224,13 @@ mtr_commit(
required when we insert modified buffer pages in to the flush list
which must be sorted on oldest_modification. */
- mtr_memo_pop_all(mtr);
-
if (write_log) {
log_release();
}
+ /* All unlocking has been moved here, after log_sys mutex release. */
+ mtr_memo_pop_all(mtr);
+
ut_d(mtr->state = MTR_COMMITTED);
dyn_array_free(&(mtr->memo));
dyn_array_free(&(mtr->log));
@@ -263,6 +298,12 @@ mtr_memo_release(
slot = dyn_array_get_element(memo, offset);
if ((object == slot->object) && (type == slot->type)) {
+ if (mtr->modifications &&
+ UNIV_LIKELY(slot->object != NULL) &&
+ slot->type == MTR_MEMO_PAGE_X_FIX) {
+ buf_flush_note_modification(
+ (buf_block_t*)slot->object, mtr);
+ }
mtr_memo_slot_release(mtr, slot);
=== modified file 'storage/xtradb/os/os0file.c'
--- a/storage/xtradb/os/os0file.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/os/os0file.c 2009-06-25 01:43:25 +0000
@@ -73,6 +73,28 @@ UNIV_INTERN ibool os_aio_use_native_aio
UNIV_INTERN ibool os_aio_print_debug = FALSE;
+/* State for the state of an IO request in simulated AIO.
+ Protocol for simulated aio:
+ client requests IO: find slot with reserved = FALSE. Add entry with
+ status = OS_AIO_NOT_ISSUED.
+ IO thread wakes: find adjacent slots with reserved = TRUE and status =
+ OS_AIO_NOT_ISSUED. Change status for slots to
+ OS_AIO_ISSUED.
+ IO operation completes: set status for slots to OS_AIO_DONE. set status
+ for the first slot to OS_AIO_CLAIMED and return
+ result for that slot.
+ When there are multiple read and write threads, they all compete to execute
+ the requests in the array (os_aio_array_t). This avoids the need to load
+ balance requests at the time the request is made at the cost of waking all
+ threads when a request is available.
+*/
+typedef enum {
+ OS_AIO_NOT_ISSUED, /* Available to be processed by an IO thread. */
+ OS_AIO_ISSUED, /* Being processed by an IO thread. */
+ OS_AIO_DONE, /* Request processed. */
+ OS_AIO_CLAIMED /* Result being returned to client. */
+} os_aio_status;
+
/* The aio array slot structure */
typedef struct os_aio_slot_struct os_aio_slot_t;
@@ -81,6 +103,8 @@ struct os_aio_slot_struct{
ulint pos; /* index of the slot in the aio
array */
ibool reserved; /* TRUE if this slot is reserved */
+ os_aio_status status; /* Status for current request. Valid when reserved
+ is TRUE. Used only in simulated aio. */
time_t reservation_time;/* time when reserved */
ulint len; /* length of the block to read or
write */
@@ -91,11 +115,11 @@ struct os_aio_slot_struct{
ulint offset_high; /* 32 high bits of file offset */
os_file_t file; /* file where to read or write */
const char* name; /* file name or path */
- ibool io_already_done;/* used only in simulated aio:
- TRUE if the physical i/o already
- made and only the slot message
- needs to be passed to the caller
- of os_aio_simulated_handle */
+// ibool io_already_done;/* used only in simulated aio:
+// TRUE if the physical i/o already
+// made and only the slot message
+// needs to be passed to the caller
+// of os_aio_simulated_handle */
fil_node_t* message1; /* message which is given by the */
void* message2; /* the requester of an aio operation
and which can be used to identify
@@ -141,6 +165,13 @@ struct os_aio_array_struct{
/* Array of events used in simulated aio */
static os_event_t* os_aio_segment_wait_events = NULL;
+/* Number for the first global segment for reading. */
+const ulint os_aio_first_read_segment = 2;
+
+/* Number for the first global segment for writing. Set to
+2 + os_aio_read_write_threads. */
+ulint os_aio_first_write_segment = 0;
+
/* The aio arrays for non-ibuf i/o and ibuf i/o, as well as sync aio. These
are NULL when the module has not yet been initialized. */
static os_aio_array_t* os_aio_read_array = NULL;
@@ -149,11 +180,17 @@ static os_aio_array_t* os_aio_ibuf_array
static os_aio_array_t* os_aio_log_array = NULL;
static os_aio_array_t* os_aio_sync_array = NULL;
+/* Per thread buffer used for merged IO requests. Used by
+os_aio_simulated_handle so that a buffer doesn't have to be allocated
+for each request. */
+static char* os_aio_thread_buffer[SRV_MAX_N_IO_THREADS];
+static ulint os_aio_thread_buffer_size[SRV_MAX_N_IO_THREADS];
+
static ulint os_aio_n_segments = ULINT_UNDEFINED;
/* If the following is TRUE, read i/o handler threads try to
wait until a batch of new read requests have been posted */
-static ibool os_aio_recommend_sleep_for_read_threads = FALSE;
+static volatile ibool os_aio_recommend_sleep_for_read_threads = FALSE;
UNIV_INTERN ulint os_n_file_reads = 0;
UNIV_INTERN ulint os_bytes_read_since_printout = 0;
@@ -2956,6 +2993,8 @@ os_aio_init(
for (i = 0; i < n_segments; i++) {
srv_set_io_thread_op_info(i, "not started yet");
+ os_aio_thread_buffer[i] = 0;
+ os_aio_thread_buffer_size[i] = 0;
}
n_per_seg = n / n_segments;
@@ -2964,6 +3003,7 @@ os_aio_init(
/* fprintf(stderr, "Array n per seg %lu\n", n_per_seg); */
+ os_aio_first_write_segment = os_aio_first_read_segment + n_read_threads;
os_aio_ibuf_array = os_aio_array_create(n_per_seg, 1);
srv_io_thread_function[0] = "insert buffer thread";
@@ -2972,14 +3012,14 @@ os_aio_init(
srv_io_thread_function[1] = "log thread";
- os_aio_read_array = os_aio_array_create(n_read_segs * n_per_seg,
+ os_aio_read_array = os_aio_array_create(n_per_seg,
n_read_segs);
for (i = 2; i < 2 + n_read_segs; i++) {
ut_a(i < SRV_MAX_N_IO_THREADS);
srv_io_thread_function[i] = "read thread";
}
- os_aio_write_array = os_aio_array_create(n_write_segs * n_per_seg,
+ os_aio_write_array = os_aio_array_create(n_per_seg,
n_write_segs);
for (i = 2 + n_read_segs; i < n_segments; i++) {
ut_a(i < SRV_MAX_N_IO_THREADS);
@@ -3225,7 +3265,8 @@ loop:
slot->buf = buf;
slot->offset = offset;
slot->offset_high = offset_high;
- slot->io_already_done = FALSE;
+// slot->io_already_done = FALSE;
+ slot->status = OS_AIO_NOT_ISSUED;
#ifdef WIN_ASYNC_IO
control = &(slot->control);
@@ -3256,6 +3297,7 @@ os_aio_array_free_slot(
ut_ad(slot->reserved);
slot->reserved = FALSE;
+ slot->status = OS_AIO_NOT_ISSUED;
array->n_reserved--;
@@ -3292,16 +3334,18 @@ os_aio_simulated_wake_handler_thread(
segment = os_aio_get_array_and_local_segment(&array, global_segment);
- n = array->n_slots / array->n_segments;
+ n = array->n_slots;
/* Look through n slots after the segment * n'th slot */
os_mutex_enter(array->mutex);
for (i = 0; i < n; i++) {
- slot = os_aio_array_get_nth_slot(array, i + segment * n);
+ slot = os_aio_array_get_nth_slot(array, i);
- if (slot->reserved) {
+ if (slot->reserved &&
+ (slot->status == OS_AIO_NOT_ISSUED ||
+ slot->status == OS_AIO_DONE)) {
/* Found an i/o request */
break;
@@ -3311,7 +3355,25 @@ os_aio_simulated_wake_handler_thread(
os_mutex_exit(array->mutex);
if (i < n) {
- os_event_set(os_aio_segment_wait_events[global_segment]);
+ if (array == os_aio_ibuf_array) {
+ os_event_set(os_aio_segment_wait_events[0]);
+
+ } else if (array == os_aio_log_array) {
+ os_event_set(os_aio_segment_wait_events[1]);
+
+ } else if (array == os_aio_read_array) {
+ ulint x;
+ for (x = os_aio_first_read_segment; x < os_aio_first_write_segment; x++)
+ os_event_set(os_aio_segment_wait_events[x]);
+
+ } else if (array == os_aio_write_array) {
+ ulint x;
+ for (x = os_aio_first_write_segment; x < os_aio_n_segments; x++)
+ os_event_set(os_aio_segment_wait_events[x]);
+
+ } else {
+ ut_a(0);
+ }
}
}
@@ -3322,8 +3384,6 @@ void
os_aio_simulated_wake_handler_threads(void)
/*=======================================*/
{
- ulint i;
-
if (os_aio_use_native_aio) {
/* We do not use simulated aio: do nothing */
@@ -3332,9 +3392,10 @@ os_aio_simulated_wake_handler_threads(vo
os_aio_recommend_sleep_for_read_threads = FALSE;
- for (i = 0; i < os_aio_n_segments; i++) {
- os_aio_simulated_wake_handler_thread(i);
- }
+ os_aio_simulated_wake_handler_thread(0);
+ os_aio_simulated_wake_handler_thread(1);
+ os_aio_simulated_wake_handler_thread(os_aio_first_read_segment);
+ os_aio_simulated_wake_handler_thread(os_aio_first_write_segment);
}
/**************************************************************************
@@ -3606,7 +3667,7 @@ os_aio_windows_handle(
ut_ad(os_aio_validate());
ut_ad(segment < array->n_segments);
- n = array->n_slots / array->n_segments;
+ n = array->n_slots;
if (array == os_aio_sync_array) {
os_event_wait(os_aio_array_get_nth_slot(array, pos)->event);
@@ -3615,12 +3676,12 @@ os_aio_windows_handle(
srv_set_io_thread_op_info(orig_seg, "wait Windows aio");
i = os_event_wait_multiple(n,
(array->native_events)
- + segment * n);
+ );
}
os_mutex_enter(array->mutex);
- slot = os_aio_array_get_nth_slot(array, i + segment * n);
+ slot = os_aio_array_get_nth_slot(array, i);
ut_a(slot->reserved);
@@ -3685,10 +3746,13 @@ os_aio_simulated_handle(
os_aio_slot_t* slot;
os_aio_slot_t* slot2;
os_aio_slot_t* consecutive_ios[OS_AIO_MERGE_N_CONSECUTIVE];
+ os_aio_slot_t* lowest_request;
+ os_aio_slot_t* oldest_request;
ulint n_consecutive;
ulint total_len;
ulint offs;
ulint lowest_offset;
+ ulint oldest_offset;
ulint biggest_age;
ulint age;
byte* combined_buf;
@@ -3696,6 +3760,7 @@ os_aio_simulated_handle(
ibool ret;
ulint n;
ulint i;
+ time_t now;
segment = os_aio_get_array_and_local_segment(&array, global_segment);
@@ -3708,7 +3773,7 @@ restart:
ut_ad(os_aio_validate());
ut_ad(segment < array->n_segments);
- n = array->n_slots / array->n_segments;
+ n = array->n_slots;
/* Look through n slots after the segment * n'th slot */
@@ -3730,9 +3795,9 @@ restart:
done */
for (i = 0; i < n; i++) {
- slot = os_aio_array_get_nth_slot(array, i + segment * n);
+ slot = os_aio_array_get_nth_slot(array, i);
- if (slot->reserved && slot->io_already_done) {
+ if (slot->reserved && slot->status == OS_AIO_DONE) {
if (os_aio_print_debug) {
fprintf(stderr,
@@ -3754,67 +3819,57 @@ restart:
then pick the one at the lowest offset. */
biggest_age = 0;
- lowest_offset = ULINT_MAX;
+ now = time(NULL);
+ oldest_request = lowest_request = NULL;
+ oldest_offset = lowest_offset = ULINT_MAX;
+ /* Find the oldest request and the request with the smallest offset */
for (i = 0; i < n; i++) {
- slot = os_aio_array_get_nth_slot(array, i + segment * n);
+ slot = os_aio_array_get_nth_slot(array, i);
- if (slot->reserved) {
- age = (ulint)difftime(time(NULL),
- slot->reservation_time);
+ if (slot->reserved && slot->status == OS_AIO_NOT_ISSUED) {
+ age = (ulint)difftime(now, slot->reservation_time);
if ((age >= 2 && age > biggest_age)
|| (age >= 2 && age == biggest_age
- && slot->offset < lowest_offset)) {
+ && slot->offset < oldest_offset)) {
/* Found an i/o request */
- consecutive_ios[0] = slot;
-
- n_consecutive = 1;
-
biggest_age = age;
- lowest_offset = slot->offset;
+ oldest_request = slot;
+ oldest_offset = slot->offset;
}
- }
- }
-
- if (n_consecutive == 0) {
- /* There were no old requests. Look for an i/o request at the
- lowest offset in the array (we ignore the high 32 bits of the
- offset in these heuristics) */
-
- lowest_offset = ULINT_MAX;
-
- for (i = 0; i < n; i++) {
- slot = os_aio_array_get_nth_slot(array,
- i + segment * n);
-
- if (slot->reserved && slot->offset < lowest_offset) {
+ /* Look for an i/o request at the lowest offset in the array
+ * (we ignore the high 32 bits of the offset) */
+ if (slot->offset < lowest_offset) {
/* Found an i/o request */
- consecutive_ios[0] = slot;
-
- n_consecutive = 1;
-
+ lowest_request = slot;
lowest_offset = slot->offset;
}
}
}
- if (n_consecutive == 0) {
+ if (!lowest_request && !oldest_request) {
/* No i/o requested at the moment */
goto wait_for_io;
}
- slot = consecutive_ios[0];
+ if (oldest_request) {
+ slot = oldest_request;
+ } else {
+ slot = lowest_request;
+ }
+ consecutive_ios[0] = slot;
+ n_consecutive = 1;
/* Check if there are several consecutive blocks to read or write */
consecutive_loop:
for (i = 0; i < n; i++) {
- slot2 = os_aio_array_get_nth_slot(array, i + segment * n);
+ slot2 = os_aio_array_get_nth_slot(array, i);
if (slot2->reserved && slot2 != slot
&& slot2->offset == slot->offset + slot->len
@@ -3822,7 +3877,8 @@ consecutive_loop:
&& slot->offset + slot->len > slot->offset
&& slot2->offset_high == slot->offset_high
&& slot2->type == slot->type
- && slot2->file == slot->file) {
+ && slot2->file == slot->file
+ && slot2->status == OS_AIO_NOT_ISSUED) {
/* Found a consecutive i/o request */
@@ -3851,6 +3907,8 @@ consecutive_loop:
for (i = 0; i < n_consecutive; i++) {
total_len += consecutive_ios[i]->len;
+ ut_a(consecutive_ios[i]->status == OS_AIO_NOT_ISSUED);
+ consecutive_ios[i]->status = OS_AIO_ISSUED;
}
if (n_consecutive == 1) {
@@ -3858,7 +3916,14 @@ consecutive_loop:
combined_buf = slot->buf;
combined_buf2 = NULL;
} else {
- combined_buf2 = ut_malloc(total_len + UNIV_PAGE_SIZE);
+ if ((total_len + UNIV_PAGE_SIZE) > os_aio_thread_buffer_size[global_segment]) {
+ if (os_aio_thread_buffer[global_segment])
+ ut_free(os_aio_thread_buffer[global_segment]);
+
+ os_aio_thread_buffer[global_segment] = ut_malloc(total_len + UNIV_PAGE_SIZE);
+ os_aio_thread_buffer_size[global_segment] = total_len + UNIV_PAGE_SIZE;
+ }
+ combined_buf2 = os_aio_thread_buffer[global_segment];
ut_a(combined_buf2);
@@ -3869,6 +3934,9 @@ consecutive_loop:
this assumes that there is just one i/o-handler thread serving
a single segment of slots! */
+ ut_a(slot->reserved);
+ ut_a(slot->status == OS_AIO_ISSUED);
+
os_mutex_exit(array->mutex);
if (slot->type == OS_FILE_WRITE && n_consecutive > 1) {
@@ -3924,16 +3992,13 @@ consecutive_loop:
}
}
- if (combined_buf2) {
- ut_free(combined_buf2);
- }
-
os_mutex_enter(array->mutex);
/* Mark the i/os done in slots */
for (i = 0; i < n_consecutive; i++) {
- consecutive_ios[i]->io_already_done = TRUE;
+ ut_a(consecutive_ios[i]->status == OS_AIO_ISSUED);
+ consecutive_ios[i]->status = OS_AIO_DONE;
}
/* We return the messages for the first slot now, and if there were
@@ -3943,6 +4008,8 @@ consecutive_loop:
slot_io_done:
ut_a(slot->reserved);
+ ut_a(slot->status == OS_AIO_DONE);
+ slot->status = OS_AIO_CLAIMED;
*message1 = slot->message1;
*message2 = slot->message2;
=== modified file 'storage/xtradb/rem/rem0cmp.c'
--- a/storage/xtradb/rem/rem0cmp.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/rem/rem0cmp.c 2009-06-25 01:43:25 +0000
@@ -892,10 +892,11 @@ cmp_rec_rec_with_match(
matched fields; when the function returns,
contains the value the for current
comparison */
- ulint* matched_bytes) /* in/out: number of already matched
+ ulint* matched_bytes, /* in/out: number of already matched
bytes within the first field not completely
matched; when the function returns, contains
the value for the current comparison */
+ ulint stats_method)
{
#ifndef UNIV_HOTBACKUP
ulint rec1_n_fields; /* the number of fields in rec */
@@ -989,7 +990,11 @@ cmp_rec_rec_with_match(
if (rec1_f_len == rec2_f_len) {
- goto next_field;
+ if (stats_method == SRV_STATS_METHOD_NULLS_EQUAL) {
+ goto next_field;
+ } else {
+ ret = -1;
+ }
} else if (rec2_f_len == UNIV_SQL_NULL) {
=== modified file 'storage/xtradb/row/row0mysql.c'
--- a/storage/xtradb/row/row0mysql.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/row/row0mysql.c 2009-06-25 01:43:25 +0000
@@ -854,6 +854,9 @@ row_update_statistics_if_needed(
table->stat_modified_counter = counter + 1;
+ if (!srv_stats_auto_update)
+ return;
+
/* Calculate new statistics if 1 / 16 of table has been modified
since the last time a statistics batch was run, or if
stat_modified_counter > 2 000 000 000 (to avoid wrap-around).
=== modified file 'storage/xtradb/scripts/install_innodb_plugins.sql'
--- a/storage/xtradb/scripts/install_innodb_plugins.sql 2009-01-29 16:54:13 +0000
+++ b/storage/xtradb/scripts/install_innodb_plugins.sql 2009-06-25 01:43:25 +0000
@@ -12,3 +12,5 @@ INSTALL PLUGIN INNODB_BUFFER_POOL_PAGES
INSTALL PLUGIN INNODB_BUFFER_POOL_PAGES_BLOB SONAME 'ha_innodb.so';
INSTALL PLUGIN INNODB_BUFFER_POOL_PAGES_INDEX SONAME 'ha_innodb.so';
INSTALL PLUGIN innodb_rseg SONAME 'ha_innodb.so';
+INSTALL PLUGIN innodb_table_stats SONAME 'ha_innodb.so';
+INSTALL PLUGIN innodb_index_stats SONAME 'ha_innodb.so';
=== modified file 'storage/xtradb/srv/srv0srv.c'
--- a/storage/xtradb/srv/srv0srv.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/srv/srv0srv.c 2009-07-06 05:47:15 +0000
@@ -285,6 +285,7 @@ Value 10 should be good if there are les
computer. Bigger computers need bigger values. Value 0 will disable the
concurrency check. */
+UNIV_INTERN ibool srv_thread_concurrency_timer_based = FALSE;
UNIV_INTERN ulong srv_thread_concurrency = 0;
UNIV_INTERN ulong srv_commit_concurrency = 0;
@@ -336,6 +337,8 @@ UNIV_INTERN ibool srv_innodb_status = FA
/* When estimating number of different key values in an index, sample
this many index pages */
UNIV_INTERN unsigned long long srv_stats_sample_pages = 8;
+UNIV_INTERN ulint srv_stats_method = 0;
+UNIV_INTERN ulint srv_stats_auto_update = 1;
UNIV_INTERN ibool srv_use_doublewrite_buf = TRUE;
UNIV_INTERN ibool srv_use_checksums = TRUE;
@@ -361,14 +364,18 @@ UNIV_INTERN ulint srv_flush_neighbor_pag
UNIV_INTERN ulint srv_enable_unsafe_group_commit = 0; /* 0:disable 1:enable */
UNIV_INTERN ulint srv_read_ahead = 3; /* 1: random 2: linear 3: Both */
-UNIV_INTERN ulint srv_adaptive_checkpoint = 0; /* 0:disable 1:enable */
+UNIV_INTERN ulint srv_adaptive_checkpoint = 0; /* 0: none 1: reflex 2: estimate */
+
+UNIV_INTERN ulint srv_expand_import = 0; /* 0:disable 1:enable */
UNIV_INTERN ulint srv_extra_rsegments = 0; /* extra rseg for users */
+UNIV_INTERN ulint srv_dict_size_limit = 0;
/*-------------------------------------------*/
UNIV_INTERN ulong srv_n_spin_wait_rounds = 20;
UNIV_INTERN ulong srv_n_free_tickets_to_enter = 500;
UNIV_INTERN ulong srv_thread_sleep_delay = 10000;
UNIV_INTERN ulint srv_spin_wait_delay = 5;
+UNIV_INTERN ulint srv_spins_microsec = 50;
UNIV_INTERN ibool srv_priority_boost = TRUE;
#ifdef UNIV_DEBUG
@@ -657,6 +664,47 @@ are indexed by the type of the thread. *
UNIV_INTERN ulint srv_n_threads_active[SRV_MASTER + 1];
UNIV_INTERN ulint srv_n_threads[SRV_MASTER + 1];
+static
+void
+srv_align_spins_microsec(void)
+{
+ ulint start_sec, end_sec;
+ ulint start_usec, end_usec;
+ ib_uint64_t usecs;
+
+ /* change temporary */
+ srv_spins_microsec = 1;
+
+ if (ut_usectime(&start_sec, &start_usec)) {
+ srv_spins_microsec = 50;
+ goto end;
+ }
+
+ ut_delay(100000);
+
+ if (ut_usectime(&end_sec, &end_usec)) {
+ srv_spins_microsec = 50;
+ goto end;
+ }
+
+ usecs = (end_sec - start_sec) * 1000000LL + (end_usec - start_usec);
+
+ if (usecs) {
+ srv_spins_microsec = 100000 / usecs;
+ if (srv_spins_microsec == 0)
+ srv_spins_microsec = 1;
+ if (srv_spins_microsec > 50)
+ srv_spins_microsec = 50;
+ } else {
+ srv_spins_microsec = 50;
+ }
+end:
+ if (srv_spins_microsec != 50)
+ fprintf(stderr,
+ "InnoDB: unit of spin count at ut_delay() is aligned to %lu\n",
+ srv_spins_microsec);
+}
+
/*************************************************************************
Sets the info describing an i/o thread current state. */
UNIV_INTERN
@@ -889,6 +937,8 @@ srv_init(void)
dict_table_t* table;
ulint i;
+ srv_align_spins_microsec();
+
srv_sys = mem_alloc(sizeof(srv_sys_t));
kernel_mutex_temp = mem_alloc(sizeof(mutex_t));
@@ -1009,6 +1059,75 @@ UNIV_INTERN ulong srv_max_purge_lag = 0
/*************************************************************************
Puts an OS thread to wait if there are too many concurrent threads
(>= srv_thread_concurrency) inside InnoDB. The threads wait in a FIFO queue. */
+
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
+static void
+enter_innodb_with_tickets(trx_t* trx)
+{
+ trx->declared_to_be_inside_innodb = TRUE;
+ trx->n_tickets_to_enter_innodb = SRV_FREE_TICKETS_TO_ENTER;
+ return;
+}
+
+static void
+srv_conc_enter_innodb_timer_based(trx_t* trx)
+{
+ lint conc_n_threads;
+ ibool has_yielded = FALSE;
+ ulint has_slept = 0;
+
+ if (trx->declared_to_be_inside_innodb) {
+ ut_print_timestamp(stderr);
+ fputs(
+" InnoDB: Error: trying to declare trx to enter InnoDB, but\n"
+"InnoDB: it already is declared.\n", stderr);
+ trx_print(stderr, trx, 0);
+ putc('\n', stderr);
+ }
+retry:
+ if (srv_conc_n_threads < (lint) srv_thread_concurrency) {
+ conc_n_threads = __sync_add_and_fetch(&srv_conc_n_threads, 1);
+ if (conc_n_threads <= (lint) srv_thread_concurrency) {
+ enter_innodb_with_tickets(trx);
+ return;
+ }
+ __sync_add_and_fetch(&srv_conc_n_threads, -1);
+ }
+ if (!has_yielded)
+ {
+ has_yielded = TRUE;
+ os_thread_yield();
+ goto retry;
+ }
+ if (trx->has_search_latch
+ || NULL != UT_LIST_GET_FIRST(trx->trx_locks)) {
+
+ conc_n_threads = __sync_add_and_fetch(&srv_conc_n_threads, 1);
+ enter_innodb_with_tickets(trx);
+ return;
+ }
+ if (has_slept < 2)
+ {
+ trx->op_info = "sleeping before entering InnoDB";
+ os_thread_sleep(10000);
+ trx->op_info = "";
+ has_slept++;
+ }
+ conc_n_threads = __sync_add_and_fetch(&srv_conc_n_threads, 1);
+ enter_innodb_with_tickets(trx);
+ return;
+}
+
+static void
+srv_conc_exit_innodb_timer_based(trx_t* trx)
+{
+ __sync_add_and_fetch(&srv_conc_n_threads, -1);
+ trx->declared_to_be_inside_innodb = FALSE;
+ trx->n_tickets_to_enter_innodb = 0;
+ return;
+}
+#endif
+
UNIV_INTERN
void
srv_conc_enter_innodb(
@@ -1039,6 +1158,13 @@ srv_conc_enter_innodb(
return;
}
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
+ if (srv_thread_concurrency_timer_based) {
+ srv_conc_enter_innodb_timer_based(trx);
+ return;
+ }
+#endif
+
os_fast_mutex_lock(&srv_conc_mutex);
retry:
if (trx->declared_to_be_inside_innodb) {
@@ -1182,6 +1308,14 @@ srv_conc_force_enter_innodb(
}
ut_ad(srv_conc_n_threads >= 0);
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
+ if (srv_thread_concurrency_timer_based) {
+ __sync_add_and_fetch(&srv_conc_n_threads, 1);
+ trx->declared_to_be_inside_innodb = TRUE;
+ trx->n_tickets_to_enter_innodb = 1;
+ return;
+ }
+#endif
os_fast_mutex_lock(&srv_conc_mutex);
@@ -1215,6 +1349,13 @@ srv_conc_force_exit_innodb(
return;
}
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
+ if (srv_thread_concurrency_timer_based) {
+ srv_conc_exit_innodb_timer_based(trx);
+ return;
+ }
+#endif
+
os_fast_mutex_lock(&srv_conc_mutex);
ut_ad(srv_conc_n_threads > 0);
@@ -1934,6 +2075,7 @@ srv_export_innodb_status(void)
export_vars.innodb_data_reads = os_n_file_reads;
export_vars.innodb_data_writes = os_n_file_writes;
export_vars.innodb_data_written = srv_data_written;
+ export_vars.innodb_dict_tables= (dict_sys ? UT_LIST_GET_LEN(dict_sys->table_LRU) : 0);
export_vars.innodb_buffer_pool_read_requests = buf_pool->n_page_gets;
export_vars.innodb_buffer_pool_write_requests
= srv_buf_pool_write_requests;
@@ -2348,6 +2490,8 @@ srv_master_thread(
ibool skip_sleep = FALSE;
ulint i;
+ ib_uint64_t lsn_old;
+
ib_uint64_t oldest_lsn;
#ifdef UNIV_DEBUG_THREAD_CREATION
@@ -2365,6 +2509,9 @@ srv_master_thread(
mutex_exit(&kernel_mutex);
+ mutex_enter(&(log_sys->mutex));
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
loop:
/*****************************************************************/
/* ---- When there is database activity by users, we cycle in this
@@ -2399,6 +2546,19 @@ loop:
if (!skip_sleep) {
os_thread_sleep(1000000);
+
+ /*
+ mutex_enter(&(log_sys->mutex));
+ oldest_lsn = buf_pool_get_oldest_modification();
+ ib_uint64_t lsn = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
+
+ if(oldest_lsn)
+ fprintf(stderr,
+ "InnoDB flush: age pct: %lu, lsn progress: %lu\n",
+ (lsn - oldest_lsn) * 100 / log_sys->max_checkpoint_age,
+ lsn - lsn_old);
+ */
}
skip_sleep = FALSE;
@@ -2437,14 +2597,15 @@ loop:
+ log_sys->n_pending_writes;
n_ios = log_sys->n_log_ios + buf_pool->n_pages_read
+ buf_pool->n_pages_written;
- if (n_pend_ios < 3 && (n_ios - n_ios_old < PCT_IO(5))) {
+ if (n_pend_ios < PCT_IO(3) && (n_ios - n_ios_old < PCT_IO(5))) {
srv_main_thread_op_info = "doing insert buffer merge";
ibuf_contract_for_n_pages(
TRUE, PCT_IBUF_IO((srv_insert_buffer_batch_size / 4)));
srv_main_thread_op_info = "flushing log";
- log_buffer_flush_to_disk();
+ /* No fsync when srv_flush_log_at_trx_commit != 1 */
+ log_buffer_flush_maybe_sync();
}
if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
@@ -2462,13 +2623,16 @@ loop:
iteration of this loop. */
skip_sleep = TRUE;
- } else if (srv_adaptive_checkpoint) {
+ mutex_enter(&(log_sys->mutex));
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
+ } else if (srv_adaptive_checkpoint == 1) {
/* Try to keep modified age not to exceed
max_checkpoint_age * 7/8 line */
mutex_enter(&(log_sys->mutex));
-
+ lsn_old = log_sys->lsn;
oldest_lsn = buf_pool_get_oldest_modification();
if (oldest_lsn == 0) {
@@ -2504,7 +2668,93 @@ loop:
mutex_exit(&(log_sys->mutex));
}
}
+ } else if (srv_adaptive_checkpoint == 2) {
+ /* Try to keep modified age not to exceed
+ max_checkpoint_age * 7/8 line */
+
+ mutex_enter(&(log_sys->mutex));
+
+ oldest_lsn = buf_pool_get_oldest_modification();
+ if (oldest_lsn == 0) {
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
+
+ } else {
+ if ((log_sys->lsn - oldest_lsn)
+ > (log_sys->max_checkpoint_age) - ((log_sys->max_checkpoint_age) / 8)) {
+ /* LOG_POOL_PREFLUSH_RATIO_ASYNC is exceeded. */
+ /* We should not flush from here. */
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
+ } else if ((log_sys->lsn - oldest_lsn)
+ > (log_sys->max_checkpoint_age)/2 ) {
+
+ /* defence line (max_checkpoint_age * 1/2) */
+ ib_uint64_t lsn = log_sys->lsn;
+
+ mutex_exit(&(log_sys->mutex));
+
+ ib_uint64_t level, bpl;
+ buf_page_t* bpage;
+
+ mutex_enter(&flush_list_mutex);
+
+ level = 0;
+ bpage = UT_LIST_GET_FIRST(buf_pool->flush_list);
+
+ while (bpage != NULL) {
+ ib_uint64_t oldest_modification = bpage->oldest_modification;
+ if (oldest_modification != 0) {
+ level += log_sys->max_checkpoint_age
+ - (lsn - oldest_modification);
+ }
+ bpage = UT_LIST_GET_NEXT(flush_list, bpage);
+ }
+
+ if (level) {
+ bpl = ((ib_uint64_t) UT_LIST_GET_LEN(buf_pool->flush_list)
+ * UT_LIST_GET_LEN(buf_pool->flush_list)
+ * (lsn - lsn_old)) / level;
+ } else {
+ bpl = 0;
+ }
+
+ mutex_exit(&flush_list_mutex);
+
+ if (!srv_use_doublewrite_buf) {
+ /* flush is faster than when doublewrite */
+ bpl = (bpl * 3) / 4;
+ }
+
+ if (bpl) {
+retry_flush_batch:
+ n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
+ bpl,
+ oldest_lsn + (lsn - lsn_old));
+ if (n_pages_flushed == ULINT_UNDEFINED) {
+ os_thread_sleep(5000);
+ goto retry_flush_batch;
+ }
+ }
+
+ lsn_old = lsn;
+ /*
+ fprintf(stderr,
+ "InnoDB flush: age pct: %lu, lsn progress: %lu, blocks to flush:%llu\n",
+ (lsn - oldest_lsn) * 100 / log_sys->max_checkpoint_age,
+ lsn - lsn_old, bpl);
+ */
+ } else {
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
+ }
+ }
+
+ } else {
+ mutex_enter(&(log_sys->mutex));
+ lsn_old = log_sys->lsn;
+ mutex_exit(&(log_sys->mutex));
}
if (srv_activity_count == old_activity_count) {
@@ -2537,7 +2787,8 @@ loop:
buf_flush_batch(BUF_FLUSH_LIST, PCT_IO(100), IB_ULONGLONG_MAX);
srv_main_thread_op_info = "flushing log";
- log_buffer_flush_to_disk();
+ /* No fsync when srv_flush_log_at_trx_commit != 1 */
+ log_buffer_flush_maybe_sync();
}
/* We run a batch of insert buffer merge every 10 seconds,
@@ -2547,7 +2798,8 @@ loop:
ibuf_contract_for_n_pages(TRUE, PCT_IBUF_IO((srv_insert_buffer_batch_size / 4)));
srv_main_thread_op_info = "flushing log";
- log_buffer_flush_to_disk();
+ /* No fsync when srv_flush_log_at_trx_commit != 1 */
+ log_buffer_flush_maybe_sync();
/* We run a full purge every 10 seconds, even if the server
were active */
@@ -2718,7 +2970,14 @@ flush_loop:
srv_main_thread_op_info = "flushing log";
- log_buffer_flush_to_disk();
+ current_time = time(NULL);
+ if (difftime(current_time, last_flush_time) > 1) {
+ log_buffer_flush_to_disk();
+ last_flush_time = current_time;
+ } else {
+ /* No fsync when srv_flush_log_at_trx_commit != 1 */
+ log_buffer_flush_maybe_sync();
+ }
srv_main_thread_op_info = "making checkpoint";
=== modified file 'storage/xtradb/srv/srv0start.c'
--- a/storage/xtradb/srv/srv0start.c 2009-06-09 15:08:46 +0000
+++ b/storage/xtradb/srv/srv0start.c 2009-08-03 20:09:53 +0000
@@ -1269,7 +1269,7 @@ innobase_start_or_create_for_mysql(void)
os_aio_init(8 * SRV_N_PENDING_IOS_PER_THREAD
* srv_n_file_io_threads,
srv_n_read_io_threads, srv_n_write_io_threads,
- SRV_MAX_N_PENDING_SYNC_IOS * 8);
+ SRV_MAX_N_PENDING_SYNC_IOS);
} else {
os_aio_init(SRV_N_PENDING_IOS_PER_THREAD
* srv_n_file_io_threads,
=== modified file 'storage/xtradb/sync/sync0sync.c'
--- a/storage/xtradb/sync/sync0sync.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/sync/sync0sync.c 2009-06-25 01:43:25 +0000
@@ -1081,6 +1081,12 @@ sync_thread_add_level(
case SYNC_TRX_SYS_HEADER:
case SYNC_FILE_FORMAT_TAG:
case SYNC_DOUBLEWRITE:
+ case SYNC_BUF_LRU_LIST:
+ case SYNC_BUF_FLUSH_LIST:
+ case SYNC_BUF_PAGE_HASH:
+ case SYNC_BUF_FREE_LIST:
+ case SYNC_BUF_ZIP_FREE:
+ case SYNC_BUF_ZIP_HASH:
case SYNC_BUF_POOL:
case SYNC_SEARCH_SYS:
case SYNC_SEARCH_SYS_CONF:
@@ -1107,7 +1113,7 @@ sync_thread_add_level(
/* Either the thread must own the buffer pool mutex
(buf_pool_mutex), or it is allowed to latch only ONE
buffer block (block->mutex or buf_pool_zip_mutex). */
- ut_a((sync_thread_levels_contain(array, SYNC_BUF_POOL)
+ ut_a((sync_thread_levels_contain(array, SYNC_BUF_LRU_LIST)
&& sync_thread_levels_g(array, SYNC_BUF_BLOCK - 1))
|| sync_thread_levels_g(array, SYNC_BUF_BLOCK));
break;
=== modified file 'storage/xtradb/ut/ut0ut.c'
--- a/storage/xtradb/ut/ut0ut.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/ut/ut0ut.c 2009-06-25 01:43:25 +0000
@@ -372,6 +372,8 @@ ut_get_year_month_day(
/*****************************************************************
Runs an idle loop on CPU. The argument gives the desired delay
in microseconds on 100 MHz Pentium + Visual C++. */
+extern ulint srv_spins_microsec;
+
UNIV_INTERN
ulint
ut_delay(
@@ -383,7 +385,11 @@ ut_delay(
j = 0;
- for (i = 0; i < delay * 50; i++) {
+ for (i = 0; i < delay * srv_spins_microsec; i++) {
+#if (defined (__i386__) || defined (__x86_64__)) && defined (__GNUC__)
+ /* it is equal to the instruction 'pause' */
+ __asm__ __volatile__ ("rep; nop");
+#endif
j += i;
}
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 29 Jul '09
by worklog-noreply@askmonty.org 29 Jul '09
29 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300
+++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300
@@ -2,163 +2,146 @@
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
-1. Conditions for removal
-1.1 Quick check if there are candidates
-2. Removal operation properties
-3. Removal operation
-4. User interface
-5. Tests and benchmarks
-6. Todo, issues to resolve
-6.1 To resolve
-6.2 Resolved
-7. Additional issues
+1. Elimination criteria
+2. No outside references check
+2.1 Quick check if there are tables with no outside references
+3. One-match check
+3.1 Functional dependency source #1: Potential eq_ref access
+3.2 Functional dependency source #2: col2=func(col1)
+3.3 Functional dependency source #3: One or zero records in the table
+3.4 Functional dependency check implementation
+3.4.1 Equality collection: Option1
+3.4.2 Equality collection: Option2
+3.4.3 Functional dependency propagation - option 1
+3.4.4 Functional dependency propagation - option 2
+4. Removal operation properties
+5. Removal operation
+6. User interface
+6.1 @@optimizer_switch flag
+6.2 EXPLAIN [EXTENDED]
+7. Miscellaneous adjustments
+7.1 Fix used_tables() of aggregate functions
+7.2 Make subquery predicates collect their outer references
+8. Other concerns
+8.1 Relationship with outer->inner joins converter
+8.2 Relationship with prepared statements
+8.3 Relationship with constant table detection
+9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
-1. Conditions for removal
--------------------------
-We can eliminate an inner side of outer join if:
-1. For each record combination of outer tables, it will always produce
- exactly one record.
-2. There are no references to columns of the inner tables anywhere else in
+1. Elimination criteria
+=======================
+We can eliminate inner side of an outer join nest if:
+
+1. There are no references to columns of the inner tables anywhere else in
the query.
+2. For each record combination of outer tables, it will always produce
+ exactly one matching record combination.
+
+Most of effort in this WL entry is checking these two conditions.
-#1 means that every table inside the outer join nest is:
- - is a constant table:
- = because it can be accessed via eq_ref(const) access, or
- = it is a zero-rows or one-row MyISAM-like table [MARK1]
- - has an eq_ref access method candidate.
-
-#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
- GROUP BY and HAVING do not refer to the inner tables of the outer join
- nest.
-
-1.1 Quick check if there are candidates
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before we start to enumerate join nests, here is a quick way to check if
-there *can be* something to be removed:
+2. No outside references check
+==============================
+Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
+outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
+tables of the outer join nest we're trying to remove.
+
+For multi-table UPDATE/DELETE we also must not remove tables that we're
+updating/deleting from or tables that are used in UPDATE's SET clause.
+
+2.1 Quick check if there are tables with no outside references
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start searching for outer join nests that could be eliminated,
+we'll do a quick and cheap check if there possibly could be something that
+could be eliminated:
- if ((tables used in select_list |
+ if (there are outer joins &&
+ (tables used in select_list |
tables used in group/order by UNION |
- tables used in where) != bitmap_of_all_tables)
+ tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
-2. Removal operation properties
--------------------------------
-* There is always one way to remove (no choice to remove either this or that)
-* It is always better to remove as much tables as possible (at least within
- our cost model).
-Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
-3. Removal operation
---------------------
-* Remove the outer join nest's nested join structure (i.e. get the
- outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
- $OJ->embedding->nested_join. Update table_map's of all ancestor nested
- joins). [MARK2]
+3. One-match check
+==================
+We can eliminate inner side of outer join if it will always generate exactly
+one matching record combination.
-* Move the tables and their JOIN_TABs to front like it is done with const
- tables, with exception that if eliminated outer join nest was within
- another outer join nest, that shouldn't prevent us from moving away the
- eliminated tables.
+By definition of OUTER JOIN, a NULL-complemented record combination will be
+generated when the inner side of outer join has not produced any matches.
-* Update join->table_count and all-join-tables bitmap.
+What remains to be checked is that there is no possiblity that inner side of
+the outer join could produce more than one matching record combination.
-* That's it. Nothing else?
+We'll refer to one-match property as "functional dependency":
-4. User interface
------------------
-* We'll add an @@optimizer switch flag for table elimination. Tentative
- name: 'table_elimination'.
- (Note ^^ utility of the above questioned ^, as table elimination can never
- be worse than no elimination. We're leaning towards not adding the flag)
-
-* EXPLAIN will not show the removed tables at all. This will allow to check
- if tables were removed, and also will behave nicely with anchor model and
- VIEWs: stuff that user doesn't care about just won't be there.
+- A outer join nest is functionally dependent [wrt outer tables] if it will
+ produce one matching record combination per each record combination of
+ outer tables
-5. Tests and benchmarks
------------------------
-Create a benchmark in sql-bench which checks if the DBMS has table
-elimination.
-[According to Monty] Run
- - queries that would use elimination
- - queries that are very similar to one above (so that they would have same
- QEP, execution cost, etc) but cannot use table elimination.
-then compare run times and make a conclusion about whether dbms supports table
-elimination.
+- A table is functionally dependent wrt certain set of dependency tables, if
+ record combination of dependency tables uniquely identifies zero or one
+ matching record in the table
-6. Todo, issues to resolve
---------------------------
+- Definitions of functional dependency of keys (=column tuples) and columns are
+ apparent.
-6.1 To resolve
-~~~~~~~~~~~~~~
-- Relationship with prepared statements.
- On one hand, it's natural to desire to make table elimination a
- once-per-statement operation, like outer->inner join conversion. We'll have
- to limit the applicability by removing [MARK1] as that can change during
- lifetime of the statement.
-
- The other option is to do table elimination every time. This will require to
- rework operation [MARK2] to be undoable.
-
- I'm leaning towards doing the former. With anchor modeling, it is unlikely
- that we'll meet outer joins which have N inner tables of which some are 1-row
- MyISAM tables that do not have primary key.
-
-6.2 Resolved
-~~~~~~~~~~~~
-* outer->inner join conversion is not a problem for table elimination.
- We make outer->inner conversions based on predicates in WHERE. If the WHERE
- referred to an inner table (requirement for OJ->IJ conversion) then table
- elimination would not be applicable anyway.
-
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
+Our goal is to prove that the entire join nest is functionally-dependent.
-* Aggregate functions used to report that they depend on all tables, that is,
+Join nest is functionally dependent (on the otside tables) if each of its
+elements (those can be either base tables or join nests) is functionally
+dependent.
- item_agg_func->used_tables() == (1ULL << join->tables) - 1
+Functional dependency is transitive: if table A is f-dependent on the outer
+tables and table B is f.dependent on {A, outer_tables} then B is functionally
+dependent on the outer tables.
+
+Subsequent sections list cases when we can declare a table to be
+functionally-dependent.
+
+3.1 Functional dependency source #1: Potential eq_ref access
+------------------------------------------------------------
+This is the most practically-important case. Taking the example from the HLD
+of this WL entry:
+
+ select
+ A.colA
+ from
+ tableA A
+ left outer join
+ tableB B
+ on
+ B.id = A.id;
- always. Fixed it, now aggregate function reports it depends on
- tables that its arguments depend on. In particular, COUNT(*) reports
- that it depends on no tables (item_count_star->used_tables()==0).
- One consequence of that is that "item->used_tables()==0" is not
- equivalent to "item->const_item()==true" anymore (not sure if it's
- "anymore" or this has been already happening).
-
-* EXPLAIN EXTENDED warning text was generated after the JOIN object has
- been discarded. This didn't allow to use information about join plan
- when printing the warning. Fixed this by keeping the JOIN objects until
- we've printed the warning (have also an intent to remove the const
- tables from the join output).
-
-7. Additional issues
---------------------
-* We remove ON clauses within outer join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
- Yes. Current approach: when removing an outer join nest, walk the ON clause
- and mark subselects as eliminated. Then let EXPLAIN code check if the
- SELECT was eliminated before the printing (EXPLAIN is generated by doing
- a recursive descent, so the check will also cause children of eliminated
- selects not to be printed)
-
-* Table elimination is performed after constant table detection (but before
- the range analysis). Constant tables are technically different from
- eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
- Considering we've already done the join_read_const_table() call, is there any
- real difference between constant table and eliminated one? If there is, should
- we mark const tables also as eliminated?
- from user/EXPLAIN point of view: no. constant table is the one that we read
- one record from. eliminated table is the one that we don't acccess at all.
+and generalizing it: a table TBL is functionally-dependent if the ON
+expression allows to construct a potential eq_ref access to table TBL that
+uses only outer or functionally-dependent tables.
+
+In other words: table TBL will have one match if the ON expression can be
+converted into this form
+
+ TBL.unique_key=func(one_match_tables) AND .. remainder ...
+
+(with appropriate extension for multi-part keys), where
+
+ one_match_tables= {
+ tables that are not on the inner side of the outer join in question, and
+ functionally dependent tables
+ }
+
+Note that this will cover constant tables, except those that are constant because
+they have 0/1 record or are partitioned and have no used partitions.
+
+
+3.2 Functional dependency source #2: col2=func(col1)
+----------------------------------------------------
+This comes from the second example in the HLS:
-* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
@@ -169,32 +152,331 @@
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
- This is because condition "B.fromDate= func(tableB)" cannot be used.
- Reason#1: update_ref_and_keys() does not consider such conditions to
- be of any use (and indeed they are not usable for ref access)
- so they are not put into KEYUSE array.
- Reason#2: even if they were put there, we would need to be able to tell
- between predicates like
- B.fromDate= func(B.id) // guarantees only one matching row as
- // B.id is already bound by B.id=A.id
- // hence B.fromDate becomes bound too.
- and
- "B.fromDate= func(B.*)" // Can potentially have many matching
- // records.
- We need to
- - Have update_ref_and_keys() create KEYUSE elements for such equalities
- - Have eliminate_tables() and friends make a more accurate check.
- The right check is to check whether all parts of a unique key are bound.
- If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
- keypartY to be bound.
- The difficulty here is that correlated subquery predicate cannot tell what
- columns it depends on (it only remembers tables).
- Traversing the predicate is expensive and complicated.
- We're leaning towards making each subquery predicate have a List<Item> with
- items that
- - are in the current select
- - and it depends on.
- This list will be useful in certain other subquery optimizations as well,
- it is cheap to collect it in fix_fields() phase, so it will be collected
- for every subquery predicate.
+Here it is apparent that tableB can be eliminated. It is not possible to
+construct eq_ref access to tableB, though, because for the second part of the
+primary key (fromDate column) we only got a condition in this form:
+
+ B.fromDate= func(tableB)
+
+(we write "func(tableB)" because ref optimizer can only determine which tables
+the right part of the equality depends on).
+
+In general case, equality like this doesn't guarantee functional dependency.
+For example, if func() == { return fromDate;}, i.e the ON expression is
+
+ ... ON B.id = A.id and B.fromDate = B.fromDate
+
+then that would allow table B to have multiple matches per record of table A.
+
+In order to be able to distinguish between these two cases, we'll need to go
+down to column level:
+
+- A table is functionally dependent if it has a unique key that's functionally
+ dependent
+
+- A unique key is functionally dependent when all of its columns are
+ functionally dependent
+
+- A table column is functionally dependent if the ON clause allows to extract
+ an AND-part in this form:
+
+ tbl.column = f(functionally-dependent columns or columns of outer tables)
+
+3.3 Functional dependency source #3: One or zero records in the table
+---------------------------------------------------------------------
+A table with one or zero records cannot generate more than one matching
+record. This source is of lesser importance as one/zero-record tables are only
+MyISAM tables.
+
+3.4 Functional dependency check implementation
+----------------------------------------------
+As shown above, we need something similar to KEYUSE structures, but not
+exactly that (we need things that current ref optimizer considers unusable and
+don't need things that it considers usable).
+
+3.4.1 Equality collection: Option1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We could
+- extend KEYUSE structures to store all kinds of equalities we need
+- change update_ref_and_keys() and co. to collect equalities both for ref
+ access and for table elimination
+ = [possibly] Improve [eq_]ref access to be able to use equalities in
+ form keypart2=func(keypart1)
+- process the KEYUSE array both by table elimination and by ref access
+ optimizer.
+
++ This requires less effort.
+- Code will have to be changed all over sql_select.cc
+- update_ref_and_keys() and co. already do several unrelated things. Hooking
+ up table elimination will make it even worse.
+
+3.4.2 Equality collection: Option2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Alternatively, we could process the WHERE clause totally on our own.
++ Table elimination is standalone and easy to detach module.
+- Some code duplication with update_ref_and_keys() and co.
+
+Having got the equalities, we'll to propagate functional dependency property
+to unique keys, tables and, ultimately, join nests.
+
+3.4.3 Functional dependency propagation - option 1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Borrow the approach used in constant table detection code:
+
+ do
+ {
+ converted= FALSE;
+ for each table T in join nest
+ {
+ if (check_if_functionally_dependent(T))
+ converted= TRUE;
+ }
+ } while (converted == TRUE);
+
+ check_if_functionally_dependent(T)
+ {
+ if (T has eq_ref access based on func_dep_tables)
+ return TRUE;
+
+ Apply the same do-while loop-based approach to available equalities
+ T.column1=func(other columns)
+ to spread the set of functionally-dependent columns. The goal is to get
+ all columns of a certain unique key to be bound.
+ }
+
+
+3.4.4 Functional dependency propagation - option 2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Analyze the ON expression(s) and build a list of
+
+ tbl.field = expr(...)
+
+equalities. tbl here is a table that belongs to a join nest that could
+potentially be eliminated.
+
+besides those, add to the list
+ - An element for each unique key in the table that needs to be eliminated
+ - An element for each table that needs to be eliminated
+ - An element for each join nest that can be eliminated (i.e. has no
+ references from outside).
+
+Then, setup "reverse dependencies": each element should have pointers to
+elements that are functionally dependent on it:
+
+- "tbl.field=expr(...)" equality is functionally dependent on all fields that
+ are used in "expr(...)" (here we take into account only fields that belong
+ to tables that can potentially be eliminated).
+- a unique key is dependent on all of its components
+- a table is dependent on all of its unique keys
+- a join nest is dependent on all tables that it contains
+
+These pointers are stored in form of one bitmap, such that:
+
+ "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
+
+Each object also stores a number of dependencies it needs to be satisfied
+before it itself is satisfied:
+
+- "tbl.field=expr(...)" needs all its underlying fields (if a field is
+ referenced many times it is counted only once)
+
+- a unique key needs all of its key parts
+
+- a table needs only one of its unique keys
+
+- a join nest needs all of its tables
+
+(TODO: so what do we do when we've marked a table as constant? We'll need to
+update the "field=expr(....)" elements that use fields of that table. And the
+problem is that we won't know how much to decrement from the counters of those
+elements.
+
+Solution#1: switch to table_map() based approach.
+Solution#2: introduce separate elements for each involved field.
+ field will depend on its table,
+ "field=expr" will depend on fields.
+)
+
+Besides the above, let each element have a pointer to another element, so that
+we can have a linked list of elements.
+
+After the above structures have been created, we start the main algorithm.
+
+The first step is to create a list of functionally-dependent elements. We walk
+across array of dependencies and mark those elements that are already bound
+(i.e. their dependencies are satisfied). At the moment those immediately-bound
+are only "field=expr" dependencies that don't refer to any columns that are
+not bound.
+
+The second step is the loop
+
+ while (bound_list is not empty)
+ {
+ Take the first bound element F off the list.
+ Use the bitmap to find out what other elements depended on it
+ for each such element E
+ {
+ if (E becomes bound after F is bound)
+ add E to the list;
+ }
+ }
+
+The last step is to walk through elements that represent the join nests. Those
+that are bound can be eliminated.
+
+4. Removal operation properties
+===============================
+* There is always one way to remove (no choice to remove either this or that)
+* It is always better to remove as much tables as possible (at least within
+ our cost model).
+Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
+
+
+5. Removal operation
+====================
+(This depends a lot on whether we make table elimination a one-off rewrite or
+conditional)
+
+At the moment table elimination is re-done for each join re-execution, hence
+the removal operation is designed not to modify any statement's permanent
+members.
+
+* Remove the outer join nest's nested join structure (i.e. get the
+ outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
+ $OJ->embedding->nested_join. Update table_map's of all ancestor nested
+ joins). [MARK2]
+
+* Move the tables and their JOIN_TABs to the front of join order, like it is
+ done with const tables, with exception that if eliminated outer join nest
+ was within another outer join nest, that shouldn't prevent us from moving
+ away the eliminated tables.
+
+* Update join->table_count and all-join-tables bitmap.
+ ^ TODO: not true anymore ^
+
+* That's it. Nothing else?
+
+6. User interface
+=================
+
+6.1 @@optimizer_switch flag
+---------------------------
+Argument againist adding the flag:
+* It is always better to perform table elimination than not to do it.
+
+Arguments for the flag:
+* It is always theoretically possible that the new code will cause unintended
+ slowdowns.
+* Having the flag is useful for QA and comparative benchmarking.
+
+Decision so far: add the flag under #ifdef. Make the flag be present in debug
+builds.
+
+6.2 EXPLAIN [EXTENDED]
+----------------------
+There are two possible options:
+1. Show eliminated tables, like we do with const tables.
+2. Do not show eliminated tables.
+
+We chose option 2, because:
+- the table is not accessed at all (besides locking it)
+- it is more natural for anchor model user - when he's querying an anchor-
+ and attributes view, he doesn't care about the unused attributes.
+
+EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
+
+NOTE: Before this WL, the warning text was generated after all JOIN objects
+have been destroyed. This didn't allow to use information about join plan
+when printing the warning. We've fixed this by keeping the JOIN objects until
+the warning text has been generated.
+
+Table elimination removes inner sides of outer join, and logically the ON
+clause is also removed. If this clause has any subqueries, they will be
+also removed from EXPLAIN output.
+
+An exception to the above is that if we eliminate a derived table, it will
+still be shown in EXPLAIN output. This comes from the fact that the FROM
+subqueries are evaluated before table elimination is invoked.
+TODO: Is the above ok or still remove parts of FROM subqueries?
+
+7. Miscellaneous adjustments
+============================
+
+7.1 Fix used_tables() of aggregate functions
+--------------------------------------------
+Aggregate functions used to report that they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+always. Fixed it, now aggregate function reports that it depends on the
+tables that its arguments depend on. In particular, COUNT(*) reports that it
+depends on no tables (item_count_star->used_tables()==0). One consequence of
+that is that "item->used_tables()==0" is not equivalent to
+"item->const_item()==true" anymore (not sure if it's "anymore" or this has
+been already so for some items).
+
+7.2 Make subquery predicates collect their outer references
+-----------------------------------------------------------
+Per-column functional dependency analysis requires us to take a
+
+ tbl.field = func(...)
+
+equality and tell which columns of which tables are referred from func(...)
+expression. For scalar expressions, this is accomplished by Item::walk()-based
+traversal. It should be reasonably cheap (the only practical Item that can be
+expensive to traverse seems to be a special case of "col IN (const1,const2,
+...)". check if we traverse the long list for such items).
+
+For correlated subqueries, traversal can be expensive, it is cheaper to make
+each subquery item have a list of its outer references. The list can be
+collected at fix_fields() stage with very little extra cost, and then it could
+be used for other optimizations.
+
+
+8. Other concerns
+=================
+
+8.1 Relationship with outer->inner joins converter
+--------------------------------------------------
+One could suspect that outer->inner join conversion could get in the way
+of table elimination by changing outer joins (which could be eliminated)
+to inner (which we will not try to eliminate).
+This concern is not valid: we make outer->inner conversions based on
+predicates in WHERE. If the WHERE referred to an inner table (this is a
+requirement for the conversion) then table elimination would not be
+applicable anyway.
+
+8.2 Relationship with prepared statements
+-----------------------------------------
+On one hand, it's natural to desire to make table elimination a
+once-per-statement operation, like outer->inner join conversion. We'll have
+to limit the applicability by removing [MARK1] as that can change during
+lifetime of the statement.
+
+The other option is to do table elimination every time. This will require to
+rework operation [MARK2] to be undoable.
+
+
+8.3 Relationship with constant table detection
+----------------------------------------------
+Table elimination is performed after constant table detection (but before
+the range analysis). Constant tables are technically different from
+eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+Considering we've already done the join_read_const_table() call, is there any
+real difference between constant table and eliminated one? If there is, should
+we mark const tables also as eliminated?
+from user/EXPLAIN point of view: no. constant table is the one that we read
+one record from. eliminated table is the one that we don't acccess at all.
+TODO
+
+9. Tests and benchmarks
+=======================
+Create a benchmark in sql-bench which checks if the DBMS has table
+elimination.
+[According to Monty] Run
+ - query Q1 that would use elimination
+ - query Q2 that is very similar to Q1 (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether the used dbms
+supports table elimination.
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
------------------------------------------------------------
-=-=(View All Progress Notes, 26 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Elimination criteria
2. No outside references check
2.1 Quick check if there are tables with no outside references
3. One-match check
3.1 Functional dependency source #1: Potential eq_ref access
3.2 Functional dependency source #2: col2=func(col1)
3.3 Functional dependency source #3: One or zero records in the table
3.4 Functional dependency check implementation
3.4.1 Equality collection: Option1
3.4.2 Equality collection: Option2
3.4.3 Functional dependency propagation - option 1
3.4.4 Functional dependency propagation - option 2
4. Removal operation properties
5. Removal operation
6. User interface
6.1 @@optimizer_switch flag
6.2 EXPLAIN [EXTENDED]
7. Miscellaneous adjustments
7.1 Fix used_tables() of aggregate functions
7.2 Make subquery predicates collect their outer references
8. Other concerns
8.1 Relationship with outer->inner joins converter
8.2 Relationship with prepared statements
8.3 Relationship with constant table detection
9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Elimination criteria
=======================
We can eliminate inner side of an outer join nest if:
1. There are no references to columns of the inner tables anywhere else in
the query.
2. For each record combination of outer tables, it will always produce
exactly one matching record combination.
Most of effort in this WL entry is checking these two conditions.
2. No outside references check
==============================
Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
tables of the outer join nest we're trying to remove.
For multi-table UPDATE/DELETE we also must not remove tables that we're
updating/deleting from or tables that are used in UPDATE's SET clause.
2.1 Quick check if there are tables with no outside references
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start searching for outer join nests that could be eliminated,
we'll do a quick and cheap check if there possibly could be something that
could be eliminated:
if (there are outer joins &&
(tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
3. One-match check
==================
We can eliminate inner side of outer join if it will always generate exactly
one matching record combination.
By definition of OUTER JOIN, a NULL-complemented record combination will be
generated when the inner side of outer join has not produced any matches.
What remains to be checked is that there is no possiblity that inner side of
the outer join could produce more than one matching record combination.
We'll refer to one-match property as "functional dependency":
- A outer join nest is functionally dependent [wrt outer tables] if it will
produce one matching record combination per each record combination of
outer tables
- A table is functionally dependent wrt certain set of dependency tables, if
record combination of dependency tables uniquely identifies zero or one
matching record in the table
- Definitions of functional dependency of keys (=column tuples) and columns are
apparent.
Our goal is to prove that the entire join nest is functionally-dependent.
Join nest is functionally dependent (on the otside tables) if each of its
elements (those can be either base tables or join nests) is functionally
dependent.
Functional dependency is transitive: if table A is f-dependent on the outer
tables and table B is f.dependent on {A, outer_tables} then B is functionally
dependent on the outer tables.
Subsequent sections list cases when we can declare a table to be
functionally-dependent.
3.1 Functional dependency source #1: Potential eq_ref access
------------------------------------------------------------
This is the most practically-important case. Taking the example from the HLD
of this WL entry:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
and generalizing it: a table TBL is functionally-dependent if the ON
expression allows to construct a potential eq_ref access to table TBL that
uses only outer or functionally-dependent tables.
In other words: table TBL will have one match if the ON expression can be
converted into this form
TBL.unique_key=func(one_match_tables) AND .. remainder ...
(with appropriate extension for multi-part keys), where
one_match_tables= {
tables that are not on the inner side of the outer join in question, and
functionally dependent tables
}
Note that this will cover constant tables, except those that are constant because
they have 0/1 record or are partitioned and have no used partitions.
3.2 Functional dependency source #2: col2=func(col1)
----------------------------------------------------
This comes from the second example in the HLS:
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
Here it is apparent that tableB can be eliminated. It is not possible to
construct eq_ref access to tableB, though, because for the second part of the
primary key (fromDate column) we only got a condition in this form:
B.fromDate= func(tableB)
(we write "func(tableB)" because ref optimizer can only determine which tables
the right part of the equality depends on).
In general case, equality like this doesn't guarantee functional dependency.
For example, if func() == { return fromDate;}, i.e the ON expression is
... ON B.id = A.id and B.fromDate = B.fromDate
then that would allow table B to have multiple matches per record of table A.
In order to be able to distinguish between these two cases, we'll need to go
down to column level:
- A table is functionally dependent if it has a unique key that's functionally
dependent
- A unique key is functionally dependent when all of its columns are
functionally dependent
- A table column is functionally dependent if the ON clause allows to extract
an AND-part in this form:
tbl.column = f(functionally-dependent columns or columns of outer tables)
3.3 Functional dependency source #3: One or zero records in the table
---------------------------------------------------------------------
A table with one or zero records cannot generate more than one matching
record. This source is of lesser importance as one/zero-record tables are only
MyISAM tables.
3.4 Functional dependency check implementation
----------------------------------------------
As shown above, we need something similar to KEYUSE structures, but not
exactly that (we need things that current ref optimizer considers unusable and
don't need things that it considers usable).
3.4.1 Equality collection: Option1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could
- extend KEYUSE structures to store all kinds of equalities we need
- change update_ref_and_keys() and co. to collect equalities both for ref
access and for table elimination
= [possibly] Improve [eq_]ref access to be able to use equalities in
form keypart2=func(keypart1)
- process the KEYUSE array both by table elimination and by ref access
optimizer.
+ This requires less effort.
- Code will have to be changed all over sql_select.cc
- update_ref_and_keys() and co. already do several unrelated things. Hooking
up table elimination will make it even worse.
3.4.2 Equality collection: Option2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alternatively, we could process the WHERE clause totally on our own.
+ Table elimination is standalone and easy to detach module.
- Some code duplication with update_ref_and_keys() and co.
Having got the equalities, we'll to propagate functional dependency property
to unique keys, tables and, ultimately, join nests.
3.4.3 Functional dependency propagation - option 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Borrow the approach used in constant table detection code:
do
{
converted= FALSE;
for each table T in join nest
{
if (check_if_functionally_dependent(T))
converted= TRUE;
}
} while (converted == TRUE);
check_if_functionally_dependent(T)
{
if (T has eq_ref access based on func_dep_tables)
return TRUE;
Apply the same do-while loop-based approach to available equalities
T.column1=func(other columns)
to spread the set of functionally-dependent columns. The goal is to get
all columns of a certain unique key to be bound.
}
3.4.4 Functional dependency propagation - option 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Analyze the ON expression(s) and build a list of
tbl.field = expr(...)
equalities. tbl here is a table that belongs to a join nest that could
potentially be eliminated.
besides those, add to the list
- An element for each unique key in the table that needs to be eliminated
- An element for each table that needs to be eliminated
- An element for each join nest that can be eliminated (i.e. has no
references from outside).
Then, setup "reverse dependencies": each element should have pointers to
elements that are functionally dependent on it:
- "tbl.field=expr(...)" equality is functionally dependent on all fields that
are used in "expr(...)" (here we take into account only fields that belong
to tables that can potentially be eliminated).
- a unique key is dependent on all of its components
- a table is dependent on all of its unique keys
- a join nest is dependent on all tables that it contains
These pointers are stored in form of one bitmap, such that:
"X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
Each object also stores a number of dependencies it needs to be satisfied
before it itself is satisfied:
- "tbl.field=expr(...)" needs all its underlying fields (if a field is
referenced many times it is counted only once)
- a unique key needs all of its key parts
- a table needs only one of its unique keys
- a join nest needs all of its tables
(TODO: so what do we do when we've marked a table as constant? We'll need to
update the "field=expr(....)" elements that use fields of that table. And the
problem is that we won't know how much to decrement from the counters of those
elements.
Solution#1: switch to table_map() based approach.
Solution#2: introduce separate elements for each involved field.
field will depend on its table,
"field=expr" will depend on fields.
)
Besides the above, let each element have a pointer to another element, so that
we can have a linked list of elements.
After the above structures have been created, we start the main algorithm.
The first step is to create a list of functionally-dependent elements. We walk
across array of dependencies and mark those elements that are already bound
(i.e. their dependencies are satisfied). At the moment those immediately-bound
are only "field=expr" dependencies that don't refer to any columns that are
not bound.
The second step is the loop
while (bound_list is not empty)
{
Take the first bound element F off the list.
Use the bitmap to find out what other elements depended on it
for each such element E
{
if (E becomes bound after F is bound)
add E to the list;
}
}
The last step is to walk through elements that represent the join nests. Those
that are bound can be eliminated.
4. Removal operation properties
===============================
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
5. Removal operation
====================
(This depends a lot on whether we make table elimination a one-off rewrite or
conditional)
At the moment table elimination is re-done for each join re-execution, hence
the removal operation is designed not to modify any statement's permanent
members.
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to the front of join order, like it is
done with const tables, with exception that if eliminated outer join nest
was within another outer join nest, that shouldn't prevent us from moving
away the eliminated tables.
* Update join->table_count and all-join-tables bitmap.
^ TODO: not true anymore ^
* That's it. Nothing else?
6. User interface
=================
6.1 @@optimizer_switch flag
---------------------------
Argument againist adding the flag:
* It is always better to perform table elimination than not to do it.
Arguments for the flag:
* It is always theoretically possible that the new code will cause unintended
slowdowns.
* Having the flag is useful for QA and comparative benchmarking.
Decision so far: add the flag under #ifdef. Make the flag be present in debug
builds.
6.2 EXPLAIN [EXTENDED]
----------------------
There are two possible options:
1. Show eliminated tables, like we do with const tables.
2. Do not show eliminated tables.
We chose option 2, because:
- the table is not accessed at all (besides locking it)
- it is more natural for anchor model user - when he's querying an anchor-
and attributes view, he doesn't care about the unused attributes.
EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
NOTE: Before this WL, the warning text was generated after all JOIN objects
have been destroyed. This didn't allow to use information about join plan
when printing the warning. We've fixed this by keeping the JOIN objects until
the warning text has been generated.
Table elimination removes inner sides of outer join, and logically the ON
clause is also removed. If this clause has any subqueries, they will be
also removed from EXPLAIN output.
An exception to the above is that if we eliminate a derived table, it will
still be shown in EXPLAIN output. This comes from the fact that the FROM
subqueries are evaluated before table elimination is invoked.
TODO: Is the above ok or still remove parts of FROM subqueries?
7. Miscellaneous adjustments
============================
7.1 Fix used_tables() of aggregate functions
--------------------------------------------
Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports that it depends on the
tables that its arguments depend on. In particular, COUNT(*) reports that it
depends on no tables (item_count_star->used_tables()==0). One consequence of
that is that "item->used_tables()==0" is not equivalent to
"item->const_item()==true" anymore (not sure if it's "anymore" or this has
been already so for some items).
7.2 Make subquery predicates collect their outer references
-----------------------------------------------------------
Per-column functional dependency analysis requires us to take a
tbl.field = func(...)
equality and tell which columns of which tables are referred from func(...)
expression. For scalar expressions, this is accomplished by Item::walk()-based
traversal. It should be reasonably cheap (the only practical Item that can be
expensive to traverse seems to be a special case of "col IN (const1,const2,
...)". check if we traverse the long list for such items).
For correlated subqueries, traversal can be expensive, it is cheaper to make
each subquery item have a list of its outer references. The list can be
collected at fix_fields() stage with very little extra cost, and then it could
be used for other optimizations.
8. Other concerns
=================
8.1 Relationship with outer->inner joins converter
--------------------------------------------------
One could suspect that outer->inner join conversion could get in the way
of table elimination by changing outer joins (which could be eliminated)
to inner (which we will not try to eliminate).
This concern is not valid: we make outer->inner conversions based on
predicates in WHERE. If the WHERE referred to an inner table (this is a
requirement for the conversion) then table elimination would not be
applicable anyway.
8.2 Relationship with prepared statements
-----------------------------------------
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
8.3 Relationship with constant table detection
----------------------------------------------
Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
TODO
9. Tests and benchmarks
=======================
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- query Q1 that would use elimination
- query Q2 that is very similar to Q1 (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether the used dbms
supports table elimination.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 29 Jul '09
by worklog-noreply@askmonty.org 29 Jul '09
29 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300
+++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300
@@ -2,163 +2,146 @@
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
-1. Conditions for removal
-1.1 Quick check if there are candidates
-2. Removal operation properties
-3. Removal operation
-4. User interface
-5. Tests and benchmarks
-6. Todo, issues to resolve
-6.1 To resolve
-6.2 Resolved
-7. Additional issues
+1. Elimination criteria
+2. No outside references check
+2.1 Quick check if there are tables with no outside references
+3. One-match check
+3.1 Functional dependency source #1: Potential eq_ref access
+3.2 Functional dependency source #2: col2=func(col1)
+3.3 Functional dependency source #3: One or zero records in the table
+3.4 Functional dependency check implementation
+3.4.1 Equality collection: Option1
+3.4.2 Equality collection: Option2
+3.4.3 Functional dependency propagation - option 1
+3.4.4 Functional dependency propagation - option 2
+4. Removal operation properties
+5. Removal operation
+6. User interface
+6.1 @@optimizer_switch flag
+6.2 EXPLAIN [EXTENDED]
+7. Miscellaneous adjustments
+7.1 Fix used_tables() of aggregate functions
+7.2 Make subquery predicates collect their outer references
+8. Other concerns
+8.1 Relationship with outer->inner joins converter
+8.2 Relationship with prepared statements
+8.3 Relationship with constant table detection
+9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
-1. Conditions for removal
--------------------------
-We can eliminate an inner side of outer join if:
-1. For each record combination of outer tables, it will always produce
- exactly one record.
-2. There are no references to columns of the inner tables anywhere else in
+1. Elimination criteria
+=======================
+We can eliminate inner side of an outer join nest if:
+
+1. There are no references to columns of the inner tables anywhere else in
the query.
+2. For each record combination of outer tables, it will always produce
+ exactly one matching record combination.
+
+Most of effort in this WL entry is checking these two conditions.
-#1 means that every table inside the outer join nest is:
- - is a constant table:
- = because it can be accessed via eq_ref(const) access, or
- = it is a zero-rows or one-row MyISAM-like table [MARK1]
- - has an eq_ref access method candidate.
-
-#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
- GROUP BY and HAVING do not refer to the inner tables of the outer join
- nest.
-
-1.1 Quick check if there are candidates
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before we start to enumerate join nests, here is a quick way to check if
-there *can be* something to be removed:
+2. No outside references check
+==============================
+Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
+outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
+tables of the outer join nest we're trying to remove.
+
+For multi-table UPDATE/DELETE we also must not remove tables that we're
+updating/deleting from or tables that are used in UPDATE's SET clause.
+
+2.1 Quick check if there are tables with no outside references
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we start searching for outer join nests that could be eliminated,
+we'll do a quick and cheap check if there possibly could be something that
+could be eliminated:
- if ((tables used in select_list |
+ if (there are outer joins &&
+ (tables used in select_list |
tables used in group/order by UNION |
- tables used in where) != bitmap_of_all_tables)
+ tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
-2. Removal operation properties
--------------------------------
-* There is always one way to remove (no choice to remove either this or that)
-* It is always better to remove as much tables as possible (at least within
- our cost model).
-Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
-3. Removal operation
---------------------
-* Remove the outer join nest's nested join structure (i.e. get the
- outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
- $OJ->embedding->nested_join. Update table_map's of all ancestor nested
- joins). [MARK2]
+3. One-match check
+==================
+We can eliminate inner side of outer join if it will always generate exactly
+one matching record combination.
-* Move the tables and their JOIN_TABs to front like it is done with const
- tables, with exception that if eliminated outer join nest was within
- another outer join nest, that shouldn't prevent us from moving away the
- eliminated tables.
+By definition of OUTER JOIN, a NULL-complemented record combination will be
+generated when the inner side of outer join has not produced any matches.
-* Update join->table_count and all-join-tables bitmap.
+What remains to be checked is that there is no possiblity that inner side of
+the outer join could produce more than one matching record combination.
-* That's it. Nothing else?
+We'll refer to one-match property as "functional dependency":
-4. User interface
------------------
-* We'll add an @@optimizer switch flag for table elimination. Tentative
- name: 'table_elimination'.
- (Note ^^ utility of the above questioned ^, as table elimination can never
- be worse than no elimination. We're leaning towards not adding the flag)
-
-* EXPLAIN will not show the removed tables at all. This will allow to check
- if tables were removed, and also will behave nicely with anchor model and
- VIEWs: stuff that user doesn't care about just won't be there.
+- A outer join nest is functionally dependent [wrt outer tables] if it will
+ produce one matching record combination per each record combination of
+ outer tables
-5. Tests and benchmarks
------------------------
-Create a benchmark in sql-bench which checks if the DBMS has table
-elimination.
-[According to Monty] Run
- - queries that would use elimination
- - queries that are very similar to one above (so that they would have same
- QEP, execution cost, etc) but cannot use table elimination.
-then compare run times and make a conclusion about whether dbms supports table
-elimination.
+- A table is functionally dependent wrt certain set of dependency tables, if
+ record combination of dependency tables uniquely identifies zero or one
+ matching record in the table
-6. Todo, issues to resolve
---------------------------
+- Definitions of functional dependency of keys (=column tuples) and columns are
+ apparent.
-6.1 To resolve
-~~~~~~~~~~~~~~
-- Relationship with prepared statements.
- On one hand, it's natural to desire to make table elimination a
- once-per-statement operation, like outer->inner join conversion. We'll have
- to limit the applicability by removing [MARK1] as that can change during
- lifetime of the statement.
-
- The other option is to do table elimination every time. This will require to
- rework operation [MARK2] to be undoable.
-
- I'm leaning towards doing the former. With anchor modeling, it is unlikely
- that we'll meet outer joins which have N inner tables of which some are 1-row
- MyISAM tables that do not have primary key.
-
-6.2 Resolved
-~~~~~~~~~~~~
-* outer->inner join conversion is not a problem for table elimination.
- We make outer->inner conversions based on predicates in WHERE. If the WHERE
- referred to an inner table (requirement for OJ->IJ conversion) then table
- elimination would not be applicable anyway.
-
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
+Our goal is to prove that the entire join nest is functionally-dependent.
-* Aggregate functions used to report that they depend on all tables, that is,
+Join nest is functionally dependent (on the otside tables) if each of its
+elements (those can be either base tables or join nests) is functionally
+dependent.
- item_agg_func->used_tables() == (1ULL << join->tables) - 1
+Functional dependency is transitive: if table A is f-dependent on the outer
+tables and table B is f.dependent on {A, outer_tables} then B is functionally
+dependent on the outer tables.
+
+Subsequent sections list cases when we can declare a table to be
+functionally-dependent.
+
+3.1 Functional dependency source #1: Potential eq_ref access
+------------------------------------------------------------
+This is the most practically-important case. Taking the example from the HLD
+of this WL entry:
+
+ select
+ A.colA
+ from
+ tableA A
+ left outer join
+ tableB B
+ on
+ B.id = A.id;
- always. Fixed it, now aggregate function reports it depends on
- tables that its arguments depend on. In particular, COUNT(*) reports
- that it depends on no tables (item_count_star->used_tables()==0).
- One consequence of that is that "item->used_tables()==0" is not
- equivalent to "item->const_item()==true" anymore (not sure if it's
- "anymore" or this has been already happening).
-
-* EXPLAIN EXTENDED warning text was generated after the JOIN object has
- been discarded. This didn't allow to use information about join plan
- when printing the warning. Fixed this by keeping the JOIN objects until
- we've printed the warning (have also an intent to remove the const
- tables from the join output).
-
-7. Additional issues
---------------------
-* We remove ON clauses within outer join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
- Yes. Current approach: when removing an outer join nest, walk the ON clause
- and mark subselects as eliminated. Then let EXPLAIN code check if the
- SELECT was eliminated before the printing (EXPLAIN is generated by doing
- a recursive descent, so the check will also cause children of eliminated
- selects not to be printed)
-
-* Table elimination is performed after constant table detection (but before
- the range analysis). Constant tables are technically different from
- eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
- Considering we've already done the join_read_const_table() call, is there any
- real difference between constant table and eliminated one? If there is, should
- we mark const tables also as eliminated?
- from user/EXPLAIN point of view: no. constant table is the one that we read
- one record from. eliminated table is the one that we don't acccess at all.
+and generalizing it: a table TBL is functionally-dependent if the ON
+expression allows to construct a potential eq_ref access to table TBL that
+uses only outer or functionally-dependent tables.
+
+In other words: table TBL will have one match if the ON expression can be
+converted into this form
+
+ TBL.unique_key=func(one_match_tables) AND .. remainder ...
+
+(with appropriate extension for multi-part keys), where
+
+ one_match_tables= {
+ tables that are not on the inner side of the outer join in question, and
+ functionally dependent tables
+ }
+
+Note that this will cover constant tables, except those that are constant because
+they have 0/1 record or are partitioned and have no used partitions.
+
+
+3.2 Functional dependency source #2: col2=func(col1)
+----------------------------------------------------
+This comes from the second example in the HLS:
-* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
@@ -169,32 +152,331 @@
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
- This is because condition "B.fromDate= func(tableB)" cannot be used.
- Reason#1: update_ref_and_keys() does not consider such conditions to
- be of any use (and indeed they are not usable for ref access)
- so they are not put into KEYUSE array.
- Reason#2: even if they were put there, we would need to be able to tell
- between predicates like
- B.fromDate= func(B.id) // guarantees only one matching row as
- // B.id is already bound by B.id=A.id
- // hence B.fromDate becomes bound too.
- and
- "B.fromDate= func(B.*)" // Can potentially have many matching
- // records.
- We need to
- - Have update_ref_and_keys() create KEYUSE elements for such equalities
- - Have eliminate_tables() and friends make a more accurate check.
- The right check is to check whether all parts of a unique key are bound.
- If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
- keypartY to be bound.
- The difficulty here is that correlated subquery predicate cannot tell what
- columns it depends on (it only remembers tables).
- Traversing the predicate is expensive and complicated.
- We're leaning towards making each subquery predicate have a List<Item> with
- items that
- - are in the current select
- - and it depends on.
- This list will be useful in certain other subquery optimizations as well,
- it is cheap to collect it in fix_fields() phase, so it will be collected
- for every subquery predicate.
+Here it is apparent that tableB can be eliminated. It is not possible to
+construct eq_ref access to tableB, though, because for the second part of the
+primary key (fromDate column) we only got a condition in this form:
+
+ B.fromDate= func(tableB)
+
+(we write "func(tableB)" because ref optimizer can only determine which tables
+the right part of the equality depends on).
+
+In general case, equality like this doesn't guarantee functional dependency.
+For example, if func() == { return fromDate;}, i.e the ON expression is
+
+ ... ON B.id = A.id and B.fromDate = B.fromDate
+
+then that would allow table B to have multiple matches per record of table A.
+
+In order to be able to distinguish between these two cases, we'll need to go
+down to column level:
+
+- A table is functionally dependent if it has a unique key that's functionally
+ dependent
+
+- A unique key is functionally dependent when all of its columns are
+ functionally dependent
+
+- A table column is functionally dependent if the ON clause allows to extract
+ an AND-part in this form:
+
+ tbl.column = f(functionally-dependent columns or columns of outer tables)
+
+3.3 Functional dependency source #3: One or zero records in the table
+---------------------------------------------------------------------
+A table with one or zero records cannot generate more than one matching
+record. This source is of lesser importance as one/zero-record tables are only
+MyISAM tables.
+
+3.4 Functional dependency check implementation
+----------------------------------------------
+As shown above, we need something similar to KEYUSE structures, but not
+exactly that (we need things that current ref optimizer considers unusable and
+don't need things that it considers usable).
+
+3.4.1 Equality collection: Option1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We could
+- extend KEYUSE structures to store all kinds of equalities we need
+- change update_ref_and_keys() and co. to collect equalities both for ref
+ access and for table elimination
+ = [possibly] Improve [eq_]ref access to be able to use equalities in
+ form keypart2=func(keypart1)
+- process the KEYUSE array both by table elimination and by ref access
+ optimizer.
+
++ This requires less effort.
+- Code will have to be changed all over sql_select.cc
+- update_ref_and_keys() and co. already do several unrelated things. Hooking
+ up table elimination will make it even worse.
+
+3.4.2 Equality collection: Option2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Alternatively, we could process the WHERE clause totally on our own.
++ Table elimination is standalone and easy to detach module.
+- Some code duplication with update_ref_and_keys() and co.
+
+Having got the equalities, we'll to propagate functional dependency property
+to unique keys, tables and, ultimately, join nests.
+
+3.4.3 Functional dependency propagation - option 1
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Borrow the approach used in constant table detection code:
+
+ do
+ {
+ converted= FALSE;
+ for each table T in join nest
+ {
+ if (check_if_functionally_dependent(T))
+ converted= TRUE;
+ }
+ } while (converted == TRUE);
+
+ check_if_functionally_dependent(T)
+ {
+ if (T has eq_ref access based on func_dep_tables)
+ return TRUE;
+
+ Apply the same do-while loop-based approach to available equalities
+ T.column1=func(other columns)
+ to spread the set of functionally-dependent columns. The goal is to get
+ all columns of a certain unique key to be bound.
+ }
+
+
+3.4.4 Functional dependency propagation - option 2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Analyze the ON expression(s) and build a list of
+
+ tbl.field = expr(...)
+
+equalities. tbl here is a table that belongs to a join nest that could
+potentially be eliminated.
+
+besides those, add to the list
+ - An element for each unique key in the table that needs to be eliminated
+ - An element for each table that needs to be eliminated
+ - An element for each join nest that can be eliminated (i.e. has no
+ references from outside).
+
+Then, setup "reverse dependencies": each element should have pointers to
+elements that are functionally dependent on it:
+
+- "tbl.field=expr(...)" equality is functionally dependent on all fields that
+ are used in "expr(...)" (here we take into account only fields that belong
+ to tables that can potentially be eliminated).
+- a unique key is dependent on all of its components
+- a table is dependent on all of its unique keys
+- a join nest is dependent on all tables that it contains
+
+These pointers are stored in form of one bitmap, such that:
+
+ "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
+
+Each object also stores a number of dependencies it needs to be satisfied
+before it itself is satisfied:
+
+- "tbl.field=expr(...)" needs all its underlying fields (if a field is
+ referenced many times it is counted only once)
+
+- a unique key needs all of its key parts
+
+- a table needs only one of its unique keys
+
+- a join nest needs all of its tables
+
+(TODO: so what do we do when we've marked a table as constant? We'll need to
+update the "field=expr(....)" elements that use fields of that table. And the
+problem is that we won't know how much to decrement from the counters of those
+elements.
+
+Solution#1: switch to table_map() based approach.
+Solution#2: introduce separate elements for each involved field.
+ field will depend on its table,
+ "field=expr" will depend on fields.
+)
+
+Besides the above, let each element have a pointer to another element, so that
+we can have a linked list of elements.
+
+After the above structures have been created, we start the main algorithm.
+
+The first step is to create a list of functionally-dependent elements. We walk
+across array of dependencies and mark those elements that are already bound
+(i.e. their dependencies are satisfied). At the moment those immediately-bound
+are only "field=expr" dependencies that don't refer to any columns that are
+not bound.
+
+The second step is the loop
+
+ while (bound_list is not empty)
+ {
+ Take the first bound element F off the list.
+ Use the bitmap to find out what other elements depended on it
+ for each such element E
+ {
+ if (E becomes bound after F is bound)
+ add E to the list;
+ }
+ }
+
+The last step is to walk through elements that represent the join nests. Those
+that are bound can be eliminated.
+
+4. Removal operation properties
+===============================
+* There is always one way to remove (no choice to remove either this or that)
+* It is always better to remove as much tables as possible (at least within
+ our cost model).
+Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
+
+
+5. Removal operation
+====================
+(This depends a lot on whether we make table elimination a one-off rewrite or
+conditional)
+
+At the moment table elimination is re-done for each join re-execution, hence
+the removal operation is designed not to modify any statement's permanent
+members.
+
+* Remove the outer join nest's nested join structure (i.e. get the
+ outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
+ $OJ->embedding->nested_join. Update table_map's of all ancestor nested
+ joins). [MARK2]
+
+* Move the tables and their JOIN_TABs to the front of join order, like it is
+ done with const tables, with exception that if eliminated outer join nest
+ was within another outer join nest, that shouldn't prevent us from moving
+ away the eliminated tables.
+
+* Update join->table_count and all-join-tables bitmap.
+ ^ TODO: not true anymore ^
+
+* That's it. Nothing else?
+
+6. User interface
+=================
+
+6.1 @@optimizer_switch flag
+---------------------------
+Argument againist adding the flag:
+* It is always better to perform table elimination than not to do it.
+
+Arguments for the flag:
+* It is always theoretically possible that the new code will cause unintended
+ slowdowns.
+* Having the flag is useful for QA and comparative benchmarking.
+
+Decision so far: add the flag under #ifdef. Make the flag be present in debug
+builds.
+
+6.2 EXPLAIN [EXTENDED]
+----------------------
+There are two possible options:
+1. Show eliminated tables, like we do with const tables.
+2. Do not show eliminated tables.
+
+We chose option 2, because:
+- the table is not accessed at all (besides locking it)
+- it is more natural for anchor model user - when he's querying an anchor-
+ and attributes view, he doesn't care about the unused attributes.
+
+EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
+
+NOTE: Before this WL, the warning text was generated after all JOIN objects
+have been destroyed. This didn't allow to use information about join plan
+when printing the warning. We've fixed this by keeping the JOIN objects until
+the warning text has been generated.
+
+Table elimination removes inner sides of outer join, and logically the ON
+clause is also removed. If this clause has any subqueries, they will be
+also removed from EXPLAIN output.
+
+An exception to the above is that if we eliminate a derived table, it will
+still be shown in EXPLAIN output. This comes from the fact that the FROM
+subqueries are evaluated before table elimination is invoked.
+TODO: Is the above ok or still remove parts of FROM subqueries?
+
+7. Miscellaneous adjustments
+============================
+
+7.1 Fix used_tables() of aggregate functions
+--------------------------------------------
+Aggregate functions used to report that they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+always. Fixed it, now aggregate function reports that it depends on the
+tables that its arguments depend on. In particular, COUNT(*) reports that it
+depends on no tables (item_count_star->used_tables()==0). One consequence of
+that is that "item->used_tables()==0" is not equivalent to
+"item->const_item()==true" anymore (not sure if it's "anymore" or this has
+been already so for some items).
+
+7.2 Make subquery predicates collect their outer references
+-----------------------------------------------------------
+Per-column functional dependency analysis requires us to take a
+
+ tbl.field = func(...)
+
+equality and tell which columns of which tables are referred from func(...)
+expression. For scalar expressions, this is accomplished by Item::walk()-based
+traversal. It should be reasonably cheap (the only practical Item that can be
+expensive to traverse seems to be a special case of "col IN (const1,const2,
+...)". check if we traverse the long list for such items).
+
+For correlated subqueries, traversal can be expensive, it is cheaper to make
+each subquery item have a list of its outer references. The list can be
+collected at fix_fields() stage with very little extra cost, and then it could
+be used for other optimizations.
+
+
+8. Other concerns
+=================
+
+8.1 Relationship with outer->inner joins converter
+--------------------------------------------------
+One could suspect that outer->inner join conversion could get in the way
+of table elimination by changing outer joins (which could be eliminated)
+to inner (which we will not try to eliminate).
+This concern is not valid: we make outer->inner conversions based on
+predicates in WHERE. If the WHERE referred to an inner table (this is a
+requirement for the conversion) then table elimination would not be
+applicable anyway.
+
+8.2 Relationship with prepared statements
+-----------------------------------------
+On one hand, it's natural to desire to make table elimination a
+once-per-statement operation, like outer->inner join conversion. We'll have
+to limit the applicability by removing [MARK1] as that can change during
+lifetime of the statement.
+
+The other option is to do table elimination every time. This will require to
+rework operation [MARK2] to be undoable.
+
+
+8.3 Relationship with constant table detection
+----------------------------------------------
+Table elimination is performed after constant table detection (but before
+the range analysis). Constant tables are technically different from
+eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+Considering we've already done the join_read_const_table() call, is there any
+real difference between constant table and eliminated one? If there is, should
+we mark const tables also as eliminated?
+from user/EXPLAIN point of view: no. constant table is the one that we read
+one record from. eliminated table is the one that we don't acccess at all.
+TODO
+
+9. Tests and benchmarks
+=======================
+Create a benchmark in sql-bench which checks if the DBMS has table
+elimination.
+[According to Monty] Run
+ - query Q1 that would use elimination
+ - query Q2 that is very similar to Q1 (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether the used dbms
+supports table elimination.
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
------------------------------------------------------------
-=-=(View All Progress Notes, 26 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Elimination criteria
2. No outside references check
2.1 Quick check if there are tables with no outside references
3. One-match check
3.1 Functional dependency source #1: Potential eq_ref access
3.2 Functional dependency source #2: col2=func(col1)
3.3 Functional dependency source #3: One or zero records in the table
3.4 Functional dependency check implementation
3.4.1 Equality collection: Option1
3.4.2 Equality collection: Option2
3.4.3 Functional dependency propagation - option 1
3.4.4 Functional dependency propagation - option 2
4. Removal operation properties
5. Removal operation
6. User interface
6.1 @@optimizer_switch flag
6.2 EXPLAIN [EXTENDED]
7. Miscellaneous adjustments
7.1 Fix used_tables() of aggregate functions
7.2 Make subquery predicates collect their outer references
8. Other concerns
8.1 Relationship with outer->inner joins converter
8.2 Relationship with prepared statements
8.3 Relationship with constant table detection
9. Tests and benchmarks
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Elimination criteria
=======================
We can eliminate inner side of an outer join nest if:
1. There are no references to columns of the inner tables anywhere else in
the query.
2. For each record combination of outer tables, it will always produce
exactly one matching record combination.
Most of effort in this WL entry is checking these two conditions.
2. No outside references check
==============================
Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent
outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner
tables of the outer join nest we're trying to remove.
For multi-table UPDATE/DELETE we also must not remove tables that we're
updating/deleting from or tables that are used in UPDATE's SET clause.
2.1 Quick check if there are tables with no outside references
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start searching for outer join nests that could be eliminated,
we'll do a quick and cheap check if there possibly could be something that
could be eliminated:
if (there are outer joins &&
(tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_join_tables)
{
attempt table elimination;
}
3. One-match check
==================
We can eliminate inner side of outer join if it will always generate exactly
one matching record combination.
By definition of OUTER JOIN, a NULL-complemented record combination will be
generated when the inner side of outer join has not produced any matches.
What remains to be checked is that there is no possiblity that inner side of
the outer join could produce more than one matching record combination.
We'll refer to one-match property as "functional dependency":
- A outer join nest is functionally dependent [wrt outer tables] if it will
produce one matching record combination per each record combination of
outer tables
- A table is functionally dependent wrt certain set of dependency tables, if
record combination of dependency tables uniquely identifies zero or one
matching record in the table
- Definitions of functional dependency of keys (=column tuples) and columns are
apparent.
Our goal is to prove that the entire join nest is functionally-dependent.
Join nest is functionally dependent (on the otside tables) if each of its
elements (those can be either base tables or join nests) is functionally
dependent.
Functional dependency is transitive: if table A is f-dependent on the outer
tables and table B is f.dependent on {A, outer_tables} then B is functionally
dependent on the outer tables.
Subsequent sections list cases when we can declare a table to be
functionally-dependent.
3.1 Functional dependency source #1: Potential eq_ref access
------------------------------------------------------------
This is the most practically-important case. Taking the example from the HLD
of this WL entry:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
and generalizing it: a table TBL is functionally-dependent if the ON
expression allows to construct a potential eq_ref access to table TBL that
uses only outer or functionally-dependent tables.
In other words: table TBL will have one match if the ON expression can be
converted into this form
TBL.unique_key=func(one_match_tables) AND .. remainder ...
(with appropriate extension for multi-part keys), where
one_match_tables= {
tables that are not on the inner side of the outer join in question, and
functionally dependent tables
}
Note that this will cover constant tables, except those that are constant because
they have 0/1 record or are partitioned and have no used partitions.
3.2 Functional dependency source #2: col2=func(col1)
----------------------------------------------------
This comes from the second example in the HLS:
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
Here it is apparent that tableB can be eliminated. It is not possible to
construct eq_ref access to tableB, though, because for the second part of the
primary key (fromDate column) we only got a condition in this form:
B.fromDate= func(tableB)
(we write "func(tableB)" because ref optimizer can only determine which tables
the right part of the equality depends on).
In general case, equality like this doesn't guarantee functional dependency.
For example, if func() == { return fromDate;}, i.e the ON expression is
... ON B.id = A.id and B.fromDate = B.fromDate
then that would allow table B to have multiple matches per record of table A.
In order to be able to distinguish between these two cases, we'll need to go
down to column level:
- A table is functionally dependent if it has a unique key that's functionally
dependent
- A unique key is functionally dependent when all of its columns are
functionally dependent
- A table column is functionally dependent if the ON clause allows to extract
an AND-part in this form:
tbl.column = f(functionally-dependent columns or columns of outer tables)
3.3 Functional dependency source #3: One or zero records in the table
---------------------------------------------------------------------
A table with one or zero records cannot generate more than one matching
record. This source is of lesser importance as one/zero-record tables are only
MyISAM tables.
3.4 Functional dependency check implementation
----------------------------------------------
As shown above, we need something similar to KEYUSE structures, but not
exactly that (we need things that current ref optimizer considers unusable and
don't need things that it considers usable).
3.4.1 Equality collection: Option1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could
- extend KEYUSE structures to store all kinds of equalities we need
- change update_ref_and_keys() and co. to collect equalities both for ref
access and for table elimination
= [possibly] Improve [eq_]ref access to be able to use equalities in
form keypart2=func(keypart1)
- process the KEYUSE array both by table elimination and by ref access
optimizer.
+ This requires less effort.
- Code will have to be changed all over sql_select.cc
- update_ref_and_keys() and co. already do several unrelated things. Hooking
up table elimination will make it even worse.
3.4.2 Equality collection: Option2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alternatively, we could process the WHERE clause totally on our own.
+ Table elimination is standalone and easy to detach module.
- Some code duplication with update_ref_and_keys() and co.
Having got the equalities, we'll to propagate functional dependency property
to unique keys, tables and, ultimately, join nests.
3.4.3 Functional dependency propagation - option 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Borrow the approach used in constant table detection code:
do
{
converted= FALSE;
for each table T in join nest
{
if (check_if_functionally_dependent(T))
converted= TRUE;
}
} while (converted == TRUE);
check_if_functionally_dependent(T)
{
if (T has eq_ref access based on func_dep_tables)
return TRUE;
Apply the same do-while loop-based approach to available equalities
T.column1=func(other columns)
to spread the set of functionally-dependent columns. The goal is to get
all columns of a certain unique key to be bound.
}
3.4.4 Functional dependency propagation - option 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Analyze the ON expression(s) and build a list of
tbl.field = expr(...)
equalities. tbl here is a table that belongs to a join nest that could
potentially be eliminated.
besides those, add to the list
- An element for each unique key in the table that needs to be eliminated
- An element for each table that needs to be eliminated
- An element for each join nest that can be eliminated (i.e. has no
references from outside).
Then, setup "reverse dependencies": each element should have pointers to
elements that are functionally dependent on it:
- "tbl.field=expr(...)" equality is functionally dependent on all fields that
are used in "expr(...)" (here we take into account only fields that belong
to tables that can potentially be eliminated).
- a unique key is dependent on all of its components
- a table is dependent on all of its unique keys
- a join nest is dependent on all tables that it contains
These pointers are stored in form of one bitmap, such that:
"X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] )
Each object also stores a number of dependencies it needs to be satisfied
before it itself is satisfied:
- "tbl.field=expr(...)" needs all its underlying fields (if a field is
referenced many times it is counted only once)
- a unique key needs all of its key parts
- a table needs only one of its unique keys
- a join nest needs all of its tables
(TODO: so what do we do when we've marked a table as constant? We'll need to
update the "field=expr(....)" elements that use fields of that table. And the
problem is that we won't know how much to decrement from the counters of those
elements.
Solution#1: switch to table_map() based approach.
Solution#2: introduce separate elements for each involved field.
field will depend on its table,
"field=expr" will depend on fields.
)
Besides the above, let each element have a pointer to another element, so that
we can have a linked list of elements.
After the above structures have been created, we start the main algorithm.
The first step is to create a list of functionally-dependent elements. We walk
across array of dependencies and mark those elements that are already bound
(i.e. their dependencies are satisfied). At the moment those immediately-bound
are only "field=expr" dependencies that don't refer to any columns that are
not bound.
The second step is the loop
while (bound_list is not empty)
{
Take the first bound element F off the list.
Use the bitmap to find out what other elements depended on it
for each such element E
{
if (E becomes bound after F is bound)
add E to the list;
}
}
The last step is to walk through elements that represent the join nests. Those
that are bound can be eliminated.
4. Removal operation properties
===============================
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
5. Removal operation
====================
(This depends a lot on whether we make table elimination a one-off rewrite or
conditional)
At the moment table elimination is re-done for each join re-execution, hence
the removal operation is designed not to modify any statement's permanent
members.
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to the front of join order, like it is
done with const tables, with exception that if eliminated outer join nest
was within another outer join nest, that shouldn't prevent us from moving
away the eliminated tables.
* Update join->table_count and all-join-tables bitmap.
^ TODO: not true anymore ^
* That's it. Nothing else?
6. User interface
=================
6.1 @@optimizer_switch flag
---------------------------
Argument againist adding the flag:
* It is always better to perform table elimination than not to do it.
Arguments for the flag:
* It is always theoretically possible that the new code will cause unintended
slowdowns.
* Having the flag is useful for QA and comparative benchmarking.
Decision so far: add the flag under #ifdef. Make the flag be present in debug
builds.
6.2 EXPLAIN [EXTENDED]
----------------------
There are two possible options:
1. Show eliminated tables, like we do with const tables.
2. Do not show eliminated tables.
We chose option 2, because:
- the table is not accessed at all (besides locking it)
- it is more natural for anchor model user - when he's querying an anchor-
and attributes view, he doesn't care about the unused attributes.
EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either.
NOTE: Before this WL, the warning text was generated after all JOIN objects
have been destroyed. This didn't allow to use information about join plan
when printing the warning. We've fixed this by keeping the JOIN objects until
the warning text has been generated.
Table elimination removes inner sides of outer join, and logically the ON
clause is also removed. If this clause has any subqueries, they will be
also removed from EXPLAIN output.
An exception to the above is that if we eliminate a derived table, it will
still be shown in EXPLAIN output. This comes from the fact that the FROM
subqueries are evaluated before table elimination is invoked.
TODO: Is the above ok or still remove parts of FROM subqueries?
7. Miscellaneous adjustments
============================
7.1 Fix used_tables() of aggregate functions
--------------------------------------------
Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports that it depends on the
tables that its arguments depend on. In particular, COUNT(*) reports that it
depends on no tables (item_count_star->used_tables()==0). One consequence of
that is that "item->used_tables()==0" is not equivalent to
"item->const_item()==true" anymore (not sure if it's "anymore" or this has
been already so for some items).
7.2 Make subquery predicates collect their outer references
-----------------------------------------------------------
Per-column functional dependency analysis requires us to take a
tbl.field = func(...)
equality and tell which columns of which tables are referred from func(...)
expression. For scalar expressions, this is accomplished by Item::walk()-based
traversal. It should be reasonably cheap (the only practical Item that can be
expensive to traverse seems to be a special case of "col IN (const1,const2,
...)". check if we traverse the long list for such items).
For correlated subqueries, traversal can be expensive, it is cheaper to make
each subquery item have a list of its outer references. The list can be
collected at fix_fields() stage with very little extra cost, and then it could
be used for other optimizations.
8. Other concerns
=================
8.1 Relationship with outer->inner joins converter
--------------------------------------------------
One could suspect that outer->inner join conversion could get in the way
of table elimination by changing outer joins (which could be eliminated)
to inner (which we will not try to eliminate).
This concern is not valid: we make outer->inner conversions based on
predicates in WHERE. If the WHERE referred to an inner table (this is a
requirement for the conversion) then table elimination would not be
applicable anyway.
8.2 Relationship with prepared statements
-----------------------------------------
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
8.3 Relationship with constant table detection
----------------------------------------------
Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
TODO
9. Tests and benchmarks
=======================
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- query Q1 that would use elimination
- query Q2 that is very similar to Q1 (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether the used dbms
supports table elimination.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Rev 2820: Apply Evgen's fix: in file:///home/psergey/dev/mysql-next-fix-subq-r2/
by Sergey Petrunya 29 Jul '09
by Sergey Petrunya 29 Jul '09
29 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq-r2/
------------------------------------------------------------
revno: 2820
revision-id: psergey(a)askmonty.org-20090729161849-ynumr03ety244ueu
parent: psergey(a)askmonty.org-20090708174703-dz9uf5b0m6pcvtl6
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq-r2
timestamp: Wed 2009-07-29 20:18:49 +0400
message:
Apply Evgen's fix:
Bug#45174: Incorrectly applied equality propagation caused wrong result
on a query with a materialized semi-join.
Equality propagation is done after query execution plan is chosen. It
substitutes fields from tables being retrieved later for fields from tables
being retrieved earlier. Materialized semi-joins are exception to this rule.
For field which belongs to a table within a materialized semi-join, we can
only pick fields from the same semi-join.
Example: suppose we have a join order:
ot1 ot2 SJ-Mat(it1 it2 it3) ot3
and equality ot2.col = it1.col = it2.col
If we're looking for best substitute for 'it2.col', we should pick it1.col
and not ot2.col.
For a field that is not in a materialized semi-join we must pick a field
that's not embedded in a materialized semi-join.
Example: suppose we have a join order:
SJ-Mat(it1 it2) ot1 ot2
and equality ot2.col = ot1.col = it2.col
If we're looking for best substitute for 'ot2.col', we should pick ot1.col
and not it2.col, because when we run a join between ot1 and ot2
execution of SJ-Mat(...) has already finished and we can't rely on the value
of it*.*.
Now the Item_equal::get_first function accepts as a parameter a field being
substituted and checks whether it belongs to a materialized semi-join.
Depending on the check result a field to substitute for or NULL is returned.
The is_sj_materialization_strategy method is added to the JOIN_TAB class to
check whether JOIN_TAB belongs to a materialized semi-join.
=== modified file 'mysql-test/r/subselect3.result'
--- a/mysql-test/r/subselect3.result 2009-04-30 19:37:21 +0000
+++ b/mysql-test/r/subselect3.result 2009-07-29 16:18:49 +0000
@@ -1081,8 +1081,8 @@
insert into t3 select A.a + 10*B.a, 'filler' from t0 A, t0 B;
explain select * from t3 where a in (select a from t2) and (a > 5 or a < 10);
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 1
+1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 1 Using index condition
select * from t3 where a in (select a from t2);
a filler
1 filler
@@ -1129,8 +1129,8 @@
explain select * from t1, t3 where t3.a in (select a from t2) and (t3.a < 10 or t3.a >30) and t1.a =3;
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 10 Using where
-1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 10
+1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 10 Using index condition
explain select straight_join * from t1 A, t1 B where A.a in (select a from t2);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY A ALL NULL NULL NULL NULL 10 Using where
@@ -1158,14 +1158,14 @@
explain select * from t0, t3 where t3.a in (select a from t2) and (t3.a < 10 or t3.a >30);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t0 system NULL NULL NULL NULL 1
-1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 10
+1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 10 Using index condition
create table t4 as select a as x, a as y from t1;
explain select * from t0, t3 where (t3.a, t3.b) in (select x,y from t4) and (t3.a < 10 or t3.a >30);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t0 system NULL NULL NULL NULL 1
-1 PRIMARY t4 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t4.x 10 Using where
+1 PRIMARY t4 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t4.x 10 Using index condition; Using where
drop table t0,t1,t2,t3,t4;
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
=== modified file 'mysql-test/r/subselect3_jcl6.result'
--- a/mysql-test/r/subselect3_jcl6.result 2009-04-30 19:37:21 +0000
+++ b/mysql-test/r/subselect3_jcl6.result 2009-07-29 16:18:49 +0000
@@ -1086,8 +1086,8 @@
insert into t3 select A.a + 10*B.a, 'filler' from t0 A, t0 B;
explain select * from t3 where a in (select a from t2) and (a > 5 or a < 10);
id select_type table type possible_keys key key_len ref rows Extra
-1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 1 Using join buffer
+1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 1 Using index condition; Using join buffer
select * from t3 where a in (select a from t2);
a filler
1 filler
@@ -1134,8 +1134,8 @@
explain select * from t1, t3 where t3.a in (select a from t2) and (t3.a < 10 or t3.a >30) and t1.a =3;
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 10 Using where
-1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 10 Using join buffer
+1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 10 Using index condition; Using join buffer
explain select straight_join * from t1 A, t1 B where A.a in (select a from t2);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY A ALL NULL NULL NULL NULL 10 Using where
@@ -1163,14 +1163,14 @@
explain select * from t0, t3 where t3.a in (select a from t2) and (t3.a < 10 or t3.a >30);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t0 system NULL NULL NULL NULL 1
-1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t2.a 10 Using join buffer
+1 PRIMARY t2 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t2.a 10 Using index condition; Using join buffer
create table t4 as select a as x, a as y from t1;
explain select * from t0, t3 where (t3.a, t3.b) in (select x,y from t4) and (t3.a < 10 or t3.a >30);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t0 system NULL NULL NULL NULL 1
-1 PRIMARY t4 ALL NULL NULL NULL NULL 10 Using where; Materialize; Scan
-1 PRIMARY t3 ref a a 5 test.t4.x 10 Using where; Using join buffer
+1 PRIMARY t4 ALL NULL NULL NULL NULL 10 Materialize; Scan
+1 PRIMARY t3 ref a a 5 test.t4.x 10 Using index condition; Using where; Using join buffer
drop table t0,t1,t2,t3,t4;
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
=== modified file 'mysql-test/r/subselect_sj.result'
--- a/mysql-test/r/subselect_sj.result 2009-07-06 07:57:39 +0000
+++ b/mysql-test/r/subselect_sj.result 2009-07-29 16:18:49 +0000
@@ -372,3 +372,39 @@
3
2
drop table t1, t2, t3;
+#
+# Bug#45174: Incorrectly applied equality propagation caused wrong
+# result on a query with a materialized semi-join.
+#
+CREATE TABLE `CC` (
+`pk` int(11) NOT NULL AUTO_INCREMENT,
+`varchar_key` varchar(1) NOT NULL,
+`varchar_nokey` varchar(1) NOT NULL,
+PRIMARY KEY (`pk`),
+KEY `varchar_key` (`varchar_key`)
+);
+INSERT INTO `CC` VALUES (11,'m','m'),(12,'j','j'),(13,'z','z'),(14,'a','a'),(15,'',''),(16,'e','e'),(17,'t','t'),(19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
+CREATE TABLE `C` (
+`varchar_nokey` varchar(1) NOT NULL
+);
+INSERT INTO `C` VALUES ('v'),('u'),('n'),('l'),('h'),('u'),('n'),('j'),('k'),('e'),('i'),('u'),('n'),('b'),('x'),(''),('q'),('u');
+EXPLAIN EXTENDED SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY C ALL NULL NULL NULL NULL 18 100.00
+1 PRIMARY CC ALL varchar_key NULL NULL NULL 15 100.00 Using where; Materialize
+Warnings:
+Note 1003 select `test`.`C`.`varchar_nokey` AS `varchar_nokey` from `test`.`C` semi join (`test`.`CC`) where ((`test`.`CC`.`varchar_key` = `test`.`C`.`varchar_nokey`) and (`test`.`CC`.`varchar_nokey` = `test`.`CC`.`varchar_key`) and ((`test`.`CC`.`varchar_nokey` < 'n') xor `test`.`CC`.`pk`))
+SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+varchar_nokey
+DROP TABLE CC, C;
+# End of the test for bug#45174.
=== modified file 'mysql-test/r/subselect_sj_jcl6.result'
--- a/mysql-test/r/subselect_sj_jcl6.result 2009-07-06 07:57:39 +0000
+++ b/mysql-test/r/subselect_sj_jcl6.result 2009-07-29 16:18:49 +0000
@@ -376,6 +376,42 @@
3
2
drop table t1, t2, t3;
+#
+# Bug#45174: Incorrectly applied equality propagation caused wrong
+# result on a query with a materialized semi-join.
+#
+CREATE TABLE `CC` (
+`pk` int(11) NOT NULL AUTO_INCREMENT,
+`varchar_key` varchar(1) NOT NULL,
+`varchar_nokey` varchar(1) NOT NULL,
+PRIMARY KEY (`pk`),
+KEY `varchar_key` (`varchar_key`)
+);
+INSERT INTO `CC` VALUES (11,'m','m'),(12,'j','j'),(13,'z','z'),(14,'a','a'),(15,'',''),(16,'e','e'),(17,'t','t'),(19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
+CREATE TABLE `C` (
+`varchar_nokey` varchar(1) NOT NULL
+);
+INSERT INTO `C` VALUES ('v'),('u'),('n'),('l'),('h'),('u'),('n'),('j'),('k'),('e'),('i'),('u'),('n'),('b'),('x'),(''),('q'),('u');
+EXPLAIN EXTENDED SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY C ALL NULL NULL NULL NULL 18 100.00
+1 PRIMARY CC ALL varchar_key NULL NULL NULL 15 100.00 Using where; Materialize
+Warnings:
+Note 1003 select `test`.`C`.`varchar_nokey` AS `varchar_nokey` from `test`.`C` semi join (`test`.`CC`) where ((`test`.`CC`.`varchar_key` = `test`.`C`.`varchar_nokey`) and (`test`.`CC`.`varchar_nokey` = `test`.`CC`.`varchar_key`) and ((`test`.`CC`.`varchar_nokey` < 'n') xor `test`.`CC`.`pk`))
+SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+varchar_nokey
+DROP TABLE CC, C;
+# End of the test for bug#45174.
set join_cache_level=default;
show variables like 'join_cache_level';
Variable_name Value
=== modified file 'mysql-test/t/subselect_sj.test'
--- a/mysql-test/t/subselect_sj.test 2009-07-06 07:57:39 +0000
+++ b/mysql-test/t/subselect_sj.test 2009-07-29 16:18:49 +0000
@@ -22,7 +22,6 @@
create table t12 like t10;
insert into t12 select * from t10;
-
--echo Flattened because of dependency, t10=func(t1)
explain select * from t1 where a in (select pk from t10);
select * from t1 where a in (select pk from t10);
@@ -252,3 +251,43 @@
where a in (select c from t2 where d >= some(select e from t3 where b=e));
drop table t1, t2, t3;
+
+--echo #
+--echo # Bug#45174: Incorrectly applied equality propagation caused wrong
+--echo # result on a query with a materialized semi-join.
+--echo #
+
+CREATE TABLE `CC` (
+ `pk` int(11) NOT NULL AUTO_INCREMENT,
+ `varchar_key` varchar(1) NOT NULL,
+ `varchar_nokey` varchar(1) NOT NULL,
+ PRIMARY KEY (`pk`),
+ KEY `varchar_key` (`varchar_key`)
+);
+
+INSERT INTO `CC` VALUES (11,'m','m'),(12,'j','j'),(13,'z','z'),(14,'a','a'),(15,'',''),(16,'e','e'),(17,'t','t'),(19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
+
+CREATE TABLE `C` (
+ `varchar_nokey` varchar(1) NOT NULL
+);
+
+INSERT INTO `C` VALUES ('v'),('u'),('n'),('l'),('h'),('u'),('n'),('j'),('k'),('e'),('i'),('u'),('n'),('b'),('x'),(''),('q'),('u');
+
+EXPLAIN EXTENDED SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+
+SELECT varchar_nokey
+FROM C
+WHERE ( `varchar_nokey` , `varchar_nokey` ) IN (
+SELECT `varchar_key` , `varchar_nokey`
+FROM CC
+WHERE `varchar_nokey` < 'n' XOR `pk` ) ;
+
+DROP TABLE CC, C;
+
+--echo # End of the test for bug#45174.
+
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-07-06 07:57:39 +0000
+++ b/sql/item.cc 2009-07-29 16:18:49 +0000
@@ -4895,7 +4895,7 @@
return this;
return const_item;
}
- Item_field *subst= item_equal->get_first();
+ Item_field *subst= item_equal->get_first(this);
if (subst && field->table != subst->field->table && !field->eq(subst->field))
return subst;
}
=== modified file 'sql/item_cmpfunc.cc'
--- a/sql/item_cmpfunc.cc 2009-07-06 07:57:39 +0000
+++ b/sql/item_cmpfunc.cc 2009-07-29 16:18:49 +0000
@@ -5377,7 +5377,7 @@
void Item_equal::fix_length_and_dec()
{
- Item *item= get_first();
+ Item *item= get_first(NULL);
eval_item= cmp_item::get_comparator(item->result_type(),
item->collation.collation);
}
@@ -5440,3 +5440,107 @@
str->append(')');
}
+
+/*
+ @brief Get the first field of multiple equality.
+ @param[in] field the field to get equal field to
+
+ @details Get the first field of multiple equality that is equal to the
+ given field. In order to make semi-join materialization strategy work
+ correctly we can't propagate equal fields from upper select to the semi-join.
+ Thus the fields is returned according to following rules:
+
+ 1) If the given field belongs to a semi-join then the first field in
+ multiple equality which belong to the same semi-join is returned.
+ Otherwise NULL is returned.
+ 2) If no field is given or the field doesn't belong to a semi-join then
+ the first field in the multiple equality is returned.
+
+ @retval Found first field in the multiple equality.
+ @retval 0 if no field found.
+*/
+
+Item_field* Item_equal::get_first(Item_field *field)
+{
+ List_iterator<Item_field> it(fields);
+ Item_field *item;
+ JOIN_TAB *field_tab;
+
+ if (!field)
+ return fields.head();
+ /*
+ Of all equal fields, return the first one we can use. Normally, this is the
+ field which belongs to the table that is the first in the join order.
+
+ There is one exception to this: When semi-join materialization strategy is
+ used, and the given field belongs to a table within the semi-join nest, we
+ must pick the first field in the semi-join nest.
+
+ Example: suppose we have a join order:
+
+ ot1 ot2 SJ-Mat(it1 it2 it3) ot3
+
+ and equality ot2.col = it1.col = it2.col
+ If we're looking for best substitute for 'it2.col', we should pick it1.col
+ and not ot2.col.
+ */
+
+ field_tab= field->field->table->reginfo.join_tab;
+ if (field_tab->is_sj_materialization_strategy())
+ {
+ /*
+ It's a field from an materialized semi-join. We can substitute it only
+ for a field from the same semi-join.
+ */
+ JOIN_TAB *first;
+ JOIN *join= field_tab->join;
+ uint tab_idx= field_tab - field_tab->join->join_tab;
+ /* Find first table of this semi-join. */
+ for (int i=tab_idx; i >= join->const_tables; i--)
+ {
+ if (join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE ||
+ join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
+ first= join->join_tab + i;
+ else
+ // Found first tab that doesn't belong to current SJ.
+ break;
+ }
+ /* Find an item to substitute for. */
+ while ((item= it++))
+ {
+ if (item->field->table->reginfo.join_tab >= first)
+ {
+ /*
+ If we found given field then return NULL to avoid unnecessary
+ substitution.
+ */
+ return (item != field) ? item : NULL;
+ }
+ }
+ }
+ else
+ {
+ /*
+ The field is not in SJ-Materialization nest. We must return the first field
+ that's not embedded in a SJ-Materialization nest.
+ Example: suppose we have a join order:
+
+ SJ-Mat(it1 it2) ot1 ot2
+
+ and equality ot2.col = ot1.col = it2.col
+ If we're looking for best substitute for 'ot2.col', we should pick ot1.col
+ and not it2.col, because when we run a join between ot1 and ot2
+ execution of SJ-Mat(...) has already finished and we can't rely on the
+ value of it*.*.
+ */
+ while ((item= it++))
+ {
+ field_tab= item->field->table->reginfo.join_tab;
+ if (!field_tab->is_sj_materialization_strategy())
+ return item;
+ }
+ }
+ // Shouldn't get here.
+ DBUG_ASSERT(0);
+ return NULL;
+}
=== modified file 'sql/item_cmpfunc.h'
--- a/sql/item_cmpfunc.h 2009-07-06 07:57:39 +0000
+++ b/sql/item_cmpfunc.h 2009-07-29 16:18:49 +0000
@@ -1593,7 +1593,7 @@
void add(Item_field *f);
uint members();
bool contains(Field *field);
- Item_field* get_first() { return fields.head(); }
+ Item_field* get_first(Item_field *field);
void merge(Item_equal *item);
void update_const();
enum Functype functype() const { return MULT_EQUAL_FUNC; }
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-07-06 14:33:29 +0000
+++ b/sql/sql_select.cc 2009-07-29 16:18:49 +0000
@@ -10379,6 +10379,21 @@
/**
+ Check whether the JOIN_TAB belongs to a materialized semi-join.
+*/
+
+bool JOIN_TAB::is_sj_materialization_strategy()
+{
+ uint tab_idx= this - join->join_tab;
+
+ return (emb_sj_nest &&
+ ((join->best_positions[tab_idx].sj_strategy == SJ_OPT_MATERIALIZE ||
+ join->best_positions[tab_idx].sj_strategy == SJ_OPT_MATERIALIZE_SCAN)));
+
+}
+
+
+/**
Partially cleanup JOIN after it has executed: close index or rnd read
(table cursors), free quick selects.
@@ -11720,7 +11735,7 @@
head= item_const;
else
{
- head= item_equal->get_first();
+ head= item_equal->get_first(NULL);
it++;
}
Item_field *item_field;
=== modified file 'sql/sql_select.h'
--- a/sql/sql_select.h 2009-05-07 20:48:24 +0000
+++ b/sql/sql_select.h 2009-07-29 16:18:49 +0000
@@ -332,6 +332,7 @@
return first_inner;
return first_sj_inner_tab;
}
+ bool is_sj_materialization_strategy();
} JOIN_TAB;
/*
1
0
Hi all,
Last week I pushed a new report for buildbot live and after a few tweaks wanted to send
out an email on it.
The report is located here:
http://askmonty.org/buildbot/reports/
At the moment we only have the "cross reference" report, more reports will be added in the
future. The cross reference report lists all test failures matching your search (to view
all data just click "search" with no parameters"). You can click on a row to display the
full test failure text.
Currently there are only a few test failures listed but all future failures will appear on
this report.
If you have any questions or comments feel free to email the list, me directly or hop on
#maria
Best Regards,
--
Bryan Alsdorf, Lead Web Developer
Monty Program, AB. http://askmonty.org
3
3
[Maria-developers] Rev 2717: WL#4800: Optimizer trace in file:///home/psergey/dev/maria-5.1-opt-trace/
by Sergey Petrunya 23 Jul '09
by Sergey Petrunya 23 Jul '09
23 Jul '09
At file:///home/psergey/dev/maria-5.1-opt-trace/
------------------------------------------------------------
revno: 2717
revision-id: psergey(a)askmonty.org-20090723174522-99j6if4ay9r341qg
parent: knielsen(a)knielsen-hq.org-20090707111924-e44ycwmckomk13qz
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-opt-trace
timestamp: Thu 2009-07-23 21:45:22 +0400
message:
WL#4800: Optimizer trace
- Port current state to MariaDB
Diff too large for email (1909 lines, the limit is 1000).
1
0
[Maria-developers] Rev 2717: WL#4800: Optimizer trace in file:///home/psergey/dev/maria-5.1-opt-trace/
by Sergey Petrunya 23 Jul '09
by Sergey Petrunya 23 Jul '09
23 Jul '09
At file:///home/psergey/dev/maria-5.1-opt-trace/
------------------------------------------------------------
revno: 2717
revision-id: psergey(a)askmonty.org-20090723174047-982pmyty704c5bgu
parent: knielsen(a)knielsen-hq.org-20090707111924-e44ycwmckomk13qz
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-opt-trace
timestamp: Thu 2009-07-23 21:40:47 +0400
message:
WL#4800: Optimizer trace
- Port of current state to mariadb-5.1
Diff too large for email (1354 lines, the limit is 1000).
1
0
[Maria-developers] Updated (by Guest): Table elimination: all tasks (29)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination: all tasks
CREATION DATE..: Wed, 03 Jun 2009, 12:07
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 29 (http://askmonty.org/worklog/?tid=29)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 23 Jul 2009, 20:13)=-=-
Version updated.
--- /tmp/wklog.29.old.17550 2009-07-23 20:13:44.000000000 +0300
+++ /tmp/wklog.29.new.17550 2009-07-23 20:13:44.000000000 +0300
@@ -1 +1 @@
-Server-4.0
+Server-5.1
-=-=(Guest - Thu, 23 Jul 2009, 20:09)=-=-
Version updated.
--- /tmp/wklog.29.old.17326 2009-07-23 20:09:38.000000000 +0300
+++ /tmp/wklog.29.new.17326 2009-07-23 20:09:38.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-4.0
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 16 Jun 2009, 17:03)=-=-
Dependency deleted: 29 no longer depends on 20
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 20
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
DESCRIPTION:
This WL entry groups all table elimination tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination: all tasks (29)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination: all tasks
CREATION DATE..: Wed, 03 Jun 2009, 12:07
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 29 (http://askmonty.org/worklog/?tid=29)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 23 Jul 2009, 20:13)=-=-
Version updated.
--- /tmp/wklog.29.old.17550 2009-07-23 20:13:44.000000000 +0300
+++ /tmp/wklog.29.new.17550 2009-07-23 20:13:44.000000000 +0300
@@ -1 +1 @@
-Server-4.0
+Server-5.1
-=-=(Guest - Thu, 23 Jul 2009, 20:09)=-=-
Version updated.
--- /tmp/wklog.29.old.17326 2009-07-23 20:09:38.000000000 +0300
+++ /tmp/wklog.29.new.17326 2009-07-23 20:09:38.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-4.0
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 16 Jun 2009, 17:03)=-=-
Dependency deleted: 29 no longer depends on 20
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 20
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
DESCRIPTION:
This WL entry groups all table elimination tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination: all tasks (29)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination: all tasks
CREATION DATE..: Wed, 03 Jun 2009, 12:07
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 29 (https://askmonty.org/worklog/?tid=29)
VERSION........: Server-4.0
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 23 Jul 2009, 20:09)=-=-
Version updated.
--- /tmp/wklog.29.old.17326 2009-07-23 20:09:38.000000000 +0300
+++ /tmp/wklog.29.new.17326 2009-07-23 20:09:38.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-4.0
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 16 Jun 2009, 17:03)=-=-
Dependency deleted: 29 no longer depends on 20
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 20
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
DESCRIPTION:
This WL entry groups all table elimination tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination: all tasks (29)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination: all tasks
CREATION DATE..: Wed, 03 Jun 2009, 12:07
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 29 (https://askmonty.org/worklog/?tid=29)
VERSION........: Server-4.0
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 23 Jul 2009, 20:09)=-=-
Version updated.
--- /tmp/wklog.29.old.17326 2009-07-23 20:09:38.000000000 +0300
+++ /tmp/wklog.29.new.17326 2009-07-23 20:09:38.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-4.0
-=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 16 Jun 2009, 17:03)=-=-
Dependency deleted: 29 no longer depends on 20
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 20
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
DESCRIPTION:
This WL entry groups all table elimination tasks.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Monty): Table elimination (17)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
------------------------------------------------------------
-=-=(View All Progress Notes, 24 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Monty): Table elimination (17)
by worklog-noreply@askmonty.org 23 Jul '09
by worklog-noreply@askmonty.org 23 Jul '09
23 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=-
Version updated.
--- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300
+++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
------------------------------------------------------------
-=-=(View All Progress Notes, 24 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Hi,
I am in the process of setting up Eventum as the bug tracker at Monty Program AB to track
bugs in MariaDB. I would like input on the values we need for the following fields:
* Status
* Category
* Operating Systems
Also please suggest any additional fields you think should be in this system. The current
fields I have are:
* Summary
* Severity
* Category
* Status
* Operating System
* Description
* How to Repeat
* Suggested Fix
* Product
* Product Version
* BZR Tree (URL to relevant tree with bug fix or that illustrates the problem)
Some of these fields (and my current values for them) are inspired by bugs.mysql.com.
However, since we are setting up a brand new system we should try make it fit our needs
instead of doing what others are doing.
Any suggestions or comments are appreciated, once I finish the initial setup I will get a
test site up for everyone to look over.
Best Regards,
--
Bryan Alsdorf, Lead Web Developer
Monty Program, AB. http://askmonty.org
8
15
[Maria-developers] Updated (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462 2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462 2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+ and a parser for text form of .proto file which then exposes the parsed
+ file via standard GPB message navigation API.
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
LOW-LEVEL DESIGN:
* GPB tarball contains a protocol definition for .proto file structure itself
and a parser for text form of .proto file which then exposes the parsed
file via standard GPB message navigation API.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399 2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399 2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+ " the name and declared type for each field can only be determined on the
+ decoding end by referencing the message type's definition (i.e. the .proto
+ file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format.
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.
+- Return type of the function will be determined from GPB message definition.
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns.
+This approach has problems:
+- It might be hard to implement code-wise
+ - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow
+ quoted names like `foo/bar[2]/baz' ?
+
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
HIGH-LEVEL SPECIFICATION:
<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>
1. GPB Encoding overview
========================
GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to
iterate over its parts, but, quoting the spec
http://code.google.com/apis/protocolbuffers/docs/encoding.html:
" the name and declared type for each field can only be determined on the
decoding end by referencing the message type's definition (i.e. the .proto
file). "
2. GPB in an SQL database
=========================
It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format.
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server
traffic right away, and will open path to getting the best possible
performance.
2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB
encoding overview" section).
So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.
DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.
2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.
2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.
- Return type of the function will be determined from GPB message definition.
- For path, we can use XPath selector (a subset of XPath) syntax.
(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)
2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns.
This approach has problems:
- It might be hard to implement code-wise
- (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow
quoted names like `foo/bar[2]/baz' ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Psergey): Add support for google protocol buffers (34)
by worklog-noreply@askmonty.org 21 Jul '09
by worklog-noreply@askmonty.org 21 Jul '09
21 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Add support for Google Protocol Buffers (further GPB). It should be possible
to have columns that store GPB-encoded data, as well as use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.
Any support for indexing GPB data is outside of scope of this WL entry.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 20 Jul '09
by worklog-noreply@askmonty.org 20 Jul '09
20 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
------------------------------------------------------------
-=-=(View All Progress Notes, 23 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Progress (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 20 Jul '09
by worklog-noreply@askmonty.org 20 Jul '09
20 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 1
ESTIMATE.......: 3 (hours remain)
ORIG. ESTIMATE.: 3
PROGRESS NOTES:
-=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=-
deukje weg
Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours).
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
------------------------------------------------------------
-=-=(View All Progress Notes, 23 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Hi all,
Just a heads up, please don't merge in this change from MySQL 5.4.4
See http://openquery.com/blog/type-disappears-mysql-544 for
background, also http://bugs.mysql.com/bug.php?id=17501
Sergei Golubchik is still assigned to that bug, btw ;-)
Cheers,
Arjen.
--
Arjen Lentz, Director @ Open Query (http://openquery.com)
Exceptional Services for MySQL at a fixed budget.
Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 17 Jul '09
by worklog-noreply@askmonty.org 17 Jul '09
17 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
------------------------------------------------------------
-=-=(View All Progress Notes, 22 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 17 Jul '09
by worklog-noreply@askmonty.org 17 Jul '09
17 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300
+++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300
@@ -1 +1 @@
-9.x
+Server-9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
------------------------------------------------------------
-=-=(View All Progress Notes, 22 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 17 Jul '09
by worklog-noreply@askmonty.org 17 Jul '09
17 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: 9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
------------------------------------------------------------
-=-=(View All Progress Notes, 21 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 17 Jul '09
by worklog-noreply@askmonty.org 17 Jul '09
17 Jul '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Client-BackLog
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: 9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Version updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+9.x
-=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=-
Category updated.
--- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300
+++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300
@@ -1 +1 @@
-Server-Sprint
+Client-BackLog
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
------------------------------------------------------------
-=-=(View All Progress Notes, 21 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Re: [Maria-developers] [Merge] lp:~maria-captains/maria/maria-xtradb into lp:maria
by Kristian Nielsen 10 Jul '09
by Kristian Nielsen 10 Jul '09
10 Jul '09
Percona <launchpad(a)percona.com> writes:
> Percona has proposed merging lp:~maria-captains/maria/maria-xtradb into lp:maria.
>
> Requested reviews:
> Maria-captains (maria-captains)
>
> Proposal to merge replacement InnoDB->XtraDB
Thanks a lot for your efforts in this!
I branched the tree and took a look. There are a couple of issues that I think
need to be resolved before we can merge it into MariaDB. I have some questions
below, but please don't hesitate to ask me for any kind of help needed to move
this forward.
> === modified file 'storage/innobase/include/sync0rw.h'
> --- storage/innobase/include/sync0rw.h 2008-02-19 16:44:09 +0000
> +++ storage/innobase/include/sync0rw.h 2009-03-31 04:19:17 +0000
> +#ifndef INNODB_RW_LOCKS_USE_ATOMICS
> +#error INNODB_RW_LOCKS_USE_ATOMICS is not defined. Do you use enough new GCC or compatibles?
> +#error Or do you use exact options for CFLAGS?
> +#error e.g. (for x86_32): "-m32 -march=i586 -mtune=i686"
> +#error e.g. (for Sparc_64): "-m64 -mcpu=v9"
> +#error Otherwise, this build may be slower than normal version.
> +#endif
> +
My attempt to build (BUILD/compile-pentium64-max) failed with this
error. There were also other build errors (I assume places where the
atomics-using code has not been extended with a part that works without the
availability of atomics).
The reason is that in the MariaDB tree, HAVE_GCC_ATOMIC_BUILTINS is disabled,
which caused XtraDB to disable INNODB_RW_LOCKS_USE_ATOMICS, which triggers the
above error.
I think I understand why this makes sense for Percona, after all using these
better synchronisation primitives is part of the reason for using the Percona
server in the first place.
Can you tell me if Percona has decided not to maintain XtraDB working without
the availability of atomic operations? Or if it is just an oversight?
I need to discuss with other MariaDB people whether XtraDB for MariaDB should
be maintained working without the atomic operations (if so, we should of
course be willing to do the work/effort required).
So, any thoughts about the best way to deal with this? Should the above #error
be removed and XtraDB extended to work without atomics in the MariaDB tree?
And is this something Percona wants to do, or should I look into it?
Also, Sergei Golubchik told me that HAVE_GCC_ATOMIC_BUILTINS is for
my_atomic_ops, and InnoDB/XtraDB shouldn't really be using it. But I need to
look more into the code to understand what the problem is, if any.
> === added directory 'storage/innobase/mysql-test'
> === added directory 'storage/innobase/mysql-test/patches'
When I ran the test suite, I got test failures in test main.innodb.
I see that the patch contains patches for main MySQL test cases in
mysql-test/t/*.test, and also seems to add separate test cases in the
storage/innobase/mysql-test/ directory.
Do you know what the status is of these test suite modifications? Do the
patches need to be applied to the existing test suite, and/or should the extra
test cases be used to add to/overwrite the existing tests?
We would need to get the test suite to run without problems before
merging. Does Percona run the test suite with no failures? Can you suggest
which directions I should work in to solve the test failures? Ie. I'm unsure
to what extent the extra test cases/patches have already been applied in main
MySQL sources, and whether failures are expected or are just due to not being
adapted for current MariaDB source changes.
Any help with the above would be great. I plan to continue working with you on
this so we can get it merged without unnecessary delays.
- Kristian.
3
17
So, I've been making a fair bit of changes around bits of the storage
engine API (by all accounts for the better) in Drizzle.
The idea being to move the handler to be a cursor on a table, with
actions not pertaining that to reside in StorageEngine (e.g. DDL).
There's also the (now rather old) change to drop table return code.
The next thing that will move into the StorageEngine is metadata
handling with the engine being able to be responsible for its own
(table) metadata.
This is well and truly increasing the differences between MySQL/MariaDB
and Drizzle in this area of code - increasing the work needed to port an
engine (either way).
I would guess it makes little sense for MySQL and MariaDB to diverge
here, although I have been (and continue to be) okay with Drizzle
diverging.
So, is there somebody interested in working with me to have the
MySQL/MariaDB API evolve in the same way?
--
Stewart Smith
7
14
[Maria-developers] Rev 2819: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next-fix-subq/
by Sergey Petrunya 08 Jul '09
by Sergey Petrunya 08 Jul '09
08 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq/
------------------------------------------------------------
revno: 2819
revision-id: psergey(a)askmonty.org-20090708174703-dz9uf5b0m6pcvtl6
parent: psergey(a)askmonty.org-20090708095341-9i08n2r8igulpxzz
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq
timestamp: Wed 2009-07-08 21:47:03 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
Make the fix work with prepared statements:
- in previous cset changed calloc to alloc, forgot to add bzero.
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-07-08 09:53:41 +0000
+++ b/sql/item_subselect.cc 2009-07-08 17:47:03 +0000
@@ -180,10 +180,11 @@
if (!ancestor_used_tables)
{
set_depth();
- if (!(ancestor_used_tables=
- (table_map*)alloc_root(thd->stmt_arena->mem_root,
- (1+depth)*sizeof(table_map))))
+ size_t size= (1+depth) * sizeof(table_map);
+ if (!(ancestor_used_tables= (table_map*)
+ alloc_root(thd->stmt_arena->mem_root, size)))
return TRUE;
+ bzero(ancestor_used_tables, size);
furthest_correlated_ancestor= 0;
inside_first_fix_fields= TRUE;
}
@@ -258,7 +259,7 @@
is_correlated= TRUE;
furthest_correlated_ancestor= max(furthest_correlated_ancestor, n_levels);
if (n_levels > 1)
- ancestor_used_tables[n_levels - 2]= dep_map;
+ ancestor_used_tables[n_levels - 2] |= dep_map;
}
}
1
0
[Maria-developers] Rev 2727: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim/
by Sergey Petrunya 08 Jul '09
by Sergey Petrunya 08 Jul '09
08 Jul '09
At file:///home/psergey/dev/maria-5.1-table-elim/
------------------------------------------------------------
revno: 2727
revision-id: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635
parent: psergey(a)askmonty.org-20090630132018-8qwou8bqiq5z1qjg
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim
timestamp: Wed 2009-07-08 21:10:38 +0400
message:
MWL#17: Table elimination
- When collecting Item_subselect::refers_to, put references to the correct
subselect entry.
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-06-22 11:46:31 +0000
+++ b/sql/sql_lex.cc 2009-07-08 17:10:38 +0000
@@ -1780,6 +1780,7 @@
void st_select_lex::mark_as_dependent(st_select_lex *last, Item *dependency)
{
+ SELECT_LEX *next_to_last;
/*
Mark all selects from resolved to 1 before select where was
found table as depended (of select where was found table)
@@ -1787,6 +1788,7 @@
for (SELECT_LEX *s= this;
s && s != last;
s= s->outer_select())
+ {
if (!(s->uncacheable & UNCACHEABLE_DEPENDENT))
{
// Select is dependent of outer select
@@ -1802,10 +1804,12 @@
sl->uncacheable|= UNCACHEABLE_UNITED;
}
}
+ next_to_last= s;
+ }
is_correlated= TRUE;
this->master_unit()->item->is_correlated= TRUE;
if (dependency)
- this->master_unit()->item->refers_to.push_back(dependency);
+ next_to_last->master_unit()->item->refers_to.push_back(dependency);
}
bool st_select_lex_node::set_braces(bool value) { return 1; }
1
0
[Maria-developers] Rev 2818: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next-fix-subq/
by Sergey Petrunya 08 Jul '09
by Sergey Petrunya 08 Jul '09
08 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq/
------------------------------------------------------------
revno: 2818
revision-id: psergey(a)askmonty.org-20090708095341-9i08n2r8igulpxzz
parent: psergey(a)askmonty.org-20090706143329-72s3e73rov2f5tml
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq
timestamp: Wed 2009-07-08 13:53:41 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
Make the fix work with prepared statements:
- collect/save ancestor_used_tables and furthest_correlated_ancestor only at
PREPARE phase (at execute() we are unable to tell what table_map the outer
reference would tell. since it would be the same anyway, we save it at
PREPARE phase)
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-07-06 14:26:03 +0000
+++ b/sql/item_subselect.cc 2009-07-08 09:53:41 +0000
@@ -39,8 +39,8 @@
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), inside_fix_fields(0), engine_changed(0), changed(0),
- is_correlated(FALSE)
+ const_item_cache(1), inside_first_fix_fields(0), ancestor_used_tables(0),
+ engine_changed(0), changed(0), is_correlated(FALSE)
{
with_subselect= 1;
reset();
@@ -158,6 +158,7 @@
DBUG_RETURN(RES_OK);
}
+
void Item_subselect::set_depth()
{
uint n= 0;
@@ -166,6 +167,7 @@
this->depth= n - 1;
}
+
bool Item_subselect::fix_fields(THD *thd_param, Item **ref)
{
char const *save_where= thd_param->where;
@@ -175,23 +177,25 @@
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
- if (!inside_fix_fields)
+ if (!ancestor_used_tables)
{
set_depth();
- if (!(ancestor_used_tables= (table_map*)thd->calloc((1+depth) *
- sizeof(table_map))))
+ if (!(ancestor_used_tables=
+ (table_map*)alloc_root(thd->stmt_arena->mem_root,
+ (1+depth)*sizeof(table_map))))
return TRUE;
furthest_correlated_ancestor= 0;
+ inside_first_fix_fields= TRUE;
}
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
- inside_fix_fields++;
res= engine->prepare();
-
+
// all transformation is done (used by prepared statements)
changed= 1;
+ inside_first_fix_fields= FALSE;
if (!res)
{
@@ -220,14 +224,12 @@
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
- inside_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
- inside_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -244,12 +246,23 @@
fixed= 1;
err:
- inside_fix_fields--;
thd->where= save_where;
return res;
}
+void Item_subselect::mark_as_dependent(uint n_levels, table_map dep_map)
+{
+ if (inside_first_fix_fields)
+ {
+ is_correlated= TRUE;
+ furthest_correlated_ancestor= max(furthest_correlated_ancestor, n_levels);
+ if (n_levels > 1)
+ ancestor_used_tables[n_levels - 2]= dep_map;
+ }
+}
+
+
/*
Adjust attributes after our parent select has been merged into grandparent
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2009-07-06 07:57:39 +0000
+++ b/sql/item_subselect.h 2009-07-08 09:53:41 +0000
@@ -68,7 +68,7 @@
/* cache of constant state */
bool const_item_cache;
- int inside_fix_fields;
+ int inside_first_fix_fields;
public:
/*
Depth of the subquery predicate.
@@ -140,6 +140,7 @@
return null_value;
}
bool fix_fields(THD *thd, Item **ref);
+ void mark_as_dependent(uint n_levels, table_map dep_map);
void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
Item **ref);
virtual bool exec();
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-07-06 07:57:39 +0000
+++ b/sql/sql_lex.cc 2009-07-08 09:53:41 +0000
@@ -1929,13 +1929,7 @@
}
Item_subselect *subquery_predicate= s->master_unit()->item;
if (subquery_predicate)
- {
- subquery_predicate->is_correlated= TRUE;
- subquery_predicate->furthest_correlated_ancestor=
- max(subquery_predicate->furthest_correlated_ancestor, n_levels);
- if (n_levels > 1)
- subquery_predicate->ancestor_used_tables[n_levels - 2]= dep_map;
- }
+ subquery_predicate->mark_as_dependent(n_levels, dep_map);
n_levels--;
}
}
1
0
07 Jul '09
Toby Thain has proposed merging lp:~qu1j0t3/maria/solaris10-port into lp:maria.
Requested reviews:
Kristian Nielsen (knielsen)
Added build scripts for 32 bit x86 architecture on Solaris. Renamed some scripts for consistency. Changed to dynamic linking of libgcc.
--
https://code.launchpad.net/~qu1j0t3/maria/solaris10-port/+merge/6999
Your team Maria developers is subscribed to branch lp:maria.
2
4
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2716: Solaris 10 build script fixes by Toby Thain.
by noreply@launchpad.net 07 Jul '09
by noreply@launchpad.net 07 Jul '09
07 Jul '09
------------------------------------------------------------
revno: 2716
committer: knielsen(a)knielsen-hq.org
branch nick: mariadb-solaris10-port-merge
timestamp: Tue 2009-07-07 13:19:24 +0200
message:
Solaris 10 build script fixes by Toby Thain.
Added build scripts for 32 bit x86 architecture on Solaris.
Renamed some scripts for consistency.
Changed to dynamic linking of libgcc.
removed:
BUILD/compile-solaris-amd64-forte-debug
added:
BUILD/compile-solaris-amd64-debug-forte
BUILD/compile-solaris-x86-32
BUILD/compile-solaris-x86-32-debug
BUILD/compile-solaris-x86-32-debug-forte
BUILD/compile-solaris-x86-forte-32
modified:
BUILD/compile-solaris-amd64
BUILD/compile-solaris-amd64-debug
=== modified file 'BUILD/compile-solaris-amd64'
--- BUILD/compile-solaris-amd64 2009-05-09 04:01:53 +0000
+++ BUILD/compile-solaris-amd64 2009-07-07 11:19:24 +0000
@@ -26,7 +26,7 @@
extra_flags="$amd64_cflags -D__sun -m64 -mtune=athlon64"
extra_configs="$amd64_configs $max_configs --with-libevent"
-LDFLAGS="-lmtmalloc -static-libgcc"
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib/64"
export LDFLAGS
. "$path/FINISH.sh"
=== modified file 'BUILD/compile-solaris-amd64-debug'
--- BUILD/compile-solaris-amd64-debug 2009-05-09 04:01:53 +0000
+++ BUILD/compile-solaris-amd64-debug 2009-07-07 11:19:24 +0000
@@ -5,7 +5,7 @@
extra_flags="$amd64_cflags -D__sun -m64 -mtune=athlon64 $debug_cflags"
extra_configs="$amd64_configs $debug_configs $max_configs --with-libevent"
-LDFLAGS="-lmtmalloc -static-libgcc"
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib/64"
export LDFLAGS
. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-amd64-debug-forte'
--- BUILD/compile-solaris-amd64-debug-forte 1970-01-01 00:00:00 +0000
+++ BUILD/compile-solaris-amd64-debug-forte 2009-07-07 11:19:24 +0000
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+# Take only #define options - the others are gcc specific.
+# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
+DEFS=""
+for F in $debug_cflags ; do
+ expr "$F" : "^-D" && DEFS="$DEFS $F"
+done
+debug_cflags="-O0 -g $DEFS"
+
+extra_flags="-m64 -mt -D_FORTEC_ -xlibmopt -fns=no $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
=== removed file 'BUILD/compile-solaris-amd64-forte-debug'
--- BUILD/compile-solaris-amd64-forte-debug 2009-05-09 04:01:53 +0000
+++ BUILD/compile-solaris-amd64-forte-debug 1970-01-01 00:00:00 +0000
@@ -1,27 +0,0 @@
-#!/bin/sh
-
-path=`dirname $0`
-. "$path/SETUP.sh"
-
-# Take only #define options - the others are gcc specific.
-# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
-DEFS=""
-for F in $debug_cflags ; do
- expr "$F" : "^-D" && DEFS="$DEFS $F"
-done
-debug_cflags="-O0 -g $DEFS"
-
-extra_flags="-m64 -mt -D_FORTEC_ -xlibmopt -fns=no $debug_cflags"
-extra_configs="$max_configs --with-libevent $debug_configs"
-
-warnings=""
-c_warnings=""
-cxx_warnings=""
-base_cxxflags="-noex"
-
-CC=cc
-CFLAGS="-xstrconst"
-CXX=CC
-LDFLAGS="-lmtmalloc"
-
-. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32'
--- BUILD/compile-solaris-x86-32 1970-01-01 00:00:00 +0000
+++ BUILD/compile-solaris-x86-32 2009-07-07 11:19:24 +0000
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+extra_flags="-D__sun -m32"
+extra_configs="$max_configs --with-libevent"
+
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib"
+export LDFLAGS
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32-debug'
--- BUILD/compile-solaris-x86-32-debug 1970-01-01 00:00:00 +0000
+++ BUILD/compile-solaris-x86-32-debug 2009-07-07 11:19:24 +0000
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+extra_flags="-D__sun -m32 $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib"
+export LDFLAGS
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32-debug-forte'
--- BUILD/compile-solaris-x86-32-debug-forte 1970-01-01 00:00:00 +0000
+++ BUILD/compile-solaris-x86-32-debug-forte 2009-07-07 11:19:24 +0000
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+# Take only #define options - the others are gcc specific.
+# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
+DEFS=""
+for F in $debug_cflags ; do
+ expr "$F" : "^-D" && DEFS="$DEFS $F"
+done
+debug_cflags="-O0 -g $DEFS"
+
+extra_flags="-m32 -mt -D_FORTEC_ -xbuiltin=%all -xlibmil -xlibmopt -fns=no -xprefetch=auto -xprefetch_level=3 $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-forte-32'
--- BUILD/compile-solaris-x86-forte-32 1970-01-01 00:00:00 +0000
+++ BUILD/compile-solaris-x86-forte-32 2009-07-07 11:19:24 +0000
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+extra_flags="-m32 -mt -D_FORTEC_ -xbuiltin=%all -xlibmil -xlibmopt -fns=no -xprefetch=auto -xprefetch_level=3"
+extra_configs="$max_configs --with-libevent"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2716)
by knielsen@knielsen-hq.org 07 Jul '09
by knielsen@knielsen-hq.org 07 Jul '09
07 Jul '09
#At lp:maria
2716 knielsen(a)knielsen-hq.org 2009-07-07
Solaris 10 build script fixes by Toby Thain.
Added build scripts for 32 bit x86 architecture on Solaris.
Renamed some scripts for consistency.
Changed to dynamic linking of libgcc.
removed:
BUILD/compile-solaris-amd64-forte-debug
added:
BUILD/compile-solaris-amd64-debug-forte
BUILD/compile-solaris-x86-32
BUILD/compile-solaris-x86-32-debug
BUILD/compile-solaris-x86-32-debug-forte
BUILD/compile-solaris-x86-forte-32
modified:
BUILD/compile-solaris-amd64
BUILD/compile-solaris-amd64-debug
per-file messages:
BUILD/compile-solaris-amd64
Changed to dynamic linking of libgcc.
The -static-libgcc was a legacy of the original build scripts. -R
(analogous to -L link time search path) is a Solaris mechanism to
ensure a needed lib directory is searched at runtime.
In Solaris 10, gcc comes bundled, under /usr/sfw, allowing to use it without
creating dependency problems. This allows eg. benefiting from ordinary system
patch maintenance.
BUILD/compile-solaris-amd64-debug
Changed to dynamic linking of libgcc.
The -static-libgcc was a legacy of the original build scripts. -R
(analogous to -L link time search path) is a Solaris mechanism to
ensure a needed lib directory is searched at runtime.
In Solaris 10, gcc comes bundled, under /usr/sfw, allowing to use it without
creating dependency problems. This allows eg. benefiting from ordinary system
patch maintenance.
=== modified file 'BUILD/compile-solaris-amd64'
--- a/BUILD/compile-solaris-amd64 2009-05-09 04:01:53 +0000
+++ b/BUILD/compile-solaris-amd64 2009-07-07 11:19:24 +0000
@@ -26,7 +26,7 @@ path=`dirname $0`
extra_flags="$amd64_cflags -D__sun -m64 -mtune=athlon64"
extra_configs="$amd64_configs $max_configs --with-libevent"
-LDFLAGS="-lmtmalloc -static-libgcc"
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib/64"
export LDFLAGS
. "$path/FINISH.sh"
=== modified file 'BUILD/compile-solaris-amd64-debug'
--- a/BUILD/compile-solaris-amd64-debug 2009-05-09 04:01:53 +0000
+++ b/BUILD/compile-solaris-amd64-debug 2009-07-07 11:19:24 +0000
@@ -5,7 +5,7 @@ path=`dirname $0`
extra_flags="$amd64_cflags -D__sun -m64 -mtune=athlon64 $debug_cflags"
extra_configs="$amd64_configs $debug_configs $max_configs --with-libevent"
-LDFLAGS="-lmtmalloc -static-libgcc"
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib/64"
export LDFLAGS
. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-amd64-debug-forte'
--- a/BUILD/compile-solaris-amd64-debug-forte 1970-01-01 00:00:00 +0000
+++ b/BUILD/compile-solaris-amd64-debug-forte 2009-07-07 11:19:24 +0000
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+# Take only #define options - the others are gcc specific.
+# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
+DEFS=""
+for F in $debug_cflags ; do
+ expr "$F" : "^-D" && DEFS="$DEFS $F"
+done
+debug_cflags="-O0 -g $DEFS"
+
+extra_flags="-m64 -mt -D_FORTEC_ -xlibmopt -fns=no $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
=== removed file 'BUILD/compile-solaris-amd64-forte-debug'
--- a/BUILD/compile-solaris-amd64-forte-debug 2009-05-09 04:01:53 +0000
+++ b/BUILD/compile-solaris-amd64-forte-debug 1970-01-01 00:00:00 +0000
@@ -1,27 +0,0 @@
-#!/bin/sh
-
-path=`dirname $0`
-. "$path/SETUP.sh"
-
-# Take only #define options - the others are gcc specific.
-# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
-DEFS=""
-for F in $debug_cflags ; do
- expr "$F" : "^-D" && DEFS="$DEFS $F"
-done
-debug_cflags="-O0 -g $DEFS"
-
-extra_flags="-m64 -mt -D_FORTEC_ -xlibmopt -fns=no $debug_cflags"
-extra_configs="$max_configs --with-libevent $debug_configs"
-
-warnings=""
-c_warnings=""
-cxx_warnings=""
-base_cxxflags="-noex"
-
-CC=cc
-CFLAGS="-xstrconst"
-CXX=CC
-LDFLAGS="-lmtmalloc"
-
-. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32'
--- a/BUILD/compile-solaris-x86-32 1970-01-01 00:00:00 +0000
+++ b/BUILD/compile-solaris-x86-32 2009-07-07 11:19:24 +0000
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+extra_flags="-D__sun -m32"
+extra_configs="$max_configs --with-libevent"
+
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib"
+export LDFLAGS
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32-debug'
--- a/BUILD/compile-solaris-x86-32-debug 1970-01-01 00:00:00 +0000
+++ b/BUILD/compile-solaris-x86-32-debug 2009-07-07 11:19:24 +0000
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+extra_flags="-D__sun -m32 $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+LDFLAGS="-lmtmalloc -R/usr/sfw/lib"
+export LDFLAGS
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-32-debug-forte'
--- a/BUILD/compile-solaris-x86-32-debug-forte 1970-01-01 00:00:00 +0000
+++ b/BUILD/compile-solaris-x86-32-debug-forte 2009-07-07 11:19:24 +0000
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+# Take only #define options - the others are gcc specific.
+# (real fix is for SETUP.sh not to put gcc specific options in $debug_cflags)
+DEFS=""
+for F in $debug_cflags ; do
+ expr "$F" : "^-D" && DEFS="$DEFS $F"
+done
+debug_cflags="-O0 -g $DEFS"
+
+extra_flags="-m32 -mt -D_FORTEC_ -xbuiltin=%all -xlibmil -xlibmopt -fns=no -xprefetch=auto -xprefetch_level=3 $debug_cflags"
+extra_configs="$max_configs --with-libevent $debug_configs"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
=== added file 'BUILD/compile-solaris-x86-forte-32'
--- a/BUILD/compile-solaris-x86-forte-32 1970-01-01 00:00:00 +0000
+++ b/BUILD/compile-solaris-x86-forte-32 2009-07-07 11:19:24 +0000
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+path=`dirname $0`
+. "$path/SETUP.sh"
+
+extra_flags="-m32 -mt -D_FORTEC_ -xbuiltin=%all -xlibmil -xlibmopt -fns=no -xprefetch=auto -xprefetch_level=3"
+extra_configs="$max_configs --with-libevent"
+
+warnings=""
+c_warnings=""
+cxx_warnings=""
+base_cxxflags="-noex"
+
+CC=cc
+CFLAGS="-xstrconst"
+CXX=CC
+LDFLAGS="-lmtmalloc"
+
+. "$path/FINISH.sh"
1
0
[Maria-developers] Rev 2817: BUG#42742: crash in setup_sj_materialization, Copy_field::set in file:///home/psergey/dev/mysql-next-fix-subq/
by Sergey Petrunya 06 Jul '09
by Sergey Petrunya 06 Jul '09
06 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq/
------------------------------------------------------------
revno: 2817
revision-id: psergey(a)askmonty.org-20090706143329-72s3e73rov2f5tml
parent: psergey(a)askmonty.org-20090706142603-z3z8ku4fdah6ntwv
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq
timestamp: Mon 2009-07-06 18:33:29 +0400
message:
BUG#42742: crash in setup_sj_materialization, Copy_field::set
- If a semi-join strategy covers certain [first_table; last_table]
range in join order, do reset the sj_strategy member for all tables
within the range, except the first one.
Failure to do so caused EXPLAIN/execution code to try applying two
strategies at once which would cause all kinds of undesired effects.
=== modified file 'mysql-test/r/subselect_sj2.result'
--- a/mysql-test/r/subselect_sj2.result 2009-03-21 15:31:38 +0000
+++ b/mysql-test/r/subselect_sj2.result 2009-07-06 14:33:29 +0000
@@ -689,3 +689,19 @@
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
drop table t1, t2;
+#
+# BUG#42742: crash in setup_sj_materialization, Copy_field::set
+#
+create table t3 ( c1 year) engine=innodb;
+insert into t3 values (2135),(2142);
+create table t2 (c1 tinytext,c2 text,c6 timestamp) engine=innodb;
+# The following must not crash, EXPLAIN should show one SJ strategy, not a mix:
+explain select 1 from t2 where
+c2 in (select 1 from t3, t2) and
+c1 in (select convert(c6,char(1)) from t2);
+id select_type table type possible_keys key key_len ref rows Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1 Using where
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1 Using where; Using join buffer
+1 PRIMARY t3 ALL NULL NULL NULL NULL 2 FirstMatch(t2); Using join buffer
+drop table t2, t3;
=== modified file 'mysql-test/r/subselect_sj2_jcl6.result'
--- a/mysql-test/r/subselect_sj2_jcl6.result 2009-06-19 09:12:06 +0000
+++ b/mysql-test/r/subselect_sj2_jcl6.result 2009-07-06 14:33:29 +0000
@@ -693,6 +693,22 @@
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
drop table t1, t2;
+#
+# BUG#42742: crash in setup_sj_materialization, Copy_field::set
+#
+create table t3 ( c1 year) engine=innodb;
+insert into t3 values (2135),(2142);
+create table t2 (c1 tinytext,c2 text,c6 timestamp) engine=innodb;
+# The following must not crash, EXPLAIN should show one SJ strategy, not a mix:
+explain select 1 from t2 where
+c2 in (select 1 from t3, t2) and
+c1 in (select convert(c6,char(1)) from t2);
+id select_type table type possible_keys key key_len ref rows Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1 Using where
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1 Using join buffer
+1 PRIMARY t2 ALL NULL NULL NULL NULL 1 Using where; Using join buffer
+1 PRIMARY t3 ALL NULL NULL NULL NULL 2 FirstMatch(t2); Using join buffer
+drop table t2, t3;
set join_cache_level=default;
show variables like 'join_cache_level';
Variable_name Value
=== modified file 'mysql-test/t/subselect_sj2.test'
--- a/mysql-test/t/subselect_sj2.test 2009-03-21 15:31:38 +0000
+++ b/mysql-test/t/subselect_sj2.test 2009-07-06 14:33:29 +0000
@@ -872,3 +872,15 @@
explain select 1 from t2 where c2 = any (select log10(null) from t1 where c6 <null) ;
drop table t1, t2;
+--echo #
+--echo # BUG#42742: crash in setup_sj_materialization, Copy_field::set
+--echo #
+create table t3 ( c1 year) engine=innodb;
+insert into t3 values (2135),(2142);
+create table t2 (c1 tinytext,c2 text,c6 timestamp) engine=innodb;
+-- echo # The following must not crash, EXPLAIN should show one SJ strategy, not a mix:
+explain select 1 from t2 where
+ c2 in (select 1 from t3, t2) and
+ c1 in (select convert(c6,char(1)) from t2);
+drop table t2, t3;
+
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-07-06 07:57:39 +0000
+++ b/sql/sql_select.cc 2009-07-06 14:33:29 +0000
@@ -7916,7 +7916,11 @@
uint i_end= first + join->best_positions[first].n_sj_tables;
for (uint i= first; i < i_end; i++)
+ {
+ if (i != first)
+ join->best_positions[i].sj_strategy= SJ_OPT_NONE;
handled_tabs |= join->best_positions[i].table->table->map;
+ }
if (tablenr != first)
pos->sj_strategy= SJ_OPT_NONE;
1
0
[Maria-developers] Rev 2816: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next/
by Sergey Petrunya 06 Jul '09
by Sergey Petrunya 06 Jul '09
06 Jul '09
At file:///home/psergey/dev/mysql-next/
------------------------------------------------------------
revno: 2816
revision-id: psergey(a)askmonty.org-20090706142603-z3z8ku4fdah6ntwv
parent: psergey(a)askmonty.org-20090706075739-ay9m392esf31wx0s
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next
timestamp: Mon 2009-07-06 18:26:03 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
- Post-push valgrind fix
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-07-06 07:57:39 +0000
+++ b/sql/item_subselect.cc 2009-07-06 14:26:03 +0000
@@ -289,8 +289,12 @@
used_tables_cache &= ~OUTER_REF_TABLE_BIT;
if (furthest_correlated_ancestor > 1)
used_tables_cache |= OUTER_REF_TABLE_BIT;
- const_item_cache &= test(!(used_tables_cache &
- ~new_parent->join->const_table_map));
+
+ /*
+ Don't update const_tables_cache yet as we don't yet know which of the
+ parent's tables are constant. Parent will call update_used_tables() anyway,
+ and that will be our chance to update.
+ */
}
1
0
[Maria-developers] Rev 2816: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next-look-vg/
by Sergey Petrunya 06 Jul '09
by Sergey Petrunya 06 Jul '09
06 Jul '09
At file:///home/psergey/dev/mysql-next-look-vg/
------------------------------------------------------------
revno: 2816
revision-id: psergey(a)askmonty.org-20090706141824-4u0m7arubaadks6w
parent: psergey(a)askmonty.org-20090706081826-4bvmp429ikj9aptw
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-look-vg
timestamp: Mon 2009-07-06 18:18:24 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
- Post-push valgrind fix
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-07-06 08:18:26 +0000
+++ b/sql/item_subselect.cc 2009-07-06 14:18:24 +0000
@@ -289,8 +289,12 @@
used_tables_cache &= ~OUTER_REF_TABLE_BIT;
if (furthest_correlated_ancestor > 1)
used_tables_cache |= OUTER_REF_TABLE_BIT;
- const_item_cache &= test(!(used_tables_cache &
- ~new_parent->join->const_table_map));
+
+ /*
+ Don't update const_tables_cache yet as we don't yet know which of the
+ parent's tables are constant. Parent will call update_used_tables() anyway,
+ and that will be our chance to update.
+ */
}
1
0
Test, please ignore
1
0
Test, please ignore.
1
0
Test, please ignore
1
0
[Maria-developers] Rev 2815: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next-fix-subq/
by Sergey Petrunya 06 Jul '09
by Sergey Petrunya 06 Jul '09
06 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq/
------------------------------------------------------------
revno: 2815
revision-id: psergey(a)askmonty.org-20090706081826-4bvmp429ikj9aptw
parent: psergey(a)askmonty.org-20090704004450-4pqbx9pm50bzky0l
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq
timestamp: Mon 2009-07-06 12:18:26 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
=== modified file 'mysql-test/r/subselect_sj.result'
--- a/mysql-test/r/subselect_sj.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj.result 2009-07-06 08:18:26 +0000
@@ -327,3 +327,48 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
=== modified file 'mysql-test/r/subselect_sj_jcl6.result'
--- a/mysql-test/r/subselect_sj_jcl6.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj_jcl6.result 2009-07-06 08:18:26 +0000
@@ -331,6 +331,51 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
set join_cache_level=default;
show variables like 'join_cache_level';
Variable_name Value
=== modified file 'mysql-test/t/subselect_sj.test'
--- a/mysql-test/t/subselect_sj.test 2009-03-19 17:03:58 +0000
+++ b/mysql-test/t/subselect_sj.test 2009-07-06 08:18:26 +0000
@@ -216,4 +216,39 @@
HAVING X > '2012-12-12';
drop table t1, t2;
-
+--echo #
+--echo # BUG#31480: Incorrect result for nested subquery when executed via semi join
+--echo #
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+show warnings;
+
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+
+drop table t1, t2, t3;
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item.cc 2009-07-06 08:18:26 +0000
@@ -2212,7 +2212,8 @@
}
-void Item_field::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_field::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (new_parent == depended_from)
depended_from= NULL;
@@ -3797,16 +3798,17 @@
static void mark_as_dependent(THD *thd, SELECT_LEX *last, SELECT_LEX *current,
Item_ident *resolved_item,
- Item_ident *mark_item)
+ Item_ident *mark_item, table_map dep_map)
{
const char *db_name= (resolved_item->db_name ?
resolved_item->db_name : "");
const char *table_name= (resolved_item->table_name ?
resolved_item->table_name : "");
+ //table_map dep_map = resolved_item->used_tables();
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
- current->mark_as_dependent(last);
+ current->mark_as_dependent(last, dep_map);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_NOTE,
@@ -3864,21 +3866,26 @@
Item_subselect *prev_subselect_item=
previous_select->master_unit()->item;
Item_ident *dependent= resolved_item;
+ table_map found_used_tables;
if (found_field == view_ref_found)
{
Item::Type type= found_item->type();
+ found_used_tables= found_item->used_tables();
prev_subselect_item->used_tables_cache|=
- found_item->used_tables();
+ found_used_tables;
dependent= ((type == Item::REF_ITEM || type == Item::FIELD_ITEM) ?
(Item_ident*) found_item :
0);
}
else
+ {
+ found_used_tables= found_field->table->map;
prev_subselect_item->used_tables_cache|=
found_field->table->map;
+ }
prev_subselect_item->const_item_cache= 0;
mark_as_dependent(thd, last_select, current_sel, resolved_item,
- dependent);
+ dependent, found_used_tables);
}
}
@@ -4159,6 +4166,7 @@
SELECT_LEX *current_sel= (SELECT_LEX *) thd->lex->current_select;
Name_resolution_context *outer_context= 0;
SELECT_LEX *select= 0;
+ uint n_levels= 0;
/* Currently derived tables cannot be correlated */
if (current_sel->master_unit()->first_select()->linkage !=
DERIVED_TABLE_TYPE)
@@ -4251,7 +4259,8 @@
context->select_lex, this,
((ref_type == REF_ITEM ||
ref_type == FIELD_ITEM) ?
- (Item_ident*) (*reference) : 0));
+ (Item_ident*) (*reference) : 0),
+ (*from_field)->table->map);
return 0;
}
}
@@ -4266,7 +4275,8 @@
context->select_lex, this,
((ref_type == REF_ITEM || ref_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
A reference to a view field had been found and we
substituted it instead of this Item (find_field_in_tables
@@ -4300,6 +4310,7 @@
*/
prev_subselect_item->used_tables_cache|= OUTER_REF_TABLE_BIT;
prev_subselect_item->const_item_cache= 0;
+ n_levels++;
}
DBUG_ASSERT(ref != 0);
@@ -4367,14 +4378,15 @@
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex, this,
- rf);
+ rf, rf->used_tables());
return 0;
}
else
{
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex,
- this, (Item_ident*)*reference);
+ this, (Item_ident*)*reference,
+ (*reference)->used_tables());
if (last_checked_context->select_lex->having_fix_field)
{
Item_ref *rf;
@@ -6084,7 +6096,8 @@
((refer_type == REF_ITEM ||
refer_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
view reference found, we substituted it instead of this
Item, so can quit
@@ -6134,7 +6147,8 @@
goto error;
thd->change_item_tree(reference, fld);
mark_as_dependent(thd, last_checked_context->select_lex,
- thd->lex->current_select, this, fld);
+ thd->lex->current_select, this, fld,
+ from_field->table->map);
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6157,7 +6171,8 @@
/* Should be checked in resolve_ref_in_select_and_group(). */
DBUG_ASSERT(*ref && (*ref)->fixed);
mark_as_dependent(thd, last_checked_context->select_lex,
- context->select_lex, this, this);
+ context->select_lex, this, this,
+ (*ref)->used_tables());
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6568,20 +6583,22 @@
return err;
}
-void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (depended_from == new_parent)
{
*ref= outer_ref;
- outer_ref->fix_after_pullout(new_parent, ref);
+ outer_ref->fix_after_pullout(new_parent, parent_tables, ref);
}
}
-void Item_ref::fix_after_pullout(st_select_lex *new_parent, Item **refptr)
+void Item_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **refptr)
{
if (depended_from == new_parent)
{
- (*ref)->fix_after_pullout(new_parent, ref);
+ (*ref)->fix_after_pullout(new_parent, parent_tables, ref);
depended_from= NULL;
}
}
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-05-25 10:10:18 +0000
+++ b/sql/item.h 2009-07-06 08:18:26 +0000
@@ -557,7 +557,8 @@
Fix after some tables has been pulled out. Basically re-calculate all
attributes that are dependent on the tables.
*/
- virtual void fix_after_pullout(st_select_lex *new_parent, Item **ref) {};
+ virtual void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref) {};
/*
should be used in case where we are sure that we do not need
@@ -1486,7 +1487,8 @@
bool send(Protocol *protocol, String *str_arg);
void reset_field(Field *f);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void make_field(Send_field *tmp_field);
int save_in_field(Field *field,bool no_conversions);
void save_org_in_field(Field *field);
@@ -2278,7 +2280,8 @@
bool send(Protocol *prot, String *tmp);
void make_field(Send_field *field);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
int save_in_field(Field *field, bool no_conversions);
void save_org_in_field(Field *field);
enum Item_result result_type () const { return (*ref)->result_type(); }
@@ -2448,7 +2451,8 @@
outer_ref->save_org_in_field(result_field);
}
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const
{
return (*ref)->const_item() ? 0 : OUTER_REF_TABLE_BIT;
=== modified file 'sql/item_cmpfunc.cc'
--- a/sql/item_cmpfunc.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_cmpfunc.cc 2009-07-06 08:18:26 +0000
@@ -4004,7 +4004,8 @@
}
-void Item_cond::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_cond::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
List_iterator<Item> li(list);
Item *item;
@@ -4018,7 +4019,7 @@
while ((item=li++))
{
table_map tmp_table_map;
- item->fix_after_pullout(new_parent, li.ref());
+ item->fix_after_pullout(new_parent, parent_tables, li.ref());
item= *li.ref();
used_tables_cache|= item->used_tables();
const_item_cache&= item->const_item();
=== modified file 'sql/item_cmpfunc.h'
--- a/sql/item_cmpfunc.h 2009-01-26 16:03:39 +0000
+++ b/sql/item_cmpfunc.h 2009-07-06 08:18:26 +0000
@@ -1475,7 +1475,8 @@
bool add_at_head(Item *item) { return list.push_front(item); }
void add_at_head(List<Item> *nlist) { list.prepand(nlist); }
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
enum Type type() const { return COND_ITEM; }
List<Item>* argument_list() { return &list; }
=== modified file 'sql/item_func.cc'
--- a/sql/item_func.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_func.cc 2009-07-06 08:18:26 +0000
@@ -206,7 +206,8 @@
}
-void Item_func::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_func::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
Item **arg,**arg_end;
@@ -217,7 +218,7 @@
{
for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++)
{
- (*arg)->fix_after_pullout(new_parent, arg);
+ (*arg)->fix_after_pullout(new_parent, parent_tables, arg);
Item *item= *arg;
used_tables_cache|= item->used_tables();
=== modified file 'sql/item_func.h'
--- a/sql/item_func.h 2009-05-21 20:27:17 +0000
+++ b/sql/item_func.h 2009-07-06 08:18:26 +0000
@@ -117,7 +117,8 @@
// Constructor used for Item_cond_and/or (see Item comment)
Item_func(THD *thd, Item_func *item);
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const;
table_map not_null_tables() const;
void update_used_tables();
=== modified file 'sql/item_row.cc'
--- a/sql/item_row.cc 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.cc 2009-07-06 08:18:26 +0000
@@ -124,13 +124,14 @@
}
}
-void Item_row::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_row::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
used_tables_cache= 0;
const_item_cache= 1;
for (uint i= 0; i < arg_count; i++)
{
- items[i]->fix_after_pullout(new_parent, &items[i]);
+ items[i]->fix_after_pullout(new_parent, parent_tables, &items[i]);
used_tables_cache|= items[i]->used_tables();
const_item_cache&= items[i]->const_item();
}
=== modified file 'sql/item_row.h'
--- a/sql/item_row.h 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.h 2009-07-06 08:18:26 +0000
@@ -59,7 +59,8 @@
return 0;
};
bool fix_fields(THD *thd, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void cleanup();
void split_sum_func(THD *thd, Item **ref_pointer_array, List<Item> &fields);
table_map used_tables() const { return used_tables_cache; };
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-06-30 08:03:05 +0000
+++ b/sql/item_subselect.cc 2009-07-06 08:18:26 +0000
@@ -39,7 +39,7 @@
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), engine_changed(0), changed(0),
+ const_item_cache(1), inside_fix_fields(0), engine_changed(0), changed(0),
is_correlated(FALSE)
{
with_subselect= 1;
@@ -158,6 +158,13 @@
DBUG_RETURN(RES_OK);
}
+void Item_subselect::set_depth()
+{
+ uint n= 0;
+ for (SELECT_LEX *s= unit->first_select(); s; s= s->outer_select())
+ n++;
+ this->depth= n - 1;
+}
bool Item_subselect::fix_fields(THD *thd_param, Item **ref)
{
@@ -168,9 +175,19 @@
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
+ if (!inside_fix_fields)
+ {
+ set_depth();
+ if (!(ancestor_used_tables= (table_map*)thd->calloc((1+depth) *
+ sizeof(table_map))))
+ return TRUE;
+ furthest_correlated_ancestor= 0;
+ }
+
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
+ inside_fix_fields++;
res= engine->prepare();
// all transformation is done (used by prepared statements)
@@ -203,12 +220,14 @@
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
+ inside_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
+ inside_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -225,11 +244,56 @@
fixed= 1;
err:
+ inside_fix_fields--;
thd->where= save_where;
return res;
}
+/*
+ Adjust attributes after our parent select has been merged into grandparent
+
+ DESCRIPTION
+ Subquery is a composite object which may be correlated, that is, it may
+ have
+ 1. references to tables of the parent select (i.e. one that has the clause
+ with the subquery predicate)
+ 2. references to tables of the grandparent select
+ 3. references to tables of further ancestors.
+
+ Before the pullout, this item indicates:
+ - #1 with table bits in used_tables()
+ - #2 and #3 with OUTER_REF_TABLE_BIT.
+
+ After parent has been merged with grandparent:
+ - references to parent and grandparent tables should be indicated with
+ table bits.
+ - references to greatgrandparent and further ancestors - with
+ OUTER_REF_TABLE_BIT.
+
+ This is exactly what this function does, based on pre-collected info in
+ ancestor_used_tables and furthest_correlated_ancestor.
+*/
+
+void Item_subselect::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
+{
+ used_tables_cache= (used_tables_cache << parent_tables) |
+ ancestor_used_tables[0];
+ for (uint i=0; i < depth; i++)
+ ancestor_used_tables[i]= ancestor_used_tables[i+1];
+ depth--;
+
+ if (furthest_correlated_ancestor)
+ furthest_correlated_ancestor--;
+ used_tables_cache &= ~OUTER_REF_TABLE_BIT;
+ if (furthest_correlated_ancestor > 1)
+ used_tables_cache |= OUTER_REF_TABLE_BIT;
+ const_item_cache &= test(!(used_tables_cache &
+ ~new_parent->join->const_table_map));
+}
+
+
bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
uchar *argument)
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2008-11-10 18:36:50 +0000
+++ b/sql/item_subselect.h 2009-07-06 08:18:26 +0000
@@ -66,9 +66,39 @@
/* work with 'substitution' */
bool have_to_be_excluded;
/* cache of constant state */
+
bool const_item_cache;
+ int inside_fix_fields;
+public:
+ /*
+ Depth of the subquery predicate.
+ If the subquery predicate is attatched to some clause of the top-level
+ select, depth will be 1
+ If it is attached to a clause in a subquery of the top-level select, depth
+ will be 2 and so forth.
+ */
+ uint depth;
+
+ /*
+ Maximum correlation level of the select
+ - select that has no references to outside will have 0,
+ - select that references tables in the select it is located will have 1,
+ - select that has references to tables of its parent select will have 2,
+ - select that has references to tables of grandparent will have 3
+ and so forth.
+ */
+ uint furthest_correlated_ancestor;
+ /*
+ This is used_tables() for non-direct ancestors. That is,
+ - used_tables() shows which tables of the parent select are referred to
+ from within the subquery,
+ - ancestor_used_tables[0] shows which tables of the grandparent select are
+ referred to from within the subquery,
+ - ancestor_used_tables[1] shows which tables of the great grand parent
+ select... and so forth.
+ */
+ table_map *ancestor_used_tables;
-public:
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -84,6 +114,7 @@
Item_subselect();
virtual subs_type substype() { return UNKNOWN_SUBS; }
+ void set_depth();
/*
We need this method, because some compilers do not allow 'this'
@@ -109,6 +140,8 @@
return null_value;
}
bool fix_fields(THD *thd, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
virtual bool exec();
virtual void fix_length_and_dec();
table_map used_tables() const;
=== modified file 'sql/item_sum.cc'
--- a/sql/item_sum.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_sum.cc 2009-07-06 08:18:26 +0000
@@ -350,7 +350,7 @@
sl= sl->master_unit()->outer_select() )
sl->master_unit()->item->with_sum_func= 1;
}
- thd->lex->current_select->mark_as_dependent(aggr_sel);
+ thd->lex->current_select->mark_as_dependent(aggr_sel, NULL);
return FALSE;
}
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-06-04 06:27:44 +0000
+++ b/sql/sql_lex.cc 2009-07-06 08:18:26 +0000
@@ -1901,8 +1901,9 @@
'last' should be reachable from this st_select_lex_node
*/
-void st_select_lex::mark_as_dependent(st_select_lex *last)
+void st_select_lex::mark_as_dependent(st_select_lex *last, table_map dep_map)
{
+ uint n_levels= master_unit()->item->depth;
/*
Mark all selects from resolved to 1 before select where was
found table as depended (of select where was found table)
@@ -1928,7 +1929,14 @@
}
Item_subselect *subquery_predicate= s->master_unit()->item;
if (subquery_predicate)
+ {
subquery_predicate->is_correlated= TRUE;
+ subquery_predicate->furthest_correlated_ancestor=
+ max(subquery_predicate->furthest_correlated_ancestor, n_levels);
+ if (n_levels > 1)
+ subquery_predicate->ancestor_used_tables[n_levels - 2]= dep_map;
+ }
+ n_levels--;
}
}
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2009-06-12 02:01:08 +0000
+++ b/sql/sql_lex.h 2009-07-06 08:18:26 +0000
@@ -755,7 +755,7 @@
return master_unit()->return_after_parsing();
}
- void mark_as_dependent(st_select_lex *last);
+ void mark_as_dependent(st_select_lex *last, table_map dep_map);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-07-04 00:44:50 +0000
+++ b/sql/sql_select.cc 2009-07-06 08:18:26 +0000
@@ -3122,16 +3122,23 @@
}
-void fix_list_after_tbl_changes(SELECT_LEX *new_parent, List<TABLE_LIST> *tlist)
+void fix_list_after_tbl_changes(SELECT_LEX *new_parent, uint parent_tables,
+ List<TABLE_LIST> *tlist)
{
List_iterator<TABLE_LIST> it(*tlist);
TABLE_LIST *table;
while ((table= it++))
{
if (table->on_expr)
- table->on_expr->fix_after_pullout(new_parent, &table->on_expr);
+ {
+ table->on_expr->fix_after_pullout(new_parent, parent_tables,
+ &table->on_expr);
+ }
if (table->nested_join)
- fix_list_after_tbl_changes(new_parent, &table->nested_join->join_list);
+ {
+ fix_list_after_tbl_changes(new_parent, parent_tables,
+ &table->nested_join->join_list);
+ }
}
}
@@ -3334,6 +3341,7 @@
/*TODO: also reset the 'with_subselect' there. */
/* n. Adjust the parent_join->tables counter */
+ uint parent_tables= parent_join->tables;
uint table_no= parent_join->tables;
/* n. Walk through child's tables and adjust table->map */
for (tl= subq_lex->leaf_tables; tl; tl= tl->next_leaf, table_no++)
@@ -3410,8 +3418,10 @@
Fix attributes (mainly item->table_map()) for sj-nest's WHERE and ON
expressions.
*/
- sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr);
- fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
+ sj_nest->sj_on_expr->fix_after_pullout(parent_lex, parent_join->tables,
+ &sj_nest->sj_on_expr);
+ fix_list_after_tbl_changes(parent_lex, parent_join->tables,
+ &sj_nest->nested_join->join_list);
/* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
1
0
[Maria-developers] Rev 2815: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-next-fix-subq/
by Sergey Petrunya 06 Jul '09
by Sergey Petrunya 06 Jul '09
06 Jul '09
At file:///home/psergey/dev/mysql-next-fix-subq/
------------------------------------------------------------
revno: 2815
revision-id: psergey(a)askmonty.org-20090706075739-ay9m392esf31wx0s
parent: psergey(a)askmonty.org-20090704004450-4pqbx9pm50bzky0l
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next-fix-subq
timestamp: Mon 2009-07-06 11:57:39 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
=== modified file 'mysql-test/r/subselect_sj.result'
--- a/mysql-test/r/subselect_sj.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj.result 2009-07-06 07:57:39 +0000
@@ -327,3 +327,48 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
=== modified file 'mysql-test/r/subselect_sj_jcl6.result'
--- a/mysql-test/r/subselect_sj_jcl6.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj_jcl6.result 2009-07-06 07:57:39 +0000
@@ -331,6 +331,51 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
set join_cache_level=default;
show variables like 'join_cache_level';
Variable_name Value
=== modified file 'mysql-test/t/subselect_sj.test'
--- a/mysql-test/t/subselect_sj.test 2009-03-19 17:03:58 +0000
+++ b/mysql-test/t/subselect_sj.test 2009-07-06 07:57:39 +0000
@@ -216,4 +216,39 @@
HAVING X > '2012-12-12';
drop table t1, t2;
-
+--echo #
+--echo # BUG#31480: Incorrect result for nested subquery when executed via semi join
+--echo #
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+show warnings;
+
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+
+drop table t1, t2, t3;
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item.cc 2009-07-06 07:57:39 +0000
@@ -2212,7 +2212,8 @@
}
-void Item_field::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_field::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (new_parent == depended_from)
depended_from= NULL;
@@ -3797,16 +3798,17 @@
static void mark_as_dependent(THD *thd, SELECT_LEX *last, SELECT_LEX *current,
Item_ident *resolved_item,
- Item_ident *mark_item)
+ Item_ident *mark_item, table_map dep_map)
{
const char *db_name= (resolved_item->db_name ?
resolved_item->db_name : "");
const char *table_name= (resolved_item->table_name ?
resolved_item->table_name : "");
+ //table_map dep_map = resolved_item->used_tables();
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
- current->mark_as_dependent(last);
+ current->mark_as_dependent(last, dep_map);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_NOTE,
@@ -3864,21 +3866,26 @@
Item_subselect *prev_subselect_item=
previous_select->master_unit()->item;
Item_ident *dependent= resolved_item;
+ table_map found_used_tables;
if (found_field == view_ref_found)
{
Item::Type type= found_item->type();
+ found_used_tables= found_item->used_tables();
prev_subselect_item->used_tables_cache|=
- found_item->used_tables();
+ found_used_tables;
dependent= ((type == Item::REF_ITEM || type == Item::FIELD_ITEM) ?
(Item_ident*) found_item :
0);
}
else
+ {
+ found_used_tables= found_field->table->map;
prev_subselect_item->used_tables_cache|=
found_field->table->map;
+ }
prev_subselect_item->const_item_cache= 0;
mark_as_dependent(thd, last_select, current_sel, resolved_item,
- dependent);
+ dependent, found_used_tables);
}
}
@@ -4159,6 +4166,7 @@
SELECT_LEX *current_sel= (SELECT_LEX *) thd->lex->current_select;
Name_resolution_context *outer_context= 0;
SELECT_LEX *select= 0;
+ uint n_levels= 0;
/* Currently derived tables cannot be correlated */
if (current_sel->master_unit()->first_select()->linkage !=
DERIVED_TABLE_TYPE)
@@ -4251,7 +4259,8 @@
context->select_lex, this,
((ref_type == REF_ITEM ||
ref_type == FIELD_ITEM) ?
- (Item_ident*) (*reference) : 0));
+ (Item_ident*) (*reference) : 0),
+ (*from_field)->table->map);
return 0;
}
}
@@ -4266,7 +4275,8 @@
context->select_lex, this,
((ref_type == REF_ITEM || ref_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
A reference to a view field had been found and we
substituted it instead of this Item (find_field_in_tables
@@ -4300,6 +4310,7 @@
*/
prev_subselect_item->used_tables_cache|= OUTER_REF_TABLE_BIT;
prev_subselect_item->const_item_cache= 0;
+ n_levels++;
}
DBUG_ASSERT(ref != 0);
@@ -4367,14 +4378,15 @@
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex, this,
- rf);
+ rf, rf->used_tables());
return 0;
}
else
{
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex,
- this, (Item_ident*)*reference);
+ this, (Item_ident*)*reference,
+ (*reference)->used_tables());
if (last_checked_context->select_lex->having_fix_field)
{
Item_ref *rf;
@@ -6084,7 +6096,8 @@
((refer_type == REF_ITEM ||
refer_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
view reference found, we substituted it instead of this
Item, so can quit
@@ -6134,7 +6147,8 @@
goto error;
thd->change_item_tree(reference, fld);
mark_as_dependent(thd, last_checked_context->select_lex,
- thd->lex->current_select, this, fld);
+ thd->lex->current_select, this, fld,
+ from_field->table->map);
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6157,7 +6171,8 @@
/* Should be checked in resolve_ref_in_select_and_group(). */
DBUG_ASSERT(*ref && (*ref)->fixed);
mark_as_dependent(thd, last_checked_context->select_lex,
- context->select_lex, this, this);
+ context->select_lex, this, this,
+ (*ref)->used_tables());
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6568,20 +6583,22 @@
return err;
}
-void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (depended_from == new_parent)
{
*ref= outer_ref;
- outer_ref->fix_after_pullout(new_parent, ref);
+ outer_ref->fix_after_pullout(new_parent, parent_tables, ref);
}
}
-void Item_ref::fix_after_pullout(st_select_lex *new_parent, Item **refptr)
+void Item_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **refptr)
{
if (depended_from == new_parent)
{
- (*ref)->fix_after_pullout(new_parent, ref);
+ (*ref)->fix_after_pullout(new_parent, parent_tables, ref);
depended_from= NULL;
}
}
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-05-25 10:10:18 +0000
+++ b/sql/item.h 2009-07-06 07:57:39 +0000
@@ -557,7 +557,8 @@
Fix after some tables has been pulled out. Basically re-calculate all
attributes that are dependent on the tables.
*/
- virtual void fix_after_pullout(st_select_lex *new_parent, Item **ref) {};
+ virtual void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref) {};
/*
should be used in case where we are sure that we do not need
@@ -1486,7 +1487,8 @@
bool send(Protocol *protocol, String *str_arg);
void reset_field(Field *f);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void make_field(Send_field *tmp_field);
int save_in_field(Field *field,bool no_conversions);
void save_org_in_field(Field *field);
@@ -2278,7 +2280,8 @@
bool send(Protocol *prot, String *tmp);
void make_field(Send_field *field);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
int save_in_field(Field *field, bool no_conversions);
void save_org_in_field(Field *field);
enum Item_result result_type () const { return (*ref)->result_type(); }
@@ -2448,7 +2451,8 @@
outer_ref->save_org_in_field(result_field);
}
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const
{
return (*ref)->const_item() ? 0 : OUTER_REF_TABLE_BIT;
=== modified file 'sql/item_cmpfunc.cc'
--- a/sql/item_cmpfunc.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_cmpfunc.cc 2009-07-06 07:57:39 +0000
@@ -4004,7 +4004,8 @@
}
-void Item_cond::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_cond::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
List_iterator<Item> li(list);
Item *item;
@@ -4018,7 +4019,7 @@
while ((item=li++))
{
table_map tmp_table_map;
- item->fix_after_pullout(new_parent, li.ref());
+ item->fix_after_pullout(new_parent, parent_tables, li.ref());
item= *li.ref();
used_tables_cache|= item->used_tables();
const_item_cache&= item->const_item();
=== modified file 'sql/item_cmpfunc.h'
--- a/sql/item_cmpfunc.h 2009-01-26 16:03:39 +0000
+++ b/sql/item_cmpfunc.h 2009-07-06 07:57:39 +0000
@@ -1475,7 +1475,8 @@
bool add_at_head(Item *item) { return list.push_front(item); }
void add_at_head(List<Item> *nlist) { list.prepand(nlist); }
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
enum Type type() const { return COND_ITEM; }
List<Item>* argument_list() { return &list; }
=== modified file 'sql/item_func.cc'
--- a/sql/item_func.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_func.cc 2009-07-06 07:57:39 +0000
@@ -206,7 +206,8 @@
}
-void Item_func::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_func::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
Item **arg,**arg_end;
@@ -217,7 +218,7 @@
{
for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++)
{
- (*arg)->fix_after_pullout(new_parent, arg);
+ (*arg)->fix_after_pullout(new_parent, parent_tables, arg);
Item *item= *arg;
used_tables_cache|= item->used_tables();
=== modified file 'sql/item_func.h'
--- a/sql/item_func.h 2009-05-21 20:27:17 +0000
+++ b/sql/item_func.h 2009-07-06 07:57:39 +0000
@@ -117,7 +117,8 @@
// Constructor used for Item_cond_and/or (see Item comment)
Item_func(THD *thd, Item_func *item);
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const;
table_map not_null_tables() const;
void update_used_tables();
=== modified file 'sql/item_row.cc'
--- a/sql/item_row.cc 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.cc 2009-07-06 07:57:39 +0000
@@ -124,13 +124,14 @@
}
}
-void Item_row::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_row::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
used_tables_cache= 0;
const_item_cache= 1;
for (uint i= 0; i < arg_count; i++)
{
- items[i]->fix_after_pullout(new_parent, &items[i]);
+ items[i]->fix_after_pullout(new_parent, parent_tables, &items[i]);
used_tables_cache|= items[i]->used_tables();
const_item_cache&= items[i]->const_item();
}
=== modified file 'sql/item_row.h'
--- a/sql/item_row.h 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.h 2009-07-06 07:57:39 +0000
@@ -59,7 +59,8 @@
return 0;
};
bool fix_fields(THD *thd, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void cleanup();
void split_sum_func(THD *thd, Item **ref_pointer_array, List<Item> &fields);
table_map used_tables() const { return used_tables_cache; };
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-06-30 08:03:05 +0000
+++ b/sql/item_subselect.cc 2009-07-06 07:57:39 +0000
@@ -39,7 +39,7 @@
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), engine_changed(0), changed(0),
+ const_item_cache(1), inside_fix_fields(0), engine_changed(0), changed(0),
is_correlated(FALSE)
{
with_subselect= 1;
@@ -158,6 +158,13 @@
DBUG_RETURN(RES_OK);
}
+void Item_subselect::set_depth()
+{
+ uint n= 0;
+ for (SELECT_LEX *s= unit->first_select(); s; s= s->outer_select())
+ n++;
+ this->depth= n - 1;
+}
bool Item_subselect::fix_fields(THD *thd_param, Item **ref)
{
@@ -168,9 +175,19 @@
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
+ if (!inside_fix_fields)
+ {
+ set_depth();
+ if (!(ancestor_used_tables= (table_map*)thd->calloc((1+depth) *
+ sizeof(table_map))))
+ return TRUE;
+ furthest_correlated_ancestor= 0;
+ }
+
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
+ inside_fix_fields++;
res= engine->prepare();
// all transformation is done (used by prepared statements)
@@ -203,12 +220,14 @@
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
+ inside_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
+ inside_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -225,11 +244,56 @@
fixed= 1;
err:
+ inside_fix_fields--;
thd->where= save_where;
return res;
}
+/*
+ Adjust attributes after our parent select has been merged into grandparent
+
+ DESCRIPTION
+ Subquery is a composite object which may be correlated, that is, it may
+ have
+ 1. references to tables of the parent select (i.e. one that has the clause
+ with the subquery predicate)
+ 2. references to tables of the grandparent select
+ 3. references to tables of further ancestors.
+
+ Before the pullout, this item indicates:
+ - #1 with table bits in used_tables()
+ - #2 and #3 with OUTER_REF_TABLE_BIT.
+
+ After parent has been merged with grandparent:
+ - references to parent and grandparent tables should be indicated with
+ table bits.
+ - references to greatgrandparent and further ancestors - with
+ OUTER_REF_TABLE_BIT.
+
+ This is exactly what this function does, based on pre-collected info in
+ ancestor_used_tables and furthest_correlated_ancestor.
+*/
+
+void Item_subselect::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
+{
+ used_tables_cache= (used_tables_cache << parent_tables) |
+ ancestor_used_tables[0];
+ for (uint i=0; i < depth; i++)
+ ancestor_used_tables[i]= ancestor_used_tables[i+1];
+ depth--;
+
+ if (furthest_correlated_ancestor)
+ furthest_correlated_ancestor--;
+ used_tables_cache &= ~OUTER_REF_TABLE_BIT;
+ if (furthest_correlated_ancestor > 1)
+ used_tables_cache |= OUTER_REF_TABLE_BIT;
+ const_item_cache &= test(!(used_tables_cache &
+ ~new_parent->join->const_table_map));
+}
+
+
bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
uchar *argument)
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2008-11-10 18:36:50 +0000
+++ b/sql/item_subselect.h 2009-07-06 07:57:39 +0000
@@ -66,9 +66,39 @@
/* work with 'substitution' */
bool have_to_be_excluded;
/* cache of constant state */
+
bool const_item_cache;
+ int inside_fix_fields;
+public:
+ /*
+ Depth of the subquery predicate.
+ If the subquery predicate is attatched to some clause of the top-level
+ select, depth will be 1
+ If it is attached to a clause in a subquery of the top-level select, depth
+ will be 2 and so forth.
+ */
+ uint depth;
+
+ /*
+ Maximum correlation level of the select
+ - select that has no references to outside will have 0,
+ - select that references tables in the select it is located will have 1,
+ - select that has references to tables of its parent select will have 2,
+ - select that has references to tables of grandparent will have 3
+ and so forth.
+ */
+ uint furthest_correlated_ancestor;
+ /*
+ This is used_tables() for non-direct ancestors. That is,
+ - used_tables() shows which tables of the parent select are referred to
+ from within the subquery,
+ - ancestor_used_tables[0] shows which tables of the grandparent select are
+ referred to from within the subquery,
+ - ancestor_used_tables[1] shows which tables of the great grand parent
+ select... and so forth.
+ */
+ table_map *ancestor_used_tables;
-public:
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -84,6 +114,7 @@
Item_subselect();
virtual subs_type substype() { return UNKNOWN_SUBS; }
+ void set_depth();
/*
We need this method, because some compilers do not allow 'this'
@@ -109,6 +140,8 @@
return null_value;
}
bool fix_fields(THD *thd, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
virtual bool exec();
virtual void fix_length_and_dec();
table_map used_tables() const;
=== modified file 'sql/item_sum.cc'
--- a/sql/item_sum.cc 2009-06-09 16:53:34 +0000
+++ b/sql/item_sum.cc 2009-07-06 07:57:39 +0000
@@ -350,7 +350,7 @@
sl= sl->master_unit()->outer_select() )
sl->master_unit()->item->with_sum_func= 1;
}
- thd->lex->current_select->mark_as_dependent(aggr_sel);
+ thd->lex->current_select->mark_as_dependent(aggr_sel, NULL);
return FALSE;
}
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-06-04 06:27:44 +0000
+++ b/sql/sql_lex.cc 2009-07-06 07:57:39 +0000
@@ -1901,8 +1901,9 @@
'last' should be reachable from this st_select_lex_node
*/
-void st_select_lex::mark_as_dependent(st_select_lex *last)
+void st_select_lex::mark_as_dependent(st_select_lex *last, table_map dep_map)
{
+ uint n_levels= master_unit()->item->depth;
/*
Mark all selects from resolved to 1 before select where was
found table as depended (of select where was found table)
@@ -1928,7 +1929,14 @@
}
Item_subselect *subquery_predicate= s->master_unit()->item;
if (subquery_predicate)
+ {
subquery_predicate->is_correlated= TRUE;
+ subquery_predicate->furthest_correlated_ancestor=
+ max(subquery_predicate->furthest_correlated_ancestor, n_levels);
+ if (n_levels > 1)
+ subquery_predicate->ancestor_used_tables[n_levels - 2]= dep_map;
+ }
+ n_levels--;
}
}
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2009-06-12 02:01:08 +0000
+++ b/sql/sql_lex.h 2009-07-06 07:57:39 +0000
@@ -755,7 +755,7 @@
return master_unit()->return_after_parsing();
}
- void mark_as_dependent(st_select_lex *last);
+ void mark_as_dependent(st_select_lex *last, table_map dep_map);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-07-04 00:44:50 +0000
+++ b/sql/sql_select.cc 2009-07-06 07:57:39 +0000
@@ -3122,16 +3122,23 @@
}
-void fix_list_after_tbl_changes(SELECT_LEX *new_parent, List<TABLE_LIST> *tlist)
+void fix_list_after_tbl_changes(SELECT_LEX *new_parent, uint parent_tables,
+ List<TABLE_LIST> *tlist)
{
List_iterator<TABLE_LIST> it(*tlist);
TABLE_LIST *table;
while ((table= it++))
{
if (table->on_expr)
- table->on_expr->fix_after_pullout(new_parent, &table->on_expr);
+ {
+ table->on_expr->fix_after_pullout(new_parent, parent_tables,
+ &table->on_expr);
+ }
if (table->nested_join)
- fix_list_after_tbl_changes(new_parent, &table->nested_join->join_list);
+ {
+ fix_list_after_tbl_changes(new_parent, parent_tables,
+ &table->nested_join->join_list);
+ }
}
}
@@ -3334,6 +3341,7 @@
/*TODO: also reset the 'with_subselect' there. */
/* n. Adjust the parent_join->tables counter */
+ uint parent_tables= parent_join->tables;
uint table_no= parent_join->tables;
/* n. Walk through child's tables and adjust table->map */
for (tl= subq_lex->leaf_tables; tl; tl= tl->next_leaf, table_no++)
@@ -3410,8 +3418,10 @@
Fix attributes (mainly item->table_map()) for sj-nest's WHERE and ON
expressions.
*/
- sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr);
- fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
+ sj_nest->sj_on_expr->fix_after_pullout(parent_lex, parent_join->tables,
+ &sj_nest->sj_on_expr);
+ fix_list_after_tbl_changes(parent_lex, parent_join->tables,
+ &sj_nest->nested_join->join_list);
/* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
1
0
[Maria-developers] Rev 2697: BUG#31480: Incorrect result for nested subquery when executed via semi join in file:///home/psergey/dev/mysql-6.0-look/
by Sergey Petrunya 04 Jul '09
by Sergey Petrunya 04 Jul '09
04 Jul '09
At file:///home/psergey/dev/mysql-6.0-look/
------------------------------------------------------------
revno: 2697
revision-id: psergey(a)askmonty.org-20090704040131-bzcjcds3siutn6sc
parent: jperkin(a)sun.com-20090423215644-h7ssug9w1hdgzn39
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-6.0-look
timestamp: Sat 2009-07-04 08:01:31 +0400
message:
BUG#31480: Incorrect result for nested subquery when executed via semi join
=== modified file 'mysql-test/r/subselect_sj.result'
--- a/mysql-test/r/subselect_sj.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj.result 2009-07-04 04:01:31 +0000
@@ -327,3 +327,48 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
=== modified file 'mysql-test/r/subselect_sj_jcl6.result'
--- a/mysql-test/r/subselect_sj_jcl6.result 2009-03-19 17:03:58 +0000
+++ b/mysql-test/r/subselect_sj_jcl6.result 2009-07-04 04:01:31 +0000
@@ -331,6 +331,51 @@
HAVING X > '2012-12-12';
X
drop table t1, t2;
+#
+# BUG#31480: Incorrect result for nested subquery when executed via semi join
+#
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY t2 ALL NULL NULL NULL NULL 6 100.00 Start temporary
+1 PRIMARY t1 ALL NULL NULL NULL NULL 7 100.00 Using where; End temporary; Using join buffer
+3 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 4 100.00 Using where
+Warnings:
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+show warnings;
+Level Code Message
+Note 1276 Field or reference 'test.t1.b' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`c`) and <nop>(<in_optimizer>(`test`.`t2`.`d`,<exists>(select 1 AS `Not_used` from `test`.`t3` where ((`test`.`t1`.`b` = `test`.`t3`.`e`) and (<cache>(`test`.`t2`.`d`) >= `test`.`t3`.`e`))))))
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+a
+2
+2
+3
+2
+drop table t1, t2, t3;
set join_cache_level=default;
show variables like 'join_cache_level';
Variable_name Value
=== modified file 'mysql-test/t/subselect_sj.test'
--- a/mysql-test/t/subselect_sj.test 2009-03-19 17:03:58 +0000
+++ b/mysql-test/t/subselect_sj.test 2009-07-04 04:01:31 +0000
@@ -216,4 +216,39 @@
HAVING X > '2012-12-12';
drop table t1, t2;
-
+--echo #
+--echo # BUG#31480: Incorrect result for nested subquery when executed via semi join
+--echo #
+create table t1 (a int not null, b int not null);
+create table t2 (c int not null, d int not null);
+create table t3 (e int not null);
+
+insert into t1 values (1,10);
+insert into t1 values (2,10);
+insert into t1 values (1,20);
+insert into t1 values (2,20);
+insert into t1 values (3,20);
+insert into t1 values (2,30);
+insert into t1 values (4,40);
+
+insert into t2 values (2,10);
+insert into t2 values (2,20);
+insert into t2 values (4,10);
+insert into t2 values (5,10);
+insert into t2 values (3,20);
+insert into t2 values (2,40);
+
+insert into t3 values (10);
+insert into t3 values (30);
+insert into t3 values (10);
+insert into t3 values (20);
+
+explain extended
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+show warnings;
+
+select a from t1
+where a in (select c from t2 where d >= some(select e from t3 where b=e));
+
+drop table t1, t2, t3;
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-04-03 15:14:49 +0000
+++ b/sql/item.cc 2009-07-04 04:01:31 +0000
@@ -2174,7 +2174,8 @@
}
-void Item_field::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_field::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (new_parent == depended_from)
depended_from= NULL;
@@ -3559,16 +3560,17 @@
static void mark_as_dependent(THD *thd, SELECT_LEX *last, SELECT_LEX *current,
Item_ident *resolved_item,
- Item_ident *mark_item)
+ Item_ident *mark_item, table_map dep_map)
{
const char *db_name= (resolved_item->db_name ?
resolved_item->db_name : "");
const char *table_name= (resolved_item->table_name ?
resolved_item->table_name : "");
+ //table_map dep_map = resolved_item->used_tables();
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
- current->mark_as_dependent(last);
+ current->mark_as_dependent(last, dep_map);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
char warn_buff[MYSQL_ERRMSG_SIZE];
@@ -3628,21 +3630,26 @@
Item_subselect *prev_subselect_item=
previous_select->master_unit()->item;
Item_ident *dependent= resolved_item;
+ table_map found_used_tables;
if (found_field == view_ref_found)
{
Item::Type type= found_item->type();
+ found_used_tables= found_item->used_tables();
prev_subselect_item->used_tables_cache|=
- found_item->used_tables();
+ found_used_tables;
dependent= ((type == Item::REF_ITEM || type == Item::FIELD_ITEM) ?
(Item_ident*) found_item :
0);
}
else
+ {
+ found_used_tables= found_field->table->map;
prev_subselect_item->used_tables_cache|=
found_field->table->map;
+ }
prev_subselect_item->const_item_cache= 0;
mark_as_dependent(thd, last_select, current_sel, resolved_item,
- dependent);
+ dependent, found_used_tables);
}
}
@@ -3923,6 +3930,7 @@
SELECT_LEX *current_sel= (SELECT_LEX *) thd->lex->current_select;
Name_resolution_context *outer_context= 0;
SELECT_LEX *select= 0;
+ uint n_levels= 0;
/* Currently derived tables cannot be correlated */
if (current_sel->master_unit()->first_select()->linkage !=
DERIVED_TABLE_TYPE)
@@ -4015,7 +4023,8 @@
context->select_lex, this,
((ref_type == REF_ITEM ||
ref_type == FIELD_ITEM) ?
- (Item_ident*) (*reference) : 0));
+ (Item_ident*) (*reference) : 0),
+ (*from_field)->table->map);
return 0;
}
}
@@ -4030,7 +4039,8 @@
context->select_lex, this,
((ref_type == REF_ITEM || ref_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
A reference to a view field had been found and we
substituted it instead of this Item (find_field_in_tables
@@ -4064,6 +4074,7 @@
*/
prev_subselect_item->used_tables_cache|= OUTER_REF_TABLE_BIT;
prev_subselect_item->const_item_cache= 0;
+ n_levels++;
}
DBUG_ASSERT(ref != 0);
@@ -4131,14 +4142,15 @@
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex, this,
- rf);
+ rf, rf->used_tables());
return 0;
}
else
{
mark_as_dependent(thd, last_checked_context->select_lex,
context->select_lex,
- this, (Item_ident*)*reference);
+ this, (Item_ident*)*reference,
+ (*reference)->used_tables());
if (last_checked_context->select_lex->having_fix_field)
{
Item_ref *rf;
@@ -5840,7 +5852,8 @@
((refer_type == REF_ITEM ||
refer_type == FIELD_ITEM) ?
(Item_ident*) (*reference) :
- 0));
+ 0),
+ (*reference)->used_tables());
/*
view reference found, we substituted it instead of this
Item, so can quit
@@ -5890,7 +5903,8 @@
goto error;
thd->change_item_tree(reference, fld);
mark_as_dependent(thd, last_checked_context->select_lex,
- thd->lex->current_select, this, fld);
+ thd->lex->current_select, this, fld,
+ from_field->table->map);
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -5913,7 +5927,8 @@
/* Should be checked in resolve_ref_in_select_and_group(). */
DBUG_ASSERT(*ref && (*ref)->fixed);
mark_as_dependent(thd, last_checked_context->select_lex,
- context->select_lex, this, this);
+ context->select_lex, this, this,
+ (*ref)->used_tables());
/*
A reference is resolved to a nest level that's outer or the same as
the nest level of the enclosing set function : adjust the value of
@@ -6323,20 +6338,22 @@
return err;
}
-void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_outer_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
if (depended_from == new_parent)
{
*ref= outer_ref;
- outer_ref->fix_after_pullout(new_parent, ref);
+ outer_ref->fix_after_pullout(new_parent, parent_tables, ref);
}
}
-void Item_ref::fix_after_pullout(st_select_lex *new_parent, Item **refptr)
+void Item_ref::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **refptr)
{
if (depended_from == new_parent)
{
- (*ref)->fix_after_pullout(new_parent, ref);
+ (*ref)->fix_after_pullout(new_parent, parent_tables, ref);
depended_from= NULL;
}
}
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-04-03 15:14:49 +0000
+++ b/sql/item.h 2009-07-04 04:01:31 +0000
@@ -557,7 +557,8 @@
Fix after some tables has been pulled out. Basically re-calculate all
attributes that are dependent on the tables.
*/
- virtual void fix_after_pullout(st_select_lex *new_parent, Item **ref) {};
+ virtual void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref) {};
/*
should be used in case where we are sure that we do not need
@@ -1486,7 +1487,8 @@
bool send(Protocol *protocol, String *str_arg);
void reset_field(Field *f);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void make_field(Send_field *tmp_field);
int save_in_field(Field *field,bool no_conversions);
void save_org_in_field(Field *field);
@@ -2278,7 +2280,8 @@
bool send(Protocol *prot, String *tmp);
void make_field(Send_field *field);
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
int save_in_field(Field *field, bool no_conversions);
void save_org_in_field(Field *field);
enum Item_result result_type () const { return (*ref)->result_type(); }
@@ -2448,7 +2451,8 @@
outer_ref->save_org_in_field(result_field);
}
bool fix_fields(THD *, Item **);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const
{
return (*ref)->const_item() ? 0 : OUTER_REF_TABLE_BIT;
=== modified file 'sql/item_cmpfunc.cc'
--- a/sql/item_cmpfunc.cc 2009-04-01 21:36:07 +0000
+++ b/sql/item_cmpfunc.cc 2009-07-04 04:01:31 +0000
@@ -4013,7 +4013,8 @@
}
-void Item_cond::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_cond::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
List_iterator<Item> li(list);
Item *item;
@@ -4027,7 +4028,7 @@
while ((item=li++))
{
table_map tmp_table_map;
- item->fix_after_pullout(new_parent, li.ref());
+ item->fix_after_pullout(new_parent, parent_tables, li.ref());
item= *li.ref();
used_tables_cache|= item->used_tables();
const_item_cache&= item->const_item();
=== modified file 'sql/item_cmpfunc.h'
--- a/sql/item_cmpfunc.h 2009-01-26 16:03:39 +0000
+++ b/sql/item_cmpfunc.h 2009-07-04 04:01:31 +0000
@@ -1475,7 +1475,8 @@
bool add_at_head(Item *item) { return list.push_front(item); }
void add_at_head(List<Item> *nlist) { list.prepand(nlist); }
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
enum Type type() const { return COND_ITEM; }
List<Item>* argument_list() { return &list; }
=== modified file 'sql/item_func.cc'
--- a/sql/item_func.cc 2009-04-13 13:24:28 +0000
+++ b/sql/item_func.cc 2009-07-04 04:01:31 +0000
@@ -206,7 +206,8 @@
}
-void Item_func::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_func::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
Item **arg,**arg_end;
@@ -217,7 +218,7 @@
{
for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++)
{
- (*arg)->fix_after_pullout(new_parent, arg);
+ (*arg)->fix_after_pullout(new_parent, parent_tables, arg);
Item *item= *arg;
used_tables_cache|= item->used_tables();
=== modified file 'sql/item_func.h'
--- a/sql/item_func.h 2009-02-13 16:30:54 +0000
+++ b/sql/item_func.h 2009-07-04 04:01:31 +0000
@@ -117,7 +117,8 @@
// Constructor used for Item_cond_and/or (see Item comment)
Item_func(THD *thd, Item_func *item);
bool fix_fields(THD *, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
table_map used_tables() const;
table_map not_null_tables() const;
void update_used_tables();
=== modified file 'sql/item_row.cc'
--- a/sql/item_row.cc 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.cc 2009-07-04 04:01:31 +0000
@@ -124,13 +124,14 @@
}
}
-void Item_row::fix_after_pullout(st_select_lex *new_parent, Item **ref)
+void Item_row::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
{
used_tables_cache= 0;
const_item_cache= 1;
for (uint i= 0; i < arg_count; i++)
{
- items[i]->fix_after_pullout(new_parent, &items[i]);
+ items[i]->fix_after_pullout(new_parent, parent_tables, &items[i]);
used_tables_cache|= items[i]->used_tables();
const_item_cache&= items[i]->const_item();
}
=== modified file 'sql/item_row.h'
--- a/sql/item_row.h 2008-02-22 11:11:25 +0000
+++ b/sql/item_row.h 2009-07-04 04:01:31 +0000
@@ -59,7 +59,8 @@
return 0;
};
bool fix_fields(THD *thd, Item **ref);
- void fix_after_pullout(st_select_lex *new_parent, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
void cleanup();
void split_sum_func(THD *thd, Item **ref_pointer_array, List<Item> &fields);
table_map used_tables() const { return used_tables_cache; };
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-01-08 19:06:44 +0000
+++ b/sql/item_subselect.cc 2009-07-04 04:01:31 +0000
@@ -39,7 +39,7 @@
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), engine_changed(0), changed(0),
+ const_item_cache(1), inside_fix_fields(0), engine_changed(0), changed(0),
is_correlated(FALSE)
{
with_subselect= 1;
@@ -158,6 +158,13 @@
DBUG_RETURN(RES_OK);
}
+void Item_subselect::set_depth()
+{
+ uint n= 0;
+ for (SELECT_LEX *s= unit->first_select(); s; s= s->outer_select())
+ n++;
+ this->depth= n - 1;
+}
bool Item_subselect::fix_fields(THD *thd_param, Item **ref)
{
@@ -168,9 +175,19 @@
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
+ if (!inside_fix_fields)
+ {
+ set_depth();
+ if (!(ancestor_used_tables= (table_map*)thd->calloc((1+depth) *
+ sizeof(table_map))))
+ return TRUE;
+ furthest_correlated_ancestor= 0;
+ }
+
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
+ inside_fix_fields++;
res= engine->prepare();
// all transformation is done (used by prepared statements)
@@ -203,12 +220,14 @@
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
+ inside_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
+ inside_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -225,11 +244,56 @@
fixed= 1;
err:
+ inside_fix_fields--;
thd->where= save_where;
return res;
}
+/*
+ Adjust attributes after our parent select has been merged into grandparent
+
+ DESCRIPTION
+ Subquery is a composite object which may be correlated, that is, it may
+ have
+ 1. references to tables of the parent select (i.e. one that has the clause
+ with the subquery predicate)
+ 2. references to tables of the grandparent select
+ 3. references to tables of further ancestors.
+
+ Before the pullout, this item indicates:
+ - #1 with table bits in used_tables()
+ - #2 and #3 with OUTER_REF_TABLE_BIT.
+
+ After parent has been merged with grandparent:
+ - references to parent and grandparent tables should be indicated with
+ table bits.
+ - references to greatgrandparent and further ancestors - with
+ OUTER_REF_TABLE_BIT.
+
+ This is exactly what this function does, based on pre-collected info in
+ ancestor_used_tables and furthest_correlated_ancestor.
+*/
+
+void Item_subselect::fix_after_pullout(st_select_lex *new_parent,
+ uint parent_tables, Item **ref)
+{
+ used_tables_cache= (used_tables_cache << parent_tables) |
+ ancestor_used_tables[0];
+ for (uint i=0; i < depth; i++)
+ ancestor_used_tables[i]= ancestor_used_tables[i+1];
+ depth--;
+
+ if (furthest_correlated_ancestor)
+ furthest_correlated_ancestor--;
+ used_tables_cache &= ~OUTER_REF_TABLE_BIT;
+ if (furthest_correlated_ancestor > 1)
+ used_tables_cache |= OUTER_REF_TABLE_BIT;
+ const_item_cache &= test(!(used_tables_cache &
+ ~new_parent->join->const_table_map));
+}
+
+
bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
uchar *argument)
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2008-11-10 18:36:50 +0000
+++ b/sql/item_subselect.h 2009-07-04 04:01:31 +0000
@@ -66,9 +66,39 @@
/* work with 'substitution' */
bool have_to_be_excluded;
/* cache of constant state */
+
bool const_item_cache;
+ int inside_fix_fields;
+public:
+ /*
+ Depth of the subquery predicate.
+ If the subquery predicate is attatched to some clause of the top-level
+ select, depth will be 1
+ If it is attached to a clause in a subquery of the top-level select, depth
+ will be 2 and so forth.
+ */
+ uint depth;
+
+ /*
+ Maximum correlation level of the select
+ - select that has no references to outside will have 0,
+ - select that references tables in the select it is located will have 1,
+ - select that has references to tables of its parent select will have 2,
+ - select that has references to tables of grandparent will have 3
+ and so forth.
+ */
+ uint furthest_correlated_ancestor;
+ /*
+ This is used_tables() for non-direct ancestors. That is,
+ - used_tables() shows which tables of the parent select are referred to
+ from within the subquery,
+ - ancestor_used_tables[0] shows which tables of the grandparent select are
+ referred to from within the subquery,
+ - ancestor_used_tables[1] shows which tables of the great grand parent
+ select... and so forth.
+ */
+ table_map *ancestor_used_tables;
-public:
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -84,6 +114,7 @@
Item_subselect();
virtual subs_type substype() { return UNKNOWN_SUBS; }
+ void set_depth();
/*
We need this method, because some compilers do not allow 'this'
@@ -109,6 +140,8 @@
return null_value;
}
bool fix_fields(THD *thd, Item **ref);
+ void fix_after_pullout(st_select_lex *new_parent, uint parent_tables,
+ Item **ref);
virtual bool exec();
virtual void fix_length_and_dec();
table_map used_tables() const;
=== modified file 'sql/item_sum.cc'
--- a/sql/item_sum.cc 2009-03-11 12:52:04 +0000
+++ b/sql/item_sum.cc 2009-07-04 04:01:31 +0000
@@ -350,7 +350,7 @@
sl= sl->master_unit()->outer_select() )
sl->master_unit()->item->with_sum_func= 1;
}
- thd->lex->current_select->mark_as_dependent(aggr_sel);
+ thd->lex->current_select->mark_as_dependent(aggr_sel, NULL);
return FALSE;
}
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-04-01 09:34:34 +0000
+++ b/sql/sql_lex.cc 2009-07-04 04:01:31 +0000
@@ -1835,8 +1835,9 @@
'last' should be reachable from this st_select_lex_node
*/
-void st_select_lex::mark_as_dependent(st_select_lex *last)
+void st_select_lex::mark_as_dependent(st_select_lex *last, table_map dep_map)
{
+ uint n_levels= master_unit()->item->depth;
/*
Mark all selects from resolved to 1 before select where was
found table as depended (of select where was found table)
@@ -1862,7 +1863,14 @@
}
Item_subselect *subquery_predicate= s->master_unit()->item;
if (subquery_predicate)
+ {
subquery_predicate->is_correlated= TRUE;
+ subquery_predicate->furthest_correlated_ancestor=
+ max(subquery_predicate->furthest_correlated_ancestor, n_levels);
+ if (n_levels > 1)
+ subquery_predicate->ancestor_used_tables[n_levels - 2]= dep_map;
+ }
+ n_levels--;
}
}
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2009-03-19 16:42:23 +0000
+++ b/sql/sql_lex.h 2009-07-04 04:01:31 +0000
@@ -754,7 +754,7 @@
return master_unit()->return_after_parsing();
}
- void mark_as_dependent(st_select_lex *last);
+ void mark_as_dependent(st_select_lex *last, table_map dep_map);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-04-13 09:51:24 +0000
+++ b/sql/sql_select.cc 2009-07-04 04:01:31 +0000
@@ -3119,16 +3119,23 @@
}
-void fix_list_after_tbl_changes(SELECT_LEX *new_parent, List<TABLE_LIST> *tlist)
+void fix_list_after_tbl_changes(SELECT_LEX *new_parent, uint parent_tables,
+ List<TABLE_LIST> *tlist)
{
List_iterator<TABLE_LIST> it(*tlist);
TABLE_LIST *table;
while ((table= it++))
{
if (table->on_expr)
- table->on_expr->fix_after_pullout(new_parent, &table->on_expr);
+ {
+ table->on_expr->fix_after_pullout(new_parent, parent_tables,
+ &table->on_expr);
+ }
if (table->nested_join)
- fix_list_after_tbl_changes(new_parent, &table->nested_join->join_list);
+ {
+ fix_list_after_tbl_changes(new_parent, parent_tables,
+ &table->nested_join->join_list);
+ }
}
}
@@ -3331,6 +3338,7 @@
/*TODO: also reset the 'with_subselect' there. */
/* n. Adjust the parent_join->tables counter */
+ uint parent_tables= parent_join->tables;
uint table_no= parent_join->tables;
/* n. Walk through child's tables and adjust table->map */
for (tl= subq_lex->leaf_tables; tl; tl= tl->next_leaf, table_no++)
@@ -3407,8 +3415,10 @@
Walk through sj nest's WHERE and ON expressions and call
item->fix_table_changes() for all items.
*/
- sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr);
- fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
+ sj_nest->sj_on_expr->fix_after_pullout(parent_lex, parent_join->tables,
+ &sj_nest->sj_on_expr);
+ fix_list_after_tbl_changes(parent_lex, parent_join->tables,
+ &sj_nest->nested_join->join_list);
/* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
1
0
[Maria-developers] Rev 2814: Better comments in file:///home/psergey/dev/mysql-next/
by Sergey Petrunya 04 Jul '09
by Sergey Petrunya 04 Jul '09
04 Jul '09
At file:///home/psergey/dev/mysql-next/
------------------------------------------------------------
revno: 2814
revision-id: psergey(a)askmonty.org-20090704004450-4pqbx9pm50bzky0l
parent: alik(a)sun.com-20090702085822-8svd0aslr7qnddbb
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: mysql-next
timestamp: Sat 2009-07-04 04:44:50 +0400
message:
Better comments
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-30 08:03:05 +0000
+++ b/sql/sql_select.cc 2009-07-04 00:44:50 +0000
@@ -3407,8 +3407,8 @@
sj_nest->sj_on_expr->fix_fields(parent_join->thd, &sj_nest->sj_on_expr);
/*
- Walk through sj nest's WHERE and ON expressions and call
- item->fix_table_changes() for all items.
+ Fix attributes (mainly item->table_map()) for sj-nest's WHERE and ON
+ expressions.
*/
sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr);
fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2715: Added MY_CS_NONASCII marker for character sets that are not compatible with latin1 for characters...
by noreply@launchpad.net 02 Jul '09
by noreply@launchpad.net 02 Jul '09
02 Jul '09
------------------------------------------------------------
revno: 2715
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: mysql-maria
timestamp: Thu 2009-07-02 13:15:33 +0300
message:
Added MY_CS_NONASCII marker for character sets that are not compatible with latin1 for characters 0x00-0x7f
This allows us to skip and speed up some very common character converts that MySQL is doing when sending data to the client
and this gives us a nice speed increase for most queries that uses only characters in the range 0x00-0x7f.
This code is based on Alexander Barkov's code that he has done in MySQL 6.0
modified:
include/m_ctype.h
libmysqld/lib_sql.cc
mysys/charset.c
scripts/mysql_install_db.sh
sql/protocol.cc
sql/protocol.h
sql/sql_string.cc
strings/conf_to_src.c
strings/ctype-extra.c
strings/ctype-sjis.c
strings/ctype-uca.c
strings/ctype-ucs2.c
strings/ctype-utf8.c
strings/ctype.c
=== modified file 'include/m_ctype.h'
--- include/m_ctype.h 2008-12-23 14:21:01 +0000
+++ include/m_ctype.h 2009-07-02 10:15:33 +0000
@@ -87,6 +87,7 @@
#define MY_CS_CSSORT 1024 /* if case sensitive sort order */
#define MY_CS_HIDDEN 2048 /* don't display in SHOW */
#define MY_CS_PUREASCII 4096 /* if a charset is pure ascii */
+#define MY_CS_NONASCII 8192 /* if not ASCII-compatible */
#define MY_CHARSET_UNDEFINED 0
/* Character repertoire flags */
@@ -517,6 +518,7 @@
#define my_strcasecmp(s, a, b) ((s)->coll->strcasecmp((s), (a), (b)))
#define my_charpos(cs, b, e, num) (cs)->cset->charpos((cs), (const char*) (b), (const char *)(e), (num))
+my_bool my_charset_is_ascii_compatible(CHARSET_INFO *cs);
#define use_mb(s) ((s)->cset->ismbchar != NULL)
#define my_ismbchar(s, a, b) ((s)->cset->ismbchar((s), (a), (b)))
=== modified file 'libmysqld/lib_sql.cc'
--- libmysqld/lib_sql.cc 2009-02-24 11:29:49 +0000
+++ libmysqld/lib_sql.cc 2009-07-02 10:15:33 +0000
@@ -1124,6 +1124,7 @@
return false;
}
+
bool Protocol::net_store_data(const uchar *from, size_t length)
{
char *field_buf;
@@ -1143,6 +1144,30 @@
return FALSE;
}
+
+bool Protocol::net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *from_cs, CHARSET_INFO *to_cs)
+{
+ uint conv_length= to_cs->mbmaxlen * length / from_cs->mbminlen;
+ uint dummy_error;
+ char *field_buf;
+ if (!thd->mysql) // bootstrap file handling
+ return false;
+
+ if (!(field_buf= (char*) alloc_root(alloc, conv_length + sizeof(uint) + 1)))
+ return true;
+ *next_field= field_buf + sizeof(uint);
+ length= copy_and_convert(*next_field, conv_length, to_cs,
+ (const char*) from, length, from_cs, &dummy_error);
+ *(uint *) field_buf= length;
+ (*next_field)[length]= 0;
+ if (next_mysql_field->max_length < length)
+ next_mysql_field->max_length= length;
+ ++next_field;
+ ++next_mysql_field;
+ return false;
+}
+
#if defined(_MSC_VER) && _MSC_VER < 1400
#define vsnprintf _vsnprintf
#endif
=== modified file 'mysys/charset.c'
--- mysys/charset.c 2009-02-13 16:41:47 +0000
+++ mysys/charset.c 2009-07-02 10:15:33 +0000
@@ -248,6 +248,7 @@
{
#if defined(HAVE_CHARSET_ucs2) && defined(HAVE_UCA_COLLATIONS)
copy_uca_collation(newcs, &my_charset_ucs2_unicode_ci);
+ newcs->state|= MY_CS_AVAILABLE | MY_CS_LOADED | MY_CS_NONASCII;
#endif
}
else if (!strcmp(cs->csname, "utf8"))
@@ -280,6 +281,8 @@
if (my_charset_is_8bit_pure_ascii(all_charsets[cs->number]))
all_charsets[cs->number]->state|= MY_CS_PUREASCII;
+ if (!my_charset_is_ascii_compatible(cs))
+ all_charsets[cs->number]->state|= MY_CS_NONASCII;
}
}
else
=== modified file 'scripts/mysql_install_db.sh'
--- scripts/mysql_install_db.sh 2009-01-06 15:08:15 +0000
+++ scripts/mysql_install_db.sh 2009-07-02 10:15:33 +0000
@@ -1,5 +1,5 @@
#!/bin/sh
-# Copyright (C) 2002-2003 MySQL AB
+# Copyright (C) 2002-2003 MySQL AB & Monty Program Ab
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -14,7 +14,7 @@
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-# This scripts creates the MySQL Server system tables
+# This scripts creates the MariaDB Server system tables
#
# All unrecognized arguments to this script are passed to mysqld.
@@ -38,26 +38,27 @@
{
cat <<EOF
Usage: $0 [OPTIONS]
- --basedir=path The path to the MySQL installation directory.
+ --basedir=path The path to the MariaDB installation directory.
--builddir=path If using --srcdir with out-of-directory builds, you
will need to set this to the location of the build
directory where built files reside.
- --cross-bootstrap For internal use. Used when building the MySQL system
+ --cross-bootstrap For internal use. Used when building the MariaDB system
tables on a different host than the target.
- --datadir=path The path to the MySQL data directory.
+ --datadir=path The path to the MariaDB data directory.
--force Causes mysql_install_db to run even if DNS does not
work. In that case, grant table entries that normally
use hostnames will use IP addresses.
- --ldata=path The path to the MySQL data directory. Same as --datadir.
+ --ldata=path The path to the MariaDB data directory. Same as
+ --datadir.
--rpm For internal use. This option is used by RPM files
- during the MySQL installation process.
+ during the MariaDB installation process.
--skip-name-resolve Use IP addresses rather than hostnames when creating
grant table entries. This option can be useful if
your DNS does not work.
- --srcdir=path The path to the MySQL source directory. This option
+ --srcdir=path The path to the MariaDB source directory. This option
uses the compiled binaries and support files within the
source tree, useful for if you don't want to install
- MySQL yet and just want to create the system tables.
+ MariaDB yet and just want to create the system tables.
--user=user_name The login username to use for running mysqld. Files
and directories created by mysqld will be owned by this
user. You must be root to use this option. By default
@@ -116,7 +117,7 @@
defaults="$arg" ;;
--cross-bootstrap|--windows)
- # Used when building the MySQL system tables on a different host than
+ # Used when building the MariaDB system tables on a different host than
# the target. The platform-independent files that are created in
# --datadir on the host can be copied to the target system.
#
@@ -338,10 +339,10 @@
fi
echo "WARNING: The host '$hostname' could not be looked up with resolveip."
echo "This probably means that your libc libraries are not 100 % compatible"
- echo "with this binary MySQL version. The MySQL daemon, mysqld, should work"
+ echo "with this binary MariaDB version. The MariaDB daemon, mysqld, should work"
echo "normally with the exception that host name resolving will not work."
echo "This means that you should use IP addresses instead of hostnames"
- echo "when specifying MySQL privileges !"
+ echo "when specifying MariaDB privileges !"
fi
fi
@@ -388,7 +389,7 @@
--net_buffer_length=16K"
# Create the system and help tables by passing them to "mysqld --bootstrap"
-s_echo "Installing MySQL system tables..."
+s_echo "Installing MariaDB/MySQL system tables..."
if { echo "use mysql;"; cat $create_system_tables $fill_system_tables; } | eval "$filter_cmd_line" | $mysqld_install_cmd_line > /dev/null
then
s_echo "OK"
@@ -410,14 +411,16 @@
echo "Try 'mysqld --help' if you have problems with paths. Using --log"
echo "gives you a log in $ldata that may be helpful."
echo
- echo "The latest information about MySQL is available on the web at"
- echo "http://www.mysql.com/. Please consult the MySQL manual section"
+ echo "The latest information about MariaDB is available on the web at"
+ echo "http://askmonty.org/wiki/index.php/MariaDB".
+ echo "If you have a problem, you can consult the MySQL manual section"
echo "'Problems running mysql_install_db', and the manual section that"
- echo "describes problems on your OS. Another information source are the"
- echo "MySQL email archives available at http://lists.mysql.com/."
+ echo "describes problems on your OS at http://dev.mysql.com/doc/"
+ echo "MariaDB is hosted on launchpad; You can find the latest source and"
+ echo "email lists at http://launchpad.net/maria"
echo
echo "Please check all of the above before mailing us! And remember, if"
- echo "you do mail us, you MUST use the $scriptdir/mysqlbug script!"
+ echo "you do mail us, you should use the $scriptdir/mysqlbug script!"
echo
exit 1
fi
@@ -442,7 +445,7 @@
s_echo "support-files/mysql.server to the right place for your system"
echo
- echo "PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !"
+ echo "PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !"
echo "To do so, start the server, then issue the following commands:"
echo
echo "$bindir/mysqladmin -u root password 'new-password'"
@@ -455,23 +458,28 @@
echo "databases and anonymous user created by default. This is"
echo "strongly recommended for production servers."
echo
- echo "See the manual for more instructions."
+ echo "See the MySQL manual for more instructions."
if test "$in_rpm" -eq 0
then
echo
- echo "You can start the MySQL daemon with:"
+ echo "You can start the MariaDB daemon with:"
echo "cd $basedir ; $bindir/mysqld_safe &"
echo
- echo "You can test the MySQL daemon with mysql-test-run.pl"
+ echo "You can test the MariaDB daemon with mysql-test-run.pl"
echo "cd $basedir/mysql-test ; perl mysql-test-run.pl"
fi
echo
echo "Please report any problems with the $scriptdir/mysqlbug script!"
echo
- echo "The latest information about MySQL is available at http://www.mysql.com/"
- echo "Support MySQL by buying support/licenses from http://shop.mysql.com/"
+ echo "The latest information about MariaDB is available at http://www.askmonty.org/."
+ echo "You can find additional information about the MySQL part at:"
+ echo "http://dev.mysql.com"
+ echo "Support MariaDB development by buying support/new features from"
+ echo "Monty Program Ab. You can contact us about this at sales(a)askmonty.org".
+ echo "Alternatively consider joining our community based development effort:"
+ echo "http://askmonty.org/wiki/index.php/MariaDB#How_can_I_participate_in_the_dev…"
echo
fi
=== modified file 'sql/protocol.cc'
--- sql/protocol.cc 2009-04-25 10:05:32 +0000
+++ sql/protocol.cc 2009-07-02 10:15:33 +0000
@@ -58,6 +58,65 @@
}
+/*
+ net_store_data() - extended version with character set conversion.
+
+ It is optimized for short strings whose length after
+ conversion is garanteed to be less than 251, which accupies
+ exactly one byte to store length. It allows not to use
+ the "convert" member as a temporary buffer, conversion
+ is done directly to the "packet" member.
+ The limit 251 is good enough to optimize send_fields()
+ because column, table, database names fit into this limit.
+*/
+
+#ifndef EMBEDDED_LIBRARY
+bool Protocol::net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *from_cs, CHARSET_INFO *to_cs)
+{
+ uint dummy_errors;
+ /* Calculate maxumum possible result length */
+ size_t conv_length= to_cs->mbmaxlen * length / from_cs->mbminlen;
+ ulong packet_length, new_length;
+ char *length_pos, *to;
+
+ if (conv_length > 250)
+ {
+ /*
+ For strings with conv_length greater than 250 bytes
+ we don't know how many bytes we will need to store length: one or two,
+ because we don't know result length until conversion is done.
+ For example, when converting from utf8 (mbmaxlen=3) to latin1,
+ conv_length=300 means that the result length can vary between 100 to 300.
+ length=100 needs one byte, length=300 needs to bytes.
+
+ Thus conversion directly to "packet" is not worthy.
+ Let's use "convert" as a temporary buffer.
+ */
+ return (convert->copy((const char*) from, length, from_cs, to_cs,
+ &dummy_errors) ||
+ net_store_data((const uchar*) convert->ptr(), convert->length()));
+ }
+
+ packet_length= packet->length();
+ new_length= packet_length + conv_length + 1;
+
+ if (new_length > packet->alloced_length() && packet->realloc(new_length))
+ return 1;
+
+ length_pos= (char*) packet->ptr() + packet_length;
+ to= length_pos + 1;
+
+ to+= copy_and_convert(to, conv_length, to_cs,
+ (const char*) from, length, from_cs, &dummy_errors);
+
+ net_store_length((uchar*) length_pos, to - length_pos - 1);
+ packet->length((uint) (to - packet->ptr()));
+ return 0;
+}
+#endif
+
+
/**
Send a error string to client.
@@ -773,10 +832,10 @@
fromcs != &my_charset_bin &&
tocs != &my_charset_bin)
{
- uint dummy_errors;
- return (convert->copy(from, length, fromcs, tocs, &dummy_errors) ||
- net_store_data((uchar*) convert->ptr(), convert->length()));
+ /* Store with conversion */
+ return net_store_data((uchar*) from, length, fromcs, tocs);
}
+ /* Store without conversion */
return net_store_data((uchar*) from, length);
}
@@ -802,7 +861,7 @@
{
CHARSET_INFO *tocs= this->thd->variables.character_set_results;
#ifndef DBUG_OFF
- DBUG_PRINT("info", ("Protocol_text::store field %u (%u): %*s", field_pos,
+ DBUG_PRINT("info", ("Protocol_text::store field %u (%u): %.*s", field_pos,
field_count, (int) length, from));
DBUG_ASSERT(field_pos < field_count);
DBUG_ASSERT(field_types == 0 ||
=== modified file 'sql/protocol.h'
--- sql/protocol.h 2007-12-20 21:11:37 +0000
+++ sql/protocol.h 2009-07-02 10:15:33 +0000
@@ -42,6 +42,8 @@
MYSQL_FIELD *next_mysql_field;
MEM_ROOT *alloc;
#endif
+ bool net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *fromcs, CHARSET_INFO *tocs);
bool store_string_aux(const char *from, size_t length,
CHARSET_INFO *fromcs, CHARSET_INFO *tocs);
public:
=== modified file 'sql/sql_string.cc'
--- sql/sql_string.cc 2009-04-25 10:05:32 +0000
+++ sql/sql_string.cc 2009-07-02 10:15:33 +0000
@@ -782,10 +782,11 @@
*/
-uint32
-copy_and_convert(char *to, uint32 to_length, CHARSET_INFO *to_cs,
- const char *from, uint32 from_length, CHARSET_INFO *from_cs,
- uint *errors)
+static uint32
+copy_and_convert_extended(char *to, uint32 to_length, CHARSET_INFO *to_cs,
+ const char *from, uint32 from_length,
+ CHARSET_INFO *from_cs,
+ uint *errors)
{
int cnvres;
my_wc_t wc;
@@ -900,6 +901,65 @@
}
/*
+ Optimized for quick copying of ASCII characters in the range 0x00..0x7F.
+*/
+uint32
+copy_and_convert(char *to, uint32 to_length, CHARSET_INFO *to_cs,
+ const char *from, uint32 from_length, CHARSET_INFO *from_cs,
+ uint *errors)
+{
+ /*
+ If any of the character sets is not ASCII compatible,
+ immediately switch to slow mb_wc->wc_mb method.
+ */
+ if ((to_cs->state | from_cs->state) & MY_CS_NONASCII)
+ return copy_and_convert_extended(to, to_length, to_cs,
+ from, from_length, from_cs, errors);
+
+ uint32 length= min(to_length, from_length), length2= length;
+
+#if defined(__i386__)
+ /*
+ Special loop for i386, it allows to refer to a
+ non-aligned memory block as UINT32, which makes
+ it possible to copy four bytes at once. This
+ gives about 10% performance improvement comparing
+ to byte-by-byte loop.
+ */
+ for ( ; length >= 4; length-= 4, from+= 4, to+= 4)
+ {
+ if ((*(uint32*)from) & 0x80808080)
+ break;
+ *((uint32*) to)= *((const uint32*) from);
+ }
+#endif
+
+ for (; ; *to++= *from++, length--)
+ {
+ if (!length)
+ {
+ *errors= 0;
+ return length2;
+ }
+ if (*((unsigned char*) from) > 0x7F) /* A non-ASCII character */
+ {
+ uint32 copied_length= length2 - length;
+ to_length-= copied_length;
+ from_length-= copied_length;
+ return copied_length + copy_and_convert_extended(to, to_length,
+ to_cs,
+ from, from_length,
+ from_cs,
+ errors);
+ }
+ }
+
+ DBUG_ASSERT(FALSE); // Should never get to here
+ return 0; // Make compiler happy
+}
+
+
+/*
copy a string,
with optional character set conversion,
with optional left padding (for binary -> UCS2 conversion)
=== modified file 'strings/conf_to_src.c'
--- strings/conf_to_src.c 2008-11-14 16:29:38 +0000
+++ strings/conf_to_src.c 2009-07-02 10:15:33 +0000
@@ -184,11 +184,12 @@
{
fprintf(f,"{\n");
fprintf(f," %d,%d,%d,\n",cs->number,0,0);
- fprintf(f," MY_CS_COMPILED%s%s%s%s,\n",
+ fprintf(f," MY_CS_COMPILED%s%s%s%s%s,\n",
cs->state & MY_CS_BINSORT ? "|MY_CS_BINSORT" : "",
cs->state & MY_CS_PRIMARY ? "|MY_CS_PRIMARY" : "",
is_case_sensitive(cs) ? "|MY_CS_CSSORT" : "",
- my_charset_is_8bit_pure_ascii(cs) ? "|MY_CS_PUREASCII" : "");
+ my_charset_is_8bit_pure_ascii(cs) ? "|MY_CS_PUREASCII" : "",
+ !my_charset_is_ascii_compatible(cs) ? "|MY_CS_NONASCII": "");
if (cs->name)
{
=== modified file 'strings/ctype-extra.c'
--- strings/ctype-extra.c 2007-08-20 11:47:31 +0000
+++ strings/ctype-extra.c 2009-07-02 10:15:33 +0000
@@ -6804,7 +6804,7 @@
#ifdef HAVE_CHARSET_swe7
{
10,0,0,
- MY_CS_COMPILED|MY_CS_PRIMARY,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_NONASCII,
"swe7", /* cset name */
"swe7_swedish_ci", /* coll name */
"", /* comment */
@@ -8454,7 +8454,7 @@
#ifdef HAVE_CHARSET_swe7
{
82,0,0,
- MY_CS_COMPILED|MY_CS_BINSORT,
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_NONASCII,
"swe7", /* cset name */
"swe7_bin", /* coll name */
"", /* comment */
@@ -8550,72 +8550,6 @@
}
,
#endif
-#ifdef HAVE_CHARSET_geostd8
-{
- 92,0,0,
- MY_CS_COMPILED|MY_CS_PRIMARY,
- "geostd8", /* cset name */
- "geostd8_general_ci", /* coll name */
- "", /* comment */
- NULL, /* tailoring */
- ctype_geostd8_general_ci, /* ctype */
- to_lower_geostd8_general_ci, /* lower */
- to_upper_geostd8_general_ci, /* upper */
- sort_order_geostd8_general_ci, /* sort_order */
- NULL, /* contractions */
- NULL, /* sort_order_big*/
- to_uni_geostd8_general_ci, /* to_uni */
- NULL, /* from_uni */
- my_unicase_default, /* caseinfo */
- NULL, /* state map */
- NULL, /* ident map */
- 1, /* strxfrm_multiply*/
- 1, /* caseup_multiply*/
- 1, /* casedn_multiply*/
- 1, /* mbminlen */
- 1, /* mbmaxlen */
- 0, /* min_sort_char */
- 255, /* max_sort_char */
- ' ', /* pad_char */
- 0, /* escape_with_backslash_is_dangerous */
- &my_charset_8bit_handler,
- &my_collation_8bit_simple_ci_handler,
-}
-,
-#endif
-#ifdef HAVE_CHARSET_geostd8
-{
- 93,0,0,
- MY_CS_COMPILED|MY_CS_BINSORT,
- "geostd8", /* cset name */
- "geostd8_bin", /* coll name */
- "", /* comment */
- NULL, /* tailoring */
- ctype_geostd8_bin, /* ctype */
- to_lower_geostd8_bin, /* lower */
- to_upper_geostd8_bin, /* upper */
- NULL, /* sort_order */
- NULL, /* contractions */
- NULL, /* sort_order_big*/
- to_uni_geostd8_bin, /* to_uni */
- NULL, /* from_uni */
- my_unicase_default, /* caseinfo */
- NULL, /* state map */
- NULL, /* ident map */
- 1, /* strxfrm_multiply*/
- 1, /* caseup_multiply*/
- 1, /* casedn_multiply*/
- 1, /* mbminlen */
- 1, /* mbmaxlen */
- 0, /* min_sort_char */
- 255, /* max_sort_char */
- ' ', /* pad_char */
- 0, /* escape_with_backslash_is_dangerous */
- &my_charset_8bit_handler,
- &my_collation_8bit_bin_handler,
-}
-,
-#endif
#ifdef HAVE_CHARSET_latin1
{
94,0,0,
=== modified file 'strings/ctype-sjis.c'
--- strings/ctype-sjis.c 2007-10-04 07:10:15 +0000
+++ strings/ctype-sjis.c 2009-07-02 10:15:33 +0000
@@ -4672,7 +4672,7 @@
CHARSET_INFO my_charset_sjis_japanese_ci=
{
13,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM, /* state */
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_NONASCII, /* state */
"sjis", /* cs name */
"sjis_japanese_ci", /* name */
"", /* comment */
@@ -4704,7 +4704,7 @@
CHARSET_INFO my_charset_sjis_bin=
{
88,0,0, /* number */
- MY_CS_COMPILED|MY_CS_BINSORT, /* state */
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_NONASCII, /* state */
"sjis", /* cs name */
"sjis_bin", /* name */
"", /* comment */
=== modified file 'strings/ctype-uca.c'
--- strings/ctype-uca.c 2007-07-03 09:06:57 +0000
+++ strings/ctype-uca.c 2009-07-02 10:15:33 +0000
@@ -8086,7 +8086,7 @@
CHARSET_INFO my_charset_ucs2_unicode_ci=
{
128,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_unicode_ci", /* name */
"", /* comment */
@@ -8118,7 +8118,7 @@
CHARSET_INFO my_charset_ucs2_icelandic_uca_ci=
{
129,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_icelandic_ci",/* name */
"", /* comment */
@@ -8150,7 +8150,7 @@
CHARSET_INFO my_charset_ucs2_latvian_uca_ci=
{
130,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_latvian_ci", /* name */
"", /* comment */
@@ -8182,7 +8182,7 @@
CHARSET_INFO my_charset_ucs2_romanian_uca_ci=
{
131,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_romanian_ci", /* name */
"", /* comment */
@@ -8214,7 +8214,7 @@
CHARSET_INFO my_charset_ucs2_slovenian_uca_ci=
{
132,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_slovenian_ci",/* name */
"", /* comment */
@@ -8246,7 +8246,7 @@
CHARSET_INFO my_charset_ucs2_polish_uca_ci=
{
133,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_polish_ci", /* name */
"", /* comment */
@@ -8278,7 +8278,7 @@
CHARSET_INFO my_charset_ucs2_estonian_uca_ci=
{
134,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_estonian_ci", /* name */
"", /* comment */
@@ -8310,7 +8310,7 @@
CHARSET_INFO my_charset_ucs2_spanish_uca_ci=
{
135,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_spanish_ci", /* name */
"", /* comment */
@@ -8342,7 +8342,7 @@
CHARSET_INFO my_charset_ucs2_swedish_uca_ci=
{
136,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_swedish_ci", /* name */
"", /* comment */
@@ -8374,7 +8374,7 @@
CHARSET_INFO my_charset_ucs2_turkish_uca_ci=
{
137,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_turkish_ci", /* name */
"", /* comment */
@@ -8406,7 +8406,7 @@
CHARSET_INFO my_charset_ucs2_czech_uca_ci=
{
138,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_czech_ci", /* name */
"", /* comment */
@@ -8439,7 +8439,7 @@
CHARSET_INFO my_charset_ucs2_danish_uca_ci=
{
139,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_danish_ci", /* name */
"", /* comment */
@@ -8471,7 +8471,7 @@
CHARSET_INFO my_charset_ucs2_lithuanian_uca_ci=
{
140,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_lithuanian_ci",/* name */
"", /* comment */
@@ -8503,7 +8503,7 @@
CHARSET_INFO my_charset_ucs2_slovak_uca_ci=
{
141,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_slovak_ci", /* name */
"", /* comment */
@@ -8535,7 +8535,7 @@
CHARSET_INFO my_charset_ucs2_spanish2_uca_ci=
{
142,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_spanish2_ci", /* name */
"", /* comment */
@@ -8568,7 +8568,7 @@
CHARSET_INFO my_charset_ucs2_roman_uca_ci=
{
143,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_roman_ci", /* name */
"", /* comment */
@@ -8601,7 +8601,7 @@
CHARSET_INFO my_charset_ucs2_persian_uca_ci=
{
144,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_persian_ci", /* name */
"", /* comment */
@@ -8634,7 +8634,7 @@
CHARSET_INFO my_charset_ucs2_esperanto_uca_ci=
{
145,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_esperanto_ci",/* name */
"", /* comment */
@@ -8667,7 +8667,7 @@
CHARSET_INFO my_charset_ucs2_hungarian_uca_ci=
{
146,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_hungarian_ci",/* name */
"", /* comment */
=== modified file 'strings/ctype-ucs2.c'
--- strings/ctype-ucs2.c 2009-02-13 16:41:47 +0000
+++ strings/ctype-ucs2.c 2009-07-02 10:15:33 +0000
@@ -1717,7 +1717,7 @@
CHARSET_INFO my_charset_ucs2_general_ci=
{
35,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_general_ci", /* name */
"", /* comment */
@@ -1749,7 +1749,7 @@
CHARSET_INFO my_charset_ucs2_bin=
{
90,0,0, /* number */
- MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_bin", /* name */
"", /* comment */
=== modified file 'strings/ctype-utf8.c'
--- strings/ctype-utf8.c 2008-02-11 12:28:33 +0000
+++ strings/ctype-utf8.c 2009-07-02 10:15:33 +0000
@@ -4204,7 +4204,7 @@
CHARSET_INFO my_charset_filename=
{
17,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_HIDDEN,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_HIDDEN|MY_CS_NONASCII,
"filename", /* cs name */
"filename", /* name */
"", /* comment */
=== modified file 'strings/ctype.c'
--- strings/ctype.c 2009-04-25 10:05:32 +0000
+++ strings/ctype.c 2009-07-02 10:15:33 +0000
@@ -405,3 +405,23 @@
}
return 1;
}
+
+
+/*
+ Shared function between conf_to_src and mysys.
+ Check if a 8bit character set is compatible with
+ ascii on the range 0x00..0x7F.
+*/
+my_bool
+my_charset_is_ascii_compatible(CHARSET_INFO *cs)
+{
+ uint i;
+ if (!cs->tab_to_uni)
+ return 1;
+ for (i= 0; i < 128; i++)
+ {
+ if (cs->tab_to_uni[i] != i)
+ return 0;
+ }
+ return 1;
+}
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2715)
by Michael Widenius 02 Jul '09
by Michael Widenius 02 Jul '09
02 Jul '09
#At lp:maria based on revid:monty@askmonty.org-20090630120129-6gan4k9dyjxj83e4
2715 Michael Widenius 2009-07-02
Added MY_CS_NONASCII marker for character sets that are not compatible with latin1 for characters 0x00-0x7f
This allows us to skip and speed up some very common character converts that MySQL is doing when sending data to the client
and this gives us a nice speed increase for most queries that uses only characters in the range 0x00-0x7f.
This code is based on Alexander Barkov's code that he has done in MySQL 6.0
modified:
include/m_ctype.h
libmysqld/lib_sql.cc
mysys/charset.c
scripts/mysql_install_db.sh
sql/protocol.cc
sql/protocol.h
sql/sql_string.cc
strings/conf_to_src.c
strings/ctype-extra.c
strings/ctype-sjis.c
strings/ctype-uca.c
strings/ctype-ucs2.c
strings/ctype-utf8.c
strings/ctype.c
per-file messages:
include/m_ctype.h
Added MY_CS_NONASCII marker
libmysqld/lib_sql.cc
Added function net_store_data(...) that takes to and from CHARSET_INFO * as arguments
mysys/charset.c
Mark character sets with MY_CS_NONASCII
scripts/mysql_install_db.sh
Fixed messages to refer to MariaDB instead of MySQL
sql/protocol.cc
Added function net_store_data(...) that takes to and from CHARSET_INFO * as arguments
sql/protocol.h
Added function net_store_data(...) that takes to and from CHARSET_INFO * as arguments
sql/sql_string.cc
Quicker copy of strings with no characters above 0x7f
strings/conf_to_src.c
Added printing of MY_CS_NONASCII
strings/ctype-extra.c
Mark incompatible character sets with MY_CS_NONASCII
Removed duplicated character set geostd
strings/ctype-sjis.c
Mark incompatible character sets with MY_CS_NONASCII
strings/ctype-uca.c
Mark incompatible character sets with MY_CS_NONASCII
strings/ctype-ucs2.c
Mark incompatible character sets with MY_CS_NONASCII
strings/ctype-utf8.c
Mark incompatible character sets with MY_CS_NONASCII
strings/ctype.c
Added function to check if character set is compatible with latin1 in ranges 0x00-0x7f
=== modified file 'include/m_ctype.h'
--- a/include/m_ctype.h 2008-12-23 14:21:01 +0000
+++ b/include/m_ctype.h 2009-07-02 10:15:33 +0000
@@ -87,6 +87,7 @@ extern MY_UNI_CTYPE my_uni_ctype[256];
#define MY_CS_CSSORT 1024 /* if case sensitive sort order */
#define MY_CS_HIDDEN 2048 /* don't display in SHOW */
#define MY_CS_PUREASCII 4096 /* if a charset is pure ascii */
+#define MY_CS_NONASCII 8192 /* if not ASCII-compatible */
#define MY_CHARSET_UNDEFINED 0
/* Character repertoire flags */
@@ -517,6 +518,7 @@ uint my_charset_repertoire(CHARSET_INFO
#define my_strcasecmp(s, a, b) ((s)->coll->strcasecmp((s), (a), (b)))
#define my_charpos(cs, b, e, num) (cs)->cset->charpos((cs), (const char*) (b), (const char *)(e), (num))
+my_bool my_charset_is_ascii_compatible(CHARSET_INFO *cs);
#define use_mb(s) ((s)->cset->ismbchar != NULL)
#define my_ismbchar(s, a, b) ((s)->cset->ismbchar((s), (a), (b)))
=== modified file 'libmysqld/lib_sql.cc'
--- a/libmysqld/lib_sql.cc 2009-02-24 11:29:49 +0000
+++ b/libmysqld/lib_sql.cc 2009-07-02 10:15:33 +0000
@@ -1124,6 +1124,7 @@ bool Protocol_text::store_null()
return false;
}
+
bool Protocol::net_store_data(const uchar *from, size_t length)
{
char *field_buf;
@@ -1143,6 +1144,30 @@ bool Protocol::net_store_data(const ucha
return FALSE;
}
+
+bool Protocol::net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *from_cs, CHARSET_INFO *to_cs)
+{
+ uint conv_length= to_cs->mbmaxlen * length / from_cs->mbminlen;
+ uint dummy_error;
+ char *field_buf;
+ if (!thd->mysql) // bootstrap file handling
+ return false;
+
+ if (!(field_buf= (char*) alloc_root(alloc, conv_length + sizeof(uint) + 1)))
+ return true;
+ *next_field= field_buf + sizeof(uint);
+ length= copy_and_convert(*next_field, conv_length, to_cs,
+ (const char*) from, length, from_cs, &dummy_error);
+ *(uint *) field_buf= length;
+ (*next_field)[length]= 0;
+ if (next_mysql_field->max_length < length)
+ next_mysql_field->max_length= length;
+ ++next_field;
+ ++next_mysql_field;
+ return false;
+}
+
#if defined(_MSC_VER) && _MSC_VER < 1400
#define vsnprintf _vsnprintf
#endif
=== modified file 'mysys/charset.c'
--- a/mysys/charset.c 2009-02-13 16:41:47 +0000
+++ b/mysys/charset.c 2009-07-02 10:15:33 +0000
@@ -248,6 +248,7 @@ static int add_collation(CHARSET_INFO *c
{
#if defined(HAVE_CHARSET_ucs2) && defined(HAVE_UCA_COLLATIONS)
copy_uca_collation(newcs, &my_charset_ucs2_unicode_ci);
+ newcs->state|= MY_CS_AVAILABLE | MY_CS_LOADED | MY_CS_NONASCII;
#endif
}
else if (!strcmp(cs->csname, "utf8"))
@@ -280,6 +281,8 @@ static int add_collation(CHARSET_INFO *c
if (my_charset_is_8bit_pure_ascii(all_charsets[cs->number]))
all_charsets[cs->number]->state|= MY_CS_PUREASCII;
+ if (!my_charset_is_ascii_compatible(cs))
+ all_charsets[cs->number]->state|= MY_CS_NONASCII;
}
}
else
=== modified file 'scripts/mysql_install_db.sh'
--- a/scripts/mysql_install_db.sh 2009-01-06 15:08:15 +0000
+++ b/scripts/mysql_install_db.sh 2009-07-02 10:15:33 +0000
@@ -1,5 +1,5 @@
#!/bin/sh
-# Copyright (C) 2002-2003 MySQL AB
+# Copyright (C) 2002-2003 MySQL AB & Monty Program Ab
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -14,7 +14,7 @@
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-# This scripts creates the MySQL Server system tables
+# This scripts creates the MariaDB Server system tables
#
# All unrecognized arguments to this script are passed to mysqld.
@@ -38,26 +38,27 @@ usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
- --basedir=path The path to the MySQL installation directory.
+ --basedir=path The path to the MariaDB installation directory.
--builddir=path If using --srcdir with out-of-directory builds, you
will need to set this to the location of the build
directory where built files reside.
- --cross-bootstrap For internal use. Used when building the MySQL system
+ --cross-bootstrap For internal use. Used when building the MariaDB system
tables on a different host than the target.
- --datadir=path The path to the MySQL data directory.
+ --datadir=path The path to the MariaDB data directory.
--force Causes mysql_install_db to run even if DNS does not
work. In that case, grant table entries that normally
use hostnames will use IP addresses.
- --ldata=path The path to the MySQL data directory. Same as --datadir.
+ --ldata=path The path to the MariaDB data directory. Same as
+ --datadir.
--rpm For internal use. This option is used by RPM files
- during the MySQL installation process.
+ during the MariaDB installation process.
--skip-name-resolve Use IP addresses rather than hostnames when creating
grant table entries. This option can be useful if
your DNS does not work.
- --srcdir=path The path to the MySQL source directory. This option
+ --srcdir=path The path to the MariaDB source directory. This option
uses the compiled binaries and support files within the
source tree, useful for if you don't want to install
- MySQL yet and just want to create the system tables.
+ MariaDB yet and just want to create the system tables.
--user=user_name The login username to use for running mysqld. Files
and directories created by mysqld will be owned by this
user. You must be root to use this option. By default
@@ -116,7 +117,7 @@ parse_arguments()
defaults="$arg" ;;
--cross-bootstrap|--windows)
- # Used when building the MySQL system tables on a different host than
+ # Used when building the MariaDB system tables on a different host than
# the target. The platform-independent files that are created in
# --datadir on the host can be copied to the target system.
#
@@ -338,10 +339,10 @@ then
fi
echo "WARNING: The host '$hostname' could not be looked up with resolveip."
echo "This probably means that your libc libraries are not 100 % compatible"
- echo "with this binary MySQL version. The MySQL daemon, mysqld, should work"
+ echo "with this binary MariaDB version. The MariaDB daemon, mysqld, should work"
echo "normally with the exception that host name resolving will not work."
echo "This means that you should use IP addresses instead of hostnames"
- echo "when specifying MySQL privileges !"
+ echo "when specifying MariaDB privileges !"
fi
fi
@@ -388,7 +389,7 @@ mysqld_install_cmd_line="$mysqld_bootstr
--net_buffer_length=16K"
# Create the system and help tables by passing them to "mysqld --bootstrap"
-s_echo "Installing MySQL system tables..."
+s_echo "Installing MariaDB/MySQL system tables..."
if { echo "use mysql;"; cat $create_system_tables $fill_system_tables; } | eval "$filter_cmd_line" | $mysqld_install_cmd_line > /dev/null
then
s_echo "OK"
@@ -410,14 +411,16 @@ else
echo "Try 'mysqld --help' if you have problems with paths. Using --log"
echo "gives you a log in $ldata that may be helpful."
echo
- echo "The latest information about MySQL is available on the web at"
- echo "http://www.mysql.com/. Please consult the MySQL manual section"
+ echo "The latest information about MariaDB is available on the web at"
+ echo "http://askmonty.org/wiki/index.php/MariaDB".
+ echo "If you have a problem, you can consult the MySQL manual section"
echo "'Problems running mysql_install_db', and the manual section that"
- echo "describes problems on your OS. Another information source are the"
- echo "MySQL email archives available at http://lists.mysql.com/."
+ echo "describes problems on your OS at http://dev.mysql.com/doc/"
+ echo "MariaDB is hosted on launchpad; You can find the latest source and"
+ echo "email lists at http://launchpad.net/maria"
echo
echo "Please check all of the above before mailing us! And remember, if"
- echo "you do mail us, you MUST use the $scriptdir/mysqlbug script!"
+ echo "you do mail us, you should use the $scriptdir/mysqlbug script!"
echo
exit 1
fi
@@ -442,7 +445,7 @@ then
s_echo "support-files/mysql.server to the right place for your system"
echo
- echo "PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !"
+ echo "PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !"
echo "To do so, start the server, then issue the following commands:"
echo
echo "$bindir/mysqladmin -u root password 'new-password'"
@@ -455,23 +458,28 @@ then
echo "databases and anonymous user created by default. This is"
echo "strongly recommended for production servers."
echo
- echo "See the manual for more instructions."
+ echo "See the MySQL manual for more instructions."
if test "$in_rpm" -eq 0
then
echo
- echo "You can start the MySQL daemon with:"
+ echo "You can start the MariaDB daemon with:"
echo "cd $basedir ; $bindir/mysqld_safe &"
echo
- echo "You can test the MySQL daemon with mysql-test-run.pl"
+ echo "You can test the MariaDB daemon with mysql-test-run.pl"
echo "cd $basedir/mysql-test ; perl mysql-test-run.pl"
fi
echo
echo "Please report any problems with the $scriptdir/mysqlbug script!"
echo
- echo "The latest information about MySQL is available at http://www.mysql.com/"
- echo "Support MySQL by buying support/licenses from http://shop.mysql.com/"
+ echo "The latest information about MariaDB is available at http://www.askmonty.org/."
+ echo "You can find additional information about the MySQL part at:"
+ echo "http://dev.mysql.com"
+ echo "Support MariaDB development by buying support/new features from"
+ echo "Monty Program Ab. You can contact us about this at sales(a)askmonty.org".
+ echo "Alternatively consider joining our community based development effort:"
+ echo "http://askmonty.org/wiki/index.php/MariaDB#How_can_I_participate_in_the_dev…"
echo
fi
=== modified file 'sql/protocol.cc'
--- a/sql/protocol.cc 2009-04-25 10:05:32 +0000
+++ b/sql/protocol.cc 2009-07-02 10:15:33 +0000
@@ -58,6 +58,65 @@ bool Protocol_binary::net_store_data(con
}
+/*
+ net_store_data() - extended version with character set conversion.
+
+ It is optimized for short strings whose length after
+ conversion is garanteed to be less than 251, which accupies
+ exactly one byte to store length. It allows not to use
+ the "convert" member as a temporary buffer, conversion
+ is done directly to the "packet" member.
+ The limit 251 is good enough to optimize send_fields()
+ because column, table, database names fit into this limit.
+*/
+
+#ifndef EMBEDDED_LIBRARY
+bool Protocol::net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *from_cs, CHARSET_INFO *to_cs)
+{
+ uint dummy_errors;
+ /* Calculate maxumum possible result length */
+ size_t conv_length= to_cs->mbmaxlen * length / from_cs->mbminlen;
+ ulong packet_length, new_length;
+ char *length_pos, *to;
+
+ if (conv_length > 250)
+ {
+ /*
+ For strings with conv_length greater than 250 bytes
+ we don't know how many bytes we will need to store length: one or two,
+ because we don't know result length until conversion is done.
+ For example, when converting from utf8 (mbmaxlen=3) to latin1,
+ conv_length=300 means that the result length can vary between 100 to 300.
+ length=100 needs one byte, length=300 needs to bytes.
+
+ Thus conversion directly to "packet" is not worthy.
+ Let's use "convert" as a temporary buffer.
+ */
+ return (convert->copy((const char*) from, length, from_cs, to_cs,
+ &dummy_errors) ||
+ net_store_data((const uchar*) convert->ptr(), convert->length()));
+ }
+
+ packet_length= packet->length();
+ new_length= packet_length + conv_length + 1;
+
+ if (new_length > packet->alloced_length() && packet->realloc(new_length))
+ return 1;
+
+ length_pos= (char*) packet->ptr() + packet_length;
+ to= length_pos + 1;
+
+ to+= copy_and_convert(to, conv_length, to_cs,
+ (const char*) from, length, from_cs, &dummy_errors);
+
+ net_store_length((uchar*) length_pos, to - length_pos - 1);
+ packet->length((uint) (to - packet->ptr()));
+ return 0;
+}
+#endif
+
+
/**
Send a error string to client.
@@ -773,10 +832,10 @@ bool Protocol::store_string_aux(const ch
fromcs != &my_charset_bin &&
tocs != &my_charset_bin)
{
- uint dummy_errors;
- return (convert->copy(from, length, fromcs, tocs, &dummy_errors) ||
- net_store_data((uchar*) convert->ptr(), convert->length()));
+ /* Store with conversion */
+ return net_store_data((uchar*) from, length, fromcs, tocs);
}
+ /* Store without conversion */
return net_store_data((uchar*) from, length);
}
@@ -802,7 +861,7 @@ bool Protocol_text::store(const char *fr
{
CHARSET_INFO *tocs= this->thd->variables.character_set_results;
#ifndef DBUG_OFF
- DBUG_PRINT("info", ("Protocol_text::store field %u (%u): %*s", field_pos,
+ DBUG_PRINT("info", ("Protocol_text::store field %u (%u): %.*s", field_pos,
field_count, (int) length, from));
DBUG_ASSERT(field_pos < field_count);
DBUG_ASSERT(field_types == 0 ||
=== modified file 'sql/protocol.h'
--- a/sql/protocol.h 2007-12-20 21:11:37 +0000
+++ b/sql/protocol.h 2009-07-02 10:15:33 +0000
@@ -42,6 +42,8 @@ protected:
MYSQL_FIELD *next_mysql_field;
MEM_ROOT *alloc;
#endif
+ bool net_store_data(const uchar *from, size_t length,
+ CHARSET_INFO *fromcs, CHARSET_INFO *tocs);
bool store_string_aux(const char *from, size_t length,
CHARSET_INFO *fromcs, CHARSET_INFO *tocs);
public:
=== modified file 'sql/sql_string.cc'
--- a/sql/sql_string.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_string.cc 2009-07-02 10:15:33 +0000
@@ -782,10 +782,11 @@ String *copy_if_not_alloced(String *to,S
*/
-uint32
-copy_and_convert(char *to, uint32 to_length, CHARSET_INFO *to_cs,
- const char *from, uint32 from_length, CHARSET_INFO *from_cs,
- uint *errors)
+static uint32
+copy_and_convert_extended(char *to, uint32 to_length, CHARSET_INFO *to_cs,
+ const char *from, uint32 from_length,
+ CHARSET_INFO *from_cs,
+ uint *errors)
{
int cnvres;
my_wc_t wc;
@@ -900,6 +901,65 @@ my_copy_with_hex_escaping(CHARSET_INFO *
}
/*
+ Optimized for quick copying of ASCII characters in the range 0x00..0x7F.
+*/
+uint32
+copy_and_convert(char *to, uint32 to_length, CHARSET_INFO *to_cs,
+ const char *from, uint32 from_length, CHARSET_INFO *from_cs,
+ uint *errors)
+{
+ /*
+ If any of the character sets is not ASCII compatible,
+ immediately switch to slow mb_wc->wc_mb method.
+ */
+ if ((to_cs->state | from_cs->state) & MY_CS_NONASCII)
+ return copy_and_convert_extended(to, to_length, to_cs,
+ from, from_length, from_cs, errors);
+
+ uint32 length= min(to_length, from_length), length2= length;
+
+#if defined(__i386__)
+ /*
+ Special loop for i386, it allows to refer to a
+ non-aligned memory block as UINT32, which makes
+ it possible to copy four bytes at once. This
+ gives about 10% performance improvement comparing
+ to byte-by-byte loop.
+ */
+ for ( ; length >= 4; length-= 4, from+= 4, to+= 4)
+ {
+ if ((*(uint32*)from) & 0x80808080)
+ break;
+ *((uint32*) to)= *((const uint32*) from);
+ }
+#endif
+
+ for (; ; *to++= *from++, length--)
+ {
+ if (!length)
+ {
+ *errors= 0;
+ return length2;
+ }
+ if (*((unsigned char*) from) > 0x7F) /* A non-ASCII character */
+ {
+ uint32 copied_length= length2 - length;
+ to_length-= copied_length;
+ from_length-= copied_length;
+ return copied_length + copy_and_convert_extended(to, to_length,
+ to_cs,
+ from, from_length,
+ from_cs,
+ errors);
+ }
+ }
+
+ DBUG_ASSERT(FALSE); // Should never get to here
+ return 0; // Make compiler happy
+}
+
+
+/*
copy a string,
with optional character set conversion,
with optional left padding (for binary -> UCS2 conversion)
=== modified file 'strings/conf_to_src.c'
--- a/strings/conf_to_src.c 2008-11-14 16:29:38 +0000
+++ b/strings/conf_to_src.c 2009-07-02 10:15:33 +0000
@@ -184,11 +184,12 @@ void dispcset(FILE *f,CHARSET_INFO *cs)
{
fprintf(f,"{\n");
fprintf(f," %d,%d,%d,\n",cs->number,0,0);
- fprintf(f," MY_CS_COMPILED%s%s%s%s,\n",
+ fprintf(f," MY_CS_COMPILED%s%s%s%s%s,\n",
cs->state & MY_CS_BINSORT ? "|MY_CS_BINSORT" : "",
cs->state & MY_CS_PRIMARY ? "|MY_CS_PRIMARY" : "",
is_case_sensitive(cs) ? "|MY_CS_CSSORT" : "",
- my_charset_is_8bit_pure_ascii(cs) ? "|MY_CS_PUREASCII" : "");
+ my_charset_is_8bit_pure_ascii(cs) ? "|MY_CS_PUREASCII" : "",
+ !my_charset_is_ascii_compatible(cs) ? "|MY_CS_NONASCII": "");
if (cs->name)
{
=== modified file 'strings/ctype-extra.c'
--- a/strings/ctype-extra.c 2007-08-20 11:47:31 +0000
+++ b/strings/ctype-extra.c 2009-07-02 10:15:33 +0000
@@ -6804,7 +6804,7 @@ CHARSET_INFO compiled_charsets[] = {
#ifdef HAVE_CHARSET_swe7
{
10,0,0,
- MY_CS_COMPILED|MY_CS_PRIMARY,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_NONASCII,
"swe7", /* cset name */
"swe7_swedish_ci", /* coll name */
"", /* comment */
@@ -8454,7 +8454,7 @@ CHARSET_INFO compiled_charsets[] = {
#ifdef HAVE_CHARSET_swe7
{
82,0,0,
- MY_CS_COMPILED|MY_CS_BINSORT,
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_NONASCII,
"swe7", /* cset name */
"swe7_bin", /* coll name */
"", /* comment */
@@ -8550,72 +8550,6 @@ CHARSET_INFO compiled_charsets[] = {
}
,
#endif
-#ifdef HAVE_CHARSET_geostd8
-{
- 92,0,0,
- MY_CS_COMPILED|MY_CS_PRIMARY,
- "geostd8", /* cset name */
- "geostd8_general_ci", /* coll name */
- "", /* comment */
- NULL, /* tailoring */
- ctype_geostd8_general_ci, /* ctype */
- to_lower_geostd8_general_ci, /* lower */
- to_upper_geostd8_general_ci, /* upper */
- sort_order_geostd8_general_ci, /* sort_order */
- NULL, /* contractions */
- NULL, /* sort_order_big*/
- to_uni_geostd8_general_ci, /* to_uni */
- NULL, /* from_uni */
- my_unicase_default, /* caseinfo */
- NULL, /* state map */
- NULL, /* ident map */
- 1, /* strxfrm_multiply*/
- 1, /* caseup_multiply*/
- 1, /* casedn_multiply*/
- 1, /* mbminlen */
- 1, /* mbmaxlen */
- 0, /* min_sort_char */
- 255, /* max_sort_char */
- ' ', /* pad_char */
- 0, /* escape_with_backslash_is_dangerous */
- &my_charset_8bit_handler,
- &my_collation_8bit_simple_ci_handler,
-}
-,
-#endif
-#ifdef HAVE_CHARSET_geostd8
-{
- 93,0,0,
- MY_CS_COMPILED|MY_CS_BINSORT,
- "geostd8", /* cset name */
- "geostd8_bin", /* coll name */
- "", /* comment */
- NULL, /* tailoring */
- ctype_geostd8_bin, /* ctype */
- to_lower_geostd8_bin, /* lower */
- to_upper_geostd8_bin, /* upper */
- NULL, /* sort_order */
- NULL, /* contractions */
- NULL, /* sort_order_big*/
- to_uni_geostd8_bin, /* to_uni */
- NULL, /* from_uni */
- my_unicase_default, /* caseinfo */
- NULL, /* state map */
- NULL, /* ident map */
- 1, /* strxfrm_multiply*/
- 1, /* caseup_multiply*/
- 1, /* casedn_multiply*/
- 1, /* mbminlen */
- 1, /* mbmaxlen */
- 0, /* min_sort_char */
- 255, /* max_sort_char */
- ' ', /* pad_char */
- 0, /* escape_with_backslash_is_dangerous */
- &my_charset_8bit_handler,
- &my_collation_8bit_bin_handler,
-}
-,
-#endif
#ifdef HAVE_CHARSET_latin1
{
94,0,0,
=== modified file 'strings/ctype-sjis.c'
--- a/strings/ctype-sjis.c 2007-10-04 07:10:15 +0000
+++ b/strings/ctype-sjis.c 2009-07-02 10:15:33 +0000
@@ -4672,7 +4672,7 @@ static MY_CHARSET_HANDLER my_charset_han
CHARSET_INFO my_charset_sjis_japanese_ci=
{
13,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM, /* state */
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_NONASCII, /* state */
"sjis", /* cs name */
"sjis_japanese_ci", /* name */
"", /* comment */
@@ -4704,7 +4704,7 @@ CHARSET_INFO my_charset_sjis_japanese_ci
CHARSET_INFO my_charset_sjis_bin=
{
88,0,0, /* number */
- MY_CS_COMPILED|MY_CS_BINSORT, /* state */
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_NONASCII, /* state */
"sjis", /* cs name */
"sjis_bin", /* name */
"", /* comment */
=== modified file 'strings/ctype-uca.c'
--- a/strings/ctype-uca.c 2007-07-03 09:06:57 +0000
+++ b/strings/ctype-uca.c 2009-07-02 10:15:33 +0000
@@ -8086,7 +8086,7 @@ MY_COLLATION_HANDLER my_collation_ucs2_u
CHARSET_INFO my_charset_ucs2_unicode_ci=
{
128,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_unicode_ci", /* name */
"", /* comment */
@@ -8118,7 +8118,7 @@ CHARSET_INFO my_charset_ucs2_unicode_ci=
CHARSET_INFO my_charset_ucs2_icelandic_uca_ci=
{
129,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_icelandic_ci",/* name */
"", /* comment */
@@ -8150,7 +8150,7 @@ CHARSET_INFO my_charset_ucs2_icelandic_u
CHARSET_INFO my_charset_ucs2_latvian_uca_ci=
{
130,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_latvian_ci", /* name */
"", /* comment */
@@ -8182,7 +8182,7 @@ CHARSET_INFO my_charset_ucs2_latvian_uca
CHARSET_INFO my_charset_ucs2_romanian_uca_ci=
{
131,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_romanian_ci", /* name */
"", /* comment */
@@ -8214,7 +8214,7 @@ CHARSET_INFO my_charset_ucs2_romanian_uc
CHARSET_INFO my_charset_ucs2_slovenian_uca_ci=
{
132,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_slovenian_ci",/* name */
"", /* comment */
@@ -8246,7 +8246,7 @@ CHARSET_INFO my_charset_ucs2_slovenian_u
CHARSET_INFO my_charset_ucs2_polish_uca_ci=
{
133,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_polish_ci", /* name */
"", /* comment */
@@ -8278,7 +8278,7 @@ CHARSET_INFO my_charset_ucs2_polish_uca_
CHARSET_INFO my_charset_ucs2_estonian_uca_ci=
{
134,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_estonian_ci", /* name */
"", /* comment */
@@ -8310,7 +8310,7 @@ CHARSET_INFO my_charset_ucs2_estonian_uc
CHARSET_INFO my_charset_ucs2_spanish_uca_ci=
{
135,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_spanish_ci", /* name */
"", /* comment */
@@ -8342,7 +8342,7 @@ CHARSET_INFO my_charset_ucs2_spanish_uca
CHARSET_INFO my_charset_ucs2_swedish_uca_ci=
{
136,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_swedish_ci", /* name */
"", /* comment */
@@ -8374,7 +8374,7 @@ CHARSET_INFO my_charset_ucs2_swedish_uca
CHARSET_INFO my_charset_ucs2_turkish_uca_ci=
{
137,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_turkish_ci", /* name */
"", /* comment */
@@ -8406,7 +8406,7 @@ CHARSET_INFO my_charset_ucs2_turkish_uca
CHARSET_INFO my_charset_ucs2_czech_uca_ci=
{
138,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_czech_ci", /* name */
"", /* comment */
@@ -8439,7 +8439,7 @@ CHARSET_INFO my_charset_ucs2_czech_uca_c
CHARSET_INFO my_charset_ucs2_danish_uca_ci=
{
139,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_danish_ci", /* name */
"", /* comment */
@@ -8471,7 +8471,7 @@ CHARSET_INFO my_charset_ucs2_danish_uca_
CHARSET_INFO my_charset_ucs2_lithuanian_uca_ci=
{
140,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_lithuanian_ci",/* name */
"", /* comment */
@@ -8503,7 +8503,7 @@ CHARSET_INFO my_charset_ucs2_lithuanian_
CHARSET_INFO my_charset_ucs2_slovak_uca_ci=
{
141,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_slovak_ci", /* name */
"", /* comment */
@@ -8535,7 +8535,7 @@ CHARSET_INFO my_charset_ucs2_slovak_uca_
CHARSET_INFO my_charset_ucs2_spanish2_uca_ci=
{
142,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_spanish2_ci", /* name */
"", /* comment */
@@ -8568,7 +8568,7 @@ CHARSET_INFO my_charset_ucs2_spanish2_uc
CHARSET_INFO my_charset_ucs2_roman_uca_ci=
{
143,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_roman_ci", /* name */
"", /* comment */
@@ -8601,7 +8601,7 @@ CHARSET_INFO my_charset_ucs2_roman_uca_c
CHARSET_INFO my_charset_ucs2_persian_uca_ci=
{
144,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_persian_ci", /* name */
"", /* comment */
@@ -8634,7 +8634,7 @@ CHARSET_INFO my_charset_ucs2_persian_uca
CHARSET_INFO my_charset_ucs2_esperanto_uca_ci=
{
145,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_esperanto_ci",/* name */
"", /* comment */
@@ -8667,7 +8667,7 @@ CHARSET_INFO my_charset_ucs2_esperanto_u
CHARSET_INFO my_charset_ucs2_hungarian_uca_ci=
{
146,0,0, /* number */
- MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_hungarian_ci",/* name */
"", /* comment */
=== modified file 'strings/ctype-ucs2.c'
--- a/strings/ctype-ucs2.c 2009-02-13 16:41:47 +0000
+++ b/strings/ctype-ucs2.c 2009-07-02 10:15:33 +0000
@@ -1717,7 +1717,7 @@ MY_CHARSET_HANDLER my_charset_ucs2_handl
CHARSET_INFO my_charset_ucs2_general_ci=
{
35,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_general_ci", /* name */
"", /* comment */
@@ -1749,7 +1749,7 @@ CHARSET_INFO my_charset_ucs2_general_ci=
CHARSET_INFO my_charset_ucs2_bin=
{
90,0,0, /* number */
- MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_UNICODE,
+ MY_CS_COMPILED|MY_CS_BINSORT|MY_CS_UNICODE|MY_CS_NONASCII,
"ucs2", /* cs name */
"ucs2_bin", /* name */
"", /* comment */
=== modified file 'strings/ctype-utf8.c'
--- a/strings/ctype-utf8.c 2008-02-11 12:28:33 +0000
+++ b/strings/ctype-utf8.c 2009-07-02 10:15:33 +0000
@@ -4204,7 +4204,7 @@ static MY_CHARSET_HANDLER my_charset_fil
CHARSET_INFO my_charset_filename=
{
17,0,0, /* number */
- MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_HIDDEN,
+ MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_HIDDEN|MY_CS_NONASCII,
"filename", /* cs name */
"filename", /* name */
"", /* comment */
=== modified file 'strings/ctype.c'
--- a/strings/ctype.c 2009-04-25 10:05:32 +0000
+++ b/strings/ctype.c 2009-07-02 10:15:33 +0000
@@ -405,3 +405,23 @@ my_charset_is_8bit_pure_ascii(CHARSET_IN
}
return 1;
}
+
+
+/*
+ Shared function between conf_to_src and mysys.
+ Check if a 8bit character set is compatible with
+ ascii on the range 0x00..0x7F.
+*/
+my_bool
+my_charset_is_ascii_compatible(CHARSET_INFO *cs)
+{
+ uint i;
+ if (!cs->tab_to_uni)
+ return 1;
+ for (i= 0; i < 128; i++)
+ {
+ if (cs->tab_to_uni[i] != i)
+ return 0;
+ }
+ return 1;
+}
1
0
[Maria-developers] Updated (by Guest): dynamic versionning of query plan for performance metric and downgrade . (33)
by worklog-noreply@askmonty.org 30 Jun '09
by worklog-noreply@askmonty.org 30 Jun '09
30 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: dynamic versionning of query plan for performance metric and downgrade
.
CREATION DATE..: Tue, 30 Jun 2009, 21:37
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 33 (http://askmonty.org/worklog/?tid=33)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 30
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Wed, 01 Jul 2009, 00:27)=-=-
High Level Description modified.
--- /tmp/wklog.33.old.20217 2009-07-01 00:27:24.000000000 +0300
+++ /tmp/wklog.33.new.20217 2009-07-01 00:27:24.000000000 +0300
@@ -1,6 +1,6 @@
Just for comparing apple and oranges ,
-A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
+A lot of SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
MariaDb could provide dynamic variable QP_vesion = 41|50|51 ...
-=-=(Fromdual - Tue, 30 Jun 2009, 21:53)=-=-
High Level Description modified.
--- /tmp/wklog.33.old.14359 2009-06-30 21:53:31.000000000 +0300
+++ /tmp/wklog.33.new.14359 2009-06-30 21:53:31.000000000 +0300
@@ -3,11 +3,13 @@
A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
-MariaDb should provide dynamique QP and provide a ratio of efficiency in regard
-with the number of handler operations, each user would so on, be able to found
-out, if an improvement or a bug in the data acess path match is wokload . With
-such feature 5.0 to 5.1 would have found an inconsistant ratio of 2 to 1/1000
-with a serious fluctuation on time depending on closing , reopening , reclosing
-and reopening bugs like
+MariaDb could provide dynamic variable QP_vesion = 41|50|51 ...
+
+providing benchmarks with a ratio of efficiency in regard with the number of
+handler operations, each user would so on, be able to found if an improvement or
+a bug in the data acess path match is wokload .
+With such feature 5.0 to 5.1 migration would have provide an inconsistant metric
+of 2 to 1/1000 with a serious fluctuation on time depending on closing ,
+reopening , reclosing and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
-=-=(Fromdual - Tue, 30 Jun 2009, 21:44)=-=-
Title modified.
--- /tmp/wklog.33.old.13936 2009-06-30 21:44:50.000000000 +0300
+++ /tmp/wklog.33.new.13936 2009-06-30 21:44:50.000000000 +0300
@@ -1 +1 @@
-Dynamique versionning of query plan for performance metric and downgrade .
+dynamic versionning of query plan for performance metric and downgrade .
DESCRIPTION:
Just for comparing apple and oranges ,
A lot of SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
MariaDb could provide dynamic variable QP_vesion = 41|50|51 ...
providing benchmarks with a ratio of efficiency in regard with the number of
handler operations, each user would so on, be able to found if an improvement or
a bug in the data acess path match is wokload .
With such feature 5.0 to 5.1 migration would have provide an inconsistant metric
of 2 to 1/1000 with a serious fluctuation on time depending on closing ,
reopening , reclosing and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Fromdual): dynamic versionning of query plan for performance metric and downgrade . (33)
by worklog-noreply@askmonty.org 30 Jun '09
by worklog-noreply@askmonty.org 30 Jun '09
30 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: dynamic versionning of query plan for performance metric and downgrade
.
CREATION DATE..: Tue, 30 Jun 2009, 21:37
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 33 (http://askmonty.org/worklog/?tid=33)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 30
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Fromdual - Tue, 30 Jun 2009, 21:53)=-=-
High Level Description modified.
--- /tmp/wklog.33.old.14359 2009-06-30 21:53:31.000000000 +0300
+++ /tmp/wklog.33.new.14359 2009-06-30 21:53:31.000000000 +0300
@@ -3,11 +3,13 @@
A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
-MariaDb should provide dynamique QP and provide a ratio of efficiency in regard
-with the number of handler operations, each user would so on, be able to found
-out, if an improvement or a bug in the data acess path match is wokload . With
-such feature 5.0 to 5.1 would have found an inconsistant ratio of 2 to 1/1000
-with a serious fluctuation on time depending on closing , reopening , reclosing
-and reopening bugs like
+MariaDb could provide dynamic variable QP_vesion = 41|50|51 ...
+
+providing benchmarks with a ratio of efficiency in regard with the number of
+handler operations, each user would so on, be able to found if an improvement or
+a bug in the data acess path match is wokload .
+With such feature 5.0 to 5.1 migration would have provide an inconsistant metric
+of 2 to 1/1000 with a serious fluctuation on time depending on closing ,
+reopening , reclosing and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
-=-=(Fromdual - Tue, 30 Jun 2009, 21:44)=-=-
Title modified.
--- /tmp/wklog.33.old.13936 2009-06-30 21:44:50.000000000 +0300
+++ /tmp/wklog.33.new.13936 2009-06-30 21:44:50.000000000 +0300
@@ -1 +1 @@
-Dynamique versionning of query plan for performance metric and downgrade .
+dynamic versionning of query plan for performance metric and downgrade .
DESCRIPTION:
Just for comparing apple and oranges ,
A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
MariaDb could provide dynamic variable QP_vesion = 41|50|51 ...
providing benchmarks with a ratio of efficiency in regard with the number of
handler operations, each user would so on, be able to found if an improvement or
a bug in the data acess path match is wokload .
With such feature 5.0 to 5.1 migration would have provide an inconsistant metric
of 2 to 1/1000 with a serious fluctuation on time depending on closing ,
reopening , reclosing and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Fromdual): dynamic versionning of query plan for performance metric and downgrade . (33)
by worklog-noreply@askmonty.org 30 Jun '09
by worklog-noreply@askmonty.org 30 Jun '09
30 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: dynamic versionning of query plan for performance metric and downgrade
.
CREATION DATE..: Tue, 30 Jun 2009, 21:37
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 33 (http://askmonty.org/worklog/?tid=33)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 30
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Fromdual - Tue, 30 Jun 2009, 21:44)=-=-
Title modified.
--- /tmp/wklog.33.old.13936 2009-06-30 21:44:50.000000000 +0300
+++ /tmp/wklog.33.new.13936 2009-06-30 21:44:50.000000000 +0300
@@ -1 +1 @@
-Dynamique versionning of query plan for performance metric and downgrade .
+dynamic versionning of query plan for performance metric and downgrade .
DESCRIPTION:
Just for comparing apple and oranges ,
A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
MariaDb should provide dynamique QP and provide a ratio of efficiency in regard
with the number of handler operations, each user would so on, be able to found
out, if an improvement or a bug in the data acess path match is wokload . With
such feature 5.0 to 5.1 would have found an inconsistant ratio of 2 to 1/1000
with a serious fluctuation on time depending on closing , reopening , reclosing
and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] New (by Fromdual): Dynamique versionning of query plan for performance metric and downgrade . (33)
by worklog-noreply@askmonty.org 30 Jun '09
by worklog-noreply@askmonty.org 30 Jun '09
30 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Dynamique versionning of query plan for performance metric and
downgrade .
CREATION DATE..: Tue, 30 Jun 2009, 21:37
SUPERVISOR.....: Bothorsen
IMPLEMENTOR....:
COPIES TO......:
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 33 (http://askmonty.org/worklog/?tid=33)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 30
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
DESCRIPTION:
Just for comparing apple and oranges ,
A lot of internal SUN/ORACLE benchmarks are reporting performance improvements.
But they are only tested on specific workload and predefined scenario like DBT2.
MariaDb should provide dynamique QP and provide a ratio of efficiency in regard
with the number of handler operations, each user would so on, be able to found
out, if an improvement or a bug in the data acess path match is wokload . With
such feature 5.0 to 5.1 would have found an inconsistant ratio of 2 to 1/1000
with a serious fluctuation on time depending on closing , reopening , reclosing
and reopening bugs like
http://bugs.mysql.com/bug.php?id=28404
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Rev 2730: TEst commits 3 in file:///home/psergey/dev/maria-5.1-table-elim-emailcommittests/
by Sergey Petrunya 30 Jun '09
by Sergey Petrunya 30 Jun '09
30 Jun '09
At file:///home/psergey/dev/maria-5.1-table-elim-emailcommittests/
------------------------------------------------------------
revno: 2730
revision-id: psergey(a)askmonty.org-20090630181749-29kxcglcbfaiyygp
parent: psergey(a)askmonty.org-20090630180521-32redd6z13g9tluc
committer: Sergey Petrunya <psergey(a)askmonty.org>
branch nick: maria-5.1-table-elim-emailcommittests
timestamp: Tue 2009-06-30 22:17:49 +0400
message:
TEst commits 3
=== modified file 'sql/opt_sum.cc'
--- a/sql/opt_sum.cc 2009-04-25 10:05:32 +0000
+++ b/sql/opt_sum.cc 2009-06-30 18:17:49 +0000
@@ -13,7 +13,7 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
-
+# error Test commits 3
/**
@file
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2708)
by Sergey Petrunia 30 Jun '09
by Sergey Petrunia 30 Jun '09
30 Jun '09
#At lp:maria based on revid:knielsen@knielsen-hq.org-20090602110359-n4q9gof38buucrny
2708 Sergey Petrunia 2009-06-30
MWL#17: Table elimination
- RC0 code
added:
mysql-test/r/table_elim.result
mysql-test/t/table_elim.test
sql-bench/test-table-elimination.sh
sql/opt_table_elimination.cc
modified:
libmysqld/Makefile.am
mysql-test/r/ps_11bugs.result
mysql-test/r/select.result
mysql-test/r/subselect.result
mysql-test/r/union.result
sql/CMakeLists.txt
sql/Makefile.am
sql/item.cc
sql/item.h
sql/item_subselect.cc
sql/item_subselect.h
sql/item_sum.cc
sql/item_sum.h
sql/sql_lex.cc
sql/sql_lex.h
sql/sql_select.cc
sql/sql_select.h
sql/table.h
per-file messages:
libmysqld/Makefile.am
MWL#17: Table elimination
- add opt_table_elimination.cc
mysql-test/r/ps_11bugs.result
MWL#17: Table elimination
- Update test results (the difference is because
we now recoginze Item_ref(const_item) as const
mysql-test/r/select.result
MWL#17: Table elimination
- Update test results
mysql-test/r/subselect.result
MWL#17: Table elimination
- Update test results (the difference is because
we now recoginze Item_ref(const_item) as const
mysql-test/r/table_elim.result
MWL#17: Table elimination
- Testcases
mysql-test/r/union.result
MWL#17: Table elimination
- Update test results (the difference is because
we now recoginze Item_ref(const_item) as const
mysql-test/t/table_elim.test
MWL#17: Table elimination
- Testcases
sql-bench/test-table-elimination.sh
MWL#17: Table elimination
- Benchmark which compares table elimination queries with no-table-elimination queries
sql/CMakeLists.txt
MWL#17: Table elimination
- add opt_table_elimination.cc
sql/Makefile.am
MWL#17: Table elimination
- add opt_table_elimination.cc
sql/item.cc
MWL#17: Table elimination
- Add Item_field::check_column_usage_processor
sql/item.h
MWL#17: Table elimination
- Add check_column_usage_processor()
sql/item_subselect.cc
MWL#17: Table elimination
- Make Item_subselect to
= be able to tell which particular items are referred from inside the select
= to tell whether it was eliminated
sql/item_subselect.h
MWL#17: Table elimination
- Make Item_subselect to
= be able to tell which particular items are referred from inside the select
= to tell whether it was eliminated
sql/item_sum.cc
MWL#17: Table elimination
- Fix Item_sum_sum::used_tables() to report tables whose columns it really needs
sql/item_sum.h
MWL#17: Table elimination
- Fix Item_sum_sum::used_tables() to report tables whose columns it really needs
sql/opt_table_elimination.cc
MWL#17: Table elimination
- Table elimination Module
sql/sql_lex.cc
MWL#17: Table elimination
- Collect Item_subselect::refers_to attribute
sql/sql_lex.h
MWL#17: Table elimination
- Collect Item_subselect::refers_to attribute
sql/sql_select.cc
MWL#17: Table elimination
- Make KEYUSE array code to also collect/process "binding" equalities in form
t.keyXpartY= func(t.keyXpartZ,...)
- Call table elimination function
- Make EXPLAIN not to show eliminated tables/selects
- Added more comments
- Move definitions of FT_KEYPART, KEY_OPTIMIZE_* into sql_select.h as they are now
used in opt_table_elimination.cc
sql/sql_select.h
MWL#17: Table elimination
- Make KEYUSE array code to also collect/process "binding" equalities in form
t.keyXpartY= func(t.keyXpartZ,...)
- Call table elimination function
- Make EXPLAIN not to show eliminated tables/selects
- Added more comments
- Move definitions of FT_KEYPART, KEY_OPTIMIZE_* into sql_select.h as they are now
used in opt_table_elimination.cc
sql/table.h
MWL#17: Table elimination
- More comments
- Add NESTED_JOIN::n_tables
=== modified file 'libmysqld/Makefile.am'
--- a/libmysqld/Makefile.am 2009-03-12 22:27:35 +0000
+++ b/libmysqld/Makefile.am 2009-06-30 15:09:36 +0000
@@ -76,7 +76,7 @@ sqlsources = derror.cc field.cc field_co
rpl_filter.cc sql_partition.cc sql_builtin.cc sql_plugin.cc \
sql_tablespace.cc \
rpl_injector.cc my_user.c partition_info.cc \
- sql_servers.cc event_parse_data.cc
+ sql_servers.cc event_parse_data.cc opt_table_elimination.cc
libmysqld_int_a_SOURCES= $(libmysqld_sources)
nodist_libmysqld_int_a_SOURCES= $(libmysqlsources) $(sqlsources)
=== modified file 'mysql-test/r/ps_11bugs.result'
--- a/mysql-test/r/ps_11bugs.result 2008-10-08 11:23:53 +0000
+++ b/mysql-test/r/ps_11bugs.result 2009-06-30 15:09:36 +0000
@@ -121,8 +121,8 @@ insert into t1 values (1);
explain select * from t1 where 3 in (select (1+1) union select 1);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
-2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL No tables used
-3 DEPENDENT UNION NULL NULL NULL NULL NULL NULL NULL No tables used
+2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible HAVING
+3 DEPENDENT UNION NULL NULL NULL NULL NULL NULL NULL Impossible HAVING
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL
select * from t1 where 3 in (select (1+1) union select 1);
a
=== modified file 'mysql-test/r/select.result'
--- a/mysql-test/r/select.result 2009-03-16 05:02:10 +0000
+++ b/mysql-test/r/select.result 2009-06-30 15:09:36 +0000
@@ -3585,7 +3585,6 @@ INSERT INTO t2 VALUES (1,'a'),(2,'b'),(3
EXPLAIN SELECT t1.a FROM t1 LEFT JOIN t2 ON t2.b=t1.b WHERE t1.a=3;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 const PRIMARY PRIMARY 4 const 1
-1 SIMPLE t2 const b b 22 const 1 Using index
DROP TABLE t1,t2;
CREATE TABLE t1(id int PRIMARY KEY, b int, e int);
CREATE TABLE t2(i int, a int, INDEX si(i), INDEX ai(a));
=== modified file 'mysql-test/r/subselect.result'
--- a/mysql-test/r/subselect.result 2009-04-25 09:04:38 +0000
+++ b/mysql-test/r/subselect.result 2009-06-30 15:09:36 +0000
@@ -4353,13 +4353,13 @@ id select_type table type possible_keys
1 PRIMARY t1 ALL NULL NULL NULL NULL 2 100.00
2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 2 100.00 Using temporary; Using filesort
Warnings:
-Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` group by `test`.`t1`.`a` having (<cache>(1) = <ref_null_helper>(1))))
+Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` group by `test`.`t1`.`a` having 1))
EXPLAIN EXTENDED SELECT 1 FROM t1 WHERE 1 IN (SELECT 1 FROM t1 WHERE a > 3 GROUP BY a);
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables
2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 2 100.00 Using where; Using temporary; Using filesort
Warnings:
-Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` where (`test`.`t1`.`a` > 3) group by `test`.`t1`.`a` having (<cache>(1) = <ref_null_helper>(1))))
+Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` where (`test`.`t1`.`a` > 3) group by `test`.`t1`.`a` having 1))
DROP TABLE t1;
End of 5.0 tests.
CREATE TABLE t1 (a INT, b INT);
=== added file 'mysql-test/r/table_elim.result'
--- a/mysql-test/r/table_elim.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/r/table_elim.result 2009-06-30 15:09:36 +0000
@@ -0,0 +1,204 @@
+drop table if exists t0, t1, t2, t3;
+drop view if exists v1, v2;
+create table t1 (a int);
+insert into t1 values (0),(1),(2),(3);
+create table t0 as select * from t1;
+create table t2 (a int primary key, b int)
+as select a, a as b from t1 where a in (1,2);
+create table t3 (a int primary key, b int)
+as select a, a as b from t1 where a in (1,3);
+# This will be eliminated:
+explain select t1.a from t1 left join t2 on t2.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+explain extended select t1.a from t1 left join t2 on t2.a=t1.a;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 100.00
+Warnings:
+Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1
+select t1.a from t1 left join t2 on t2.a=t1.a;
+a
+0
+1
+2
+3
+# This will not be eliminated as t2.b is in in select list:
+explain select * from t1 left join t2 on t2.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1
+# This will not be eliminated as t2.b is in in order list:
+explain select t1.a from t1 left join t2 on t2.a=t1.a order by t2.b;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
+1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1
+# This will not be eliminated as t2.b is in group list:
+explain select t1.a from t1 left join t2 on t2.a=t1.a group by t2.b;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
+1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1
+# This will not be eliminated as t2.b is in the WHERE
+explain select t1.a from t1 left join t2 on t2.a=t1.a where t2.b < 3 or t2.b is null;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 Using where
+# Elimination of multiple tables:
+explain select t1.a from t1 left join (t2 join t3) on t2.a=t1.a and t3.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+# Elimination of multiple tables (2):
+explain select t1.a from t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and t3.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+# Elimination when done within an outer join nest:
+explain extended
+select t0.*
+from
+t0 left join (t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and
+t3.a=t1.a) on t0.a=t1.a;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 SIMPLE t0 ALL NULL NULL NULL NULL 4 100.00
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 100.00
+Warnings:
+Note 1003 select `test`.`t0`.`a` AS `a` from `test`.`t0` left join (`test`.`t1`) on((`test`.`t0`.`a` = `test`.`t1`.`a`)) where 1
+# Elimination with aggregate functions
+explain select count(*) from t1 left join t2 on t2.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+explain select count(1) from t1 left join t2 on t2.a=t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+explain select count(1) from t1 left join t2 on t2.a=t1.a group by t1.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
+This must not use elimination:
+explain select count(1) from t1 left join t2 on t2.a=t1.a group by t2.a;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
+1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 Using index
+drop table t0, t1, t2, t3;
+create table t0 ( id integer, primary key (id));
+create table t1 (
+id integer,
+attr1 integer,
+primary key (id),
+key (attr1)
+);
+create table t2 (
+id integer,
+attr2 integer,
+fromdate date,
+primary key (id, fromdate),
+key (attr2,fromdate)
+);
+insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
+insert into t0 select A.id + 10*B.id from t0 A, t0 B where B.id > 0;
+insert into t1 select id, id from t0;
+insert into t2 select id, id, date_add('2009-06-22', interval id day) from t0;
+insert into t2 select id, id+1, date_add('2008-06-22', interval id day) from t0;
+create view v1 as
+select
+F.id, A1.attr1, A2.attr2
+from
+t0 F
+left join t1 A1 on A1.id=F.id
+left join t2 A2 on A2.id=F.id and
+A2.fromdate=(select MAX(fromdate) from
+t2 where id=A2.id);
+create view v2 as
+select
+F.id, A1.attr1, A2.attr2
+from
+t0 F
+left join t1 A1 on A1.id=F.id
+left join t2 A2 on A2.id=F.id and
+A2.fromdate=(select MAX(fromdate) from
+t2 where id=F.id);
+This should use one table:
+explain select id from v1 where id=2;
+id select_type table type possible_keys key key_len ref rows Extra
+1 PRIMARY F const PRIMARY PRIMARY 4 const 1 Using index
+This should use one table:
+explain extended select id from v1 where id in (1,2,3,4);
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY F range PRIMARY PRIMARY 4 NULL 4 100.00 Using where; Using index
+Warnings:
+Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` where (`F`.`id` in (1,2,3,4))
+This should use facts and A1 tables:
+explain extended select id from v1 where attr1 between 12 and 14;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY A1 range PRIMARY,attr1 attr1 5 NULL 2 100.00 Using where
+1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A1.id 1 100.00 Using index
+Warnings:
+Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t1` `A1` where ((`F`.`id` = `A1`.`id`) and (`A1`.`attr1` between 12 and 14))
+This should use facts, A2 and its subquery:
+explain extended select id from v1 where attr2 between 12 and 14;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY A2 range PRIMARY,attr2 attr2 5 NULL 5 100.00 Using where
+1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A2.id 1 100.00 Using index
+3 DEPENDENT SUBQUERY t2 ref PRIMARY PRIMARY 4 test.A2.id 2 100.00 Using index
+Warnings:
+Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t2` `A2` where ((`F`.`id` = `A2`.`id`) and (`A2`.`attr2` between 12 and 14) and (`A2`.`fromdate` = (select max(`test`.`t2`.`fromdate`) AS `MAX(fromdate)` from `test`.`t2` where (`test`.`t2`.`id` = `A2`.`id`))))
+This should use one table:
+explain select id from v2 where id=2;
+id select_type table type possible_keys key key_len ref rows Extra
+1 PRIMARY F const PRIMARY PRIMARY 4 const 1 Using index
+This should use one table:
+explain extended select id from v2 where id in (1,2,3,4);
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY F range PRIMARY PRIMARY 4 NULL 4 100.00 Using where; Using index
+Warnings:
+Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` where (`F`.`id` in (1,2,3,4))
+This should use facts and A1 tables:
+explain extended select id from v2 where attr1 between 12 and 14;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY A1 range PRIMARY,attr1 attr1 5 NULL 2 100.00 Using where
+1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A1.id 1 100.00 Using index
+Warnings:
+Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t1` `A1` where ((`F`.`id` = `A1`.`id`) and (`A1`.`attr1` between 12 and 14))
+This should use facts, A2 and its subquery:
+explain extended select id from v2 where attr2 between 12 and 14;
+id select_type table type possible_keys key key_len ref rows filtered Extra
+1 PRIMARY A2 range PRIMARY,attr2 attr2 5 NULL 5 100.00 Using where
+1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A2.id 1 100.00 Using where; Using index
+3 DEPENDENT SUBQUERY t2 ref PRIMARY PRIMARY 4 test.F.id 2 100.00 Using index
+Warnings:
+Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1
+Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t2` `A2` where ((`F`.`id` = `A2`.`id`) and (`A2`.`attr2` between 12 and 14) and (`A2`.`fromdate` = (select max(`test`.`t2`.`fromdate`) AS `MAX(fromdate)` from `test`.`t2` where (`test`.`t2`.`id` = `F`.`id`))))
+drop view v1, v2;
+drop table t0, t1, t2;
+create table t1 (a int);
+insert into t1 values (0),(1),(2),(3);
+create table t2 (pk1 int, pk2 int, pk3 int, col int, primary key(pk1, pk2, pk3));
+insert into t2 select a,a,a,a from t1;
+This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+t2.pk2=t2.pk1+1 and
+t2.pk3=t2.pk2+1;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+t2.pk3=t2.pk1+1 and
+t2.pk2=t2.pk3+1;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+This must use both:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+t2.pk3=t2.pk1+1 and
+t2.pk2=t2.pk3+t2.col;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+1 SIMPLE t2 ref PRIMARY PRIMARY 4 test.t1.a 1
+This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk2=t1.a and
+t2.pk1=t2.pk2+1 and
+t2.pk3=t2.pk1;
+id select_type table type possible_keys key key_len ref rows Extra
+1 SIMPLE t1 ALL NULL NULL NULL NULL 4
+drop table t1, t2;
=== modified file 'mysql-test/r/union.result'
--- a/mysql-test/r/union.result 2009-03-19 10:18:52 +0000
+++ b/mysql-test/r/union.result 2009-06-30 15:09:36 +0000
@@ -522,7 +522,7 @@ id select_type table type possible_keys
2 UNION t2 const PRIMARY PRIMARY 4 const 1 100.00
NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL NULL
Warnings:
-Note 1003 (select '1' AS `a`,'1' AS `b` from `test`.`t1` where ('1' = 1)) union (select '1' AS `a`,'10' AS `b` from `test`.`t2` where ('1' = 1))
+Note 1003 (select '1' AS `a`,'1' AS `b` from `test`.`t1` where 1) union (select '1' AS `a`,'10' AS `b` from `test`.`t2` where 1)
(select * from t1 where a=5) union (select * from t2 where a=1);
a b
1 10
=== added file 'mysql-test/t/table_elim.test'
--- a/mysql-test/t/table_elim.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/t/table_elim.test 2009-06-30 15:09:36 +0000
@@ -0,0 +1,160 @@
+#
+# Table elimination (MWL#17) tests
+#
+--disable_warnings
+drop table if exists t0, t1, t2, t3;
+drop view if exists v1, v2;
+--enable_warnings
+
+create table t1 (a int);
+insert into t1 values (0),(1),(2),(3);
+create table t0 as select * from t1;
+
+create table t2 (a int primary key, b int)
+ as select a, a as b from t1 where a in (1,2);
+
+create table t3 (a int primary key, b int)
+ as select a, a as b from t1 where a in (1,3);
+
+--echo # This will be eliminated:
+explain select t1.a from t1 left join t2 on t2.a=t1.a;
+explain extended select t1.a from t1 left join t2 on t2.a=t1.a;
+
+select t1.a from t1 left join t2 on t2.a=t1.a;
+
+--echo # This will not be eliminated as t2.b is in in select list:
+explain select * from t1 left join t2 on t2.a=t1.a;
+
+--echo # This will not be eliminated as t2.b is in in order list:
+explain select t1.a from t1 left join t2 on t2.a=t1.a order by t2.b;
+
+--echo # This will not be eliminated as t2.b is in group list:
+explain select t1.a from t1 left join t2 on t2.a=t1.a group by t2.b;
+
+--echo # This will not be eliminated as t2.b is in the WHERE
+explain select t1.a from t1 left join t2 on t2.a=t1.a where t2.b < 3 or t2.b is null;
+
+--echo # Elimination of multiple tables:
+explain select t1.a from t1 left join (t2 join t3) on t2.a=t1.a and t3.a=t1.a;
+
+--echo # Elimination of multiple tables (2):
+explain select t1.a from t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and t3.a=t1.a;
+
+--echo # Elimination when done within an outer join nest:
+explain extended
+select t0.*
+from
+ t0 left join (t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and
+ t3.a=t1.a) on t0.a=t1.a;
+
+--echo # Elimination with aggregate functions
+explain select count(*) from t1 left join t2 on t2.a=t1.a;
+explain select count(1) from t1 left join t2 on t2.a=t1.a;
+explain select count(1) from t1 left join t2 on t2.a=t1.a group by t1.a;
+
+--echo This must not use elimination:
+explain select count(1) from t1 left join t2 on t2.a=t1.a group by t2.a;
+
+drop table t0, t1, t2, t3;
+
+# This will stand for elim_facts
+create table t0 ( id integer, primary key (id));
+
+# Attribute1, non-versioned
+create table t1 (
+ id integer,
+ attr1 integer,
+ primary key (id),
+ key (attr1)
+);
+
+# Attribute2, time-versioned
+create table t2 (
+ id integer,
+ attr2 integer,
+ fromdate date,
+ primary key (id, fromdate),
+ key (attr2,fromdate)
+);
+
+insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
+insert into t0 select A.id + 10*B.id from t0 A, t0 B where B.id > 0;
+
+insert into t1 select id, id from t0;
+insert into t2 select id, id, date_add('2009-06-22', interval id day) from t0;
+insert into t2 select id, id+1, date_add('2008-06-22', interval id day) from t0;
+
+create view v1 as
+select
+ F.id, A1.attr1, A2.attr2
+from
+ t0 F
+ left join t1 A1 on A1.id=F.id
+ left join t2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ t2 where id=A2.id);
+create view v2 as
+select
+ F.id, A1.attr1, A2.attr2
+from
+ t0 F
+ left join t1 A1 on A1.id=F.id
+ left join t2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ t2 where id=F.id);
+
+--echo This should use one table:
+explain select id from v1 where id=2;
+--echo This should use one table:
+explain extended select id from v1 where id in (1,2,3,4);
+--echo This should use facts and A1 tables:
+explain extended select id from v1 where attr1 between 12 and 14;
+--echo This should use facts, A2 and its subquery:
+explain extended select id from v1 where attr2 between 12 and 14;
+
+# Repeat for v2:
+
+--echo This should use one table:
+explain select id from v2 where id=2;
+--echo This should use one table:
+explain extended select id from v2 where id in (1,2,3,4);
+--echo This should use facts and A1 tables:
+explain extended select id from v2 where attr1 between 12 and 14;
+--echo This should use facts, A2 and its subquery:
+explain extended select id from v2 where attr2 between 12 and 14;
+
+drop view v1, v2;
+drop table t0, t1, t2;
+
+#
+# Tests for the code that uses t.keypartX=func(t.keypartY) equalities to
+# make table elimination inferences
+#
+create table t1 (a int);
+insert into t1 values (0),(1),(2),(3);
+
+create table t2 (pk1 int, pk2 int, pk3 int, col int, primary key(pk1, pk2, pk3));
+insert into t2 select a,a,a,a from t1;
+
+--echo This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+ t2.pk2=t2.pk1+1 and
+ t2.pk3=t2.pk2+1;
+
+--echo This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+ t2.pk3=t2.pk1+1 and
+ t2.pk2=t2.pk3+1;
+
+--echo This must use both:
+explain select t1.* from t1 left join t2 on t2.pk1=t1.a and
+ t2.pk3=t2.pk1+1 and
+ t2.pk2=t2.pk3+t2.col;
+
+--echo This must use only t1:
+explain select t1.* from t1 left join t2 on t2.pk2=t1.a and
+ t2.pk1=t2.pk2+1 and
+ t2.pk3=t2.pk1;
+
+drop table t1, t2;
+
=== added file 'sql-bench/test-table-elimination.sh'
--- a/sql-bench/test-table-elimination.sh 1970-01-01 00:00:00 +0000
+++ b/sql-bench/test-table-elimination.sh 2009-06-30 15:09:36 +0000
@@ -0,0 +1,320 @@
+#!@PERL@
+# Test of table elimination feature
+
+use Cwd;
+use DBI;
+use Getopt::Long;
+use Benchmark;
+
+$opt_loop_count=100000;
+$opt_medium_loop_count=10000;
+$opt_small_loop_count=100;
+
+$pwd = cwd(); $pwd = "." if ($pwd eq '');
+require "$pwd/bench-init.pl" || die "Can't read Configuration file: $!\n";
+
+if ($opt_small_test)
+{
+ $opt_loop_count/=10;
+ $opt_medium_loop_count/=10;
+ $opt_small_loop_count/=10;
+}
+
+print "Testing table elimination feature\n";
+print "The test table has $opt_loop_count rows.\n\n";
+
+# A query to get the recent versions of all attributes:
+$select_current_full_facts="
+ select
+ F.id, A1.attr1, A2.attr2
+ from
+ elim_facts F
+ left join elim_attr1 A1 on A1.id=F.id
+ left join elim_attr2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ elim_attr2 where id=A2.id);
+";
+$select_current_full_facts="
+ select
+ F.id, A1.attr1, A2.attr2
+ from
+ elim_facts F
+ left join elim_attr1 A1 on A1.id=F.id
+ left join elim_attr2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ elim_attr2 where id=F.id);
+";
+# TODO: same as above but for some given date also?
+# TODO:
+
+
+####
+#### Connect and start timeing
+####
+
+$dbh = $server->connect();
+$start_time=new Benchmark;
+
+####
+#### Create needed tables
+####
+
+goto select_test if ($opt_skip_create);
+
+print "Creating tables\n";
+$dbh->do("drop table elim_facts" . $server->{'drop_attr'});
+$dbh->do("drop table elim_attr1" . $server->{'drop_attr'});
+$dbh->do("drop table elim_attr2" . $server->{'drop_attr'});
+
+# The facts table
+do_many($dbh,$server->create("elim_facts",
+ ["id integer"],
+ ["primary key (id)"]));
+
+# Attribute1, non-versioned
+do_many($dbh,$server->create("elim_attr1",
+ ["id integer",
+ "attr1 integer"],
+ ["primary key (id)",
+ "key (attr1)"]));
+
+# Attribute2, time-versioned
+do_many($dbh,$server->create("elim_attr2",
+ ["id integer",
+ "attr2 integer",
+ "fromdate date"],
+ ["primary key (id, fromdate)",
+ "key (attr2,fromdate)"]));
+
+#NOTE: ignoring: if ($limits->{'views'})
+$dbh->do("drop view elim_current_facts");
+$dbh->do("create view elim_current_facts as $select_current_full_facts");
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE");
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(1,\$dbh);
+}
+
+####
+#### Fill the facts table
+####
+$n_facts= $opt_loop_count;
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into facts table\n";
+$loop_time=new Benchmark;
+
+$query="insert into elim_facts values (";
+for ($id=0; $id < $n_facts ; $id++)
+{
+ do_query($dbh,"$query $id)");
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Fill attr1 table
+####
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into attr1 table\n";
+$loop_time=new Benchmark;
+
+$query="insert into elim_attr1 values (";
+for ($id=0; $id < $n_facts ; $id++)
+{
+ $attr1= ceil(rand($n_facts));
+ do_query($dbh,"$query $id, $attr1)");
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Fill attr2 table
+####
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into attr2 table\n";
+$loop_time=new Benchmark;
+
+for ($id=0; $id < $n_facts ; $id++)
+{
+ # Two values for each $id - current one and obsolete one.
+ $attr1= ceil(rand($n_facts));
+ $query="insert into elim_attr2 values ($id, $attr1, now())";
+ do_query($dbh,$query);
+ $query="insert into elim_attr2 values ($id, $attr1, '2009-01-01')";
+ do_query($dbh,$query);
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Finalize the database population
+####
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"UNLOCK TABLES");
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(0,\$dbh,["elim_facts", "elim_attr1", "elim_attr2"]);
+}
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE");
+}
+
+####
+#### Do some selects on the table
+####
+
+select_test:
+
+#
+# The selects will be:
+# - N pk-lookups with all attributes
+# - pk-attribute-based lookup
+# - latest-attribute value based lookup.
+
+
+###
+### Bare facts select:
+###
+print "testing bare facts facts table\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= ceil(rand($n_facts));
+ $rows+=fetch_all_rows($dbh,"select * from elim_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_bare_facts ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+
+###
+### Full facts select, no elimination:
+###
+print "testing full facts facts table\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select * from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_two_attributes ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+###
+### Now with elimination: select only only one fact
+###
+print "testing selection of one attribute\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select id, attr1 from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_one_attribute ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+###
+### Now with elimination: select only only one fact
+###
+print "testing selection of one attribute\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select id, attr2 from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_one_attribute ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+
+###
+### TODO...
+###
+
+;
+
+####
+#### End of benchmark
+####
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"UNLOCK TABLES");
+}
+if (!$opt_skip_delete)
+{
+ do_query($dbh,"drop table elim_facts, elim_attr1, elim_attr2" . $server->{'drop_attr'});
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(0,\$dbh);
+}
+
+$dbh->disconnect; # close connection
+
+end_benchmark($start_time);
+
=== modified file 'sql/CMakeLists.txt'
--- a/sql/CMakeLists.txt 2008-11-21 14:21:50 +0000
+++ b/sql/CMakeLists.txt 2009-06-30 15:09:36 +0000
@@ -73,7 +73,7 @@ ADD_EXECUTABLE(mysqld
partition_info.cc rpl_utility.cc rpl_injector.cc sql_locale.cc
rpl_rli.cc rpl_mi.cc sql_servers.cc
sql_connect.cc scheduler.cc
- sql_profile.cc event_parse_data.cc
+ sql_profile.cc event_parse_data.cc opt_table_elimination.cc
${PROJECT_SOURCE_DIR}/sql/sql_yacc.cc
${PROJECT_SOURCE_DIR}/sql/sql_yacc.h
${PROJECT_SOURCE_DIR}/include/mysqld_error.h
=== modified file 'sql/Makefile.am'
--- a/sql/Makefile.am 2009-03-12 22:27:35 +0000
+++ b/sql/Makefile.am 2009-06-30 15:09:36 +0000
@@ -121,7 +121,8 @@ mysqld_SOURCES = sql_lex.cc sql_handler.
event_queue.cc event_db_repository.cc events.cc \
sql_plugin.cc sql_binlog.cc \
sql_builtin.cc sql_tablespace.cc partition_info.cc \
- sql_servers.cc event_parse_data.cc
+ sql_servers.cc event_parse_data.cc \
+ opt_table_elimination.cc
nodist_mysqld_SOURCES = mini_client_errors.c pack.c client.c my_time.c my_user.c
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-04-25 10:05:32 +0000
+++ b/sql/item.cc 2009-06-30 15:09:36 +0000
@@ -1915,6 +1915,37 @@ void Item_field::reset_field(Field *f)
name= (char*) f->field_name;
}
+
+bool Item_field::check_column_usage_processor(uchar *arg)
+{
+ Field_processor_info* info=(Field_processor_info*)arg;
+
+ if (field->table == info->table)
+ {
+ /* It is not ok to use columns that are not part of the key of interest: */
+ if (!(field->part_of_key.is_set(info->keyno)))
+ return TRUE;
+
+ /* Find which key part we're using and mark it in needed_key_parts */
+ KEY *key= &field->table->key_info[info->keyno];
+ for (uint part= 0; part < key->key_parts; part++)
+ {
+ if (field->field_index == key->key_part[part].field->field_index)
+ {
+ if (part == info->forbidden_part)
+ return TRUE;
+ info->needed_key_parts |= key_part_map(1) << part;
+ break;
+ }
+ }
+ return FALSE;
+ }
+ else
+ info->used_tables |= this->used_tables();
+ return FALSE;
+}
+
+
const char *Item_ident::full_name() const
{
char *tmp;
@@ -3380,7 +3411,7 @@ static void mark_as_dependent(THD *thd,
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
- current->mark_as_dependent(last);
+ current->mark_as_dependent(last, resolved_item);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
char warn_buff[MYSQL_ERRMSG_SIZE];
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-04-25 10:05:32 +0000
+++ b/sql/item.h 2009-06-30 15:09:36 +0000
@@ -731,7 +731,11 @@ public:
virtual bool val_bool_result() { return val_bool(); }
virtual bool is_null_result() { return is_null(); }
- /* bit map of tables used by item */
+ /*
+ Bitmap of tables used by item
+ (note: if you need to check dependencies on individual columns, check out
+ check_column_usage_processor)
+ */
virtual table_map used_tables() const { return (table_map) 0L; }
/*
Return table map of tables that can't be NULL tables (tables that are
@@ -888,6 +892,8 @@ public:
virtual bool reset_query_id_processor(uchar *query_id_arg) { return 0; }
virtual bool is_expensive_processor(uchar *arg) { return 0; }
virtual bool register_field_in_read_map(uchar *arg) { return 0; }
+ virtual bool check_column_usage_processor(uchar *arg) { return 0; }
+ virtual bool mark_as_eliminated_processor(uchar *arg) { return 0; }
/*
Check if a partition function is allowed
SYNOPSIS
@@ -1011,6 +1017,18 @@ public:
bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs);
};
+/* Data for Item::check_column_usage_processor */
+typedef struct
+{
+ TABLE *table; /* Table of interest */
+ uint keyno; /* Index of interest */
+ uint forbidden_part; /* key part which one is not allowed to refer to */
+ /* [Set by processor] used tables, besides the table of interest */
+ table_map used_tables;
+ /* [Set by processor] Parts of index of interest that expression refers to */
+ uint needed_key_parts;
+} Field_processor_info;
+
class sp_head;
@@ -1477,6 +1495,7 @@ public:
bool find_item_in_field_list_processor(uchar *arg);
bool register_field_in_read_map(uchar *arg);
bool check_partition_func_processor(uchar *int_arg) {return FALSE;}
+ bool check_column_usage_processor(uchar *arg);
void cleanup();
bool result_as_longlong()
{
@@ -2203,6 +2222,10 @@ public:
if (!depended_from)
(*ref)->update_used_tables();
}
+ bool const_item() const
+ {
+ return (*ref)->const_item();
+ }
table_map not_null_tables() const { return (*ref)->not_null_tables(); }
void set_result_field(Field *field) { result_field= field; }
bool is_result_field() { return 1; }
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-01-31 21:22:44 +0000
+++ b/sql/item_subselect.cc 2009-06-30 15:09:36 +0000
@@ -39,7 +39,7 @@ inline Item * and_items(Item* cond, Item
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), engine_changed(0), changed(0), is_correlated(FALSE)
+ const_item_cache(1), in_fix_fields(0), engine_changed(0), changed(0), is_correlated(FALSE)
{
with_subselect= 1;
reset();
@@ -151,10 +151,14 @@ bool Item_subselect::fix_fields(THD *thd
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
+ if (!in_fix_fields)
+ refers_to.empty();
+ eliminated= FALSE;
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
-
+
+ in_fix_fields++;
res= engine->prepare();
// all transformation is done (used by prepared statements)
@@ -181,12 +185,14 @@ bool Item_subselect::fix_fields(THD *thd
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
+ in_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
+ in_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -203,11 +209,30 @@ bool Item_subselect::fix_fields(THD *thd
fixed= 1;
err:
+ in_fix_fields--;
thd->where= save_where;
return res;
}
+bool Item_subselect::check_column_usage_processor(uchar *arg)
+{
+ List_iterator<Item> it(refers_to);
+ Item *item;
+ while ((item= it++))
+ {
+ if (item->walk(&Item::check_column_usage_processor,FALSE, arg))
+ return TRUE;
+ }
+ return FALSE;
+}
+
+bool Item_subselect::mark_as_eliminated_processor(uchar *arg)
+{
+ eliminated= TRUE;
+ return FALSE;
+}
+
bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
uchar *argument)
{
@@ -225,6 +250,7 @@ bool Item_subselect::walk(Item_processor
if (lex->having && (lex->having)->walk(processor, walk_subquery,
argument))
return 1;
+ /* TODO: why does this walk WHERE/HAVING but not ON expressions of outer joins? */
while ((item=li++))
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2008-02-22 10:30:33 +0000
+++ b/sql/item_subselect.h 2009-06-30 15:09:36 +0000
@@ -52,8 +52,16 @@ protected:
bool have_to_be_excluded;
/* cache of constant state */
bool const_item_cache;
-
+
public:
+ /*
+ References from inside the subquery to the select that this predicate is
+ in. References to parent selects not included.
+ */
+ List<Item> refers_to;
+ int in_fix_fields;
+ bool eliminated;
+
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -126,6 +134,8 @@ public:
virtual void reset_value_registration() {}
enum_parsing_place place() { return parsing_place; }
bool walk(Item_processor processor, bool walk_subquery, uchar *arg);
+ bool mark_as_eliminated_processor(uchar *arg);
+ bool check_column_usage_processor(uchar *arg);
/**
Get the SELECT_LEX structure associated with this Item.
=== modified file 'sql/item_sum.cc'
--- a/sql/item_sum.cc 2009-04-25 09:04:38 +0000
+++ b/sql/item_sum.cc 2009-06-30 15:09:36 +0000
@@ -350,7 +350,7 @@ bool Item_sum::register_sum_func(THD *th
sl= sl->master_unit()->outer_select() )
sl->master_unit()->item->with_sum_func= 1;
}
- thd->lex->current_select->mark_as_dependent(aggr_sel);
+ thd->lex->current_select->mark_as_dependent(aggr_sel, NULL);
return FALSE;
}
@@ -542,11 +542,6 @@ void Item_sum::update_used_tables ()
args[i]->update_used_tables();
used_tables_cache|= args[i]->used_tables();
}
-
- used_tables_cache&= PSEUDO_TABLE_BITS;
-
- /* the aggregate function is aggregated into its local context */
- used_tables_cache |= (1 << aggr_sel->join->tables) - 1;
}
}
=== modified file 'sql/item_sum.h'
--- a/sql/item_sum.h 2008-12-09 19:43:10 +0000
+++ b/sql/item_sum.h 2009-06-30 15:09:36 +0000
@@ -255,6 +255,12 @@ protected:
*/
Item **orig_args, *tmp_orig_args[2];
table_map used_tables_cache;
+
+ /*
+ TRUE <=> We've managed to calculate the value of this Item in
+ opt_sum_query(), hence it can be considered constant at all subsequent
+ steps.
+ */
bool forced_const;
public:
@@ -341,6 +347,15 @@ public:
virtual const char *func_name() const= 0;
virtual Item *result_item(Field *field)
{ return new Item_field(field); }
+ /*
+ Return bitmap of tables that are needed to evaluate the item.
+
+ The implementation takes into account the used strategy: items resolved
+ at optimization phase will report 0.
+ Items that depend on the number of join output records, but not columns
+ of any particular table (like COUNT(*)) will report 0 from used_tables(),
+ but will still return false from const_item().
+ */
table_map used_tables() const { return used_tables_cache; }
void update_used_tables ();
void cleanup()
=== added file 'sql/opt_table_elimination.cc'
--- a/sql/opt_table_elimination.cc 1970-01-01 00:00:00 +0000
+++ b/sql/opt_table_elimination.cc 2009-06-30 15:09:36 +0000
@@ -0,0 +1,494 @@
+/**
+ @file
+
+ @brief
+ Table Elimination Module
+
+ @defgroup Table_Elimination Table Elimination Module
+ @{
+*/
+
+#ifdef USE_PRAGMA_IMPLEMENTATION
+#pragma implementation // gcc: Class implementation
+#endif
+
+#include "mysql_priv.h"
+#include "sql_select.h"
+
+/*
+ OVERVIEW
+
+ The module has one entry point - eliminate_tables() function, which one
+ needs to call (once) sometime after update_ref_and_keys() but before the
+ join optimization.
+ eliminate_tables() operates over the JOIN structures. Logically, it
+ removes the right sides of outer join nests. Physically, it changes the
+ following members:
+
+ * Eliminated tables are marked as constant and moved to the front of the
+ join order.
+ * In addition to this, they are recorded in JOIN::eliminated_tables bitmap.
+
+ * All join nests have their NESTED_JOIN::n_tables updated to discount
+ the eliminated tables
+
+ * Items that became disused because they were in the ON expression of an
+ eliminated outer join are notified by means of the Item tree walk which
+ calls Item::mark_as_eliminated_processor for every item
+ - At the moment the only Item that cares is Item_subselect with its
+ Item_subselect::eliminated flag which is used by EXPLAIN code to
+ check if the subquery should be shown in EXPLAIN.
+
+ Table elimination is redone on every PS re-execution.
+*/
+
+static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl);
+static bool table_has_one_match(TABLE *table, table_map bound_tables,
+ bool *multiple_matches);
+static uint
+eliminate_tables_for_list(JOIN *join, TABLE **leaves_arr,
+ List<TABLE_LIST> *join_list,
+ bool its_outer_join,
+ table_map tables_in_list,
+ table_map tables_used_elsewhere,
+ bool *multiple_matches);
+static bool
+extra_keyuses_bind_all_keyparts(table_map bound_tables, TABLE *table,
+ KEYUSE *key_start, KEYUSE *key_end,
+ uint n_keyuses, table_map bound_parts);
+
+/*
+ Perform table elimination
+
+ SYNOPSIS
+ eliminate_tables()
+ join Join to work on
+ const_tbl_count INOUT Number of constant tables (this includes
+ eliminated tables)
+ const_tables INOUT Bitmap of constant tables
+
+ DESCRIPTION
+ This function is the entry point for table elimination.
+ The idea behind table elimination is that if we have an outer join:
+
+ SELECT * FROM t1 LEFT JOIN
+ (t2 JOIN t3) ON t3.primary_key=t1.col AND
+ t4.primary_key=t2.col
+ such that
+
+ 1. columns of the inner tables are not used anywhere ouside the outer
+ join (not in WHERE, not in GROUP/ORDER BY clause, not in select list
+ etc etc), and
+ 2. inner side of the outer join is guaranteed to produce at most one
+ record combination for each record combination of outer tables.
+
+ then the inner side of the outer join can be removed from the query.
+ This is because it will always produce one matching record (either a
+ real match or a NULL-complemented record combination), and since there
+ are no references to columns of the inner tables anywhere, it doesn't
+ matter which record combination it was.
+
+ This function primary handles checking #1. It collects a bitmap of
+ tables that are not used in select list/GROUP BY/ORDER BY/HAVING/etc and
+ thus can possibly be eliminated.
+
+ SIDE EFFECTS
+ See the OVERVIEW section at the top of this file.
+
+*/
+
+void eliminate_tables(JOIN *join)
+{
+ Item *item;
+ table_map used_tables;
+ DBUG_ENTER("eliminate_tables");
+
+ DBUG_ASSERT(join->eliminated_tables == 0);
+
+ /* If there are no outer joins, we have nothing to eliminate: */
+ if (!join->outer_join)
+ DBUG_VOID_RETURN;
+
+ /* Find the tables that are referred to from WHERE/HAVING */
+ used_tables= (join->conds? join->conds->used_tables() : 0) |
+ (join->having? join->having->used_tables() : 0);
+
+ /* Add tables referred to from the select list */
+ List_iterator<Item> it(join->fields_list);
+ while ((item= it++))
+ used_tables |= item->used_tables();
+
+ /* Add tables referred to from ORDER BY and GROUP BY lists */
+ ORDER *all_lists[]= { join->order, join->group_list};
+ for (int i=0; i < 2; i++)
+ {
+ for (ORDER *cur_list= all_lists[i]; cur_list; cur_list= cur_list->next)
+ used_tables |= (*(cur_list->item))->used_tables();
+ }
+
+ THD* thd= join->thd;
+ if (join->select_lex == &thd->lex->select_lex)
+ {
+ /* Multi-table UPDATE and DELETE: don't eliminate the tables we modify: */
+ used_tables |= thd->table_map_for_update;
+
+ /* Multi-table UPDATE: don't eliminate tables referred from SET statement */
+ if (thd->lex->sql_command == SQLCOM_UPDATE_MULTI)
+ {
+ List_iterator<Item> it2(thd->lex->value_list);
+ while ((item= it2++))
+ used_tables |= item->used_tables();
+ }
+ }
+
+ table_map all_tables= join->all_tables_map();
+ if (all_tables & ~used_tables)
+ {
+ /* There are some tables that we probably could eliminate. Try it. */
+ TABLE *leaves_array[MAX_TABLES];
+ bool multiple_matches= FALSE;
+ eliminate_tables_for_list(join, leaves_array, join->join_list, FALSE,
+ all_tables, used_tables, &multiple_matches);
+ }
+ DBUG_VOID_RETURN;
+}
+
+/*
+ Perform table elimination in a given join list
+
+ SYNOPSIS
+ eliminate_tables_for_list()
+ join The join
+ leaves_arr OUT Store here an array of leaf (base) tables that
+ are descendants of the join_list, and increment
+ the pointer to point right above the array.
+ join_list Join list to work on
+ its_outer_join TRUE <=> join_list is an inner side of an outer
+ join
+ FALSE <=> otherwise (this is top-level join list)
+ tables_in_list Bitmap of tables embedded in the join_list.
+ tables_used_elsewhere Bitmap of tables that are referred to from
+ somewhere outside of the join list (e.g.
+ select list, HAVING, etc).
+
+ DESCRIPTION
+ Perform table elimination for a join list.
+ Try eliminating children nests first.
+ The "all tables in join nest can produce only one matching record
+ combination" property checking is modeled after constant table detection,
+ plus we reuse info attempts to eliminate child join nests.
+
+ RETURN
+ Number of children left after elimination. 0 means everything was
+ eliminated.
+*/
+static uint
+eliminate_tables_for_list(JOIN *join, TABLE **leaves_arr,
+ List<TABLE_LIST> *join_list,
+ bool its_outer_join,
+ table_map tables_in_list,
+ table_map tables_used_elsewhere,
+ bool *multiple_matches)
+{
+ TABLE_LIST *tbl;
+ List_iterator<TABLE_LIST> it(*join_list);
+ table_map tables_used_on_left= 0;
+ TABLE **cur_table= leaves_arr;
+ bool children_have_multiple_matches= FALSE;
+ uint remaining_children= 0;
+
+ while ((tbl= it++))
+ {
+ if (tbl->on_expr)
+ {
+ table_map outside_used_tables= tables_used_elsewhere |
+ tables_used_on_left;
+ bool multiple_matches= FALSE;
+ if (tbl->nested_join)
+ {
+ /* This is "... LEFT JOIN (join_nest) ON cond" */
+ uint n;
+ if (!(n= eliminate_tables_for_list(join, cur_table,
+ &tbl->nested_join->join_list, TRUE,
+ tbl->nested_join->used_tables,
+ outside_used_tables,
+ &multiple_matches)))
+ {
+ mark_as_eliminated(join, tbl);
+ }
+ else
+ remaining_children++;
+ tbl->nested_join->n_tables= n;
+ }
+ else
+ {
+ /* This is "... LEFT JOIN tbl ON cond" */
+ if (!(tbl->table->map & outside_used_tables) &&
+ table_has_one_match(tbl->table, join->all_tables_map(),
+ &multiple_matches))
+ {
+ mark_as_eliminated(join, tbl);
+ }
+ else
+ remaining_children++;
+ }
+ tables_used_on_left |= tbl->on_expr->used_tables();
+ children_have_multiple_matches= children_have_multiple_matches ||
+ multiple_matches;
+ }
+ else
+ {
+ DBUG_ASSERT(!tbl->nested_join);
+ remaining_children++;
+ }
+
+ if (tbl->table)
+ *(cur_table++)= tbl->table;
+ }
+
+ *multiple_matches |= children_have_multiple_matches;
+
+ /* Try eliminating the nest we're called for */
+ if (its_outer_join && !children_have_multiple_matches &&
+ !(tables_in_list & tables_used_elsewhere))
+ {
+ table_map bound_tables= join->const_table_map | (join->all_tables_map() &
+ ~tables_in_list);
+ table_map old_bound_tables;
+ TABLE **leaves_end= cur_table;
+ /*
+ Do the same as const table search table: try to expand the set of bound
+ tables until it covers all tables in the join_list
+ */
+ do
+ {
+ old_bound_tables= bound_tables;
+ for (cur_table= leaves_arr; cur_table != leaves_end; cur_table++)
+ {
+ if (!((*cur_table)->map & join->eliminated_tables) &&
+ table_has_one_match(*cur_table, bound_tables, multiple_matches))
+ {
+ bound_tables |= (*cur_table)->map;
+ }
+ }
+ } while (old_bound_tables != bound_tables);
+
+ if (!(tables_in_list & ~bound_tables))
+ {
+ /*
+ This join_list can be eliminated. Signal about this to the caller by
+ returning number of tables.
+ */
+ remaining_children= 0;
+ }
+ }
+ return remaining_children;
+}
+
+
+/*
+ Check if the table will produce at most one matching record
+
+ SYNOPSIS
+ table_has_one_match()
+ table The [base] table being checked
+ bound_tables Tables that should be considered bound.
+ multiple_matches OUT Set to TRUE when there is no way we could
+ find find a limitation that would give us one-match
+ property.
+
+ DESCRIPTION
+ Check if table will produce at most one matching record for each record
+ combination of tables in bound_tables bitmap.
+
+ The check is based on ref analysis data, KEYUSE structures. We're
+ handling two cases:
+
+ 1. Table has a UNIQUE KEY(uk_col_1, ... uk_col_N), and for each uk_col_i
+ there is a KEYUSE that represents a limitation in form
+
+ table.uk_col_i = func(bound_tables) (X)
+
+ 2. Same as above but we also handle limitations in form
+
+ table.uk_col_i = func(bound_tables, uk_col_j1, ... uk_col_j2) (XX)
+
+ where values of uk_col_jN are known to be bound because for them we
+ have an equality of form (X) or (XX).
+
+ RETURN
+ TRUE Yes, at most one match
+ FALSE No
+*/
+
+static bool table_has_one_match(TABLE *table, table_map bound_tables,
+ bool *multiple_matches)
+{
+ KEYUSE *keyuse= table->reginfo.join_tab->keyuse;
+ if (keyuse)
+ {
+ while (keyuse->table == table)
+ {
+ uint key= keyuse->key;
+ key_part_map bound_parts=0;
+ uint n_unusable=0;
+ bool ft_key= test(keyuse->keypart == FT_KEYPART);
+ KEY *keyinfo= table->key_info + key;
+ KEYUSE *key_start = keyuse;
+
+ do /* For each keypart and each way to read it */
+ {
+ if (keyuse->type == KEYUSE_USABLE)
+ {
+ if(!(keyuse->used_tables & ~bound_tables) &&
+ !(keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))
+ {
+ bound_parts |= keyuse->keypart_map;
+ }
+ }
+ else
+ n_unusable++;
+ keyuse++;
+ } while (keyuse->table == table && keyuse->key == key);
+
+ if (ft_key || ((keyinfo->flags & (HA_NOSAME | HA_NULL_PART_KEY))
+ != HA_NOSAME))
+ {
+ continue;
+ }
+
+ if (bound_parts == PREV_BITS(key_part_map, keyinfo->key_parts) ||
+ extra_keyuses_bind_all_keyparts(bound_tables, table, key_start,
+ keyuse, n_unusable, bound_parts))
+ {
+ return TRUE;
+ }
+ }
+ }
+ return FALSE;
+}
+
+
+/*
+ Check if KEYUSE elemements with unusable==TRUE bind all parts of the key
+
+ SYNOPSIS
+
+ extra_keyuses_bind_all_keyparts()
+ bound_tables Tables which can be considered constants
+ table Table we're examining
+ key_start Start of KEYUSE array with elements describing the key
+ of interest
+ key_end End of the array + 1
+ n_keyuses Number of elements in the array that have unusable==TRUE
+ bound_parts Key parts whose values are known to be bound.
+
+ DESCRIPTION
+ Check if unusable KEYUSE elements cause all parts of key to be bound. An
+ unusable keyuse element makes a keypart bound when it
+ represents the following:
+
+ keyXpartY=func(bound_columns, preceding_tables)
+
+ RETURN
+ TRUE Yes, at most one match
+ FALSE No
+*/
+
+static bool
+extra_keyuses_bind_all_keyparts(table_map bound_tables, TABLE *table,
+ KEYUSE *key_start, KEYUSE *key_end,
+ uint n_keyuses, table_map bound_parts)
+{
+ /*
+ We need
+ - some 'unusable' KEYUSE elements to work on
+ - some keyparts to be already bound to start inferences:
+ */
+ if (n_keyuses && bound_parts)
+ {
+ KEY *keyinfo= table->key_info + key_start->key;
+ bool bound_more_parts;
+ do
+ {
+ bound_more_parts= FALSE;
+ for (KEYUSE *k= key_start; k!=key_end; k++)
+ {
+ if (k->type == KEYUSE_UNKNOWN)
+ {
+ Field_processor_info fp= {table, k->key, k->keypart, 0, 0};
+ if (k->val->walk(&Item::check_column_usage_processor, FALSE,
+ (uchar*)&fp))
+ k->type= KEYUSE_NO_BIND;
+ else
+ {
+ k->used_tables= fp.used_tables;
+ k->keypart_map= fp.needed_key_parts;
+ k->type= KEYUSE_BIND;
+ }
+ }
+
+ if (k->type == KEYUSE_BIND)
+ {
+ /*
+ If this is a binding keyuse, such that
+ - all tables it refers to are bound,
+ - all parts it refers to are bound
+ - but the key part it binds is not itself bound
+ */
+ if (!(k->used_tables & ~bound_tables) &&
+ !(k->keypart_map & ~bound_parts) &&
+ !(bound_parts & key_part_map(1) << k->keypart))
+ {
+ bound_parts|= key_part_map(1) << k->keypart;
+ if (bound_parts == PREV_BITS(key_part_map, keyinfo->key_parts))
+ return TRUE;
+ bound_more_parts= TRUE;
+ }
+ }
+ }
+ } while (bound_more_parts);
+ }
+ return FALSE;
+}
+
+
+/*
+ Mark one table or the whole join nest as eliminated.
+*/
+static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl)
+{
+ TABLE *table;
+ /*
+ NOTE: there are TABLE_LIST object that have
+ tbl->table!= NULL && tbl->nested_join!=NULL and
+ tbl->table == tbl->nested_join->join_list->element(..)->table
+ */
+ if (tbl->nested_join)
+ {
+ TABLE_LIST *child;
+ List_iterator<TABLE_LIST> it(tbl->nested_join->join_list);
+ while ((child= it++))
+ mark_as_eliminated(join, child);
+ }
+ else if ((table= tbl->table))
+ {
+ JOIN_TAB *tab= tbl->table->reginfo.join_tab;
+ if (!(join->const_table_map & tab->table->map))
+ {
+ DBUG_PRINT("info", ("Eliminated table %s", table->alias));
+ tab->type= JT_CONST;
+ join->eliminated_tables |= table->map;
+ join->const_table_map|= table->map;
+ set_position(join, join->const_tables++, tab, (KEYUSE*)0);
+ }
+ }
+
+ if (tbl->on_expr)
+ tbl->on_expr->walk(&Item::mark_as_eliminated_processor, FALSE, NULL);
+}
+
+/**
+ @} (end of group Table_Elimination)
+*/
+
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_lex.cc 2009-06-30 15:09:36 +0000
@@ -1778,7 +1778,7 @@ void st_select_lex_unit::exclude_tree()
'last' should be reachable from this st_select_lex_node
*/
-void st_select_lex::mark_as_dependent(st_select_lex *last)
+void st_select_lex::mark_as_dependent(st_select_lex *last, Item *dependency)
{
/*
Mark all selects from resolved to 1 before select where was
@@ -1804,6 +1804,8 @@ void st_select_lex::mark_as_dependent(st
}
is_correlated= TRUE;
this->master_unit()->item->is_correlated= TRUE;
+ if (dependency)
+ this->master_unit()->item->refers_to.push_back(dependency);
}
bool st_select_lex_node::set_braces(bool value) { return 1; }
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2009-03-17 20:29:24 +0000
+++ b/sql/sql_lex.h 2009-06-30 15:09:36 +0000
@@ -743,7 +743,7 @@ public:
return master_unit()->return_after_parsing();
}
- void mark_as_dependent(st_select_lex *last);
+ void mark_as_dependent(st_select_lex *last, Item *dependency);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-05-19 09:28:05 +0000
+++ b/sql/sql_select.cc 2009-06-30 15:09:36 +0000
@@ -60,7 +60,6 @@ static bool update_ref_and_keys(THD *thd
table_map table_map, SELECT_LEX *select_lex,
st_sargable_param **sargables);
static int sort_keyuse(KEYUSE *a,KEYUSE *b);
-static void set_position(JOIN *join,uint index,JOIN_TAB *table,KEYUSE *key);
static bool create_ref_for_key(JOIN *join, JOIN_TAB *j, KEYUSE *org_keyuse,
table_map used_tables);
static bool choose_plan(JOIN *join,table_map join_tables);
@@ -2381,6 +2380,13 @@ mysql_select(THD *thd, Item ***rref_poin
}
else
{
+ /*
+ When in EXPLAIN, delay deleting the joins so that they are still
+ available when we're producing EXPLAIN EXTENDED warning text.
+ */
+ if (select_options & SELECT_DESCRIBE)
+ free_join= 0;
+
if (!(join= new JOIN(thd, fields, select_options, result)))
DBUG_RETURN(TRUE);
thd_proc_info(thd, "init");
@@ -2468,6 +2474,7 @@ static ha_rows get_quick_record_count(TH
DBUG_RETURN(HA_POS_ERROR); /* This shouldn't happend */
}
+
/*
This structure is used to collect info on potentially sargable
predicates in order to check whether they become sargable after
@@ -2646,24 +2653,31 @@ make_join_statistics(JOIN *join, TABLE_L
~outer_join, join->select_lex, &sargables))
goto error;
- /* Read tables with 0 or 1 rows (system tables) */
join->const_table_map= 0;
+ join->const_tables= const_count;
+ eliminate_tables(join);
+ const_count= join->const_tables;
+ found_const_table_map= join->const_table_map;
+ /* Read tables with 0 or 1 rows (system tables) */
for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
p_pos < p_end ;
p_pos++)
{
- int tmp;
s= p_pos->table;
- s->type=JT_SYSTEM;
- join->const_table_map|=s->table->map;
- if ((tmp=join_read_const_table(s, p_pos)))
+ if (! (s->table->map & join->eliminated_tables))
{
- if (tmp > 0)
- goto error; // Fatal error
+ int tmp;
+ s->type=JT_SYSTEM;
+ join->const_table_map|=s->table->map;
+ if ((tmp=join_read_const_table(s, p_pos)))
+ {
+ if (tmp > 0)
+ goto error; // Fatal error
+ }
+ else
+ found_const_table_map|= s->table->map;
}
- else
- found_const_table_map|= s->table->map;
}
/* loop until no more const tables are found */
@@ -2688,7 +2702,8 @@ make_join_statistics(JOIN *join, TABLE_L
substitution of a const table the key value happens to be null
then we can state that there are no matches for this equi-join.
*/
- if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map)
+ if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map &&
+ !(table->map & join->eliminated_tables))
{
/*
When performing an outer join operation if there are no matching rows
@@ -2747,14 +2762,16 @@ make_join_statistics(JOIN *join, TABLE_L
{
start_keyuse=keyuse;
key=keyuse->key;
- s->keys.set_bit(key); // QQ: remove this ?
+ if (keyuse->type == KEYUSE_USABLE)
+ s->keys.set_bit(key); // QQ: remove this ?
refs=0;
const_ref.clear_all();
eq_part.clear_all();
do
{
- if (keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize)
+ if (keyuse->type == KEYUSE_USABLE &&
+ keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize)
{
if (!((~found_const_table_map) & keyuse->used_tables))
const_ref.set_bit(keyuse->keypart);
@@ -2954,17 +2971,35 @@ typedef struct key_field_t {
*/
bool null_rejecting;
bool *cond_guard; /* See KEYUSE::cond_guard */
+ enum keyuse_type type; /* See KEYUSE::type */
} KEY_FIELD;
-/* Values in optimize */
-#define KEY_OPTIMIZE_EXISTS 1
-#define KEY_OPTIMIZE_REF_OR_NULL 2
/**
Merge new key definitions to old ones, remove those not used in both.
This is called for OR between different levels.
+ That is, the function operates on an array of KEY_FIELD elements which has
+ two parts:
+
+ $LEFT_PART $RIGHT_PART
+ +-----------------------+-----------------------+
+ start new_fields end
+
+ $LEFT_PART and $RIGHT_PART are arrays that have KEY_FIELD elements for two
+ parts of the OR condition. Our task is to produce an array of KEY_FIELD
+ elements that would correspond to "$LEFT_PART OR $RIGHT_PART".
+
+ The rules for combining elements are as follows:
+ (keyfieldA1 AND keyfieldA2 AND ...) OR (keyfieldB1 AND keyfieldB2 AND ...)=
+ AND_ij (keyfieldA_i OR keyfieldB_j)
+
+ We discard all (keyfieldA_i OR keyfieldB_j) that refer to different
+ fields. For those referring to the same field, the logic is as follows:
+
+ t.keycol=
+
To be able to do 'ref_or_null' we merge a comparison of a column
and 'column IS NULL' to one test. This is useful for sub select queries
that are internally transformed to something like:.
@@ -3029,13 +3064,18 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
+ /*
+ The conditions are the same, hence their usabilities should
+ be, too (TODO: shouldn't that apply to the above
+ null_rejecting and optimize attributes?)
+ */
+ DBUG_ASSERT(old->type == new_fields->type);
}
}
else if (old->eq_func && new_fields->eq_func &&
old->val->eq_by_collation(new_fields->val,
old->field->binary(),
old->field->charset()))
-
{
old->level= and_level;
old->optimize= ((old->optimize & new_fields->optimize &
@@ -3044,10 +3084,15 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
+ // "t.key_col=const" predicates are always usable
+ DBUG_ASSERT(old->type == KEYUSE_USABLE &&
+ new_fields->type == KEYUSE_USABLE);
}
else if (old->eq_func && new_fields->eq_func &&
- ((old->val->const_item() && old->val->is_null()) ||
- new_fields->val->is_null()))
+ ((new_fields->type == KEYUSE_USABLE &&
+ old->val->const_item() && old->val->is_null()) ||
+ ((old->type == KEYUSE_USABLE && new_fields->val->is_null()))))
+ /* TODO ^ why is the above asymmetric, why const_item()? */
{
/* field = expression OR field IS NULL */
old->level= and_level;
@@ -3118,6 +3163,7 @@ add_key_field(KEY_FIELD **key_fields,uin
table_map usable_tables, SARGABLE_PARAM **sargables)
{
uint exists_optimize= 0;
+ bool optimizable=0;
if (!(field->flags & PART_KEY_FLAG))
{
// Don't remove column IS NULL on a LEFT JOIN table
@@ -3130,15 +3176,12 @@ add_key_field(KEY_FIELD **key_fields,uin
else
{
table_map used_tables=0;
- bool optimizable=0;
for (uint i=0; i<num_values; i++)
{
used_tables|=(value[i])->used_tables();
if (!((value[i])->used_tables() & (field->table->map | RAND_TABLE_BIT)))
optimizable=1;
}
- if (!optimizable)
- return;
if (!(usable_tables & field->table->map))
{
if (!eq_func || (*value)->type() != Item::NULL_ITEM ||
@@ -3151,7 +3194,8 @@ add_key_field(KEY_FIELD **key_fields,uin
JOIN_TAB *stat=field->table->reginfo.join_tab;
key_map possible_keys=field->key_start;
possible_keys.intersect(field->table->keys_in_use_for_query);
- stat[0].keys.merge(possible_keys); // Add possible keys
+ if (optimizable)
+ stat[0].keys.merge(possible_keys); // Add possible keys
/*
Save the following cases:
@@ -3244,6 +3288,7 @@ add_key_field(KEY_FIELD **key_fields,uin
(*key_fields)->val= *value;
(*key_fields)->level= and_level;
(*key_fields)->optimize= exists_optimize;
+ (*key_fields)->type= optimizable? KEYUSE_USABLE : KEYUSE_UNKNOWN;
/*
If the condition has form "tbl.keypart = othertbl.field" and
othertbl.field can be NULL, there will be no matches if othertbl.field
@@ -3555,6 +3600,7 @@ add_key_part(DYNAMIC_ARRAY *keyuse_array
keyuse.optimize= key_field->optimize & KEY_OPTIMIZE_REF_OR_NULL;
keyuse.null_rejecting= key_field->null_rejecting;
keyuse.cond_guard= key_field->cond_guard;
+ keyuse.type= key_field->type;
VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse));
}
}
@@ -3563,7 +3609,6 @@ add_key_part(DYNAMIC_ARRAY *keyuse_array
}
-#define FT_KEYPART (MAX_REF_PARTS+10)
static void
add_ft_keys(DYNAMIC_ARRAY *keyuse_array,
@@ -3622,6 +3667,7 @@ add_ft_keys(DYNAMIC_ARRAY *keyuse_array,
keyuse.used_tables=cond_func->key_item()->used_tables();
keyuse.optimize= 0;
keyuse.keypart_map= 0;
+ keyuse.type= KEYUSE_USABLE;
VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse));
}
@@ -3636,6 +3682,13 @@ sort_keyuse(KEYUSE *a,KEYUSE *b)
return (int) (a->key - b->key);
if (a->keypart != b->keypart)
return (int) (a->keypart - b->keypart);
+
+ // Usable ones go before the unusable
+ int a_ok= test(a->type == KEYUSE_USABLE);
+ int b_ok= test(b->type == KEYUSE_USABLE);
+ if (a_ok != b_ok)
+ return a_ok? -1 : 1;
+
// Place const values before other ones
if ((res= test((a->used_tables & ~OUTER_REF_TABLE_BIT)) -
test((b->used_tables & ~OUTER_REF_TABLE_BIT))))
@@ -3846,7 +3899,8 @@ update_ref_and_keys(THD *thd, DYNAMIC_AR
found_eq_constant=0;
for (i=0 ; i < keyuse->elements-1 ; i++,use++)
{
- if (!use->used_tables && use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
+ if (use->type == KEYUSE_USABLE && !use->used_tables &&
+ use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
use->table->const_key_parts[use->key]|= use->keypart_map;
if (use->keypart != FT_KEYPART)
{
@@ -3870,7 +3924,8 @@ update_ref_and_keys(THD *thd, DYNAMIC_AR
/* Save ptr to first use */
if (!use->table->reginfo.join_tab->keyuse)
use->table->reginfo.join_tab->keyuse=save_pos;
- use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
+ if (use->type == KEYUSE_USABLE)
+ use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
save_pos++;
}
i=(uint) (save_pos-(KEYUSE*) keyuse->buffer);
@@ -3900,7 +3955,7 @@ static void optimize_keyuse(JOIN *join,
To avoid bad matches, we don't make ref_table_rows less than 100.
*/
keyuse->ref_table_rows= ~(ha_rows) 0; // If no ref
- if (keyuse->used_tables &
+ if (keyuse->type == KEYUSE_USABLE && keyuse->used_tables &
(map= (keyuse->used_tables & ~join->const_table_map &
~OUTER_REF_TABLE_BIT)))
{
@@ -3990,8 +4045,7 @@ add_group_and_distinct_keys(JOIN *join,
/** Save const tables first as used tables. */
-static void
-set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key)
+void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key)
{
join->positions[idx].table= table;
join->positions[idx].key=key;
@@ -4093,7 +4147,8 @@ best_access_path(JOIN *join,
if 1. expression doesn't refer to forward tables
2. we won't get two ref-or-null's
*/
- if (!(remaining_tables & keyuse->used_tables) &&
+ if (keyuse->type == KEYUSE_USABLE &&
+ !(remaining_tables & keyuse->used_tables) &&
!(ref_or_null_part && (keyuse->optimize &
KEY_OPTIMIZE_REF_OR_NULL)))
{
@@ -5547,7 +5602,8 @@ static bool create_ref_for_key(JOIN *joi
*/
do
{
- if (!(~used_tables & keyuse->used_tables))
+ if (!(~used_tables & keyuse->used_tables) &&
+ keyuse->type == KEYUSE_USABLE)
{
if (keyparts == keyuse->keypart &&
!(found_part_ref_or_null & keyuse->optimize))
@@ -5597,9 +5653,11 @@ static bool create_ref_for_key(JOIN *joi
uint i;
for (i=0 ; i < keyparts ; keyuse++,i++)
{
- while (keyuse->keypart != i ||
- ((~used_tables) & keyuse->used_tables))
+ while (keyuse->keypart != i || ((~used_tables) & keyuse->used_tables) ||
+ !(keyuse->type == KEYUSE_USABLE))
+ {
keyuse++; /* Skip other parts */
+ }
uint maybe_null= test(keyinfo->key_part[i].null_bit);
j->ref.items[i]=keyuse->val; // Save for cond removal
@@ -5757,6 +5815,7 @@ JOIN::make_simple_join(JOIN *parent, TAB
tables= 1;
const_tables= 0;
const_table_map= 0;
+ eliminated_tables= 0;
tmp_table_param.field_count= tmp_table_param.sum_func_count=
tmp_table_param.func_count= 0;
tmp_table_param.copy_field= tmp_table_param.copy_field_end=0;
@@ -6021,7 +6080,7 @@ make_outerjoin_info(JOIN *join)
}
if (!tab->first_inner)
tab->first_inner= nested_join->first_nested;
- if (++nested_join->counter < nested_join->join_list.elements)
+ if (++nested_join->counter < nested_join->n_tables)
break;
/* Table tab is the last inner table for nested join. */
nested_join->first_nested->last_inner= tab;
@@ -8575,6 +8634,8 @@ simplify_joins(JOIN *join, List<TABLE_LI
conds= simplify_joins(join, &nested_join->join_list, conds, top);
used_tables= nested_join->used_tables;
not_null_tables= nested_join->not_null_tables;
+ /* The following two might become unequal after table elimination: */
+ nested_join->n_tables= nested_join->join_list.elements;
}
else
{
@@ -8733,7 +8794,7 @@ static uint build_bitmap_for_nested_join
with anything)
2. we could run out bits in nested_join_map otherwise.
*/
- if (nested_join->join_list.elements != 1)
+ if (nested_join->n_tables != 1)
{
nested_join->nj_map= (nested_join_map) 1 << first_unused++;
first_unused= build_bitmap_for_nested_joins(&nested_join->join_list,
@@ -8894,7 +8955,7 @@ static bool check_interleaving_with_nj(J
join->cur_embedding_map |= next_emb->nested_join->nj_map;
}
- if (next_emb->nested_join->join_list.elements !=
+ if (next_emb->nested_join->n_tables !=
next_emb->nested_join->counter)
break;
@@ -8926,9 +8987,23 @@ static void restore_prev_nj_state(JOIN_T
JOIN *join= last->join;
while (last_emb)
{
+ /*
+ psergey-elim: (nevermind)
+ new_prefix= cur_prefix & ~last;
+ if (!(new_prefix & cur_table_map)) // removed last inner table
+ {
+ join->cur_embedding_map&= ~last_emb->nested_join->nj_map;
+ }
+ else (current)
+ {
+ // Won't hurt doing it all the time:
+ join->cur_embedding_map |= ...;
+ }
+ else
+ */
if (!(--last_emb->nested_join->counter))
join->cur_embedding_map&= ~last_emb->nested_join->nj_map;
- else if (last_emb->nested_join->join_list.elements-1 ==
+ else if (last_emb->nested_join->n_tables-1 ==
last_emb->nested_join->counter)
join->cur_embedding_map|= last_emb->nested_join->nj_map;
else
@@ -16202,6 +16277,14 @@ static void select_describe(JOIN *join,
tmp3.length(0);
quick_type= -1;
+
+ /* Don't show eliminated tables */
+ if (table->map & join->eliminated_tables)
+ {
+ used_tables|=table->map;
+ continue;
+ }
+
item_list.empty();
/* id */
item_list.push_back(new Item_uint((uint32)
@@ -16524,8 +16607,11 @@ static void select_describe(JOIN *join,
unit;
unit= unit->next_unit())
{
- if (mysql_explain_union(thd, unit, result))
- DBUG_VOID_RETURN;
+ if (!(unit->item && unit->item->eliminated))
+ {
+ if (mysql_explain_union(thd, unit, result))
+ DBUG_VOID_RETURN;
+ }
}
DBUG_VOID_RETURN;
}
@@ -16566,7 +16652,6 @@ bool mysql_explain_union(THD *thd, SELEC
unit->fake_select_lex->options|= SELECT_DESCRIBE;
if (!(res= unit->prepare(thd, result, SELECT_NO_UNLOCK | SELECT_DESCRIBE)))
res= unit->exec();
- res|= unit->cleanup();
}
else
{
@@ -16599,6 +16684,7 @@ bool mysql_explain_union(THD *thd, SELEC
*/
static void print_join(THD *thd,
+ table_map eliminated_tables,
String *str,
List<TABLE_LIST> *tables,
enum_query_type query_type)
@@ -16614,12 +16700,33 @@ static void print_join(THD *thd,
*t= ti++;
DBUG_ASSERT(tables->elements >= 1);
- (*table)->print(thd, str, query_type);
+ /*
+ Assert that the first table in the list isn't eliminated. This comes from
+ the fact that the first table can't be inner table of an outer join.
+ */
+ DBUG_ASSERT(!eliminated_tables ||
+ !(((*table)->table && ((*table)->table->map & eliminated_tables)) ||
+ ((*table)->nested_join && !((*table)->nested_join->used_tables &
+ ~eliminated_tables))));
+ (*table)->print(thd, eliminated_tables, str, query_type);
TABLE_LIST **end= table + tables->elements;
for (TABLE_LIST **tbl= table + 1; tbl < end; tbl++)
{
TABLE_LIST *curr= *tbl;
+ /*
+ The "eliminated_tables &&" check guards againist the case of
+ printing the query for CREATE VIEW. We do that without having run
+ JOIN::optimize() and so will have nested_join->used_tables==0.
+ */
+ if (eliminated_tables &&
+ ((curr->table && (curr->table->map & eliminated_tables)) ||
+ (curr->nested_join && !(curr->nested_join->used_tables &
+ ~eliminated_tables))))
+ {
+ continue;
+ }
+
if (curr->outer_join)
{
/* MySQL converts right to left joins */
@@ -16629,7 +16736,7 @@ static void print_join(THD *thd,
str->append(STRING_WITH_LEN(" straight_join "));
else
str->append(STRING_WITH_LEN(" join "));
- curr->print(thd, str, query_type);
+ curr->print(thd, eliminated_tables, str, query_type);
if (curr->on_expr)
{
str->append(STRING_WITH_LEN(" on("));
@@ -16683,12 +16790,13 @@ Index_hint::print(THD *thd, String *str)
@param str string where table should be printed
*/
-void TABLE_LIST::print(THD *thd, String *str, enum_query_type query_type)
+void TABLE_LIST::print(THD *thd, table_map eliminated_tables, String *str,
+ enum_query_type query_type)
{
if (nested_join)
{
str->append('(');
- print_join(thd, str, &nested_join->join_list, query_type);
+ print_join(thd, eliminated_tables, str, &nested_join->join_list, query_type);
str->append(')');
}
else
@@ -16830,7 +16938,7 @@ void st_select_lex::print(THD *thd, Stri
{
str->append(STRING_WITH_LEN(" from "));
/* go through join tree */
- print_join(thd, str, &top_join_list, query_type);
+ print_join(thd, join? join->eliminated_tables: 0, str, &top_join_list, query_type);
}
else if (where)
{
=== modified file 'sql/sql_select.h'
--- a/sql/sql_select.h 2009-04-25 10:05:32 +0000
+++ b/sql/sql_select.h 2009-06-30 15:09:36 +0000
@@ -28,6 +28,45 @@
#include "procedure.h"
#include <myisam.h>
+#define FT_KEYPART (MAX_REF_PARTS+10)
+/* Values in optimize */
+#define KEY_OPTIMIZE_EXISTS 1
+#define KEY_OPTIMIZE_REF_OR_NULL 2
+
+/* KEYUSE element types */
+enum keyuse_type
+{
+ /*
+ val refers to the same table, this is either KEYUSE_BIND or KEYUSE_NO_BIND
+ type, we didn't determine which one yet.
+ */
+ KEYUSE_UNKNOWN= 0,
+ /*
+ 'regular' keyuse, i.e. it represents one of the following
+ * t.keyXpartY = func(constants, other-tables)
+ * t.keyXpartY IS NULL
+ * t.keyXpartY = func(constants, other-tables) OR t.keyXpartY IS NULL
+ and can be used to construct ref acces
+ */
+ KEYUSE_USABLE,
+ /*
+ The keyuse represents a condition in form:
+
+ t.uniq_keyXpartY = func(other parts of uniq_keyX)
+
+ This can't be used to construct uniq_keyX but we could use it to determine
+ that the table will produce at most one match.
+ */
+ KEYUSE_BIND,
+ /*
+ Keyuse that's not usable for ref access and doesn't meet the criteria of
+ KEYUSE_BIND. Examples:
+ t.keyXpartY = func(t.keyXpartY)
+ t.keyXpartY = func(column of t that's not covered by keyX)
+ */
+ KEYUSE_NO_BIND
+};
+
typedef struct keyuse_t {
TABLE *table;
Item *val; /**< or value if no field */
@@ -51,6 +90,15 @@ typedef struct keyuse_t {
NULL - Otherwise (the source equality can't be turned off)
*/
bool *cond_guard;
+ /*
+ 1 <=> This keyuse can be used to construct key access.
+ 0 <=> Otherwise. Currently unusable KEYUSEs represent equalities
+ where one table column refers to another one, like this:
+ t.keyXpartA=func(t.keyXpartB)
+ This equality cannot be used for index access but is useful
+ for table elimination.
+ */
+ enum keyuse_type type;
} KEYUSE;
class store_key;
@@ -210,7 +258,7 @@ typedef struct st_join_table {
JOIN *join;
/** Bitmap of nested joins this table is part of */
nested_join_map embedding_map;
-
+
void cleanup();
inline bool is_using_loose_index_scan()
{
@@ -285,7 +333,15 @@ public:
fetching data from a cursor
*/
bool resume_nested_loop;
- table_map const_table_map,found_const_table_map;
+ table_map const_table_map;
+ /*
+ Constant tables for which we have found a row (as opposed to those for
+ which we didn't).
+ */
+ table_map found_const_table_map;
+
+ /* Tables removed by table elimination. Set to 0 before the elimination. */
+ table_map eliminated_tables;
/*
Bitmap of all inner tables from outer joins
*/
@@ -425,6 +481,7 @@ public:
table= 0;
tables= 0;
const_tables= 0;
+ eliminated_tables= 0;
join_list= 0;
sort_and_group= 0;
first_record= 0;
@@ -530,6 +587,10 @@ public:
return (unit == &thd->lex->unit && (unit->fake_select_lex == 0 ||
select_lex == unit->fake_select_lex));
}
+ inline table_map all_tables_map()
+ {
+ return (table_map(1) << tables) - 1;
+ }
private:
bool make_simple_join(JOIN *join, TABLE *tmp_table);
};
@@ -730,9 +791,12 @@ bool error_if_full_join(JOIN *join);
int report_error(TABLE *table, int error);
int safe_index_read(JOIN_TAB *tab);
COND *remove_eq_conds(THD *thd, COND *cond, Item::cond_result *cond_value);
+void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key);
inline bool optimizer_flag(THD *thd, uint flag)
{
return (thd->variables.optimizer_switch & flag);
}
+void eliminate_tables(JOIN *join);
+
=== modified file 'sql/table.h'
--- a/sql/table.h 2009-02-19 09:01:25 +0000
+++ b/sql/table.h 2009-06-30 15:09:36 +0000
@@ -1366,7 +1366,8 @@ struct TABLE_LIST
return (derived || view || schema_table || (create && !table->db_stat) ||
!table);
}
- void print(THD *thd, String *str, enum_query_type query_type);
+ void print(THD *thd, table_map eliminated_tables, String *str,
+ enum_query_type query_type);
bool check_single_table(TABLE_LIST **table, table_map map,
TABLE_LIST *view);
bool set_insert_values(MEM_ROOT *mem_root);
@@ -1615,7 +1616,11 @@ public:
typedef struct st_nested_join
{
List<TABLE_LIST> join_list; /* list of elements in the nested join */
- table_map used_tables; /* bitmap of tables in the nested join */
+ /*
+ Bitmap of tables within this nested join (including those embedded within
+ its children), including tables removed by table elimination.
+ */
+ table_map used_tables;
table_map not_null_tables; /* tables that rejects nulls */
struct st_join_table *first_nested;/* the first nested table in the plan */
/*
@@ -1626,6 +1631,11 @@ typedef struct st_nested_join
Before each use the counters are zeroed by reset_nj_counters.
*/
uint counter;
+ /*
+ Number of elements in join_list that were not (or contain table(s) that
+ weren't) removed by table elimination.
+ */
+ uint n_tables;
nested_join_map nj_map; /* Bit used to identify this nested join*/
} NESTED_JOIN;
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2724)
by Sergey Petrunia 30 Jun '09
by Sergey Petrunia 30 Jun '09
30 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090625200729-u11xpwwn5ebddx09
2724 Sergey Petrunia 2009-06-30
Testing commit email
modified:
sql/sql_select.cc
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-25 20:07:29 +0000
+++ b/sql/sql_select.cc 2009-06-30 15:02:15 +0000
@@ -24,6 +24,8 @@
@{
*/
+#error Testing commit mails
+
#ifdef USE_PRAGMA_IMPLEMENTATION
#pragma implementation // gcc: Class implementation
#endif
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2714: Changed default thread stack to 288K to get better memory missalignment between stacks of differe...
by noreply@launchpad.net 30 Jun '09
by noreply@launchpad.net 30 Jun '09
30 Jun '09
------------------------------------------------------------
revno: 2714
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: mysql-maria
timestamp: Tue 2009-06-30 15:01:29 +0300
message:
Changed default thread stack to 288K to get better memory missalignment between stacks of different threads (should speed up things) and to get a bit extra safety.
In maria_open(), don't allocate big arrays on stack as this may lead to stack overflow.
This fixes a valgrind warning detected by buildbot
modified:
include/my_pthread.h
storage/maria/ma_open.c
=== modified file 'include/my_pthread.h'
--- include/my_pthread.h 2009-06-29 21:03:30 +0000
+++ include/my_pthread.h 2009-06-30 12:01:29 +0000
@@ -691,7 +691,7 @@
We need to have at least 256K stack to handle calls to myisamchk_init()
with the current number of keys and key parts.
*/
-#define DEFAULT_THREAD_STACK (256*1024L)
+#define DEFAULT_THREAD_STACK (288*1024L)
#endif
#define MY_PTHREAD_LOCK_READ 0
=== modified file 'storage/maria/ma_open.c'
--- storage/maria/ma_open.c 2009-02-19 09:01:25 +0000
+++ storage/maria/ma_open.c 2009-06-30 12:01:29 +0000
@@ -256,15 +256,16 @@
MARIA_HA *maria_open(const char *name, int mode, uint open_flags)
{
int kfile,open_mode,save_errno;
- uint i,j,len,errpos,head_length,base_pos,info_length,keys, realpath_err,
+ uint i,j,len,errpos,head_length,base_pos,keys, realpath_err,
key_parts,unique_key_parts,fulltext_keys,uniques;
+ size_t info_length;
char name_buff[FN_REFLEN], org_name[FN_REFLEN], index_name[FN_REFLEN],
data_name[FN_REFLEN];
uchar *disk_cache, *disk_pos, *end_pos;
MARIA_HA info,*m_info,*old_info;
MARIA_SHARE share_buff,*share;
- double rec_per_key_part[HA_MAX_POSSIBLE_KEY*HA_MAX_KEY_SEG];
- ulong nulls_per_key_part[HA_MAX_POSSIBLE_KEY*HA_MAX_KEY_SEG];
+ double *rec_per_key_part;
+ ulong *nulls_per_key_part;
my_off_t key_root[HA_MAX_POSSIBLE_KEY];
ulonglong max_key_file_length, max_data_file_length;
my_bool versioning= 1;
@@ -294,8 +295,6 @@
{
share= &share_buff;
bzero((uchar*) &share_buff,sizeof(share_buff));
- share_buff.state.rec_per_key_part= rec_per_key_part;
- share_buff.state.nulls_per_key_part= nulls_per_key_part;
share_buff.state.key_root=key_root;
share_buff.pagecache= multi_pagecache_search((uchar*) name_buff,
(uint) strlen(name_buff),
@@ -360,11 +359,27 @@
info_length=mi_uint2korr(share->state.header.header_length);
base_pos= mi_uint2korr(share->state.header.base_pos);
- if (!(disk_cache= (uchar*) my_alloca(info_length+128)))
+
+ /*
+ Allocate space for header information and for data that is too
+ big to keep on stack
+ */
+ if (!my_multi_malloc(MY_WME,
+ &disk_cache, info_length+128,
+ &rec_per_key_part,
+ (sizeof(*rec_per_key_part) * HA_MAX_POSSIBLE_KEY *
+ HA_MAX_KEY_SEG),
+ &nulls_per_key_part,
+ (sizeof(*nulls_per_key_part) * HA_MAX_POSSIBLE_KEY *
+ HA_MAX_KEY_SEG),
+ NullS))
{
my_errno=ENOMEM;
goto err;
}
+ share_buff.state.rec_per_key_part= rec_per_key_part;
+ share_buff.state.nulls_per_key_part= nulls_per_key_part;
+
end_pos=disk_cache+info_length;
errpos= 3;
if (my_pread(kfile, disk_cache, info_length, 0L, MYF(MY_NABP)))
@@ -783,7 +798,7 @@
(keys ? MARIA_INDEX_BLOCK_MARGIN *
share->block_size * keys : 0));
share->block_size= share->base.block_size;
- my_afree(disk_cache);
+ my_free(disk_cache, MYF(0));
_ma_setup_functions(share);
if ((*share->once_init)(share, info.dfile.file))
goto err;
@@ -926,9 +941,7 @@
my_free(share,MYF(0));
/* fall through */
case 3:
- /* fall through */
- case 2:
- my_afree(disk_cache);
+ my_free(disk_cache, MYF(0));
/* fall through */
case 1:
VOID(my_close(kfile,MYF(0)));
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2714)
by Michael Widenius 30 Jun '09
by Michael Widenius 30 Jun '09
30 Jun '09
#At lp:maria based on revid:monty@askmonty.org-20090629210330-rkb15fyk4bonimqw
2714 Michael Widenius 2009-06-30
Changed default thread stack to 288K to get better memory missalignment between stacks of different threads (should speed up things) and to get a bit extra safety.
In maria_open(), don't allocate big arrays on stack as this may lead to stack overflow.
This fixes a valgrind warning detected by buildbot
modified:
include/my_pthread.h
storage/maria/ma_open.c
per-file messages:
include/my_pthread.h
Changed default thread stack to 288K to get better memory missalignment between stacks of different threads (should speed up things) and to get a bit extra safety.
storage/maria/ma_open.c
In maria_open(), don't allocate big arrays on stack as this may lead to stack overflow.
=== modified file 'include/my_pthread.h'
--- a/include/my_pthread.h 2009-06-29 21:03:30 +0000
+++ b/include/my_pthread.h 2009-06-30 12:01:29 +0000
@@ -691,7 +691,7 @@ extern void my_mutex_end();
We need to have at least 256K stack to handle calls to myisamchk_init()
with the current number of keys and key parts.
*/
-#define DEFAULT_THREAD_STACK (256*1024L)
+#define DEFAULT_THREAD_STACK (288*1024L)
#endif
#define MY_PTHREAD_LOCK_READ 0
=== modified file 'storage/maria/ma_open.c'
--- a/storage/maria/ma_open.c 2009-02-19 09:01:25 +0000
+++ b/storage/maria/ma_open.c 2009-06-30 12:01:29 +0000
@@ -256,15 +256,16 @@ MARIA_HA *maria_clone(MARIA_SHARE *share
MARIA_HA *maria_open(const char *name, int mode, uint open_flags)
{
int kfile,open_mode,save_errno;
- uint i,j,len,errpos,head_length,base_pos,info_length,keys, realpath_err,
+ uint i,j,len,errpos,head_length,base_pos,keys, realpath_err,
key_parts,unique_key_parts,fulltext_keys,uniques;
+ size_t info_length;
char name_buff[FN_REFLEN], org_name[FN_REFLEN], index_name[FN_REFLEN],
data_name[FN_REFLEN];
uchar *disk_cache, *disk_pos, *end_pos;
MARIA_HA info,*m_info,*old_info;
MARIA_SHARE share_buff,*share;
- double rec_per_key_part[HA_MAX_POSSIBLE_KEY*HA_MAX_KEY_SEG];
- ulong nulls_per_key_part[HA_MAX_POSSIBLE_KEY*HA_MAX_KEY_SEG];
+ double *rec_per_key_part;
+ ulong *nulls_per_key_part;
my_off_t key_root[HA_MAX_POSSIBLE_KEY];
ulonglong max_key_file_length, max_data_file_length;
my_bool versioning= 1;
@@ -294,8 +295,6 @@ MARIA_HA *maria_open(const char *name, i
{
share= &share_buff;
bzero((uchar*) &share_buff,sizeof(share_buff));
- share_buff.state.rec_per_key_part= rec_per_key_part;
- share_buff.state.nulls_per_key_part= nulls_per_key_part;
share_buff.state.key_root=key_root;
share_buff.pagecache= multi_pagecache_search((uchar*) name_buff,
(uint) strlen(name_buff),
@@ -360,11 +359,27 @@ MARIA_HA *maria_open(const char *name, i
info_length=mi_uint2korr(share->state.header.header_length);
base_pos= mi_uint2korr(share->state.header.base_pos);
- if (!(disk_cache= (uchar*) my_alloca(info_length+128)))
+
+ /*
+ Allocate space for header information and for data that is too
+ big to keep on stack
+ */
+ if (!my_multi_malloc(MY_WME,
+ &disk_cache, info_length+128,
+ &rec_per_key_part,
+ (sizeof(*rec_per_key_part) * HA_MAX_POSSIBLE_KEY *
+ HA_MAX_KEY_SEG),
+ &nulls_per_key_part,
+ (sizeof(*nulls_per_key_part) * HA_MAX_POSSIBLE_KEY *
+ HA_MAX_KEY_SEG),
+ NullS))
{
my_errno=ENOMEM;
goto err;
}
+ share_buff.state.rec_per_key_part= rec_per_key_part;
+ share_buff.state.nulls_per_key_part= nulls_per_key_part;
+
end_pos=disk_cache+info_length;
errpos= 3;
if (my_pread(kfile, disk_cache, info_length, 0L, MYF(MY_NABP)))
@@ -783,7 +798,7 @@ MARIA_HA *maria_open(const char *name, i
(keys ? MARIA_INDEX_BLOCK_MARGIN *
share->block_size * keys : 0));
share->block_size= share->base.block_size;
- my_afree(disk_cache);
+ my_free(disk_cache, MYF(0));
_ma_setup_functions(share);
if ((*share->once_init)(share, info.dfile.file))
goto err;
@@ -926,9 +941,7 @@ err:
my_free(share,MYF(0));
/* fall through */
case 3:
- /* fall through */
- case 2:
- my_afree(disk_cache);
+ my_free(disk_cache, MYF(0));
/* fall through */
case 1:
VOID(my_close(kfile,MYF(0)));
1
0
Hi!
I've got this test failure in maria-5.1-table-elimination tree:
main.mysql-bug41486 [ fail ]
http://askmonty.org/buildbot/builders/jaunty-amd64-rel/builds/54/steps/test…
CURRENT_TEST: main.mysql-bug41486
--- .../r/mysql-bug41486.result
+++ .../r/mysql-bug41486.reject
@@ -8,6 +8,5 @@
SET @@global.general_log = @old_general_log;
SELECT LENGTH(data) FROM t1;
LENGTH(data)
-2097152
DROP TABLE t1;
SET @@global.max_allowed_packet = @old_max_allowed_packet;
mysqltest: Result length mismatch
Here's the relevant part of the .test file:
CREATE TABLE t1(data LONGBLOB);
INSERT INTO t1 SELECT REPEAT('1', 2*1024*1024);
let $outfile= $MYSQLTEST_VARDIR/tmp/bug41486.sql;
--error 0,1
remove_file $outfile;
--exec $MYSQL_DUMP test t1 > $outfile
SET @old_general_log = @@global.general_log;
SET @@global.general_log = 0;
# Check that the mysql client does not insert extra newlines when loading
# strings longer than client's max_allowed_packet
--exec $MYSQL --max_allowed_packet=1M test < $outfile 2>&1
SET @@global.general_log = @old_general_log;
SELECT LENGTH(data) FROM t1;
My analysis relvealed that this part of the test
INSERT INTO t1 SELECT REPEAT('1', 2*1024*1024);
let $outfile= $MYSQLTEST_VARDIR/tmp/bug41486.sql;
--error 0,1
remove_file $outfile;
--exec $MYSQL_DUMP test t1 > $outfile
gets executed as follows: when $MYSQL_DUMP runs the
SELECT /*!40001 SQL_NO_CACHE */ * FROM `t1`
statement to get the table data, the select produces nothing, even though
INSERT statement has already finished by that time (at least from client
point of view).
The reason for select producing nothing is that the optimizer identifies
table t1 as constant (it has one or zero rows), then it tries to get the
record with handler->read_first_row() call, and it gets HA_ERR_END_OF_FILE.
So far I've fixed the test case by adding SELECT COUNT(*) FROM t1 (as an
arbitrary select statement involving t1) after the INSERT.
The questions are:
- Is the above behavior expected of MyISAM? (I suppose it is but I'm not
sure)
- Any ideas why does this suddenly show up when I make totally unrelated
changes in table elimination code. The changed part of the code is never
executed by the test...
BR
Sergey
--
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
2
1
Re: [Maria-developers] feedback/review requested for fix to MySQL bug #45759
by Michael Widenius 30 Jun '09
by Michael Widenius 30 Jun '09
30 Jun '09
Hi!
>>>>> "Zardosht" == Zardosht Kasheff <zardosht(a)gmail.com> writes:
Zardosht> Hello Monty,
Zardosht> Thank you for your feedback and modifications.
Zardosht> Where can I find the revised patches? I would like to attach them to
Zardosht> the bug report.
Zardosht> I have tried looking here, https://code.launchpad.net/maria, but have
Zardosht> not been able to find them. It is possible that I am missing something
Zardosht> obvious, as I am just starting to familiarize myself with launchpad.
Zardosht> Thanks
Zardosht> -Zardosht
I have now pushed all your proposed changes + more to the current
MariaDB 5.1 tree. (changeset 2713)
You can get them by either getting the lastest MariaDB code from
launchpad orlooking at the patch
https://lists.launchpad.net/maria-developers/msg00463.html
It took a little bit longer than expected as I had to fix several test
cases as warnings and EXPLAIN changed a bit thanks to the extra key
segments.
When it comes to attaching the code to the bug database, the easiest
way would of course be to just refer to the above link.
All of the above changes is of course available to everyone under the
GPL.
Regards,
Monty
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2713: Added some changes inspired by Zardosht Kasheff:
by noreply@launchpad.net 29 Jun '09
by noreply@launchpad.net 29 Jun '09
29 Jun '09
------------------------------------------------------------
revno: 2713
committer: Michael Widenius <monty(a)askmonty.org>
branch nick: mysql-maria
timestamp: Tue 2009-06-30 00:03:30 +0300
message:
Added some changes inspired by Zardosht Kasheff:
- Added a handler call (prepare_index_scan()) to inform storage engines that an index scan is about to take place.
- Extended the maximun key parts for an index from 16 to 32
- Extended MyISAM and Maria engines to support up to 32 parts
Added checks for return value from ha_index_init()
modified:
include/my_handler.h
include/my_pthread.h
mysql-test/r/create.result
mysql-test/r/myisam.result
mysql-test/r/ps_1general.result
mysql-test/r/ps_2myisam.result
mysql-test/r/ps_3innodb.result
mysql-test/r/ps_4heap.result
mysql-test/r/ps_5merge.result
mysql-test/suite/maria/r/maria.result
mysql-test/suite/maria/r/maria3.result
mysql-test/suite/maria/r/ps_maria.result
mysql-test/t/create.test
mysql-test/t/myisam.test
sql/handler.cc
sql/handler.h
sql/sql_select.cc
sql/table.cc
sql/unireg.h
storage/maria/ha_maria.cc
storage/myisam/ha_myisam.cc
storage/myisam/mi_check.c
tests/mysql_client_test.c
=== modified file 'include/my_handler.h'
--- include/my_handler.h 2008-10-10 15:28:41 +0000
+++ include/my_handler.h 2009-06-29 21:03:30 +0000
@@ -41,7 +41,7 @@
*/
#define HA_MAX_KEY_LENGTH 1000 /* Max length in bytes */
-#define HA_MAX_KEY_SEG 16 /* Max segments for key */
+#define HA_MAX_KEY_SEG 32 /* Max segments for key */
#define HA_MAX_POSSIBLE_KEY_BUFF (HA_MAX_KEY_LENGTH + 24+ 6+6)
#define HA_MAX_KEY_BUFF (HA_MAX_KEY_LENGTH+HA_MAX_KEY_SEG*6+8+8)
=== modified file 'include/my_pthread.h'
--- include/my_pthread.h 2009-02-19 09:01:25 +0000
+++ include/my_pthread.h 2009-06-29 21:03:30 +0000
@@ -687,15 +687,11 @@
#define THREAD_NAME_SIZE 10
#ifndef DEFAULT_THREAD_STACK
-#if SIZEOF_CHARP > 4
/*
- MySQL can survive with 32K, but some glibc libraries require > 128K stack
- To resolve hostnames. Also recursive stored procedures needs stack.
+ We need to have at least 256K stack to handle calls to myisamchk_init()
+ with the current number of keys and key parts.
*/
#define DEFAULT_THREAD_STACK (256*1024L)
-#else
-#define DEFAULT_THREAD_STACK (192*1024)
-#endif
#endif
#define MY_PTHREAD_LOCK_READ 0
=== modified file 'mysql-test/r/create.result'
--- mysql-test/r/create.result 2009-04-25 10:05:32 +0000
+++ mysql-test/r/create.result 2009-06-29 21:03:30 +0000
@@ -1487,10 +1487,10 @@
drop table t1;
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int,
c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int, c15 int,
-c16 int, c17 int);
+c16 int, c17 int, c18 int,c19 int,c20 int,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int,c30 int,c31 int,c32 int, c33 int);
alter table t1 add key i1 (
-c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17);
-ERROR 42000: Too many key parts specified; max 16 parts allowed
+c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33);
+ERROR 42000: Too many key parts specified; max 32 parts allowed
alter table t1 add key
a001_long_123456789_123456789_123456789_123456789_123456789_12345 (c1);
ERROR 42000: Identifier name 'a001_long_123456789_123456789_123456789_123456789_123456789_12345' is too long
@@ -1513,7 +1513,23 @@
`c14` int(11) DEFAULT NULL,
`c15` int(11) DEFAULT NULL,
`c16` int(11) DEFAULT NULL,
- `c17` int(11) DEFAULT NULL
+ `c17` int(11) DEFAULT NULL,
+ `c18` int(11) DEFAULT NULL,
+ `c19` int(11) DEFAULT NULL,
+ `c20` int(11) DEFAULT NULL,
+ `c21` int(11) DEFAULT NULL,
+ `c22` int(11) DEFAULT NULL,
+ `c23` int(11) DEFAULT NULL,
+ `c24` int(11) DEFAULT NULL,
+ `c25` int(11) DEFAULT NULL,
+ `c26` int(11) DEFAULT NULL,
+ `c27` int(11) DEFAULT NULL,
+ `c28` int(11) DEFAULT NULL,
+ `c29` int(11) DEFAULT NULL,
+ `c30` int(11) DEFAULT NULL,
+ `c31` int(11) DEFAULT NULL,
+ `c32` int(11) DEFAULT NULL,
+ `c33` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
drop table t1;
=== modified file 'mysql-test/r/myisam.result'
--- mysql-test/r/myisam.result 2009-02-19 09:01:25 +0000
+++ mysql-test/r/myisam.result 2009-06-29 21:03:30 +0000
@@ -2251,4 +2251,10 @@
Table Checksum
test.t3 326284887
drop table t1,t2,t3;
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32)) engine=myisam;
+drop table t1;
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int, a33 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33)) engine=myisam;
+ERROR 42000: Too many key parts specified; max 32 parts allowed
End of 5.1 tests
=== modified file 'mysql-test/r/ps_1general.result'
--- mysql-test/r/ps_1general.result 2008-09-06 00:51:17 +0000
+++ mysql-test/r/ps_1general.result 2009-06-29 21:03:30 +0000
@@ -447,7 +447,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 14 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
@@ -463,7 +463,7 @@
def possible_keys 253 4096 7 Y 0 31 8
def key 253 64 7 Y 0 31 8
def key_len 253 4096 1 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 27 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_2myisam.result'
--- mysql-test/r/ps_2myisam.result 2009-03-11 15:32:42 +0000
+++ mysql-test/r/ps_2myisam.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_3innodb.result'
--- mysql-test/r/ps_3innodb.result 2009-03-11 15:32:42 +0000
+++ mysql-test/r/ps_3innodb.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_4heap.result'
--- mysql-test/r/ps_4heap.result 2009-03-11 15:32:42 +0000
+++ mysql-test/r/ps_4heap.result 2009-06-29 21:03:30 +0000
@@ -1160,7 +1160,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_5merge.result'
--- mysql-test/r/ps_5merge.result 2009-03-11 15:32:42 +0000
+++ mysql-test/r/ps_5merge.result 2009-06-29 21:03:30 +0000
@@ -1202,7 +1202,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
@@ -4224,7 +4224,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/suite/maria/r/maria.result'
--- mysql-test/suite/maria/r/maria.result 2009-02-19 09:01:25 +0000
+++ mysql-test/suite/maria/r/maria.result 2009-06-29 21:03:30 +0000
@@ -340,14 +340,14 @@
test.t1 check status OK
drop table t1;
CREATE TABLE t1 (a varchar(255), b varchar(255), c varchar(255), d varchar(255), e varchar(255), KEY t1 (a, b, c, d, e));
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
CREATE TABLE t1 (a varchar(32000), unique key(a));
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
CREATE TABLE t1 (a varchar(1), b varchar(1), key (a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b));
-ERROR 42000: Too many key parts specified; max 16 parts allowed
+ERROR 42000: Too many key parts specified; max 32 parts allowed
CREATE TABLE t1 (a varchar(255), b varchar(255), c varchar(255), d varchar(255), e varchar(255));
ALTER TABLE t1 ADD INDEX t1 (a, b, c, d, e);
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
DROP TABLE t1;
CREATE TABLE t1 (a int not null, b int, c int, key(b), key(c), key(a,b), key(c,a));
INSERT into t1 values (0, null, 0), (0, null, 1), (0, null, 2), (0, null,3), (1,1,4);
@@ -1551,7 +1551,7 @@
drop table t1;
create table t1 (v varchar(65530), key(v));
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
drop table if exists t1;
create table t1 (v varchar(65536));
Warnings:
@@ -1789,34 +1789,34 @@
drop table t1;
create table t1 (a varchar(2048), key `a` (a));
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112))
+ KEY `a` (`a`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1024);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a), key(b)) key_block_size=1024;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=1024
alter table t1 key_block_size=2048;
show create table t1;
@@ -1825,7 +1825,7 @@
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=2048
alter table t1 add c int, add key (c);
show create table t1;
@@ -1835,7 +1835,7 @@
`b` varchar(2048) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192,
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192,
KEY `c` (`c`) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=2048
alter table t1 key_block_size=0;
@@ -1848,33 +1848,33 @@
`c` int(11) DEFAULT NULL,
`d` int(11) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192,
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192,
KEY `c` (`c`) KEY_BLOCK_SIZE=8192,
KEY `d` (`d`)
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a), key(b)) key_block_size=8192;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`),
- KEY `b` (`b`(1112))
+ KEY `b` (`b`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=8192
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a) key_block_size=1024, key(b)) key_block_size=8192;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`),
- KEY `b` (`b`(1112))
+ KEY `b` (`b`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=8192
drop table t1;
create table t1 (a int not null, b int, key (a) key_block_size=1024, key(b) key_block_size=8192) key_block_size=16384;
@@ -1897,12 +1897,12 @@
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1000000000000000000);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, key `a` (a) key_block_size=1025);
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- mysql-test/suite/maria/r/maria3.result 2009-06-02 09:58:27 +0000
+++ mysql-test/suite/maria/r/maria3.result 2009-06-29 21:03:30 +0000
@@ -17,12 +17,12 @@
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1000000000000000000);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, key `a` (a) key_block_size=1025);
=== modified file 'mysql-test/suite/maria/r/ps_maria.result'
--- mysql-test/suite/maria/r/ps_maria.result 2009-03-11 15:32:42 +0000
+++ mysql-test/suite/maria/r/ps_maria.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/t/create.test'
--- mysql-test/t/create.test 2009-02-19 09:01:25 +0000
+++ mysql-test/t/create.test 2009-06-29 21:03:30 +0000
@@ -1106,12 +1106,12 @@
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int,
c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int, c15 int,
-c16 int, c17 int);
+c16 int, c17 int, c18 int,c19 int,c20 int,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int,c30 int,c31 int,c32 int, c33 int);
# Get error for max key parts
--error 1070
alter table t1 add key i1 (
- c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17);
+ c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33);
# Get error for max key-name length
--error 1059
=== modified file 'mysql-test/t/myisam.test'
--- mysql-test/t/myisam.test 2009-02-19 09:01:25 +0000
+++ mysql-test/t/myisam.test 2009-06-29 21:03:30 +0000
@@ -1498,4 +1498,17 @@
CREATE TABLE t3 select * from t1;
checksum table t3;
drop table t1,t2,t3;
+
+#
+# Test number of supported key parts (32 is max)
+#
+
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32)) engine=myisam;
+drop table t1;
+
+--error 1070
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int, a33 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33)) engine=myisam;
+
--echo End of 5.1 tests
=== modified file 'sql/handler.cc'
--- sql/handler.cc 2009-05-22 12:38:50 +0000
+++ sql/handler.cc 2009-06-29 21:03:30 +0000
@@ -2102,8 +2102,8 @@
else
{
/* Find the first row through the primary key */
- (void) ha_index_init(primary_key, 0);
- error=index_first(buf);
+ if (!(error = ha_index_init(primary_key, 0)))
+ error= index_first(buf);
(void) ha_index_end();
}
DBUG_RETURN(error);
=== modified file 'sql/handler.h'
--- sql/handler.h 2009-02-19 09:01:25 +0000
+++ sql/handler.h 2009-06-29 21:03:30 +0000
@@ -1184,6 +1184,8 @@
inited=NONE;
DBUG_RETURN(index_end());
}
+ /* This is called after index_init() if we need to do a index scan */
+ virtual int prepare_index_scan() { return 0; }
int ha_rnd_init(bool scan)
{
int result;
=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc 2009-05-19 09:28:05 +0000
+++ sql/sql_select.cc 2009-06-29 21:03:30 +0000
@@ -11019,7 +11019,14 @@
empty_record(table);
if (table->group && join->tmp_table_param.sum_func_count &&
table->s->keys && !table->file->inited)
- table->file->ha_index_init(0, 0);
+ {
+ int tmp_error;
+ if ((tmp_error= table->file->ha_index_init(0, 0)))
+ {
+ table->file->print_error(tmp_error, MYF(0)); /* purecov: inspected */
+ DBUG_RETURN(-1); /* purecov: inspected */
+ }
+ }
}
/* Set up select_end */
Next_select_func end_select= setup_end_select_func(join);
@@ -11810,7 +11817,11 @@
if (!table->file->inited)
{
- table->file->ha_index_init(tab->ref.key, tab->sorted);
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return 1; /* purecov: inspected */
+ }
}
if (cmp_buffer_with_ref(tab) ||
(table->status & (STATUS_GARBAGE | STATUS_NO_PARENT | STATUS_NULL_ROW)))
@@ -11859,8 +11870,14 @@
/* Initialize the index first */
if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, tab->sorted);
-
+ {
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
+ }
+
/* Perform "Late NULLs Filtering" (see internals manual for explanations) */
for (uint i= 0 ; i < tab->ref.key_parts ; i++)
{
@@ -11895,7 +11912,13 @@
TABLE *table= tab->table;
if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, tab->sorted);
+ {
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
+ }
if (cp_buffer_from_ref(tab->join->thd, table, &tab->ref))
return -1;
if ((error=table->file->index_read_last_map(table->record[0],
@@ -11999,7 +12022,7 @@
static int
join_read_first(JOIN_TAB *tab)
{
- int error;
+ int error= 0;
TABLE *table=tab->table;
if (!table->key_read && table->covering_keys.is_set(tab->index) &&
!table->no_keyread)
@@ -12014,8 +12037,10 @@
tab->read_record.index=tab->index;
tab->read_record.record=table->record[0];
if (!table->file->inited)
- table->file->ha_index_init(tab->index, tab->sorted);
- if ((error=tab->table->file->index_first(tab->table->record[0])))
+ error= table->file->ha_index_init(tab->index, tab->sorted);
+ if (!error)
+ error= table->file->prepare_index_scan();
+ if (error || (error=tab->table->file->index_first(tab->table->record[0])))
{
if (error != HA_ERR_KEY_NOT_FOUND && error != HA_ERR_END_OF_FILE)
report_error(table, error);
@@ -12039,7 +12064,7 @@
join_read_last(JOIN_TAB *tab)
{
TABLE *table=tab->table;
- int error;
+ int error= 0;
if (!table->key_read && table->covering_keys.is_set(tab->index) &&
!table->no_keyread)
{
@@ -12053,8 +12078,10 @@
tab->read_record.index=tab->index;
tab->read_record.record=table->record[0];
if (!table->file->inited)
- table->file->ha_index_init(tab->index, 1);
- if ((error= tab->table->file->index_last(tab->table->record[0])))
+ error= table->file->ha_index_init(tab->index, 1);
+ if (!error)
+ error= table->file->prepare_index_scan();
+ if (error || (error= tab->table->file->index_last(tab->table->record[0])))
return report_error(table, error);
return 0;
}
@@ -12076,8 +12103,12 @@
int error;
TABLE *table= tab->table;
- if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, 1);
+ if (!table->file->inited &&
+ (error= table->file->ha_index_init(tab->ref.key, 1)))
+ {
+ table->file->print_error(error, MYF(0)); /* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
#if NOT_USED_YET
/* as ft-key doesn't use store_key's, see also FT_SELECT::init() */
if (cp_buffer_from_ref(tab->join->thd, table, &tab->ref))
@@ -12474,11 +12505,16 @@
copy_funcs(join->tmp_table_param.items_to_copy);
if ((error=table->file->ha_write_row(table->record[0])))
{
- if (create_internal_tmp_table_from_heap(join->thd, table, &join->tmp_table_param,
- error, 0))
+ if (create_internal_tmp_table_from_heap(join->thd, table,
+ &join->tmp_table_param,
+ error, 0))
DBUG_RETURN(NESTED_LOOP_ERROR); // Not a table_is_full error
/* Change method to update rows */
- table->file->ha_index_init(0, 0);
+ if ((error= table->file->ha_index_init(0, 0)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ DBUG_RETURN(NESTED_LOOP_ERROR); /* purecov: inspected */
+ }
join->join_tab[join->tables-1].next_select=end_unique_update;
}
join->send_records++;
=== modified file 'sql/table.cc'
--- sql/table.cc 2009-05-19 09:28:05 +0000
+++ sql/table.cc 2009-06-29 21:03:30 +0000
@@ -2448,7 +2448,7 @@
if ((file= my_create(name, CREATE_MODE, create_flags, MYF(0))) >= 0)
{
- uint key_length, tmp_key_length;
+ ulong key_length, tmp_key_length;
uint tmp;
bzero((char*) fileinfo,64);
/* header */
=== modified file 'sql/unireg.h'
--- sql/unireg.h 2009-01-15 18:11:25 +0000
+++ sql/unireg.h 2009-06-29 21:03:30 +0000
@@ -51,7 +51,7 @@
#define MAX_FIELD_NAME 34 /* Max colum name length +2 */
#define MAX_SYS_VAR_LENGTH 32
#define MAX_KEY MAX_INDEXES /* Max used keys */
-#define MAX_REF_PARTS 16 /* Max parts used as ref */
+#define MAX_REF_PARTS 32 /* Max parts used as ref */
#define MAX_KEY_LENGTH 3072 /* max possible key */
#if SIZEOF_OFF_T > 4
#define MAX_REFLENGTH 8 /* Max length for record ref */
=== modified file 'storage/maria/ha_maria.cc'
--- storage/maria/ha_maria.cc 2009-05-19 09:28:05 +0000
+++ storage/maria/ha_maria.cc 2009-06-29 21:03:30 +0000
@@ -1130,14 +1130,21 @@
err:
{
- HA_CHECK param;
- maria_chk_init(¶m);
- param.thd= thd;
- param.op_name= "restore";
- param.db_name= table->s->db.str;
- param.table_name= table->s->table_name.str;
- param.testflag= 0;
- _ma_check_print_error(¶m, errmsg, my_errno);
+ /*
+ Don't allocate param on stack here as this may be huge and it's
+ also allocated by repair()
+ */
+ HA_CHECK *param;
+ if (!(param= (HA_CHECK*) my_malloc(sizeof(*param), MYF(MY_WME | MY_FAE))))
+ DBUG_RETURN(error);
+ maria_chk_init(param);
+ param->thd= thd;
+ param->op_name= "restore";
+ param->db_name= table->s->db.str;
+ param->table_name= table->s->table_name.str;
+ param->testflag= 0;
+ _ma_check_print_error(param, errmsg, my_errno);
+ my_free(param, MYF(0));
DBUG_RETURN(error);
}
}
=== modified file 'storage/myisam/ha_myisam.cc'
--- storage/myisam/ha_myisam.cc 2009-04-25 10:05:32 +0000
+++ storage/myisam/ha_myisam.cc 2009-06-29 21:03:30 +0000
@@ -910,14 +910,21 @@
err:
{
- HA_CHECK param;
- myisamchk_init(¶m);
- param.thd= thd;
- param.op_name= "restore";
- param.db_name= table->s->db.str;
- param.table_name= table->s->table_name.str;
- param.testflag= 0;
- mi_check_print_error(¶m, errmsg, my_errno);
+ /*
+ Don't allocate param on stack here as this may be huge and it's
+ also allocated by repair()
+ */
+ HA_CHECK *param;
+ if (!(param= (HA_CHECK*) my_malloc(sizeof(*param), MYF(MY_WME | MY_FAE))))
+ DBUG_RETURN(error);
+ myisamchk_init(param);
+ param->thd= thd;
+ param->op_name= "restore";
+ param->db_name= table->s->db.str;
+ param->table_name= table->s->table_name.str;
+ param->testflag= 0;
+ mi_check_print_error(param, errmsg, my_errno);
+ my_free(param, MYF(0));
DBUG_RETURN(error);
}
}
=== modified file 'storage/myisam/mi_check.c'
--- storage/myisam/mi_check.c 2009-05-19 09:28:05 +0000
+++ storage/myisam/mi_check.c 2009-06-29 21:03:30 +0000
@@ -4629,8 +4629,9 @@
let's ensure it is not
*/
set_if_bigger(tmp,1);
- if (tmp >= (ulonglong) ~(ulong) 0)
- tmp=(ulonglong) ~(ulong) 0;
+ /* Keys are stored as 32 byte int's; Ensure we don't get an overflow */
+ if (tmp >= (ulonglong) ~(uint32) 0)
+ tmp=(ulonglong) ~(uint32) 0;
*rec_per_key_part=(ulong) tmp;
rec_per_key_part++;
=== modified file 'tests/mysql_client_test.c'
--- tests/mysql_client_test.c 2009-04-25 10:05:32 +0000
+++ tests/mysql_client_test.c 2009-06-29 21:03:30 +0000
@@ -33,6 +33,7 @@
#include <my_getopt.h>
#include <m_string.h>
#include <mysqld_error.h>
+#include <my_handler.h>
#define VER "2.1"
#define MAX_TEST_QUERY_LENGTH 300 /* MAX QUERY BUFFER LENGTH */
@@ -789,8 +790,10 @@
*/
if (length && (field->length != expected_field_length))
{
+ fflush(stdout);
fprintf(stderr, "Expected field length: %llu, got length: %lu\n",
expected_field_length, field->length);
+ fflush(stderr);
DIE_UNLESS(field->length == expected_field_length);
}
if (def)
@@ -7809,8 +7812,9 @@
"", "", NAME_CHAR_LEN*MAX_KEY, 0);
}
+ /* The length of this may verify between MariaDB versions (1024 / 2048) */
verify_prepare_field(result, 7, "ref", "", MYSQL_TYPE_VAR_STRING,
- "", "", "", NAME_CHAR_LEN*16, 0);
+ "", "", "", NAME_CHAR_LEN * HA_MAX_KEY_SEG, 0);
verify_prepare_field(result, 8, "rows", "", MYSQL_TYPE_LONGLONG,
"", "", "", 10, 0);
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (monty:2713)
by Michael Widenius 29 Jun '09
by Michael Widenius 29 Jun '09
29 Jun '09
#At lp:maria based on revid:sanja@askmonty.org-20090624222220-vxslzi477ui4rrpk
2713 Michael Widenius 2009-06-30
Added some changes inspired by Zardosht Kasheff:
- Added a handler call (prepare_index_scan()) to inform storage engines that an index scan is about to take place.
- Extended the maximun key parts for an index from 16 to 32
- Extended MyISAM and Maria engines to support up to 32 parts
Added checks for return value from ha_index_init()
modified:
include/my_handler.h
include/my_pthread.h
mysql-test/r/create.result
mysql-test/r/myisam.result
mysql-test/r/ps_1general.result
mysql-test/r/ps_2myisam.result
mysql-test/r/ps_3innodb.result
mysql-test/r/ps_4heap.result
mysql-test/r/ps_5merge.result
mysql-test/suite/maria/r/maria.result
mysql-test/suite/maria/r/maria3.result
mysql-test/suite/maria/r/ps_maria.result
mysql-test/t/create.test
mysql-test/t/myisam.test
sql/handler.cc
sql/handler.h
sql/sql_select.cc
sql/table.cc
sql/unireg.h
storage/maria/ha_maria.cc
storage/myisam/ha_myisam.cc
storage/myisam/mi_check.c
tests/mysql_client_test.c
per-file messages:
include/my_handler.h
Extended number of key parts for MyISAM and Maria from 16 to 32
include/my_pthread.h
Ensure we always have 256M of stack.
(Required to be able to handle the current number of keys and key parts in MyISAM)
mysql-test/r/create.result
Extended to test for 32 key parts
mysql-test/r/myisam.result
Test that we can create 32 but not 33 key parts
mysql-test/r/ps_1general.result
Length of ref is now 2048 as we can have more key parts
mysql-test/r/ps_2myisam.result
Length of ref is now 2048 as we can have more key parts
mysql-test/r/ps_3innodb.result
Length of ref is now 2048 as we can have more key parts
mysql-test/r/ps_4heap.result
Length of ref is now 2048 as we can have more key parts
mysql-test/r/ps_5merge.result
Length of ref is now 2048 as we can have more key parts
mysql-test/suite/maria/r/maria.result
Max key length is now 1208 bytes
mysql-test/suite/maria/r/maria3.result
Max key length is now 1208 bytes
mysql-test/suite/maria/r/ps_maria.result
Max key length is now 1208 byte
mysql-test/t/create.test
Extended to test for 32 key parts
mysql-test/t/myisam.test
Test that we can create 32 but not 33 key parts
sql/handler.cc
Check return value from ha_index_init()
sql/handler.h
Added a handler call (prepare_index_scan()) to inform storage engines that an index scan is about to take place.
sql/sql_select.cc
Checks all return values from ha_index_init()
Call prepare_index_scan()) to inform storage engines that an index scan is about to take place.
Fixed indentation
sql/table.cc
Fixed wrong types for key_length (rest of code assumed this was 32 bit)
sql/unireg.h
Extended the maximun key parts for an index from 16 to 32
storage/maria/ha_maria.cc
Don't allocate HA_CHECK on the stack in functions where we call repair() as HA_CHECK is HUGE and will overflow stack
storage/myisam/ha_myisam.cc
Don't allocate HA_CHECK on the stack in functions where we call repair() as HA_CHECK is HUGE and will overflow stack
storage/myisam/mi_check.c
Fixed wrong check if value overflow
tests/mysql_client_test.c
Added fflush() to fix output in case of error
Fixed wrong check of 'ref' length in EXPLAIN
=== modified file 'include/my_handler.h'
--- a/include/my_handler.h 2008-10-10 15:28:41 +0000
+++ b/include/my_handler.h 2009-06-29 21:03:30 +0000
@@ -41,7 +41,7 @@ extern "C" {
*/
#define HA_MAX_KEY_LENGTH 1000 /* Max length in bytes */
-#define HA_MAX_KEY_SEG 16 /* Max segments for key */
+#define HA_MAX_KEY_SEG 32 /* Max segments for key */
#define HA_MAX_POSSIBLE_KEY_BUFF (HA_MAX_KEY_LENGTH + 24+ 6+6)
#define HA_MAX_KEY_BUFF (HA_MAX_KEY_LENGTH+HA_MAX_KEY_SEG*6+8+8)
=== modified file 'include/my_pthread.h'
--- a/include/my_pthread.h 2009-02-19 09:01:25 +0000
+++ b/include/my_pthread.h 2009-06-29 21:03:30 +0000
@@ -687,15 +687,11 @@ extern void my_mutex_end();
#define THREAD_NAME_SIZE 10
#ifndef DEFAULT_THREAD_STACK
-#if SIZEOF_CHARP > 4
/*
- MySQL can survive with 32K, but some glibc libraries require > 128K stack
- To resolve hostnames. Also recursive stored procedures needs stack.
+ We need to have at least 256K stack to handle calls to myisamchk_init()
+ with the current number of keys and key parts.
*/
#define DEFAULT_THREAD_STACK (256*1024L)
-#else
-#define DEFAULT_THREAD_STACK (192*1024)
-#endif
#endif
#define MY_PTHREAD_LOCK_READ 0
=== modified file 'mysql-test/r/create.result'
--- a/mysql-test/r/create.result 2009-04-25 10:05:32 +0000
+++ b/mysql-test/r/create.result 2009-06-29 21:03:30 +0000
@@ -1487,10 +1487,10 @@ ERROR 42000: Too many keys specified; ma
drop table t1;
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int,
c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int, c15 int,
-c16 int, c17 int);
+c16 int, c17 int, c18 int,c19 int,c20 int,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int,c30 int,c31 int,c32 int, c33 int);
alter table t1 add key i1 (
-c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17);
-ERROR 42000: Too many key parts specified; max 16 parts allowed
+c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33);
+ERROR 42000: Too many key parts specified; max 32 parts allowed
alter table t1 add key
a001_long_123456789_123456789_123456789_123456789_123456789_12345 (c1);
ERROR 42000: Identifier name 'a001_long_123456789_123456789_123456789_123456789_123456789_12345' is too long
@@ -1513,7 +1513,23 @@ t1 CREATE TABLE `t1` (
`c14` int(11) DEFAULT NULL,
`c15` int(11) DEFAULT NULL,
`c16` int(11) DEFAULT NULL,
- `c17` int(11) DEFAULT NULL
+ `c17` int(11) DEFAULT NULL,
+ `c18` int(11) DEFAULT NULL,
+ `c19` int(11) DEFAULT NULL,
+ `c20` int(11) DEFAULT NULL,
+ `c21` int(11) DEFAULT NULL,
+ `c22` int(11) DEFAULT NULL,
+ `c23` int(11) DEFAULT NULL,
+ `c24` int(11) DEFAULT NULL,
+ `c25` int(11) DEFAULT NULL,
+ `c26` int(11) DEFAULT NULL,
+ `c27` int(11) DEFAULT NULL,
+ `c28` int(11) DEFAULT NULL,
+ `c29` int(11) DEFAULT NULL,
+ `c30` int(11) DEFAULT NULL,
+ `c31` int(11) DEFAULT NULL,
+ `c32` int(11) DEFAULT NULL,
+ `c33` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
drop table t1;
=== modified file 'mysql-test/r/myisam.result'
--- a/mysql-test/r/myisam.result 2009-02-19 09:01:25 +0000
+++ b/mysql-test/r/myisam.result 2009-06-29 21:03:30 +0000
@@ -2251,4 +2251,10 @@ checksum table t3;
Table Checksum
test.t3 326284887
drop table t1,t2,t3;
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32)) engine=myisam;
+drop table t1;
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int, a33 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33)) engine=myisam;
+ERROR 42000: Too many key parts specified; max 32 parts allowed
End of 5.1 tests
=== modified file 'mysql-test/r/ps_1general.result'
--- a/mysql-test/r/ps_1general.result 2008-09-06 00:51:17 +0000
+++ b/mysql-test/r/ps_1general.result 2009-06-29 21:03:30 +0000
@@ -447,7 +447,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 14 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
@@ -463,7 +463,7 @@ def type 253 10 5 Y 0 31 8
def possible_keys 253 4096 7 Y 0 31 8
def key 253 64 7 Y 0 31 8
def key_len 253 4096 1 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 27 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_2myisam.result'
--- a/mysql-test/r/ps_2myisam.result 2009-03-11 15:32:42 +0000
+++ b/mysql-test/r/ps_2myisam.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_3innodb.result'
--- a/mysql-test/r/ps_3innodb.result 2009-03-11 15:32:42 +0000
+++ b/mysql-test/r/ps_3innodb.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_4heap.result'
--- a/mysql-test/r/ps_4heap.result 2009-03-11 15:32:42 +0000
+++ b/mysql-test/r/ps_4heap.result 2009-06-29 21:03:30 +0000
@@ -1160,7 +1160,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/r/ps_5merge.result'
--- a/mysql-test/r/ps_5merge.result 2009-03-11 15:32:42 +0000
+++ b/mysql-test/r/ps_5merge.result 2009-06-29 21:03:30 +0000
@@ -1202,7 +1202,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
@@ -4224,7 +4224,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/suite/maria/r/maria.result'
--- a/mysql-test/suite/maria/r/maria.result 2009-02-19 09:01:25 +0000
+++ b/mysql-test/suite/maria/r/maria.result 2009-06-29 21:03:30 +0000
@@ -340,14 +340,14 @@ Table Op Msg_type Msg_text
test.t1 check status OK
drop table t1;
CREATE TABLE t1 (a varchar(255), b varchar(255), c varchar(255), d varchar(255), e varchar(255), KEY t1 (a, b, c, d, e));
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
CREATE TABLE t1 (a varchar(32000), unique key(a));
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
CREATE TABLE t1 (a varchar(1), b varchar(1), key (a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b));
-ERROR 42000: Too many key parts specified; max 16 parts allowed
+ERROR 42000: Too many key parts specified; max 32 parts allowed
CREATE TABLE t1 (a varchar(255), b varchar(255), c varchar(255), d varchar(255), e varchar(255));
ALTER TABLE t1 ADD INDEX t1 (a, b, c, d, e);
-ERROR 42000: Specified key was too long; max key length is 1112 bytes
+ERROR 42000: Specified key was too long; max key length is 1208 bytes
DROP TABLE t1;
CREATE TABLE t1 (a int not null, b int, c int, key(b), key(c), key(a,b), key(c,a));
INSERT into t1 values (0, null, 0), (0, null, 1), (0, null, 2), (0, null,3), (1,1,4);
@@ -1551,7 +1551,7 @@ a b
drop table t1;
create table t1 (v varchar(65530), key(v));
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
drop table if exists t1;
create table t1 (v varchar(65536));
Warnings:
@@ -1789,34 +1789,34 @@ t1 CREATE TABLE `t1` (
drop table t1;
create table t1 (a varchar(2048), key `a` (a));
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112))
+ KEY `a` (`a`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1024);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a), key(b)) key_block_size=1024;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=1024
alter table t1 key_block_size=2048;
show create table t1;
@@ -1825,7 +1825,7 @@ t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=2048
alter table t1 add c int, add key (c);
show create table t1;
@@ -1835,7 +1835,7 @@ t1 CREATE TABLE `t1` (
`b` varchar(2048) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192,
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192,
KEY `c` (`c`) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=2048
alter table t1 key_block_size=0;
@@ -1848,33 +1848,33 @@ t1 CREATE TABLE `t1` (
`c` int(11) DEFAULT NULL,
`d` int(11) DEFAULT NULL,
KEY `a` (`a`) KEY_BLOCK_SIZE=8192,
- KEY `b` (`b`(1112)) KEY_BLOCK_SIZE=8192,
+ KEY `b` (`b`(1208)) KEY_BLOCK_SIZE=8192,
KEY `c` (`c`) KEY_BLOCK_SIZE=8192,
KEY `d` (`d`)
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a), key(b)) key_block_size=8192;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`),
- KEY `b` (`b`(1112))
+ KEY `b` (`b`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=8192
drop table t1;
create table t1 (a int not null, b varchar(2048), key (a) key_block_size=1024, key(b)) key_block_size=8192;
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` varchar(2048) DEFAULT NULL,
KEY `a` (`a`),
- KEY `b` (`b`(1112))
+ KEY `b` (`b`(1208))
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0 KEY_BLOCK_SIZE=8192
drop table t1;
create table t1 (a int not null, b int, key (a) key_block_size=1024, key(b) key_block_size=8192) key_block_size=16384;
@@ -1897,12 +1897,12 @@ t1 CREATE TABLE `t1` (
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1000000000000000000);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, key `a` (a) key_block_size=1025);
=== modified file 'mysql-test/suite/maria/r/maria3.result'
--- a/mysql-test/suite/maria/r/maria3.result 2009-06-02 09:58:27 +0000
+++ b/mysql-test/suite/maria/r/maria3.result 2009-06-29 21:03:30 +0000
@@ -17,12 +17,12 @@ t1 CREATE TABLE `t1` (
drop table t1;
create table t1 (a varchar(2048), key `a` (a) key_block_size=1000000000000000000);
Warnings:
-Warning 1071 Specified key was too long; max key length is 1112 bytes
+Warning 1071 Specified key was too long; max key length is 1208 bytes
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` varchar(2048) DEFAULT NULL,
- KEY `a` (`a`(1112)) KEY_BLOCK_SIZE=8192
+ KEY `a` (`a`(1208)) KEY_BLOCK_SIZE=8192
) ENGINE=MARIA DEFAULT CHARSET=latin1 PAGE_CHECKSUM=0
drop table t1;
create table t1 (a int not null, key `a` (a) key_block_size=1025);
=== modified file 'mysql-test/suite/maria/r/ps_maria.result'
--- a/mysql-test/suite/maria/r/ps_maria.result 2009-03-11 15:32:42 +0000
+++ b/mysql-test/suite/maria/r/ps_maria.result 2009-06-29 21:03:30 +0000
@@ -1159,7 +1159,7 @@ def type 253 10 3 Y 0 31 8
def possible_keys 253 4096 0 Y 0 31 8
def key 253 64 0 Y 0 31 8
def key_len 253 4096 0 Y 0 31 8
-def ref 253 1024 0 Y 0 31 8
+def ref 253 2048 0 Y 0 31 8
def rows 8 10 1 Y 32928 0 63
def Extra 253 255 0 N 1 31 8
id select_type table type possible_keys key key_len ref rows Extra
=== modified file 'mysql-test/t/create.test'
--- a/mysql-test/t/create.test 2009-02-19 09:01:25 +0000
+++ b/mysql-test/t/create.test 2009-06-29 21:03:30 +0000
@@ -1106,12 +1106,12 @@ drop table t1;
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int,
c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int, c15 int,
-c16 int, c17 int);
+c16 int, c17 int, c18 int,c19 int,c20 int,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int,c30 int,c31 int,c32 int, c33 int);
# Get error for max key parts
--error 1070
alter table t1 add key i1 (
- c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17);
+ c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16, c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33);
# Get error for max key-name length
--error 1059
=== modified file 'mysql-test/t/myisam.test'
--- a/mysql-test/t/myisam.test 2009-02-19 09:01:25 +0000
+++ b/mysql-test/t/myisam.test 2009-06-29 21:03:30 +0000
@@ -1498,4 +1498,17 @@ checksum table t2;
CREATE TABLE t3 select * from t1;
checksum table t3;
drop table t1,t2,t3;
+
+#
+# Test number of supported key parts (32 is max)
+#
+
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32)) engine=myisam;
+drop table t1;
+
+--error 1070
+create table t1 (a1 int,a2 int,a3 int,a4 int,a5 int,a6 int,a7 int,a8 int,a9 int,a10 int,a11 int,a12 int,a13 int,a14 int,a15 int,a16 int,a17 int,a18 int,a19 int,a20 int,a21 int,a22 int,a23 int,a24 int,a25 int,a26 int,a27 int,a28 int,a29 int,a30 int,a31 int,a32 int, a33 int,
+key(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33)) engine=myisam;
+
--echo End of 5.1 tests
=== modified file 'sql/handler.cc'
--- a/sql/handler.cc 2009-05-22 12:38:50 +0000
+++ b/sql/handler.cc 2009-06-29 21:03:30 +0000
@@ -2102,8 +2102,8 @@ int handler::read_first_row(uchar * buf,
else
{
/* Find the first row through the primary key */
- (void) ha_index_init(primary_key, 0);
- error=index_first(buf);
+ if (!(error = ha_index_init(primary_key, 0)))
+ error= index_first(buf);
(void) ha_index_end();
}
DBUG_RETURN(error);
=== modified file 'sql/handler.h'
--- a/sql/handler.h 2009-02-19 09:01:25 +0000
+++ b/sql/handler.h 2009-06-29 21:03:30 +0000
@@ -1184,6 +1184,8 @@ public:
inited=NONE;
DBUG_RETURN(index_end());
}
+ /* This is called after index_init() if we need to do a index scan */
+ virtual int prepare_index_scan() { return 0; }
int ha_rnd_init(bool scan)
{
int result;
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-05-19 09:28:05 +0000
+++ b/sql/sql_select.cc 2009-06-29 21:03:30 +0000
@@ -11019,7 +11019,14 @@ do_select(JOIN *join,List<Item> *fields,
empty_record(table);
if (table->group && join->tmp_table_param.sum_func_count &&
table->s->keys && !table->file->inited)
- table->file->ha_index_init(0, 0);
+ {
+ int tmp_error;
+ if ((tmp_error= table->file->ha_index_init(0, 0)))
+ {
+ table->file->print_error(tmp_error, MYF(0)); /* purecov: inspected */
+ DBUG_RETURN(-1); /* purecov: inspected */
+ }
+ }
}
/* Set up select_end */
Next_select_func end_select= setup_end_select_func(join);
@@ -11810,7 +11817,11 @@ join_read_key(JOIN_TAB *tab)
if (!table->file->inited)
{
- table->file->ha_index_init(tab->ref.key, tab->sorted);
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return 1; /* purecov: inspected */
+ }
}
if (cmp_buffer_with_ref(tab) ||
(table->status & (STATUS_GARBAGE | STATUS_NO_PARENT | STATUS_NULL_ROW)))
@@ -11859,8 +11870,14 @@ join_read_always_key(JOIN_TAB *tab)
/* Initialize the index first */
if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, tab->sorted);
-
+ {
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
+ }
+
/* Perform "Late NULLs Filtering" (see internals manual for explanations) */
for (uint i= 0 ; i < tab->ref.key_parts ; i++)
{
@@ -11895,7 +11912,13 @@ join_read_last_key(JOIN_TAB *tab)
TABLE *table= tab->table;
if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, tab->sorted);
+ {
+ if ((error= table->file->ha_index_init(tab->ref.key, tab->sorted)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
+ }
if (cp_buffer_from_ref(tab->join->thd, table, &tab->ref))
return -1;
if ((error=table->file->index_read_last_map(table->record[0],
@@ -11999,7 +12022,7 @@ join_init_read_record(JOIN_TAB *tab)
static int
join_read_first(JOIN_TAB *tab)
{
- int error;
+ int error= 0;
TABLE *table=tab->table;
if (!table->key_read && table->covering_keys.is_set(tab->index) &&
!table->no_keyread)
@@ -12014,8 +12037,10 @@ join_read_first(JOIN_TAB *tab)
tab->read_record.index=tab->index;
tab->read_record.record=table->record[0];
if (!table->file->inited)
- table->file->ha_index_init(tab->index, tab->sorted);
- if ((error=tab->table->file->index_first(tab->table->record[0])))
+ error= table->file->ha_index_init(tab->index, tab->sorted);
+ if (!error)
+ error= table->file->prepare_index_scan();
+ if (error || (error=tab->table->file->index_first(tab->table->record[0])))
{
if (error != HA_ERR_KEY_NOT_FOUND && error != HA_ERR_END_OF_FILE)
report_error(table, error);
@@ -12039,7 +12064,7 @@ static int
join_read_last(JOIN_TAB *tab)
{
TABLE *table=tab->table;
- int error;
+ int error= 0;
if (!table->key_read && table->covering_keys.is_set(tab->index) &&
!table->no_keyread)
{
@@ -12053,8 +12078,10 @@ join_read_last(JOIN_TAB *tab)
tab->read_record.index=tab->index;
tab->read_record.record=table->record[0];
if (!table->file->inited)
- table->file->ha_index_init(tab->index, 1);
- if ((error= tab->table->file->index_last(tab->table->record[0])))
+ error= table->file->ha_index_init(tab->index, 1);
+ if (!error)
+ error= table->file->prepare_index_scan();
+ if (error || (error= tab->table->file->index_last(tab->table->record[0])))
return report_error(table, error);
return 0;
}
@@ -12076,8 +12103,12 @@ join_ft_read_first(JOIN_TAB *tab)
int error;
TABLE *table= tab->table;
- if (!table->file->inited)
- table->file->ha_index_init(tab->ref.key, 1);
+ if (!table->file->inited &&
+ (error= table->file->ha_index_init(tab->ref.key, 1)))
+ {
+ table->file->print_error(error, MYF(0)); /* purecov: inspected */
+ return(1); /* purecov: inspected */
+ }
#if NOT_USED_YET
/* as ft-key doesn't use store_key's, see also FT_SELECT::init() */
if (cp_buffer_from_ref(tab->join->thd, table, &tab->ref))
@@ -12474,11 +12505,16 @@ end_update(JOIN *join, JOIN_TAB *join_ta
copy_funcs(join->tmp_table_param.items_to_copy);
if ((error=table->file->ha_write_row(table->record[0])))
{
- if (create_internal_tmp_table_from_heap(join->thd, table, &join->tmp_table_param,
- error, 0))
+ if (create_internal_tmp_table_from_heap(join->thd, table,
+ &join->tmp_table_param,
+ error, 0))
DBUG_RETURN(NESTED_LOOP_ERROR); // Not a table_is_full error
/* Change method to update rows */
- table->file->ha_index_init(0, 0);
+ if ((error= table->file->ha_index_init(0, 0)))
+ {
+ table->file->print_error(error, MYF(0));/* purecov: inspected */
+ DBUG_RETURN(NESTED_LOOP_ERROR); /* purecov: inspected */
+ }
join->join_tab[join->tables-1].next_select=end_unique_update;
}
join->send_records++;
=== modified file 'sql/table.cc'
--- a/sql/table.cc 2009-05-19 09:28:05 +0000
+++ b/sql/table.cc 2009-06-29 21:03:30 +0000
@@ -2448,7 +2448,7 @@ File create_frm(THD *thd, const char *na
if ((file= my_create(name, CREATE_MODE, create_flags, MYF(0))) >= 0)
{
- uint key_length, tmp_key_length;
+ ulong key_length, tmp_key_length;
uint tmp;
bzero((char*) fileinfo,64);
/* header */
=== modified file 'sql/unireg.h'
--- a/sql/unireg.h 2009-01-15 18:11:25 +0000
+++ b/sql/unireg.h 2009-06-29 21:03:30 +0000
@@ -51,7 +51,7 @@
#define MAX_FIELD_NAME 34 /* Max colum name length +2 */
#define MAX_SYS_VAR_LENGTH 32
#define MAX_KEY MAX_INDEXES /* Max used keys */
-#define MAX_REF_PARTS 16 /* Max parts used as ref */
+#define MAX_REF_PARTS 32 /* Max parts used as ref */
#define MAX_KEY_LENGTH 3072 /* max possible key */
#if SIZEOF_OFF_T > 4
#define MAX_REFLENGTH 8 /* Max length for record ref */
=== modified file 'storage/maria/ha_maria.cc'
--- a/storage/maria/ha_maria.cc 2009-05-19 09:28:05 +0000
+++ b/storage/maria/ha_maria.cc 2009-06-29 21:03:30 +0000
@@ -1130,14 +1130,21 @@ int ha_maria::restore(THD * thd, HA_CHEC
err:
{
- HA_CHECK param;
- maria_chk_init(¶m);
- param.thd= thd;
- param.op_name= "restore";
- param.db_name= table->s->db.str;
- param.table_name= table->s->table_name.str;
- param.testflag= 0;
- _ma_check_print_error(¶m, errmsg, my_errno);
+ /*
+ Don't allocate param on stack here as this may be huge and it's
+ also allocated by repair()
+ */
+ HA_CHECK *param;
+ if (!(param= (HA_CHECK*) my_malloc(sizeof(*param), MYF(MY_WME | MY_FAE))))
+ DBUG_RETURN(error);
+ maria_chk_init(param);
+ param->thd= thd;
+ param->op_name= "restore";
+ param->db_name= table->s->db.str;
+ param->table_name= table->s->table_name.str;
+ param->testflag= 0;
+ _ma_check_print_error(param, errmsg, my_errno);
+ my_free(param, MYF(0));
DBUG_RETURN(error);
}
}
=== modified file 'storage/myisam/ha_myisam.cc'
--- a/storage/myisam/ha_myisam.cc 2009-04-25 10:05:32 +0000
+++ b/storage/myisam/ha_myisam.cc 2009-06-29 21:03:30 +0000
@@ -910,14 +910,21 @@ int ha_myisam::restore(THD* thd, HA_CHEC
err:
{
- HA_CHECK param;
- myisamchk_init(¶m);
- param.thd= thd;
- param.op_name= "restore";
- param.db_name= table->s->db.str;
- param.table_name= table->s->table_name.str;
- param.testflag= 0;
- mi_check_print_error(¶m, errmsg, my_errno);
+ /*
+ Don't allocate param on stack here as this may be huge and it's
+ also allocated by repair()
+ */
+ HA_CHECK *param;
+ if (!(param= (HA_CHECK*) my_malloc(sizeof(*param), MYF(MY_WME | MY_FAE))))
+ DBUG_RETURN(error);
+ myisamchk_init(param);
+ param->thd= thd;
+ param->op_name= "restore";
+ param->db_name= table->s->db.str;
+ param->table_name= table->s->table_name.str;
+ param->testflag= 0;
+ mi_check_print_error(param, errmsg, my_errno);
+ my_free(param, MYF(0));
DBUG_RETURN(error);
}
}
=== modified file 'storage/myisam/mi_check.c'
--- a/storage/myisam/mi_check.c 2009-05-19 09:28:05 +0000
+++ b/storage/myisam/mi_check.c 2009-06-29 21:03:30 +0000
@@ -4629,8 +4629,9 @@ void update_key_parts(MI_KEYDEF *keyinfo
let's ensure it is not
*/
set_if_bigger(tmp,1);
- if (tmp >= (ulonglong) ~(ulong) 0)
- tmp=(ulonglong) ~(ulong) 0;
+ /* Keys are stored as 32 byte int's; Ensure we don't get an overflow */
+ if (tmp >= (ulonglong) ~(uint32) 0)
+ tmp=(ulonglong) ~(uint32) 0;
*rec_per_key_part=(ulong) tmp;
rec_per_key_part++;
=== modified file 'tests/mysql_client_test.c'
--- a/tests/mysql_client_test.c 2009-04-25 10:05:32 +0000
+++ b/tests/mysql_client_test.c 2009-06-29 21:03:30 +0000
@@ -33,6 +33,7 @@
#include <my_getopt.h>
#include <m_string.h>
#include <mysqld_error.h>
+#include <my_handler.h>
#define VER "2.1"
#define MAX_TEST_QUERY_LENGTH 300 /* MAX QUERY BUFFER LENGTH */
@@ -789,8 +790,10 @@ static void do_verify_prepare_field(MYSQ
*/
if (length && (field->length != expected_field_length))
{
+ fflush(stdout);
fprintf(stderr, "Expected field length: %llu, got length: %lu\n",
expected_field_length, field->length);
+ fflush(stderr);
DIE_UNLESS(field->length == expected_field_length);
}
if (def)
@@ -7809,8 +7812,9 @@ static void test_explain_bug()
"", "", NAME_CHAR_LEN*MAX_KEY, 0);
}
+ /* The length of this may verify between MariaDB versions (1024 / 2048) */
verify_prepare_field(result, 7, "ref", "", MYSQL_TYPE_VAR_STRING,
- "", "", "", NAME_CHAR_LEN*16, 0);
+ "", "", "", NAME_CHAR_LEN * HA_MAX_KEY_SEG, 0);
verify_prepare_field(result, 8, "rows", "", MYSQL_TYPE_LONGLONG,
"", "", "", 10, 0);
1
0
Re: [Maria-developers] feedback/review requested for fix to MySQL bug #45759
by Michael Widenius 28 Jun '09
by Michael Widenius 28 Jun '09
28 Jun '09
Hi!
>>>>> "Zardosht" == Zardosht Kasheff <zardosht(a)gmail.com> writes:
Zardosht> Hello,
Zardosht> I am interested to hear feedback on a patch I have submitted for a
Zardosht> feature request, bug #45759 (http://bugs.mysql.com/bug.php?id=45759)
Zardosht> This patch notifies the storage engine of when a full index scan is
Zardosht> about to take place (e.g. "select count(*) from foo use_index(a);").
Zardosht> By default (in handler.h), the storage engine does nothing. Individual
Zardosht> storage engines can choose to do whatever they want in this case to
Zardosht> make full index scans faster.
Zardosht> The purpose of this patch is to give storage engines an opportunity to
Zardosht> be aware of full index scans, and in turn, make optimizations should
Zardosht> they choose to.
I have applied the patch to MariaDB 5.1, with some minor modifications:
I noticed that the old code in MySQL didn't properly check the return
value from ha_index_init() and I have now fixed this in MariaDB.
Regards,
Monty
PS: Your previous patch with MAX_KEY_PARTS didn't work right when it
comes to mysql_client_test. I have now fixed that too in MariaDB.
I also added a mysqltest to test that code.
1
0
Re: [Maria-developers] feedback/review requested for fix to MySQL bug #45458
by Michael Widenius 28 Jun '09
by Michael Widenius 28 Jun '09
28 Jun '09
hi!
>>>>> "Zardosht" == Zardosht Kasheff <zardosht(a)gmail.com> writes:
Zardosht> Hello,
Zardosht> This is a feature request that adds grammar for "clustering" indexes.
Zardosht> Users can define an index to be clustering (include all of the columns
Zardosht> in the index), and as a result, a flag is passed into the handler via
Zardosht> a flag. It is up to the storage engine to properly implement it.
Zardosht> Storage engines that choose to not implement it can simply ignore the
Zardosht> flag.
Zardosht> I am interested to hear feedback on this feature and patch. Here is the link:
Zardosht> http://bugs.mysql.com/bug.php?id=45458
In MariaDB we have a patch that allows one to use a list of keywords
for indexes and for the whole table. These keywords will be passed to
the engine so that it can take care of these that it knows about.
MariaDB is maintaining these extra keywords and will ensure that they
are kept around even if you do an ALTER TABLE back and from another
engine that doesn't support the same keywords.
This is a much better approach than having to add new code to support
any possible keyword/option that an engine may want to support.
Regards,
Monty
For information of MariaDB, the community developed server based on
source code from MySQL, check out www.askmonty.org
PS: I just checked and the patch is not yet pushed. I will talk with
Sanja tomorrow and ask what the status is and when he can push
this one.
2
1
[Maria-developers] Updated (by Guest): Implement mysql-test output parser for Buildbot (22)
by worklog-noreply@askmonty.org 26 Jun '09
by worklog-noreply@askmonty.org 26 Jun '09
26 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement mysql-test output parser for Buildbot
CREATION DATE..: Thu, 21 May 2009, 22:19
SUPERVISOR.....: Knielsen
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Other
TASK ID........: 22 (http://askmonty.org/worklog/?tid=22)
VERSION........:
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 30 (hours remain)
ORIG. ESTIMATE.: 30
PROGRESS NOTES:
-=-=(Guest - Fri, 26 Jun 2009, 11:34)=-=-
Status updated.
--- /tmp/wklog.22.old.29004 2009-06-26 11:34:41.000000000 +0300
+++ /tmp/wklog.22.new.29004 2009-06-26 11:34:41.000000000 +0300
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Knielsen - Thu, 21 May 2009, 22:33)=-=-
High Level Description modified.
--- /tmp/wklog.22.old.27679 2009-05-21 22:33:35.000000000 +0300
+++ /tmp/wklog.22.new.27679 2009-05-21 22:33:35.000000000 +0300
@@ -9,3 +9,6 @@
http://djmitche.github.com/buildbot/docs/0.7.10/#Writing-New-BuildSteps
+Later, once we get the infrastructure to write Buildbot results into a MySQL
+database, we want to extend this to also insert into the database all test
+failures and the mysqltest failure output (for cross-reference search).
DESCRIPTION:
Like in Pushbuild at MySQL AB, we want to have buildbot parse the output of
mysql-test-run for test errors so we can display on the front page the name
of any tests that failed.
The parser can also count the number of tests completed so far so Buildbot can
provide more accurate completion estimates.
Buildbot already has support for plugging in such modules. See eg.
http://djmitche.github.com/buildbot/docs/0.7.10/#Writing-New-BuildSteps
Later, once we get the infrastructure to write Buildbot results into a MySQL
database, we want to extend this to also insert into the database all test
failures and the mysqltest failure output (for cross-reference search).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] [Branch ~maria-captains/maria/5.1] Rev 2712: Fix of BUG#45632 (http://bugs.mysql.com/bug.php?id=45632) - sharing non default debug settings be...
by noreply@launchpad.net 25 Jun '09
by noreply@launchpad.net 25 Jun '09
25 Jun '09
------------------------------------------------------------
revno: 2712
committer: sanja(a)askmonty.org
branch nick: work-maria-5.1
timestamp: Thu 2009-06-25 01:22:20 +0300
message:
Fix of BUG#45632 (http://bugs.mysql.com/bug.php?id=45632) - sharing non default debug settings between sessions. This bugfix proposed by Monty.
modified:
mysql-test/r/variables_debug.result
mysql-test/t/variables_debug.test
sql/set_var.cc
=== modified file 'mysql-test/r/variables_debug.result'
--- mysql-test/r/variables_debug.result 2008-02-26 15:03:59 +0000
+++ mysql-test/r/variables_debug.result 2009-06-24 22:22:20 +0000
@@ -10,3 +10,18 @@
select @@debug;
@@debug
T
+set session debug="t";
+show session variables like 'debug';
+Variable_name Value
+debug t
+set session debug="t";
+show session variables like 'debug';
+Variable_name Value
+debug t
+set session debug="d:t";
+show session variables like 'debug';
+Variable_name Value
+debug d:t
+show session variables like 'debug';
+Variable_name Value
+debug t
=== modified file 'mysql-test/t/variables_debug.test'
--- mysql-test/t/variables_debug.test 2008-02-26 15:03:59 +0000
+++ mysql-test/t/variables_debug.test 2009-06-24 22:22:20 +0000
@@ -10,3 +10,31 @@
select @@debug;
set debug= '-P';
select @@debug;
+
+#
+# Checks that assigning variable 'debug' in one session has no influence on
+# other session. (BUG#45632 of bugs.mysql.com)
+#
+connect(con1,localhost,root,,test,,);
+connect(con2,localhost,root,,test,,);
+
+# makes output independant of current debug status
+connection con1;
+set session debug="t";
+show session variables like 'debug';
+connection con2;
+set session debug="t";
+show session variables like 'debug';
+
+# checks influence one session debug variable on another
+connection con1;
+set session debug="d:t";
+show session variables like 'debug';
+connection con2;
+show session variables like 'debug';
+
+disconnect con1;
+disconnect con2;
+
+connection default;
+
=== modified file 'sql/set_var.cc'
--- sql/set_var.cc 2009-05-19 09:28:05 +0000
+++ sql/set_var.cc 2009-06-24 22:22:20 +0000
@@ -4231,11 +4231,28 @@
bool sys_var_thd_dbug::update(THD *thd, set_var *var)
{
+#ifndef DBUG_OFF
+ const char *command= var ? var->value->str_value.c_ptr() : "";
+
if (var->type == OPT_GLOBAL)
- DBUG_SET_INITIAL(var ? var->value->str_value.c_ptr() : "");
+ DBUG_SET_INITIAL(command);
else
- DBUG_SET(var ? var->value->str_value.c_ptr() : "");
-
+ {
+ if (_db_is_pushed_())
+ {
+ /* We have already a local state done with DBUG_PUSH; Modify the state */
+ DBUG_SET(command);
+ }
+ else
+ {
+ /*
+ We are sharing the state with the global state;
+ Create a local state for this thread.
+ */
+ DBUG_PUSH(command);
+ }
+ }
+#endif
return 0;
}
--
lp:maria
https://code.launchpad.net/~maria-captains/maria/5.1
Your team Maria developers is subscribed to branch lp:maria.
To unsubscribe from this branch go to https://code.launchpad.net/~maria-captains/maria/5.1/+edit-subscription.
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (sanja:2712) Bug#45632
by sanja@askmonty.org 25 Jun '09
by sanja@askmonty.org 25 Jun '09
25 Jun '09
#At lp:maria
2712 sanja(a)askmonty.org 2009-06-25
Fix of BUG#45632 (http://bugs.mysql.com/bug.php?id=45632) - sharing non default debug settings between sessions. This bugfix proposed by Monty.
modified:
mysql-test/r/variables_debug.result
mysql-test/t/variables_debug.test
sql/set_var.cc
per-file messages:
mysql-test/r/variables_debug.result
Test that sessions do not share the same session debug variable.
mysql-test/t/variables_debug.test
Test that sessions do not share the same session debug variable.
sql/set_var.cc
As soon as default setting are shared between sessions we should push dbug state before changing debug setting first time.
=== modified file 'mysql-test/r/variables_debug.result'
--- a/mysql-test/r/variables_debug.result 2008-02-26 15:03:59 +0000
+++ b/mysql-test/r/variables_debug.result 2009-06-24 22:22:20 +0000
@@ -10,3 +10,18 @@ set debug= '-P';
select @@debug;
@@debug
T
+set session debug="t";
+show session variables like 'debug';
+Variable_name Value
+debug t
+set session debug="t";
+show session variables like 'debug';
+Variable_name Value
+debug t
+set session debug="d:t";
+show session variables like 'debug';
+Variable_name Value
+debug d:t
+show session variables like 'debug';
+Variable_name Value
+debug t
=== modified file 'mysql-test/t/variables_debug.test'
--- a/mysql-test/t/variables_debug.test 2008-02-26 15:03:59 +0000
+++ b/mysql-test/t/variables_debug.test 2009-06-24 22:22:20 +0000
@@ -10,3 +10,31 @@ set debug= '+P';
select @@debug;
set debug= '-P';
select @@debug;
+
+#
+# Checks that assigning variable 'debug' in one session has no influence on
+# other session. (BUG#45632 of bugs.mysql.com)
+#
+connect(con1,localhost,root,,test,,);
+connect(con2,localhost,root,,test,,);
+
+# makes output independant of current debug status
+connection con1;
+set session debug="t";
+show session variables like 'debug';
+connection con2;
+set session debug="t";
+show session variables like 'debug';
+
+# checks influence one session debug variable on another
+connection con1;
+set session debug="d:t";
+show session variables like 'debug';
+connection con2;
+show session variables like 'debug';
+
+disconnect con1;
+disconnect con2;
+
+connection default;
+
=== modified file 'sql/set_var.cc'
--- a/sql/set_var.cc 2009-05-19 09:28:05 +0000
+++ b/sql/set_var.cc 2009-06-24 22:22:20 +0000
@@ -4231,11 +4231,28 @@ bool sys_var_thd_dbug::check(THD *thd, s
bool sys_var_thd_dbug::update(THD *thd, set_var *var)
{
+#ifndef DBUG_OFF
+ const char *command= var ? var->value->str_value.c_ptr() : "";
+
if (var->type == OPT_GLOBAL)
- DBUG_SET_INITIAL(var ? var->value->str_value.c_ptr() : "");
+ DBUG_SET_INITIAL(command);
else
- DBUG_SET(var ? var->value->str_value.c_ptr() : "");
-
+ {
+ if (_db_is_pushed_())
+ {
+ /* We have already a local state done with DBUG_PUSH; Modify the state */
+ DBUG_SET(command);
+ }
+ else
+ {
+ /*
+ We are sharing the state with the global state;
+ Create a local state for this thread.
+ */
+ DBUG_PUSH(command);
+ }
+ }
+#endif
return 0;
}
2
1
Hi All,
This is a heads-up that we are actively testing Tungsten Replicator against the latest MariaDB build and hope to complete initial certification this week. So far there are no problems. We are also going to add a feature to use Maria for our catalog tables instead of InnoDB. Tungsten Replicator builds are GPL V2, by the way.
This brings up a question for the Maria dev team-what are your plans, if any, for replication support in MariaDB? In particular, are there any plans that would affect binlog formats?
Thanks, Robert
P.s., I don't see a location on AskMonty.org to post builds for related products. Any hints where to go? We would like to post our open source offerings in a location that is easy for MariaDB users to find.
--
Robert Hodges, CTO, Continuent, Inc.
Email: robert.hodges(a)continuent.com
Mobile: +1-510-501-3728 Skype: hodgesrm
6
11
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2716)
by knielsen@knielsen-hq.org 23 Jun '09
by knielsen@knielsen-hq.org 23 Jun '09
23 Jun '09
#At lp:maria
2716 knielsen(a)knielsen-hq.org 2009-06-23
Fix memory leak in mysql_ssl_set() when called more than once.
Fix sleep() synchronisation in innodb_information_schema test case.
modified:
mysql-test/t/innodb_information_schema.test
sql-common/client.c
per-file messages:
mysql-test/t/innodb_information_schema.test
Using sleep for synchronisation does not work!!!
Replace by looping until the required condition is met.
sql-common/client.c
mysql_ssl_set() did not free old pointers before overwriting with new ones (happens when
mysql_ssl_set() is called twice without calling mysql_close() in-between).
This sometimes caused memory leaks in the slave depending on exact timing of
master/slave shutdown.
Fixed by freeing old pointers before installing new ones in mysql_ssl_set(), just like
mysql_options() does.
=== modified file 'mysql-test/t/innodb_information_schema.test'
--- a/mysql-test/t/innodb_information_schema.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb_information_schema.test 2009-06-23 12:00:24 +0000
@@ -109,14 +109,19 @@ SELECT * FROM ```t'\"_str` WHERE c1 = '3
-- send
SELECT * FROM ```t'\"_str` WHERE c1 = '4' FOR UPDATE;
-# Give time to the above 2 queries to execute before continuing.
-# Without this sleep it sometimes happens that the SELECT from innodb_locks
+-- connection con_verify_innodb_locks
+
+# Loop, giving time for the above 2 queries to execute before continuing.
+# Without this, it sometimes happens that the SELECT FROM innodb_locks
# executes before some of them, resulting in less than expected number
# of rows being selected from innodb_locks.
--- sleep 0.1
+SET @counter := 0;
+while (`SELECT (@counter := @counter + 1) <= 50 AND COUNT(*) != 14 FROM INFORMATION_SCHEMA.INNODB_LOCKS`)
+{
+ sleep 0.1;
+}
-- enable_result_log
--- connection con_verify_innodb_locks
SELECT lock_mode, lock_type, lock_table, lock_index, lock_rec, lock_data
FROM INFORMATION_SCHEMA.INNODB_LOCKS ORDER BY lock_data;
=== modified file 'sql-common/client.c'
--- a/sql-common/client.c 2009-04-25 10:05:32 +0000
+++ b/sql-common/client.c 2009-06-23 12:00:24 +0000
@@ -1585,6 +1585,11 @@ mysql_ssl_set(MYSQL *mysql __attribute__
{
DBUG_ENTER("mysql_ssl_set");
#ifdef HAVE_OPENSSL
+ my_free(mysql->options.ssl_key, MYF(MY_ALLOW_ZERO_PTR));
+ my_free(mysql->options.ssl_cert, MYF(MY_ALLOW_ZERO_PTR));
+ my_free(mysql->options.ssl_ca, MYF(MY_ALLOW_ZERO_PTR));
+ my_free(mysql->options.ssl_capath, MYF(MY_ALLOW_ZERO_PTR));
+ my_free(mysql->options.ssl_cipher, MYF(MY_ALLOW_ZERO_PTR));
mysql->options.ssl_key= strdup_if_not_null(key);
mysql->options.ssl_cert= strdup_if_not_null(cert);
mysql->options.ssl_ca= strdup_if_not_null(ca);
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2703)
by knielsen@knielsen-hq.org 22 Jun '09
by knielsen@knielsen-hq.org 22 Jun '09
22 Jun '09
#At lp:maria
2703 knielsen(a)knielsen-hq.org 2009-06-10
Fix XtraDB to build with atomic operations, for good performance.
The root of the problem is that ./configure mixed together two different things. One is the
availability of GCC atomic operation intrinsics. The other is the selection of which
primitives to use for my_atomic implementation.
Then at some point a hack was made to not use GCC intrinsics in my_atomic to work around
some test failures. But because the two things are mixed in ./configure, this as a side
effect also makes GCC intrinsics unavailable for XtraDB.
Fixed by splitting this in two in configure, so that we have HAVE_GCC_ATOMIC_BUILTINS for
GCC intrinsics availability and MY_ATOMIC_MODE_GCC_BUILTINS for use in my_atomic.
modified:
configure.in
include/atomic/nolock.h
=== modified file 'configure.in'
--- a/configure.in 2009-04-23 13:06:16 +0000
+++ b/configure.in 2009-06-10 09:13:53 +0000
@@ -1726,6 +1726,30 @@ then
fi
fi
+AC_CACHE_CHECK([whether the compiler provides atomic builtins],
+ [mysql_cv_gcc_atomic_builtins], [AC_TRY_RUN([
+ int main()
+ {
+ int foo= -10; int bar= 10;
+ if (!__sync_fetch_and_add(&foo, bar) || foo)
+ return -1;
+ bar= __sync_lock_test_and_set(&foo, bar);
+ if (bar || foo != 10)
+ return -1;
+ bar= __sync_val_compare_and_swap(&bar, foo, 15);
+ if (bar)
+ return -1;
+ return 0;
+ }
+], [mysql_cv_gcc_atomic_builtins=yes],
+ [mysql_cv_gcc_atomic_builtins=no],
+ [mysql_cv_gcc_atomic_builtins=no])])
+
+if test "x$mysql_cv_gcc_atomic_builtins" = xyes; then
+ AC_DEFINE(HAVE_GCC_ATOMIC_BUILTINS, 1,
+ [Define to 1 if compiler provides atomic builtins.])
+fi
+
AC_ARG_WITH([atomic-ops],
AC_HELP_STRING([--with-atomic-ops=rwlocks|smp|up],
[Implement atomic operations using pthread rwlocks or atomic CPU
@@ -1739,28 +1763,9 @@ case "$with_atomic_ops" in
[Use pthread rwlocks for atomic ops]) ;;
"smp") ;;
"")
- AC_CACHE_CHECK([whether the compiler provides atomic builtins],
- [mysql_cv_gcc_atomic_builtins], [AC_TRY_RUN([
- int main()
- {
- int foo= -10; int bar= 10;
- if (!__sync_fetch_and_add(&foo, bar) || foo)
- return -1;
- bar= __sync_lock_test_and_set(&foo, bar);
- if (bar || foo != 10)
- return -1;
- bar= __sync_val_compare_and_swap(&bar, foo, 15);
- if (bar)
- return -1;
- return 0;
- }
- ], [mysql_cv_gcc_atomic_builtins=yes_but_disabled],
- [mysql_cv_gcc_atomic_builtins=no],
- [mysql_cv_gcc_atomic_builtins=no])])
-
- if test "x$mysql_cv_gcc_atomic_builtins" = xyes; then
- AC_DEFINE(HAVE_GCC_ATOMIC_BUILTINS, 1,
- [Define to 1 if compiler provides atomic builtins.])
+ if test "x$mysql_cv_gcc_atomic_builtins" = xyes_but_disabled; then
+ AC_DEFINE([MY_ATOMIC_MODE_GCC_BUILTINS], [1],
+ [Use GCC atomic builtins for atomic ops])
fi
;;
*) AC_MSG_ERROR(["$with_atomic_ops" is not a valid value for --with-atomic-ops]) ;;
=== modified file 'include/atomic/nolock.h'
--- a/include/atomic/nolock.h 2008-02-05 15:47:11 +0000
+++ b/include/atomic/nolock.h 2009-06-10 09:13:53 +0000
@@ -14,7 +14,7 @@
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
#if defined(__i386__) || defined(_MSC_VER) || \
- defined(__x86_64__) || defined(HAVE_GCC_ATOMIC_BUILTINS)
+ defined(__x86_64__) || defined(MY_ATOMIC_MODE_GCC_BUILTINS)
# ifdef MY_ATOMIC_MODE_DUMMY
# define LOCK_prefix ""
@@ -22,7 +22,7 @@
# define LOCK_prefix "lock"
# endif
-# ifdef HAVE_GCC_ATOMIC_BUILTINS
+# ifdef MY_ATOMIC_MODE_GCC_BUILTINS
# include "gcc_builtins.h"
# elif __GNUC__
# include "x86-gcc.h"
4
4
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2718)
by Sergey Petrunia 22 Jun '09
by Sergey Petrunia 22 Jun '09
22 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090617052739-37i1r8lip0m4ft9r
2718 Sergey Petrunia 2009-06-22
MWL#17: Table elimination
- Make elimination check to be able detect cases like t.primary_key_col1=othertbl.col AND t.primary_key_col2=func(t.primary_key_col1).
These are needed to handle e.g. the case of func() being a correlated subquery that selects the latest value.
- If we've removed a condition with subquery predicate, EXPLAIN [EXTENDED] won't show the subquery anymore
modified:
sql/item.cc
sql/item.h
sql/item_subselect.cc
sql/item_subselect.h
sql/item_sum.cc
sql/sql_lex.cc
sql/sql_lex.h
sql/sql_select.cc
sql/sql_select.h
per-file messages:
sql/item.cc
MWL#17: Table elimination
- Add tem_field::check_column_usage_processor(). it allows to check which key parts a condition depends on.
sql/item.h
MWL#17: Table elimination
- Add tem_field::check_column_usage_processor(). it allows to check which key parts a condition depends on.
sql/item_subselect.cc
MWL#17: Table elimination
- Item_subselect got 'eliminated' attribute. It is used only to determine if the subselect should be printed by EXPLAIN.
- Item_subselect got List<Item> refers_to - a list of item in the current select that are referred to from within the subselect.
- Added Item_*::check_column_usage_processor(). it allows to check which key parts a condition depends on.
- Added a comment about possible problem in Item_subselect::walk
sql/item_subselect.h
MWL#17: Table elimination
- Item_subselect got 'eliminated' attribute. It is used only to determine if the subselect should be printed by EXPLAIN.
- Item_subselect got List<Item> refers_to - a list of item in the current select that are referred to from within the subselect.
- Added Item_*::check_column_usage_processor(). it allows to check which key parts a condition depends on.
sql/item_sum.cc
MWL#17: Table elimination
sql/sql_lex.cc
MWL#17: Table elimination
sql/sql_lex.h
MWL#17: Table elimination
sql/sql_select.cc
MWL#17: Table elimination
- Make elimination check to be able detect cases like t.primary_key_col1=othertbl.col AND t.primary_key_col2=func(t.primary_key_col1).
These are needed to handle e.g. the case of func() being a correlated subquery that selects the latest value.
- If we've removed a condition with subquery predicate, EXPLAIN [EXTENDED] won't show the subquery anymore
sql/sql_select.h
MWL#17: Table elimination
=== modified file 'sql/item.cc'
--- a/sql/item.cc 2009-04-25 10:05:32 +0000
+++ b/sql/item.cc 2009-06-22 11:46:31 +0000
@@ -1915,6 +1915,30 @@ void Item_field::reset_field(Field *f)
name= (char*) f->field_name;
}
+bool Item_field::check_column_usage_processor(uchar *arg)
+{
+ Field_processor_info* info=(Field_processor_info*)arg;
+ if (used_tables() & ~info->allowed_tables)
+ return FALSE;
+
+ if (field->table == info->table)
+ {
+ if (!(field->part_of_key.is_set(info->keyno)))
+ return TRUE;
+
+ KEY *key= &field->table->key_info[info->keyno];
+ for (uint part= 0; part < key->key_parts; part++)
+ {
+ if (field->field_index == key->key_part[part].field->field_index)
+ {
+ info->needed_key_parts |= key_part_map(1) << part;
+ break;
+ }
+ }
+ }
+ return FALSE;
+}
+
const char *Item_ident::full_name() const
{
char *tmp;
@@ -3380,7 +3404,7 @@ static void mark_as_dependent(THD *thd,
/* store pointer on SELECT_LEX from which item is dependent */
if (mark_item)
mark_item->depended_from= last;
- current->mark_as_dependent(last);
+ current->mark_as_dependent(last, resolved_item);
if (thd->lex->describe & DESCRIBE_EXTENDED)
{
char warn_buff[MYSQL_ERRMSG_SIZE];
=== modified file 'sql/item.h'
--- a/sql/item.h 2009-06-09 21:11:33 +0000
+++ b/sql/item.h 2009-06-22 11:46:31 +0000
@@ -888,6 +888,8 @@ public:
virtual bool reset_query_id_processor(uchar *query_id_arg) { return 0; }
virtual bool is_expensive_processor(uchar *arg) { return 0; }
virtual bool register_field_in_read_map(uchar *arg) { return 0; }
+ virtual bool check_column_usage_processor(uchar *arg) { return 0; }
+ virtual bool mark_as_eliminated_processor(uchar *arg) { return 0; }
/*
Check if a partition function is allowed
SYNOPSIS
@@ -1012,6 +1014,14 @@ public:
};
+typedef struct
+{
+ table_map allowed_tables;
+ TABLE *table;
+ uint keyno;
+ uint needed_key_parts;
+} Field_processor_info;
+
class sp_head;
@@ -1477,6 +1487,7 @@ public:
bool find_item_in_field_list_processor(uchar *arg);
bool register_field_in_read_map(uchar *arg);
bool check_partition_func_processor(uchar *int_arg) {return FALSE;}
+ bool check_column_usage_processor(uchar *arg);
void cleanup();
bool result_as_longlong()
{
=== modified file 'sql/item_subselect.cc'
--- a/sql/item_subselect.cc 2009-01-31 21:22:44 +0000
+++ b/sql/item_subselect.cc 2009-06-22 11:46:31 +0000
@@ -39,7 +39,7 @@ inline Item * and_items(Item* cond, Item
Item_subselect::Item_subselect():
Item_result_field(), value_assigned(0), thd(0), substitution(0),
engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0),
- const_item_cache(1), engine_changed(0), changed(0), is_correlated(FALSE)
+ const_item_cache(1), in_fix_fields(0), engine_changed(0), changed(0), is_correlated(FALSE)
{
with_subselect= 1;
reset();
@@ -151,10 +151,14 @@ bool Item_subselect::fix_fields(THD *thd
DBUG_ASSERT(fixed == 0);
engine->set_thd((thd= thd_param));
+ if (!in_fix_fields)
+ refers_to.empty();
+ eliminated= FALSE;
if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res))
return TRUE;
-
+
+ in_fix_fields++;
res= engine->prepare();
// all transformation is done (used by prepared statements)
@@ -181,12 +185,14 @@ bool Item_subselect::fix_fields(THD *thd
if (!(*ref)->fixed)
ret= (*ref)->fix_fields(thd, ref);
thd->where= save_where;
+ in_fix_fields--;
return ret;
}
// Is it one field subselect?
if (engine->cols() > max_columns)
{
my_error(ER_OPERAND_COLUMNS, MYF(0), 1);
+ in_fix_fields--;
return TRUE;
}
fix_length_and_dec();
@@ -203,11 +209,30 @@ bool Item_subselect::fix_fields(THD *thd
fixed= 1;
err:
+ in_fix_fields--;
thd->where= save_where;
return res;
}
+bool Item_subselect::check_column_usage_processor(uchar *arg)
+{
+ List_iterator<Item> it(refers_to);
+ Item *item;
+ while ((item= it++))
+ {
+ if (item->walk(&Item::check_column_usage_processor,FALSE, arg))
+ return TRUE;
+ }
+ return FALSE;
+}
+
+bool Item_subselect::mark_as_eliminated_processor(uchar *arg)
+{
+ eliminated= TRUE;
+ return FALSE;
+}
+
bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
uchar *argument)
{
@@ -225,6 +250,7 @@ bool Item_subselect::walk(Item_processor
if (lex->having && (lex->having)->walk(processor, walk_subquery,
argument))
return 1;
+ /* TODO: why doesn't this walk the OUTER JOINs' ON expressions */
while ((item=li++))
{
=== modified file 'sql/item_subselect.h'
--- a/sql/item_subselect.h 2008-02-22 10:30:33 +0000
+++ b/sql/item_subselect.h 2009-06-22 11:46:31 +0000
@@ -52,8 +52,16 @@ protected:
bool have_to_be_excluded;
/* cache of constant state */
bool const_item_cache;
-
+
public:
+ /*
+ References from inside the subquery to the select that this predicate is
+ in. References to parent selects not included.
+ */
+ List<Item> refers_to;
+ int in_fix_fields;
+ bool eliminated;
+
/* changed engine indicator */
bool engine_changed;
/* subquery is transformed */
@@ -126,6 +134,8 @@ public:
virtual void reset_value_registration() {}
enum_parsing_place place() { return parsing_place; }
bool walk(Item_processor processor, bool walk_subquery, uchar *arg);
+ bool mark_as_eliminated_processor(uchar *arg);
+ bool check_column_usage_processor(uchar *arg);
/**
Get the SELECT_LEX structure associated with this Item.
=== modified file 'sql/item_sum.cc'
--- a/sql/item_sum.cc 2009-06-09 21:11:33 +0000
+++ b/sql/item_sum.cc 2009-06-22 11:46:31 +0000
@@ -350,7 +350,7 @@ bool Item_sum::register_sum_func(THD *th
sl= sl->master_unit()->outer_select() )
sl->master_unit()->item->with_sum_func= 1;
}
- thd->lex->current_select->mark_as_dependent(aggr_sel);
+ thd->lex->current_select->mark_as_dependent(aggr_sel, NULL);
return FALSE;
}
=== modified file 'sql/sql_lex.cc'
--- a/sql/sql_lex.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_lex.cc 2009-06-22 11:46:31 +0000
@@ -1778,7 +1778,7 @@ void st_select_lex_unit::exclude_tree()
'last' should be reachable from this st_select_lex_node
*/
-void st_select_lex::mark_as_dependent(st_select_lex *last)
+void st_select_lex::mark_as_dependent(st_select_lex *last, Item *dependency)
{
/*
Mark all selects from resolved to 1 before select where was
@@ -1804,6 +1804,8 @@ void st_select_lex::mark_as_dependent(st
}
is_correlated= TRUE;
this->master_unit()->item->is_correlated= TRUE;
+ if (dependency)
+ this->master_unit()->item->refers_to.push_back(dependency);
}
bool st_select_lex_node::set_braces(bool value) { return 1; }
=== modified file 'sql/sql_lex.h'
--- a/sql/sql_lex.h 2009-03-17 20:29:24 +0000
+++ b/sql/sql_lex.h 2009-06-22 11:46:31 +0000
@@ -743,7 +743,7 @@ public:
return master_unit()->return_after_parsing();
}
- void mark_as_dependent(st_select_lex *last);
+ void mark_as_dependent(st_select_lex *last, Item *dependency);
bool set_braces(bool value);
bool inc_in_sum_expr();
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-17 05:27:39 +0000
+++ b/sql/sql_select.cc 2009-06-22 11:46:31 +0000
@@ -2478,6 +2478,14 @@ static ha_rows get_quick_record_count(TH
}
+typedef struct st_keyuse_w_needed_reg
+{
+ KEYUSE *first;
+ key_part_map second;
+
+} Keyuse_w_needed_reg;
+
+static
bool has_eq_ref_access_candidate(TABLE *table, table_map can_refer_to_these)
{
KEYUSE *keyuse= table->reginfo.join_tab->keyuse;
@@ -2494,24 +2502,85 @@ bool has_eq_ref_access_candidate(TABLE *
{
uint key= keyuse->key;
key_part_map bound_parts=0;
- bool ft_key= test(keyuse->keypart == FT_KEYPART);
-
+ uint n_unusable=0;
+ bool ft_key= test(keyuse->keypart == FT_KEYPART);
+ KEY *keyinfo= table->key_info + key;
+ KEYUSE *key_start = keyuse;
+
do /* For each keypart and each way to read it */
{
- if (!(keyuse->used_tables & ~can_refer_to_these) &&
- !(keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))
+ if (keyuse->usable)
{
- bound_parts |= keyuse->keypart_map;
+ if(!(keyuse->used_tables & ~can_refer_to_these) &&
+ !(keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))
+ {
+ bound_parts |= keyuse->keypart_map;
+ }
}
+ else
+ n_unusable++;
keyuse++;
- } while (keyuse->table && keyuse->key == key);
+ } while (keyuse->table == table && keyuse->key == key);
+
+ if (ft_key || ((keyinfo->flags & (HA_NOSAME | HA_NULL_PART_KEY))
+ != HA_NOSAME))
+ {
+ continue;
+ }
- KEY *keyinfo= table->key_info + key;
- if (!ft_key &&
- ((keyinfo->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME) &&
- bound_parts == PREV_BITS(key_part_map, keyinfo->key_parts))
- {
+ if (bound_parts == PREV_BITS(key_part_map, keyinfo->key_parts))
return TRUE;
+ /*
+ Ok, usable keyuse elements didn't help us. Try making use of
+ unusable KEYUSEs (psergey-todo: sane comments:)
+ */
+ if (n_unusable && bound_parts)
+ {
+ /*
+ Check if unusable KEYUSE elements cause all parts of key to be
+ bound. An unusable keyuse element makes a key part bound when it
+ represents the following:
+
+ keyXpartY=func(bound_columns, preceding_tables)
+
+ .
+ */
+ Keyuse_w_needed_reg *uses;
+ if (!(uses= (Keyuse_w_needed_reg*)my_alloca(sizeof(Keyuse_w_needed_reg)*n_unusable)))
+ return FALSE;
+ uint n_uses=0;
+ for (KEYUSE *k= key_start; k!=keyuse; k++)
+ {
+ if (!k->usable && !(k->used_tables & ~can_refer_to_these))
+ {
+ //Walk k->val and check which key parts it depends on.
+ Field_processor_info fp= {can_refer_to_these, table, k->key, 0};
+ if (!k->val->walk(&Item::check_column_usage_processor, FALSE,
+ (uchar*)&fp))
+ {
+ uses[n_uses].first= k;
+ uses[n_uses].second= fp.needed_key_parts;
+ n_uses++;
+ }
+ }
+ }
+ /* Now compute transitive closure */
+ uint n_bounded;
+ do
+ {
+ n_bounded= 0;
+ for (uint i=0; i< n_uses; i++)
+ {
+ /* needed_parts is covered by what is already bound*/
+ if (!(uses[i].second & ~bound_parts))
+ {
+ bound_parts|= key_part_map(1) << uses[i].first->keypart;
+ n_bounded++;
+ }
+ if (bound_parts == PREV_BITS(key_part_map, keyinfo->key_parts))
+ return TRUE;
+ }
+ } while (n_bounded != 0);
}
}
}
@@ -2657,6 +2726,7 @@ eliminate_tables_for_join_list(JOIN *joi
eliminated += tbl->nested_join->join_list.elements;
//psergey-todo: do we need to do anything about removing the join
//nest?
+ tbl->on_expr->walk(&Item::mark_as_eliminated_processor, FALSE, NULL);
}
else
{
@@ -2673,6 +2743,7 @@ eliminate_tables_for_join_list(JOIN *joi
{
mark_table_as_eliminated(join, tbl->table, const_tbl_count,
const_tables);
+ tbl->on_expr->walk(&Item::mark_as_eliminated_processor, FALSE, NULL);
eliminated += 1;
}
}
@@ -3065,14 +3136,16 @@ make_join_statistics(JOIN *join, TABLE_L
{
start_keyuse=keyuse;
key=keyuse->key;
- s->keys.set_bit(key); // QQ: remove this ?
+ if (keyuse->usable)
+ s->keys.set_bit(key); // QQ: remove this ?
refs=0;
const_ref.clear_all();
eq_part.clear_all();
do
{
- if (keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize)
+ if (keyuse->usable && keyuse->val->type() != Item::NULL_ITEM &&
+ !keyuse->optimize)
{
if (!((~found_const_table_map) & keyuse->used_tables))
const_ref.set_bit(keyuse->keypart);
@@ -3276,6 +3349,7 @@ typedef struct key_field_t {
*/
bool null_rejecting;
bool *cond_guard; /* See KEYUSE::cond_guard */
+ bool usable;
} KEY_FIELD;
@@ -3284,6 +3358,26 @@ typedef struct key_field_t {
This is called for OR between different levels.
+ That is, the function operates on an array of KEY_FIELD elements which has
+ two parts:
+
+ $LEFT_PART $RIGHT_PART
+ +-----------------------+-----------------------+
+ start new_fields end
+
+ $LEFT_PART and $RIGHT_PART are arrays that have KEY_FIELD elements for two
+ parts of the OR condition. Our task is to produce an array of KEY_FIELD
+ elements that would correspond to "$LEFT_PART OR $RIGHT_PART".
+
+ The rules for combining elements are as follows:
+ (keyfieldA1 AND keyfieldA2 AND ...) OR (keyfieldB1 AND keyfieldB2 AND ...)=
+ AND_ij (keyfieldA_i OR keyfieldB_j)
+
+ We discard all (keyfieldA_i OR keyfieldB_j) that refer to different
+ fields. For those referring to the same field, the logic is as follows:
+
+ t.keycol=
+
To be able to do 'ref_or_null' we merge a comparison of a column
and 'column IS NULL' to one test. This is useful for sub select queries
that are internally transformed to something like:.
@@ -3348,13 +3442,18 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
+ /*
+ The conditions are the same, hence their usabilities should
+ be, too (TODO: shouldn't that apply to the above
+ null_rejecting and optimize attributes?)
+ */
+ DBUG_ASSERT(old->usable == new_fields->usable);
}
}
else if (old->eq_func && new_fields->eq_func &&
old->val->eq_by_collation(new_fields->val,
old->field->binary(),
old->field->charset()))
-
{
old->level= and_level;
old->optimize= ((old->optimize & new_fields->optimize &
@@ -3363,10 +3462,14 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
KEY_OPTIMIZE_REF_OR_NULL));
old->null_rejecting= (old->null_rejecting &&
new_fields->null_rejecting);
+ // "t.key_col=const" predicates are always usable
+ DBUG_ASSERT(old->usable && new_fields->usable);
}
else if (old->eq_func && new_fields->eq_func &&
- ((old->val->const_item() && old->val->is_null()) ||
- new_fields->val->is_null()))
+ ((new_fields->usable && old->val->const_item() &&
+ old->val->is_null()) ||
+ ((old->usable && new_fields->val->is_null()))))
+ /* TODO ^ why is the above asymmetric, why const_item()? */
{
/* field = expression OR field IS NULL */
old->level= and_level;
@@ -3437,6 +3540,7 @@ add_key_field(KEY_FIELD **key_fields,uin
table_map usable_tables, SARGABLE_PARAM **sargables)
{
uint exists_optimize= 0;
+ bool optimizable=0;
if (!(field->flags & PART_KEY_FLAG))
{
// Don't remove column IS NULL on a LEFT JOIN table
@@ -3449,15 +3553,15 @@ add_key_field(KEY_FIELD **key_fields,uin
else
{
table_map used_tables=0;
- bool optimizable=0;
for (uint i=0; i<num_values; i++)
{
used_tables|=(value[i])->used_tables();
if (!((value[i])->used_tables() & (field->table->map | RAND_TABLE_BIT)))
optimizable=1;
}
- if (!optimizable)
- return;
+ // psergey-tbl-elim:
+ // if (!optimizable)
+ // return;
if (!(usable_tables & field->table->map))
{
if (!eq_func || (*value)->type() != Item::NULL_ITEM ||
@@ -3470,7 +3574,8 @@ add_key_field(KEY_FIELD **key_fields,uin
JOIN_TAB *stat=field->table->reginfo.join_tab;
key_map possible_keys=field->key_start;
possible_keys.intersect(field->table->keys_in_use_for_query);
- stat[0].keys.merge(possible_keys); // Add possible keys
+ if (optimizable)
+ stat[0].keys.merge(possible_keys); // Add possible keys
/*
Save the following cases:
@@ -3563,6 +3668,7 @@ add_key_field(KEY_FIELD **key_fields,uin
(*key_fields)->val= *value;
(*key_fields)->level= and_level;
(*key_fields)->optimize= exists_optimize;
+ (*key_fields)->usable= optimizable;
/*
If the condition has form "tbl.keypart = othertbl.field" and
othertbl.field can be NULL, there will be no matches if othertbl.field
@@ -3874,6 +3980,7 @@ add_key_part(DYNAMIC_ARRAY *keyuse_array
keyuse.optimize= key_field->optimize & KEY_OPTIMIZE_REF_OR_NULL;
keyuse.null_rejecting= key_field->null_rejecting;
keyuse.cond_guard= key_field->cond_guard;
+ keyuse.usable= key_field->usable;
VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse));
}
}
@@ -3954,6 +4061,11 @@ sort_keyuse(KEYUSE *a,KEYUSE *b)
return (int) (a->key - b->key);
if (a->keypart != b->keypart)
return (int) (a->keypart - b->keypart);
+
+ // Usable ones go before the unusable
+ if (a->usable != b->usable)
+ return (int)a->usable - (int)b->usable;
+
// Place const values before other ones
if ((res= test((a->used_tables & ~OUTER_REF_TABLE_BIT)) -
test((b->used_tables & ~OUTER_REF_TABLE_BIT))))
@@ -4164,7 +4276,8 @@ update_ref_and_keys(THD *thd, DYNAMIC_AR
found_eq_constant=0;
for (i=0 ; i < keyuse->elements-1 ; i++,use++)
{
- if (!use->used_tables && use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
+ if (use->usable && !use->used_tables &&
+ use->optimize != KEY_OPTIMIZE_REF_OR_NULL)
use->table->const_key_parts[use->key]|= use->keypart_map;
if (use->keypart != FT_KEYPART)
{
@@ -4188,7 +4301,8 @@ update_ref_and_keys(THD *thd, DYNAMIC_AR
/* Save ptr to first use */
if (!use->table->reginfo.join_tab->keyuse)
use->table->reginfo.join_tab->keyuse=save_pos;
- use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
+ if (use->usable)
+ use->table->reginfo.join_tab->checked_keys.set_bit(use->key);
save_pos++;
}
i=(uint) (save_pos-(KEYUSE*) keyuse->buffer);
@@ -4218,7 +4332,7 @@ static void optimize_keyuse(JOIN *join,
To avoid bad matches, we don't make ref_table_rows less than 100.
*/
keyuse->ref_table_rows= ~(ha_rows) 0; // If no ref
- if (keyuse->used_tables &
+ if (keyuse->usable && keyuse->used_tables &
(map= (keyuse->used_tables & ~join->const_table_map &
~OUTER_REF_TABLE_BIT)))
{
@@ -4411,7 +4525,8 @@ best_access_path(JOIN *join,
if 1. expression doesn't refer to forward tables
2. we won't get two ref-or-null's
*/
- if (!(remaining_tables & keyuse->used_tables) &&
+ if (keyuse->usable &&
+ !(remaining_tables & keyuse->used_tables) &&
!(ref_or_null_part && (keyuse->optimize &
KEY_OPTIMIZE_REF_OR_NULL)))
{
@@ -5915,9 +6030,11 @@ static bool create_ref_for_key(JOIN *joi
uint i;
for (i=0 ; i < keyparts ; keyuse++,i++)
{
- while (keyuse->keypart != i ||
- ((~used_tables) & keyuse->used_tables))
+ while (keyuse->keypart != i || ((~used_tables) & keyuse->used_tables) ||
+ !keyuse->usable)
+ {
keyuse++; /* Skip other parts */
+ }
uint maybe_null= test(keyinfo->key_part[i].null_bit);
j->ref.items[i]=keyuse->val; // Save for cond removal
@@ -16853,8 +16970,11 @@ static void select_describe(JOIN *join,
unit;
unit= unit->next_unit())
{
- if (mysql_explain_union(thd, unit, result))
- DBUG_VOID_RETURN;
+ if (!(unit->item && unit->item->eliminated))
+ {
+ if (mysql_explain_union(thd, unit, result))
+ DBUG_VOID_RETURN;
+ }
}
DBUG_VOID_RETURN;
}
=== modified file 'sql/sql_select.h'
--- a/sql/sql_select.h 2009-06-14 12:35:04 +0000
+++ b/sql/sql_select.h 2009-06-22 11:46:31 +0000
@@ -51,6 +51,7 @@ typedef struct keyuse_t {
NULL - Otherwise (the source equality can't be turned off)
*/
bool *cond_guard;
+ bool usable;
} KEYUSE;
class store_key;
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2715)
by knielsen@knielsen-hq.org 22 Jun '09
by knielsen@knielsen-hq.org 22 Jun '09
22 Jun '09
#At lp:maria
2715 knielsen(a)knielsen-hq.org 2009-06-22
More XtraDB after-merge fixes following review and buildbot runs:
- Better fix for --innodb-use-sys-malloc causing Valgrind warnings.
- Different fix for INNODB_IBUF_MAX_SIZE variable changing default value.
- Fix some problems with the safe mutex lazy init patch.
modified:
mysql-test/include/mtr_check.sql
mysql-test/lib/mtr_cases.pm
mysql-test/mysql-test-run.pl
mysys/thr_mutex.c
storage/xtradb/ibuf/ibuf0ibuf.c
per-file messages:
mysql-test/include/mtr_check.sql
Do not check INNODB_IBUF_MAX_SIZE for changes. It is not a dynamic variable, so cannot
be changed by a test case anyway, and the value may vary slightly from one start of the
server to the next.
mysql-test/lib/mtr_cases.pm
Even just starting and stopping the server with --innodb-use-sys-malloc to check for
disabled test case under valgrind will cause valgrind leak warnings. So add not_valgrind
to the list of conditions also tested for directly in mysql-test-run.pl.
mysql-test/mysql-test-run.pl
Even just starting and stopping the server with --innodb-use-sys-malloc to check for
disabled test case under valgrind will cause valgrind leak warnings. So add not_valgrind
to the list of conditions also tested for directly in mysql-test-run.pl.
mysys/thr_mutex.c
Fix a few problems found during review of the lazy init safe mutex patch.
storage/xtradb/ibuf/ibuf0ibuf.c
Revert previous fix of INNODB_IBUF_MAX_SIZE default varying slightly between server starts.
(Fixed instead by ignoring that variable in the test suite).
=== modified file 'mysql-test/include/mtr_check.sql'
--- a/mysql-test/include/mtr_check.sql 2009-02-19 09:01:25 +0000
+++ b/mysql-test/include/mtr_check.sql 2009-06-22 08:06:35 +0000
@@ -12,7 +12,9 @@ BEGIN
-- Dump all global variables except those
-- that are supposed to change
SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
- WHERE variable_name != 'timestamp' and variable_name != "debug" order by variable_name;
+ WHERE variable_name != 'timestamp' AND variable_name != "debug"
+ AND variable_name != 'INNODB_IBUF_MAX_SIZE'
+ ORDER BY variable_name;
-- Dump all databases, there should be none
-- except those that was created during bootstrap
=== modified file 'mysql-test/lib/mtr_cases.pm'
--- a/mysql-test/lib/mtr_cases.pm 2009-03-20 14:39:37 +0000
+++ b/mysql-test/lib/mtr_cases.pm 2009-06-22 08:06:35 +0000
@@ -970,6 +970,16 @@ sub collect_one_test_case {
}
}
+ if ( $tinfo->{'not_valgrind'} )
+ {
+ if ( $::opt_valgrind_mysqld )
+ {
+ $tinfo->{'skip'}= 1;
+ $tinfo->{'comment'}= "Not compatible with Valgrind testing";
+ return $tinfo;
+ }
+ }
+
# ----------------------------------------------------------------------
# Find config file to use if not already selected in <testname>.opt file
# ----------------------------------------------------------------------
@@ -1050,6 +1060,7 @@ my @tags=
["include/ndb_master-slave.inc", "ndb_test", 1],
["federated.inc", "federated_test", 1],
["include/not_embedded.inc", "not_embedded", 1],
+ ["include/not_valgrind.inc", "not_valgrind", 1],
);
=== modified file 'mysql-test/mysql-test-run.pl'
--- a/mysql-test/mysql-test-run.pl 2009-06-18 12:39:21 +0000
+++ b/mysql-test/mysql-test-run.pl 2009-06-22 08:06:35 +0000
@@ -224,7 +224,7 @@ my $opt_strace_client;
our $opt_user = "root";
my $opt_valgrind= 0;
-my $opt_valgrind_mysqld= 0;
+our $opt_valgrind_mysqld= 0;
my $opt_valgrind_mysqltest= 0;
my @default_valgrind_args= ("--show-reachable=yes");
my @valgrind_args;
=== modified file 'mysys/thr_mutex.c'
--- a/mysys/thr_mutex.c 2009-06-09 15:08:46 +0000
+++ b/mysys/thr_mutex.c 2009-06-22 08:06:35 +0000
@@ -160,6 +160,9 @@ static int safe_mutex_lazy_init_deadlock
&mp->locked_mutex, sizeof(*mp->locked_mutex),
&mp->used_mutex, sizeof(*mp->used_mutex), NullS))
{
+ /* Disable deadlock handling for this mutex */
+ mp->create_flags|= MYF_NO_DEADLOCK_DETECTION;
+ mp->active_flags|= MYF_NO_DEADLOCK_DETECTION;
return 1; /* Error */
}
@@ -196,6 +199,9 @@ int safe_mutex_init(safe_mutex_t *mp,
mp->line= line;
/* Skip the very common '&' prefix from the autogenerated name */
mp->name= name[0] == '&' ? name + 1 : name;
+
+ if (!safe_mutex_deadlock_detector)
+ my_flags|= MYF_NO_DEADLOCK_DETECTION;
/* Deadlock detection is initialised only lazily, on first use. */
mp->create_flags= my_flags;
=== modified file 'storage/xtradb/ibuf/ibuf0ibuf.c'
--- a/storage/xtradb/ibuf/ibuf0ibuf.c 2009-06-09 15:08:46 +0000
+++ b/storage/xtradb/ibuf/ibuf0ibuf.c 2009-06-22 08:06:35 +0000
@@ -422,12 +422,7 @@ ibuf_init_at_db_start(void)
grow in size, as the references on the upper levels of the tree can
change */
- /* The default for ibuf_max_size is calculated from the requested
- buffer pool size srv_buf_pool_size, not the actual size as returned
- by buf_pool_get_curr_size(). The latter can differ from the former
- by one page due to alignment requirements, and we do not want a
- user-visible variable like INNODB_IBUF_MAX_SIZE to vary at random. */
- ibuf->max_size = ut_min( srv_buf_pool_size / UNIV_PAGE_SIZE
+ ibuf->max_size = ut_min( buf_pool_get_curr_size() / UNIV_PAGE_SIZE
/ IBUF_POOL_SIZE_PER_MAX_SIZE, (ulint) srv_ibuf_max_size / UNIV_PAGE_SIZE);
srv_ibuf_max_size = (long long) ibuf->max_size * UNIV_PAGE_SIZE;
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (sanja:2712) Bug#41098
by sanja@askmonty.org 20 Jun '09
by sanja@askmonty.org 20 Jun '09
20 Jun '09
#At lp:maria
2712 sanja(a)askmonty.org 2009-06-11
Real fix for bug Bug#41098 (http://bugs.mysql.com/bug.php?id=41098) Invalidate tables changed in insert after unlocking tables when the result of insert become really visible.
modified:
sql/handler.h
sql/mysql_priv.h
sql/sql_base.cc
sql/sql_cache.cc
sql/sql_cache.h
sql/sql_delete.cc
sql/sql_insert.cc
sql/sql_load.cc
sql/sql_parse.cc
sql/sql_partition.cc
sql/sql_rename.cc
sql/sql_table.cc
sql/sql_update.cc
sql/sql_view.cc
sql/table.h
storage/maria/ha_maria.h
storage/myisam/ha_myisam.h
per-file messages:
sql/handler.h
table type for nontransactional tables with delayed to unlock insert visibility.
sql/mysql_priv.h
Invalidate call changed.
sql/sql_base.cc
Invalidation of marked tables.
sql/sql_cache.cc
Marking tables for on-unlock-invalidation added.
sql/sql_cache.h
Invalidate call changed.
sql/sql_delete.cc
Invalidate call changed.
sql/sql_insert.cc
Invalidate call changed.
sql/sql_load.cc
Invalidate call changed.
sql/sql_parse.cc
Invalidate call changed.
sql/sql_partition.cc
Invalidate call changed.
sql/sql_rename.cc
Invalidate call changed.
sql/sql_table.cc
Invalidate call changed.
sql/sql_update.cc
Invalidate call changed.
sql/sql_view.cc
Invalidate call changed.
sql/table.h
mark for tables which need query cache invalidation on unlock.
storage/maria/ha_maria.h
MyISAM and maria 1.5 use the new type of table for query cache.
storage/myisam/ha_myisam.h
MyISAM and maria 1.5 use the new type of table for query cache.
=== modified file 'sql/handler.h'
--- a/sql/handler.h 2009-02-19 09:01:25 +0000
+++ b/sql/handler.h 2009-06-11 12:45:53 +0000
@@ -247,6 +247,11 @@
#define HA_CACHE_TBL_NOCACHE 1
#define HA_CACHE_TBL_ASKTRANSACT 2
#define HA_CACHE_TBL_TRANSACT 4
+/**
+ Non transactional table but insert results visible for other threads
+ only on unlock
+*/
+#define HA_CACHE_TBL_NTRNS_INS2LOCK 8
/* Options of START TRANSACTION statement (and later of SET TRANSACTION stmt) */
#define MYSQL_START_TRANS_OPT_WITH_CONS_SNAPSHOT 1
=== modified file 'sql/mysql_priv.h'
--- a/sql/mysql_priv.h 2009-04-25 10:05:32 +0000
+++ b/sql/mysql_priv.h 2009-06-11 12:45:53 +0000
@@ -893,7 +893,7 @@ struct Query_cache_query_flags
#define query_cache_init() query_cache.init()
#define query_cache_resize(A) query_cache.resize(A)
#define query_cache_set_min_res_unit(A) query_cache.set_min_res_unit(A)
-#define query_cache_invalidate3(A, B, C) query_cache.invalidate(A, B, C)
+#define query_cache_invalidate4(A, B, C, D) query_cache.invalidate(A, B, C, D)
#define query_cache_invalidate1(A) query_cache.invalidate(A)
#define query_cache_send_result_to_client(A, B, C) \
query_cache.send_result_to_client(A, B, C)
@@ -912,7 +912,7 @@ struct Query_cache_query_flags
#define query_cache_init()
#define query_cache_resize(A)
#define query_cache_set_min_res_unit(A)
-#define query_cache_invalidate3(A, B, C)
+#define query_cache_invalidate4(A, B, C, D)
#define query_cache_invalidate1(A)
#define query_cache_send_result_to_client(A, B, C) 0
#define query_cache_invalidate_by_MyISAM_filename_ref NULL
=== modified file 'sql/sql_base.cc'
--- a/sql/sql_base.cc 2009-05-19 09:28:05 +0000
+++ b/sql/sql_base.cc 2009-06-11 12:45:53 +0000
@@ -1373,6 +1373,15 @@ bool close_thread_table(THD *thd, TABLE
table->s->table_name.str, (long) table));
*table_ptr=table->next;
+
+ /* Invalidate if it has mark about changing in insert
+ (not all tables has such marks */
+ if (table->changed_in_insert)
+ {
+ table->changed_in_insert= FALSE;
+ query_cache_invalidate4(thd, table, FALSE, FALSE);
+ }
+
/*
When closing a MERGE parent or child table, detach the children first.
Clear child table references to force new assignment at next open.
=== modified file 'sql/sql_cache.cc'
--- a/sql/sql_cache.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_cache.cc 2009-06-11 12:45:53 +0000
@@ -1542,12 +1542,19 @@ err:
}
-/*
+/**
Remove all cached queries that uses any of the tables in the list
+
+ @param thd Thread handler
+ @param tables_used List of tables used in this operation
+ @param using_transactions Not in autocommit mode
+ @param insert It is insert operation
+
*/
void Query_cache::invalidate(THD *thd, TABLE_LIST *tables_used,
- my_bool using_transactions)
+ my_bool using_transactions,
+ my_bool insert)
{
DBUG_ENTER("Query_cache::invalidate (table list)");
@@ -1567,6 +1574,15 @@ void Query_cache::invalidate(THD *thd, T
force transaction finish.
*/
thd->add_changed_table(tables_used->table);
+ else if (insert &&
+ (tables_used->table->file->table_cache_type() ==
+ HA_CACHE_TBL_NTRNS_INS2LOCK))
+ {
+ /* for other threads */
+ tables_used->table->changed_in_insert= TRUE;
+ /* for this thread */
+ invalidate_table(thd, tables_used);
+ }
else
invalidate_table(thd, tables_used);
}
@@ -1619,12 +1635,18 @@ void Query_cache::invalidate_locked_for_
DBUG_VOID_RETURN;
}
-/*
+/**
Remove all cached queries that uses the given table
+
+ @param thd Thread handler
+ @param table TABLE descriptor
+ @param using_transactions Not in autocommit mode
+ @param insert It is insert operation
*/
-void Query_cache::invalidate(THD *thd, TABLE *table,
- my_bool using_transactions)
+void Query_cache::invalidate(THD *thd, TABLE *table,
+ my_bool using_transactions,
+ my_bool insert)
{
DBUG_ENTER("Query_cache::invalidate (table)");
@@ -1633,13 +1655,22 @@ void Query_cache::invalidate(THD *thd, T
if (using_transactions &&
(table->file->table_cache_type() == HA_CACHE_TBL_TRANSACT))
thd->add_changed_table(table);
+ else if (insert &&
+ (table->file->table_cache_type() ==
+ HA_CACHE_TBL_NTRNS_INS2LOCK))
+ {
+ /* for other threads */
+ table->changed_in_insert= TRUE;
+ /* for this thread */
+ invalidate_table(thd, table);
+ }
else
invalidate_table(thd, table);
-
DBUG_VOID_RETURN;
}
+
void Query_cache::invalidate(THD *thd, const char *key, uint32 key_length,
my_bool using_transactions)
{
=== modified file 'sql/sql_cache.h'
--- a/sql/sql_cache.h 2008-07-24 13:41:55 +0000
+++ b/sql/sql_cache.h 2009-06-11 12:45:53 +0000
@@ -445,10 +445,11 @@ protected:
/* Remove all queries that uses any of the listed following tables */
void invalidate(THD* thd, TABLE_LIST *tables_used,
- my_bool using_transactions);
+ my_bool using_transactions, my_bool insert);
void invalidate(CHANGED_TABLE_LIST *tables_used);
void invalidate_locked_for_write(TABLE_LIST *tables_used);
- void invalidate(THD* thd, TABLE *table, my_bool using_transactions);
+ void invalidate(THD* thd, TABLE *table,
+ my_bool using_transactions, my_bool insert);
void invalidate(THD *thd, const char *key, uint32 key_length,
my_bool using_transactions);
=== modified file 'sql/sql_delete.cc'
--- a/sql/sql_delete.cc 2009-04-25 09:04:38 +0000
+++ b/sql/sql_delete.cc 2009-06-11 12:45:53 +0000
@@ -370,7 +370,7 @@ cleanup:
*/
if (deleted)
{
- query_cache_invalidate3(thd, table_list, 1);
+ query_cache_invalidate4(thd, table_list, TRUE, FALSE);
}
delete select;
@@ -783,7 +783,7 @@ void multi_delete::abort()
/* Something already deleted so we have to invalidate cache */
if (deleted)
- query_cache_invalidate3(thd, delete_tables, 1);
+ query_cache_invalidate4(thd, delete_tables, TRUE, FALSE);
/*
If rows from the first table only has been deleted and it is
@@ -933,7 +933,7 @@ bool multi_delete::send_eof()
*/
if (deleted)
{
- query_cache_invalidate3(thd, delete_tables, 1);
+ query_cache_invalidate4(thd, delete_tables, TRUE, FALSE);
}
if ((local_error == 0) || thd->transaction.stmt.modified_non_trans_table)
{
@@ -1074,7 +1074,7 @@ bool mysql_truncate(THD *thd, TABLE_LIST
error= ha_create_table(thd, path, table_list->db, table_list->table_name,
&create_info, 1);
VOID(pthread_mutex_unlock(&LOCK_open));
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
end:
if (!dont_send_ok)
=== modified file 'sql/sql_insert.cc'
--- a/sql/sql_insert.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_insert.cc 2009-06-11 12:45:53 +0000
@@ -880,7 +880,7 @@ bool mysql_insert(THD *thd,TABLE_LIST *t
For the transactional algorithm to work the invalidation must be
before binlog writing and ha_autocommit_or_rollback
*/
- query_cache_invalidate3(thd, table_list, 1);
+ query_cache_invalidate4(thd, table_list, TRUE, TRUE);
}
if ((changed && error <= 0) ||
thd->transaction.stmt.modified_non_trans_table ||
@@ -2743,7 +2743,7 @@ bool Delayed_insert::handle_inserts(void
DBUG_PRINT("error", ("HA_EXTRA_NO_CACHE failed in loop"));
goto err;
}
- query_cache_invalidate3(&thd, table, 1);
+ query_cache_invalidate4(&thd, table, TRUE, TRUE);
if (thr_reschedule_write_lock(*thd.lock->locks))
{
/* This is not known to happen. */
@@ -2785,7 +2785,7 @@ bool Delayed_insert::handle_inserts(void
DBUG_PRINT("error", ("HA_EXTRA_NO_CACHE failed after loop"));
goto err;
}
- query_cache_invalidate3(&thd, table, 1);
+ query_cache_invalidate4(&thd, table, TRUE, TRUE);
pthread_mutex_lock(&mutex);
DBUG_RETURN(0);
@@ -3208,7 +3208,7 @@ bool select_insert::send_eof()
We must invalidate the table in the query cache before binlog writing
and ha_autocommit_or_rollback.
*/
- query_cache_invalidate3(thd, table, 1);
+ query_cache_invalidate4(thd, table, TRUE, TRUE);
if (thd->transaction.stmt.modified_non_trans_table)
thd->transaction.all.modified_non_trans_table= TRUE;
}
@@ -3299,7 +3299,7 @@ void select_insert::abort() {
if (!thd->current_stmt_binlog_row_based && !can_rollback_data())
thd->transaction.all.modified_non_trans_table= TRUE;
if (changed)
- query_cache_invalidate3(thd, table, 1);
+ query_cache_invalidate4(thd, table, TRUE, TRUE);
}
DBUG_ASSERT(transactional_table || !changed ||
thd->transaction.stmt.modified_non_trans_table);
=== modified file 'sql/sql_load.cc'
--- a/sql/sql_load.cc 2009-05-19 09:28:05 +0000
+++ b/sql/sql_load.cc 2009-06-11 12:45:53 +0000
@@ -446,7 +446,7 @@ int mysql_load(THD *thd,sql_exchange *ex
We must invalidate the table in query cache before binlog writing and
ha_autocommit_...
*/
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
if (error)
{
if (read_file_from_client)
=== modified file 'sql/sql_parse.cc'
--- a/sql/sql_parse.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_parse.cc 2009-06-11 12:45:53 +0000
@@ -3211,7 +3211,7 @@ end_with_restore_list:
/* INSERT ... SELECT should invalidate only the very first table */
TABLE_LIST *save_table= first_table->next_local;
first_table->next_local= 0;
- query_cache_invalidate3(thd, first_table, 1);
+ query_cache_invalidate4(thd, first_table, TRUE, FALSE);
first_table->next_local= save_table;
}
delete sel_result;
=== modified file 'sql/sql_partition.cc'
--- a/sql/sql_partition.cc 2009-02-15 10:58:34 +0000
+++ b/sql/sql_partition.cc 2009-06-11 12:45:53 +0000
@@ -4012,7 +4012,7 @@ static int fast_end_partition(THD *thd,
thd->proc_info="end";
if (!is_empty)
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
error= ha_autocommit_or_rollback(thd, 0);
if (end_active_trans(thd))
=== modified file 'sql/sql_rename.cc'
--- a/sql/sql_rename.cc 2008-02-19 12:45:21 +0000
+++ b/sql/sql_rename.cc 2009-06-11 12:45:53 +0000
@@ -182,7 +182,7 @@ bool mysql_rename_tables(THD *thd, TABLE
}
if (!error)
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
pthread_mutex_lock(&LOCK_open);
unlock_table_names(thd, table_list, (TABLE_LIST*) 0);
=== modified file 'sql/sql_table.cc'
--- a/sql/sql_table.cc 2009-06-02 09:58:27 +0000
+++ b/sql/sql_table.cc 2009-06-11 12:45:53 +0000
@@ -1775,7 +1775,7 @@ int mysql_rm_table_part2(THD *thd, TABLE
if (some_tables_deleted || tmp_table_deleted || !error)
{
- query_cache_invalidate3(thd, tables, 0);
+ query_cache_invalidate4(thd, tables, FALSE, FALSE);
if (!dont_log_query)
{
if (!thd->current_stmt_binlog_row_based ||
@@ -4372,7 +4372,7 @@ static bool mysql_admin_table(THD* thd,
if (thd->killed)
goto err;
/* Flush entries in the query cache involving this table. */
- query_cache_invalidate3(thd, table->table, 0);
+ query_cache_invalidate4(thd, table->table, FALSE, FALSE);
open_for_modify= 0;
}
@@ -4628,7 +4628,7 @@ send_result_message:
pthread_mutex_unlock(&LOCK_open);
}
/* May be something modified consequently we have to invalidate cache */
- query_cache_invalidate3(thd, table->table, 0);
+ query_cache_invalidate4(thd, table->table, FALSE, FALSE);
}
}
ha_autocommit_or_rollback(thd, 0);
@@ -5163,7 +5163,7 @@ mysql_discard_or_import_tablespace(THD *
The 0 in the call below means 'not in a transaction', which means
immediate invalidation; that is probably what we wish here
*/
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
/* The ALTER TABLE is always in its own transaction */
error = ha_autocommit_or_rollback(thd, 0);
@@ -6433,7 +6433,7 @@ view_err:
unlink_open_table(thd, name_lock, FALSE);
VOID(pthread_mutex_unlock(&LOCK_open));
table_list->table= NULL; // For query cache
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
DBUG_RETURN(error);
}
@@ -7097,7 +7097,7 @@ view_err:
ha_flush_logs(old_db_type);
}
table_list->table=0; // For query cache
- query_cache_invalidate3(thd, table_list, 0);
+ query_cache_invalidate4(thd, table_list, FALSE, FALSE);
if (thd->locked_tables && (new_name != table_name || new_db != db))
{
=== modified file 'sql/sql_update.cc'
--- a/sql/sql_update.cc 2009-04-25 09:04:38 +0000
+++ b/sql/sql_update.cc 2009-06-11 12:45:53 +0000
@@ -779,7 +779,7 @@ int mysql_update(THD *thd,
*/
if (updated)
{
- query_cache_invalidate3(thd, table_list, 1);
+ query_cache_invalidate4(thd, table_list, TRUE, FALSE);
}
/*
@@ -1780,7 +1780,7 @@ void multi_update::abort()
/* Something already updated so we have to invalidate cache */
if (updated)
- query_cache_invalidate3(thd, update_tables, 1);
+ query_cache_invalidate4(thd, update_tables, TRUE, FALSE);
/*
If all tables that has been updated are trans safe then just do rollback.
If not attempt to do remaining updates.
@@ -2023,7 +2023,7 @@ bool multi_update::send_eof()
if (updated)
{
- query_cache_invalidate3(thd, update_tables, 1);
+ query_cache_invalidate4(thd, update_tables, TRUE, FALSE);
}
/*
Write the SQL statement to the binlog if we updated
=== modified file 'sql/sql_view.cc'
--- a/sql/sql_view.cc 2009-04-25 10:05:32 +0000
+++ b/sql/sql_view.cc 2009-06-11 12:45:53 +0000
@@ -667,7 +667,7 @@ bool mysql_create_view(THD *thd, TABLE_L
VOID(pthread_mutex_unlock(&LOCK_open));
if (mode != VIEW_CREATE_NEW)
- query_cache_invalidate3(thd, view, 0);
+ query_cache_invalidate4(thd, view, FALSE, FALSE);
start_waiting_global_read_lock(thd);
if (res)
goto err;
@@ -1631,7 +1631,7 @@ bool mysql_drop_view(THD *thd, TABLE_LIS
pthread_mutex_unlock(&share->mutex);
release_table_share(share, RELEASE_WAIT_FOR_DROP);
}
- query_cache_invalidate3(thd, view, 0);
+ query_cache_invalidate4(thd, view, FALSE, FALSE);
sp_cache_invalidate();
}
@@ -1983,7 +1983,7 @@ mysql_rename_view(THD *thd,
DBUG_RETURN(1);
/* remove cache entries */
- query_cache_invalidate3(thd, view, 0);
+ query_cache_invalidate4(thd, view, FALSE, FALSE);
sp_cache_invalidate();
error= FALSE;
=== modified file 'sql/table.h'
--- a/sql/table.h 2009-02-19 09:01:25 +0000
+++ b/sql/table.h 2009-06-11 12:45:53 +0000
@@ -790,6 +790,7 @@ struct st_table {
my_bool get_fields_in_item_tree; /* Signal to fix_field */
/* If MERGE children attached to parent. See top comment in ha_myisammrg.cc */
my_bool children_attached;
+ my_bool changed_in_insert; /* Have been changed in insert since last lock */
REGINFO reginfo; /* field connections */
MEM_ROOT mem_root;
=== modified file 'storage/maria/ha_maria.h'
--- a/storage/maria/ha_maria.h 2008-12-02 22:02:52 +0000
+++ b/storage/maria/ha_maria.h 2009-06-11 12:45:53 +0000
@@ -164,4 +164,6 @@ public:
return file;
}
static int implicit_commit(THD *thd, bool new_trn);
+ /** Type of table for caching query */
+ virtual uint8 table_cache_type() { return HA_CACHE_TBL_NTRNS_INS2LOCK; }
};
=== modified file 'storage/myisam/ha_myisam.h'
--- a/storage/myisam/ha_myisam.h 2008-06-28 12:45:15 +0000
+++ b/storage/myisam/ha_myisam.h 2009-06-11 12:45:53 +0000
@@ -147,4 +147,6 @@ class ha_myisam: public handler
{
return file;
}
+ /** Type of table for caching query */
+ virtual uint8 table_cache_type() { return HA_CACHE_TBL_NTRNS_INS2LOCK; }
};
2
1
[Maria-developers] Updated (by Guest): Using the Valgrind API in mysqld (23)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Using the Valgrind API in mysqld
CREATION DATE..: Fri, 22 May 2009, 11:43
SUPERVISOR.....: Psergey
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 23 (http://askmonty.org/worklog/?tid=23)
VERSION........: Server-9.x
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 40 (hours remain)
ORIG. ESTIMATE.: 40
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 21:35)=-=-
Version updated.
--- /tmp/wklog.23.old.7421 2009-06-20 21:35:40.000000000 +0300
+++ /tmp/wklog.23.new.7421 2009-06-20 21:35:40.000000000 +0300
@@ -1 +1 @@
-Connector/J-3.2
+Server-9.x
-=-=(Guest - Fri, 19 Jun 2009, 20:48)=-=-
Version updated.
--- /tmp/wklog.23.old.1520 2009-06-19 20:48:56.000000000 +0300
+++ /tmp/wklog.23.new.1520 2009-06-19 20:48:56.000000000 +0300
@@ -1 +1 @@
-GUI-Tools-3.0
+Connector/J-3.2
-=-=(Guest - Fri, 19 Jun 2009, 20:45)=-=-
Version updated.
--- /tmp/wklog.23.old.1480 2009-06-19 20:45:53.000000000 +0300
+++ /tmp/wklog.23.new.1480 2009-06-19 20:45:53.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-2.0
+GUI-Tools-3.0
-=-=(Guest - Fri, 19 Jun 2009, 20:42)=-=-
Version updated.
--- /tmp/wklog.23.old.1373 2009-06-19 20:42:48.000000000 +0300
+++ /tmp/wklog.23.new.1373 2009-06-19 20:42:48.000000000 +0300
@@ -1 +1 @@
-WorkLog-3.4
+Connector/.NET-2.0
-=-=(Guest - Fri, 19 Jun 2009, 20:39)=-=-
Version updated.
--- /tmp/wklog.23.old.1347 2009-06-19 20:39:44.000000000 +0300
+++ /tmp/wklog.23.new.1347 2009-06-19 20:39:44.000000000 +0300
@@ -1 +1 @@
-Server-6.0
+WorkLog-3.4
-=-=(Guest - Fri, 19 Jun 2009, 20:36)=-=-
Version updated.
--- /tmp/wklog.23.old.1240 2009-06-19 20:36:40.000000000 +0300
+++ /tmp/wklog.23.new.1240 2009-06-19 20:36:40.000000000 +0300
@@ -1 +1 @@
-Connector/J-5.2
+Server-6.0
-=-=(Guest - Fri, 19 Jun 2009, 20:33)=-=-
Version updated.
--- /tmp/wklog.23.old.1136 2009-06-19 20:33:36.000000000 +0300
+++ /tmp/wklog.23.new.1136 2009-06-19 20:33:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+Connector/J-5.2
-=-=(Guest - Fri, 19 Jun 2009, 20:30)=-=-
Version updated.
--- /tmp/wklog.23.old.1109 2009-06-19 20:30:32.000000000 +0300
+++ /tmp/wklog.23.new.1109 2009-06-19 20:30:32.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Server-5.1
-=-=(Guest - Fri, 19 Jun 2009, 20:27)=-=-
Version updated.
--- /tmp/wklog.23.old.1004 2009-06-19 20:27:28.000000000 +0300
+++ /tmp/wklog.23.new.1004 2009-06-19 20:27:28.000000000 +0300
@@ -1 +1 @@
-Connector/J-3.1
+Maria-1.0
-=-=(Guest - Fri, 19 Jun 2009, 20:24)=-=-
Version updated.
--- /tmp/wklog.23.old.907 2009-06-19 20:24:24.000000000 +0300
+++ /tmp/wklog.23.new.907 2009-06-19 20:24:24.000000000 +0300
@@ -1 +1 @@
-Maria-1.1
+Connector/J-3.1
------------------------------------------------------------
-=-=(View All Progress Notes, 173 total)=-=-
http://askmonty.org/worklog/index.pl?tid=23&nolimit=1
DESCRIPTION:
Valgrind (the memcheck tool) has some very useful APIs that can be used in mysqld
when testing with Valgrind to improve testing and/or debugging:
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.clientreqs
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.mempools
This worklog is about adding configure checks and headers to allow to use these
in a way that continues to work on machines where the Valgrind headers or
functionality is missing.
It also includes adding some basic Valgrind enhancements:
- Adding Valgrind annotations to custom memory allocators so that Valgrind can
detect leaks, use-before-init, and use-after-free problems also for these
allocators.
- Adding checks for definedness in appropriate places (eg. when calling libz).
HIGH-LEVEL SPECIFICATION:
With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
a memory block is allocated (so that data read from memory is marked as undefined
instead of being defined or not at random depending on prior use); and when a
memory block is freed (so that use after freeing can be reported as an error).
In some cases cheking for leaks may also be appropriate.
Another possibility is to add an explicit check for whether memory is defined.
One place this would be useful is when calling libz. Due to the design of that
library, Valgrind produces lots of false alarms about using undefined values
(I think the issue is that it runs a few bytes off of initialized memory to
reduce boundary checks in each loop iteration, then after the loop has checks to
avoid using the undefined part of the result). This means we have lots of libz
Valgrind suppressions and continue to add more as new warnings surface. So we
might easily miss a real problem in this area. This could be improved by adding
explicit checks at the call to libz functions that the passed memory is properly
defined.
Another use is to improve debugging. It is often the case when debugging a
warning about using un-initialised memory that the detection happens long after
the real problem, the un-initialized value being passed along through the code
for a long time before being detected. This makes debugging the problem slow.
By adding in strategic places code that asserts that a specific value must be
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
See also a patch here: http://bugs.mysql.com/bug.php?id=44582
LOW-LEVEL DESIGN:
Two places where we call into libz, and where checking for defined parameters
would be good:
- mysys/my_compress.c
- sql/item_strfunc.cc (Item_func_compress).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Using the Valgrind API in mysqld (23)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Using the Valgrind API in mysqld
CREATION DATE..: Fri, 22 May 2009, 11:43
SUPERVISOR.....: Psergey
IMPLEMENTOR....: Knielsen
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 23 (http://askmonty.org/worklog/?tid=23)
VERSION........: Server-9.x
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 40 (hours remain)
ORIG. ESTIMATE.: 40
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 21:35)=-=-
Version updated.
--- /tmp/wklog.23.old.7421 2009-06-20 21:35:40.000000000 +0300
+++ /tmp/wklog.23.new.7421 2009-06-20 21:35:40.000000000 +0300
@@ -1 +1 @@
-Connector/J-3.2
+Server-9.x
-=-=(Guest - Fri, 19 Jun 2009, 20:48)=-=-
Version updated.
--- /tmp/wklog.23.old.1520 2009-06-19 20:48:56.000000000 +0300
+++ /tmp/wklog.23.new.1520 2009-06-19 20:48:56.000000000 +0300
@@ -1 +1 @@
-GUI-Tools-3.0
+Connector/J-3.2
-=-=(Guest - Fri, 19 Jun 2009, 20:45)=-=-
Version updated.
--- /tmp/wklog.23.old.1480 2009-06-19 20:45:53.000000000 +0300
+++ /tmp/wklog.23.new.1480 2009-06-19 20:45:53.000000000 +0300
@@ -1 +1 @@
-Connector/.NET-2.0
+GUI-Tools-3.0
-=-=(Guest - Fri, 19 Jun 2009, 20:42)=-=-
Version updated.
--- /tmp/wklog.23.old.1373 2009-06-19 20:42:48.000000000 +0300
+++ /tmp/wklog.23.new.1373 2009-06-19 20:42:48.000000000 +0300
@@ -1 +1 @@
-WorkLog-3.4
+Connector/.NET-2.0
-=-=(Guest - Fri, 19 Jun 2009, 20:39)=-=-
Version updated.
--- /tmp/wklog.23.old.1347 2009-06-19 20:39:44.000000000 +0300
+++ /tmp/wklog.23.new.1347 2009-06-19 20:39:44.000000000 +0300
@@ -1 +1 @@
-Server-6.0
+WorkLog-3.4
-=-=(Guest - Fri, 19 Jun 2009, 20:36)=-=-
Version updated.
--- /tmp/wklog.23.old.1240 2009-06-19 20:36:40.000000000 +0300
+++ /tmp/wklog.23.new.1240 2009-06-19 20:36:40.000000000 +0300
@@ -1 +1 @@
-Connector/J-5.2
+Server-6.0
-=-=(Guest - Fri, 19 Jun 2009, 20:33)=-=-
Version updated.
--- /tmp/wklog.23.old.1136 2009-06-19 20:33:36.000000000 +0300
+++ /tmp/wklog.23.new.1136 2009-06-19 20:33:36.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+Connector/J-5.2
-=-=(Guest - Fri, 19 Jun 2009, 20:30)=-=-
Version updated.
--- /tmp/wklog.23.old.1109 2009-06-19 20:30:32.000000000 +0300
+++ /tmp/wklog.23.new.1109 2009-06-19 20:30:32.000000000 +0300
@@ -1 +1 @@
-Maria-1.0
+Server-5.1
-=-=(Guest - Fri, 19 Jun 2009, 20:27)=-=-
Version updated.
--- /tmp/wklog.23.old.1004 2009-06-19 20:27:28.000000000 +0300
+++ /tmp/wklog.23.new.1004 2009-06-19 20:27:28.000000000 +0300
@@ -1 +1 @@
-Connector/J-3.1
+Maria-1.0
-=-=(Guest - Fri, 19 Jun 2009, 20:24)=-=-
Version updated.
--- /tmp/wklog.23.old.907 2009-06-19 20:24:24.000000000 +0300
+++ /tmp/wklog.23.new.907 2009-06-19 20:24:24.000000000 +0300
@@ -1 +1 @@
-Maria-1.1
+Connector/J-3.1
------------------------------------------------------------
-=-=(View All Progress Notes, 173 total)=-=-
http://askmonty.org/worklog/index.pl?tid=23&nolimit=1
DESCRIPTION:
Valgrind (the memcheck tool) has some very useful APIs that can be used in mysqld
when testing with Valgrind to improve testing and/or debugging:
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.clientreqs
file:///usr/share/doc/valgrind/html/mc-manual.html#mc-manual.mempools
This worklog is about adding configure checks and headers to allow to use these
in a way that continues to work on machines where the Valgrind headers or
functionality is missing.
It also includes adding some basic Valgrind enhancements:
- Adding Valgrind annotations to custom memory allocators so that Valgrind can
detect leaks, use-before-init, and use-after-free problems also for these
allocators.
- Adding checks for definedness in appropriate places (eg. when calling libz).
HIGH-LEVEL SPECIFICATION:
With custom memory allocators, using the Valgrind APIs we can tell Valgrind when
a memory block is allocated (so that data read from memory is marked as undefined
instead of being defined or not at random depending on prior use); and when a
memory block is freed (so that use after freeing can be reported as an error).
In some cases cheking for leaks may also be appropriate.
Another possibility is to add an explicit check for whether memory is defined.
One place this would be useful is when calling libz. Due to the design of that
library, Valgrind produces lots of false alarms about using undefined values
(I think the issue is that it runs a few bytes off of initialized memory to
reduce boundary checks in each loop iteration, then after the loop has checks to
avoid using the undefined part of the result). This means we have lots of libz
Valgrind suppressions and continue to add more as new warnings surface. So we
might easily miss a real problem in this area. This could be improved by adding
explicit checks at the call to libz functions that the passed memory is properly
defined.
Another use is to improve debugging. It is often the case when debugging a
warning about using un-initialised memory that the detection happens long after
the real problem, the un-initialized value being passed along through the code
for a long time before being detected. This makes debugging the problem slow.
By adding in strategic places code that asserts that a specific value must be
initialised, it is possible to detect problems earlier, speeding up debugging.
Such code can be added in more places over time as development and debugging
goes on.
See also a patch here: http://bugs.mysql.com/bug.php?id=44582
LOW-LEVEL DESIGN:
Two places where we call into libz, and where checking for defined parameters
would be good:
- mysys/my_compress.c
- sql/item_strfunc.cc (Item_func_compress).
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Better choice between range and index_merge/intersection options (26)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Better choice between range and index_merge/intersection options
CREATION DATE..: Wed, 27 May 2009, 15:20
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 26 (http://askmonty.org/worklog/?tid=26)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 09:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.26.old.21828 2009-06-20 09:39:25.000000000 +0300
+++ /tmp/wklog.26.new.21828 2009-06-20 09:39:25.000000000 +0300
@@ -1 +1,9 @@
+* Not a spec but rather preliminary notes*
+User cases
+----------
+
+* BUG#32254: Index merge used unnecessarily
+* BUG#34869: Almost 300% regression in index merge intersect performance
+* BUG#40051: needed way to prevent optimizer from using index_merge on useless
+keys (no testcase)
-=-=(Guest - Sat, 13 Jun 2009, 06:51)=-=-
Category updated.
--- /tmp/wklog.26.old.26624 2009-06-13 06:51:56.000000000 +0300
+++ /tmp/wklog.26.new.26624 2009-06-13 06:51:56.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:06)=-=-
Category updated.
--- /tmp/wklog.26.old.24784 2009-06-13 06:06:28.000000000 +0300
+++ /tmp/wklog.26.new.24784 2009-06-13 06:06:28.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 26
DESCRIPTION:
The optimizer does a cost-based choice between possible range and
index_merge/intersection scans. There are some issues with it:
- index_merge/intersection gets chosen even when there is a single
multi-part index that covers all keys. Measurements show that this is
a poor choice.
- The picked index_merge/intersection can use a redundant set of indexes:
it will be intersect(idx1, ..., idxN) where all columns in idxN are covered
by other used indexes.
This WL is to fix these limitations.
HIGH-LEVEL SPECIFICATION:
* Not a spec but rather preliminary notes*
User cases
----------
* BUG#32254: Index merge used unnecessarily
* BUG#34869: Almost 300% regression in index merge intersect performance
* BUG#40051: needed way to prevent optimizer from using index_merge on useless
keys (no testcase)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Better choice between range and index_merge/intersection options (26)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Better choice between range and index_merge/intersection options
CREATION DATE..: Wed, 27 May 2009, 15:20
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 26 (http://askmonty.org/worklog/?tid=26)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 09:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.26.old.21828 2009-06-20 09:39:25.000000000 +0300
+++ /tmp/wklog.26.new.21828 2009-06-20 09:39:25.000000000 +0300
@@ -1 +1,9 @@
+* Not a spec but rather preliminary notes*
+User cases
+----------
+
+* BUG#32254: Index merge used unnecessarily
+* BUG#34869: Almost 300% regression in index merge intersect performance
+* BUG#40051: needed way to prevent optimizer from using index_merge on useless
+keys (no testcase)
-=-=(Guest - Sat, 13 Jun 2009, 06:51)=-=-
Category updated.
--- /tmp/wklog.26.old.26624 2009-06-13 06:51:56.000000000 +0300
+++ /tmp/wklog.26.new.26624 2009-06-13 06:51:56.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:06)=-=-
Category updated.
--- /tmp/wklog.26.old.24784 2009-06-13 06:06:28.000000000 +0300
+++ /tmp/wklog.26.new.24784 2009-06-13 06:06:28.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 26
DESCRIPTION:
The optimizer does a cost-based choice between possible range and
index_merge/intersection scans. There are some issues with it:
- index_merge/intersection gets chosen even when there is a single
multi-part index that covers all keys. Measurements show that this is
a poor choice.
- The picked index_merge/intersection can use a redundant set of indexes:
it will be intersect(idx1, ..., idxN) where all columns in idxN are covered
by other used indexes.
This WL is to fix these limitations.
HIGH-LEVEL SPECIFICATION:
* Not a spec but rather preliminary notes*
User cases
----------
* BUG#32254: Index merge used unnecessarily
* BUG#34869: Almost 300% regression in index merge intersect performance
* BUG#40051: needed way to prevent optimizer from using index_merge on useless
keys (no testcase)
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300
+++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300
@@ -4,6 +4,7 @@
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
+3. Testing and required coverage
</contents>
1. Current implementation overview
@@ -240,3 +241,14 @@
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
+
+3. Testing and required coverage
+================================
+So far could find the following user cases:
+
+* BUG#17259: Query optimizer chooses wrong index
+* BUG#17673: Optimizer does not use Index Merge optimization in some cases
+* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
+* BUG#30151: optimizer is very reluctant to chose index_merge algorithm
+
+
-=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300
+++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300
@@ -141,13 +141,15 @@
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
+
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
-1. Don't remove index_merge part of the tree.
+A1. Don't remove index_merge part of the tree (this will take care of
+ DISCARD-IMERGE-1 problem)
-2. Push range conditions down into index_merge trees that may support them.
+A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
@@ -155,8 +157,86 @@
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
-3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
-2.2 New tree_or()
+2.2 New tree_or()
+-----------------
+O1. Dont remove non-range plans:
+ Current tree_or() code will refuse to produce index_merge plans for
+ conditions like
+
+ "t.key1part2=const OR t.key2part1=const"
+
+ (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
+ the AND condition is not usable for range access, and the operation of
+ tree_and() guaranteed that there was no way it could changed to make a
+ usable range plan. With new tree_and() and rule A2, this is no longer the
+ case. For example for this query:
+
+ (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
+
+ it will construct a
+
+ imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
+
+ then tree_and() will apply rule A2 to push the range down into index merge
+ and after that we'll have:
+
+ range(t.key1part1=const)
+ imerge(
+ t.key1part2=const AND t.key1part1=const,
+ t.key2part1=const
+ )
+ note that imerge(...) describes a usable index_merge plan and it's possible
+ that it will be the best access path.
+
+O2. "Create index_merge accesses when possible"
+ Current tree_or() will not create index_merge access when it could create
+ non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
+ in the current implementation" section). This will be changed to work as
+ follows: we will create index_merge made for index scans that didn't have
+ their match in the other sel_tree.
+ Ilustrating it with an example:
+
+ | sel_tree_A | sel_tree_B | A or B | include in index_merge?
+ ------+------------+------------+--------+------------------------
+ key1 | cond1 | cond2 | condM | no
+ key2 | cond3 | cond4 | NULL | no
+ key3 | cond5 | | | yes, A-side
+ key4 | cond6 | | | yes, A-side
+ key5 | | cond7 | | yes, B-side
+ key6 | | cond8 | | yes, B-side
+
+ here we assume that
+ - (cond1 OR cond2) did produce a combined range. Not including them in
+ index_merge.
+ - (cond3 OR cond4) didn't produce a usable range (e.g. they were
+ t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
+ didn't yield any range list)
+ - All other scand didn't have their counterparts, so we'll end up with a
+ SEL_TREE of:
+
+ range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
+ .
+
+O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
+that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
+seen any complaints that could be attributed to it.
+If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
+lift it ,and produce a cross-product:
+
+ ((key1p OR key2p) AND (key3p OR key4p))
+ OR
+ ((key5p OR key6p) AND (key7p OR key8p))
+
+ = (key1p OR key2p OR key5p OR key6p) AND // this part is currently
+ (key3p OR key4p OR key5p OR key6p) AND // produced
+
+ (key1p OR key2p OR key5p OR key6p) AND // this part will be added
+ (key3p OR key4p OR key5p OR key6p) //.
+
+In order to limit the impact of this combinatorial explosion, we will
+introduce a rule that we won't generate more than #defined
+MAX_IMERGE_OPTS options.
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=24&nolimit=1
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
3. Testing and required coverage
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
A1. Don't remove index_merge part of the tree (this will take care of
DISCARD-IMERGE-1 problem)
A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
-----------------
O1. Dont remove non-range plans:
Current tree_or() code will refuse to produce index_merge plans for
conditions like
"t.key1part2=const OR t.key2part1=const"
(this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
the AND condition is not usable for range access, and the operation of
tree_and() guaranteed that there was no way it could changed to make a
usable range plan. With new tree_and() and rule A2, this is no longer the
case. For example for this query:
(t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
it will construct a
imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
then tree_and() will apply rule A2 to push the range down into index merge
and after that we'll have:
range(t.key1part1=const)
imerge(
t.key1part2=const AND t.key1part1=const,
t.key2part1=const
)
note that imerge(...) describes a usable index_merge plan and it's possible
that it will be the best access path.
O2. "Create index_merge accesses when possible"
Current tree_or() will not create index_merge access when it could create
non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
in the current implementation" section). This will be changed to work as
follows: we will create index_merge made for index scans that didn't have
their match in the other sel_tree.
Ilustrating it with an example:
| sel_tree_A | sel_tree_B | A or B | include in index_merge?
------+------------+------------+--------+------------------------
key1 | cond1 | cond2 | condM | no
key2 | cond3 | cond4 | NULL | no
key3 | cond5 | | | yes, A-side
key4 | cond6 | | | yes, A-side
key5 | | cond7 | | yes, B-side
key6 | | cond8 | | yes, B-side
here we assume that
- (cond1 OR cond2) did produce a combined range. Not including them in
index_merge.
- (cond3 OR cond4) didn't produce a usable range (e.g. they were
t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
didn't yield any range list)
- All other scand didn't have their counterparts, so we'll end up with a
SEL_TREE of:
range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
.
O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
seen any complaints that could be attributed to it.
If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
lift it ,and produce a cross-product:
((key1p OR key2p) AND (key3p OR key4p))
OR
((key5p OR key6p) AND (key7p OR key8p))
= (key1p OR key2p OR key5p OR key6p) AND // this part is currently
(key3p OR key4p OR key5p OR key6p) AND // produced
(key1p OR key2p OR key5p OR key6p) AND // this part will be added
(key3p OR key4p OR key5p OR key6p) //.
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
3. Testing and required coverage
================================
So far could find the following user cases:
* BUG#17259: Query optimizer chooses wrong index
* BUG#17673: Optimizer does not use Index Merge optimization in some cases
* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
* BUG#30151: optimizer is very reluctant to chose index_merge algorithm
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 20 Jun '09
by worklog-noreply@askmonty.org 20 Jun '09
20 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300
+++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300
@@ -4,6 +4,7 @@
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
+3. Testing and required coverage
</contents>
1. Current implementation overview
@@ -240,3 +241,14 @@
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
+
+3. Testing and required coverage
+================================
+So far could find the following user cases:
+
+* BUG#17259: Query optimizer chooses wrong index
+* BUG#17673: Optimizer does not use Index Merge optimization in some cases
+* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
+* BUG#30151: optimizer is very reluctant to chose index_merge algorithm
+
+
-=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300
+++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300
@@ -141,13 +141,15 @@
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
+
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
-1. Don't remove index_merge part of the tree.
+A1. Don't remove index_merge part of the tree (this will take care of
+ DISCARD-IMERGE-1 problem)
-2. Push range conditions down into index_merge trees that may support them.
+A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
@@ -155,8 +157,86 @@
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
-3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
-2.2 New tree_or()
+2.2 New tree_or()
+-----------------
+O1. Dont remove non-range plans:
+ Current tree_or() code will refuse to produce index_merge plans for
+ conditions like
+
+ "t.key1part2=const OR t.key2part1=const"
+
+ (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
+ the AND condition is not usable for range access, and the operation of
+ tree_and() guaranteed that there was no way it could changed to make a
+ usable range plan. With new tree_and() and rule A2, this is no longer the
+ case. For example for this query:
+
+ (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
+
+ it will construct a
+
+ imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
+
+ then tree_and() will apply rule A2 to push the range down into index merge
+ and after that we'll have:
+
+ range(t.key1part1=const)
+ imerge(
+ t.key1part2=const AND t.key1part1=const,
+ t.key2part1=const
+ )
+ note that imerge(...) describes a usable index_merge plan and it's possible
+ that it will be the best access path.
+
+O2. "Create index_merge accesses when possible"
+ Current tree_or() will not create index_merge access when it could create
+ non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
+ in the current implementation" section). This will be changed to work as
+ follows: we will create index_merge made for index scans that didn't have
+ their match in the other sel_tree.
+ Ilustrating it with an example:
+
+ | sel_tree_A | sel_tree_B | A or B | include in index_merge?
+ ------+------------+------------+--------+------------------------
+ key1 | cond1 | cond2 | condM | no
+ key2 | cond3 | cond4 | NULL | no
+ key3 | cond5 | | | yes, A-side
+ key4 | cond6 | | | yes, A-side
+ key5 | | cond7 | | yes, B-side
+ key6 | | cond8 | | yes, B-side
+
+ here we assume that
+ - (cond1 OR cond2) did produce a combined range. Not including them in
+ index_merge.
+ - (cond3 OR cond4) didn't produce a usable range (e.g. they were
+ t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
+ didn't yield any range list)
+ - All other scand didn't have their counterparts, so we'll end up with a
+ SEL_TREE of:
+
+ range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
+ .
+
+O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
+that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
+seen any complaints that could be attributed to it.
+If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
+lift it ,and produce a cross-product:
+
+ ((key1p OR key2p) AND (key3p OR key4p))
+ OR
+ ((key5p OR key6p) AND (key7p OR key8p))
+
+ = (key1p OR key2p OR key5p OR key6p) AND // this part is currently
+ (key3p OR key4p OR key5p OR key6p) AND // produced
+
+ (key1p OR key2p OR key5p OR key6p) AND // this part will be added
+ (key3p OR key4p OR key5p OR key6p) //.
+
+In order to limit the impact of this combinatorial explosion, we will
+introduce a rule that we won't generate more than #defined
+MAX_IMERGE_OPTS options.
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=24&nolimit=1
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
3. Testing and required coverage
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
A1. Don't remove index_merge part of the tree (this will take care of
DISCARD-IMERGE-1 problem)
A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
-----------------
O1. Dont remove non-range plans:
Current tree_or() code will refuse to produce index_merge plans for
conditions like
"t.key1part2=const OR t.key2part1=const"
(this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
the AND condition is not usable for range access, and the operation of
tree_and() guaranteed that there was no way it could changed to make a
usable range plan. With new tree_and() and rule A2, this is no longer the
case. For example for this query:
(t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
it will construct a
imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
then tree_and() will apply rule A2 to push the range down into index merge
and after that we'll have:
range(t.key1part1=const)
imerge(
t.key1part2=const AND t.key1part1=const,
t.key2part1=const
)
note that imerge(...) describes a usable index_merge plan and it's possible
that it will be the best access path.
O2. "Create index_merge accesses when possible"
Current tree_or() will not create index_merge access when it could create
non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
in the current implementation" section). This will be changed to work as
follows: we will create index_merge made for index scans that didn't have
their match in the other sel_tree.
Ilustrating it with an example:
| sel_tree_A | sel_tree_B | A or B | include in index_merge?
------+------------+------------+--------+------------------------
key1 | cond1 | cond2 | condM | no
key2 | cond3 | cond4 | NULL | no
key3 | cond5 | | | yes, A-side
key4 | cond6 | | | yes, A-side
key5 | | cond7 | | yes, B-side
key6 | | cond8 | | yes, B-side
here we assume that
- (cond1 OR cond2) did produce a combined range. Not including them in
index_merge.
- (cond3 OR cond4) didn't produce a usable range (e.g. they were
t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
didn't yield any range list)
- All other scand didn't have their counterparts, so we'll end up with a
SEL_TREE of:
range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
.
O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
seen any complaints that could be attributed to it.
If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
lift it ,and produce a cross-product:
((key1p OR key2p) AND (key3p OR key4p))
OR
((key5p OR key6p) AND (key7p OR key8p))
= (key1p OR key2p OR key5p OR key6p) AND // this part is currently
(key3p OR key4p OR key5p OR key6p) AND // produced
(key1p OR key2p OR key5p OR key6p) AND // this part will be added
(key3p OR key4p OR key5p OR key6p) //.
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
3. Testing and required coverage
================================
So far could find the following user cases:
* BUG#17259: Query optimizer chooses wrong index
* BUG#17673: Optimizer does not use Index Merge optimization in some cases
* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge
* BUG#30151: optimizer is very reluctant to chose index_merge algorithm
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2702)
by knielsen@knielsen-hq.org 19 Jun '09
by knielsen@knielsen-hq.org 19 Jun '09
19 Jun '09
#At lp:maria
2702 knielsen(a)knielsen-hq.org 2009-06-09
XtraDB after-merge fixes.
Fixes to get the test suite to run without failures.
removed:
storage/xtradb/setup.sh
modified:
mysql-test/r/information_schema.result
mysql-test/r/information_schema_all_engines.result
mysql-test/r/innodb-autoinc.result
mysql-test/r/innodb-index.result
mysql-test/r/innodb-zip.result
mysql-test/r/innodb.result
mysql-test/r/innodb_bug36169.result
mysql-test/r/innodb_xtradb_bug317074.result
mysql-test/r/row-checksum-old.result
mysql-test/r/row-checksum.result
mysql-test/t/information_schema.test
mysql-test/t/innodb-analyze.test
mysql-test/t/innodb-autoinc.test
mysql-test/t/innodb-index.test
mysql-test/t/innodb-zip.test
mysql-test/t/innodb.test
mysql-test/t/innodb_bug34300.test
mysql-test/t/innodb_bug36169.test
mysql-test/t/innodb_bug36172.test
mysql-test/t/innodb_xtradb_bug317074.test
mysql-test/t/partition_innodb.test
mysys/thr_mutex.c
storage/xtradb/ibuf/ibuf0ibuf.c
storage/xtradb/include/sync0rw.h
storage/xtradb/include/sync0rw.ic
storage/xtradb/include/univ.i
storage/xtradb/srv/srv0start.c
per-file messages:
mysql-test/r/information_schema.result
Additional variables available now.
Sort output to avoid depending on engine order.
mysql-test/r/information_schema_all_engines.result
More variables now.
mysql-test/r/innodb-autoinc.result
Avoid picking up pbxt variables in result
mysql-test/r/innodb-index.result
Save state to not corrupt following testcases.
Suppress an expected warning.
mysql-test/r/innodb-zip.result
Work around a problem with dependency on zlib version
mysql-test/r/innodb.result
Checksums have changed in Maria.
Save and restore server state to not corrupt following testcases.
mysql-test/r/innodb_bug36169.result
Save and restore server state to not corrupt following testcases.
mysql-test/r/innodb_xtradb_bug317074.result
Save and restore server state to not corrupt following testcases.
mysql-test/r/row-checksum-old.result
Update result file
mysql-test/r/row-checksum.result
Update result file
mysql-test/t/information_schema.test
Sort output to avoid depending on engine order.
mysql-test/t/innodb-analyze.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/innodb-autoinc.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/innodb-index.test
Save state to not corrupt following testcases.
Suppress an expected warning.
mysql-test/t/innodb-zip.test
Work around a problem with dependency on zlib version
mysql-test/t/innodb.test
Save and restore server state to not corrupt following testcases.
Update --replace statements for new mysql-test-run
mysql-test/t/innodb_bug34300.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/innodb_bug36169.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/innodb_bug36172.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/innodb_xtradb_bug317074.test
Save and restore server state to not corrupt following testcases.
mysql-test/t/partition_innodb.test
Fix regexps to work with new SHOW INNODB STATUS output.
mysys/thr_mutex.c
Initialize mutex deadlock detection lazily.
This allows to test XtraDB, which initializes huge amounts of mutexes without using any but a few of them.
storage/xtradb/ibuf/ibuf0ibuf.c
Fix problem where value of INNODB_IBUF_MAX_SIZE would depend on the alignment of memory
allocated by the buffer pool.
storage/xtradb/include/sync0rw.h
Fix XtraDB to compile without GCC atomic operation intrinsics (performance may suffer
when they are not available though).
storage/xtradb/include/sync0rw.ic
Fix XtraDB to compile without GCC atomic operation intrinsics (performance may suffer
when they are not available though).
storage/xtradb/include/univ.i
Fix for MariaDB
storage/xtradb/setup.sh
Remove no longer needed file from XtraDB.
storage/xtradb/srv/srv0start.c
Fix for MariaDB
=== modified file 'mysql-test/r/information_schema.result'
--- a/mysql-test/r/information_schema.result 2009-04-08 16:55:26 +0000
+++ b/mysql-test/r/information_schema.result 2009-06-09 15:08:46 +0000
@@ -42,7 +42,7 @@ WHERE table_schema IN ('mysql', 'INFORMA
table_name<>'ndb_binlog_index' AND
table_name<>'ndb_apply_status' AND
NOT (table_schema = 'INFORMATION_SCHEMA' AND table_name LIKE 'PBXT_%');
-select * from v1;
+select * from v1 ORDER BY c COLLATE utf8_bin;
c
CHARACTER_SETS
COLLATIONS
@@ -54,6 +54,17 @@ EVENTS
FILES
GLOBAL_STATUS
GLOBAL_VARIABLES
+INNODB_BUFFER_POOL_PAGES
+INNODB_BUFFER_POOL_PAGES_BLOB
+INNODB_BUFFER_POOL_PAGES_INDEX
+INNODB_CMP
+INNODB_CMPMEM
+INNODB_CMPMEM_RESET
+INNODB_CMP_RESET
+INNODB_LOCKS
+INNODB_LOCK_WAITS
+INNODB_RSEG
+INNODB_TRX
KEY_COLUMN_USAGE
PARTITIONS
PLUGINS
@@ -72,6 +83,7 @@ TABLE_PRIVILEGES
TRIGGERS
USER_PRIVILEGES
VIEWS
+XTRADB_ENHANCEMENTS
columns_priv
db
event
@@ -87,6 +99,11 @@ proc
procs_priv
servers
slow_log
+t1
+t2
+t3
+t4
+t5
tables_priv
time_zone
time_zone_leap_second
@@ -94,11 +111,6 @@ time_zone_name
time_zone_transition
time_zone_transition_type
user
-t1
-t4
-t2
-t3
-t5
v1
select c,table_name from v1
inner join information_schema.TABLES v2 on (v1.c=v2.table_name)
@@ -800,6 +812,8 @@ TABLES CREATE_TIME datetime
TABLES UPDATE_TIME datetime
TABLES CHECK_TIME datetime
TRIGGERS CREATED datetime
+INNODB_TRX trx_started datetime
+INNODB_TRX trx_wait_started datetime
event execute_at datetime
event last_executed datetime
event starts datetime
@@ -848,6 +862,7 @@ TABLES TABLE_NAME select
TABLE_CONSTRAINTS TABLE_NAME select
TABLE_PRIVILEGES TABLE_NAME select
VIEWS TABLE_NAME select
+INNODB_BUFFER_POOL_PAGES_INDEX table_name select
delete from mysql.user where user='mysqltest_4';
delete from mysql.db where user='mysqltest_4';
flush privileges;
@@ -1223,12 +1238,12 @@ DROP PROCEDURE p1;
DROP USER mysql_bug20230@localhost;
SELECT MAX(table_name) FROM information_schema.tables WHERE table_schema IN ('mysql', 'INFORMATION_SCHEMA', 'test');
MAX(table_name)
-VIEWS
+XTRADB_ENHANCEMENTS
SELECT table_name from information_schema.tables
WHERE table_name=(SELECT MAX(table_name)
FROM information_schema.tables WHERE table_schema IN ('mysql', 'INFORMATION_SCHEMA', 'test'));
table_name
-VIEWS
+XTRADB_ENHANCEMENTS
DROP TABLE IF EXISTS bug23037;
DROP FUNCTION IF EXISTS get_value;
SELECT COLUMN_NAME, MD5(COLUMN_DEFAULT), LENGTH(COLUMN_DEFAULT) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='bug23037';
=== modified file 'mysql-test/r/information_schema_all_engines.result'
--- a/mysql-test/r/information_schema_all_engines.result 2009-04-08 16:55:26 +0000
+++ b/mysql-test/r/information_schema_all_engines.result 2009-06-09 15:08:46 +0000
@@ -29,6 +29,18 @@ TABLE_PRIVILEGES
TRIGGERS
USER_PRIVILEGES
VIEWS
+INNODB_BUFFER_POOL_PAGES_INDEX
+INNODB_RSEG
+INNODB_LOCKS
+INNODB_BUFFER_POOL_PAGES
+XTRADB_ENHANCEMENTS
+INNODB_TRX
+INNODB_BUFFER_POOL_PAGES_BLOB
+INNODB_LOCK_WAITS
+INNODB_CMP_RESET
+INNODB_CMP
+INNODB_CMPMEM_RESET
+INNODB_CMPMEM
PBXT_STATISTICS
SELECT t.table_name, c1.column_name
FROM information_schema.tables t
@@ -73,6 +85,18 @@ TABLE_PRIVILEGES TABLE_SCHEMA
TRIGGERS TRIGGER_SCHEMA
USER_PRIVILEGES GRANTEE
VIEWS TABLE_SCHEMA
+INNODB_BUFFER_POOL_PAGES_INDEX schema_name
+INNODB_RSEG rseg_id
+INNODB_LOCKS lock_id
+INNODB_BUFFER_POOL_PAGES page_type
+XTRADB_ENHANCEMENTS name
+INNODB_TRX trx_id
+INNODB_BUFFER_POOL_PAGES_BLOB space_id
+INNODB_LOCK_WAITS requesting_trx_id
+INNODB_CMP_RESET page_size
+INNODB_CMP page_size
+INNODB_CMPMEM_RESET page_size
+INNODB_CMPMEM page_size
PBXT_STATISTICS ID
SELECT t.table_name, c1.column_name
FROM information_schema.tables t
@@ -117,6 +141,18 @@ TABLE_PRIVILEGES TABLE_SCHEMA
TRIGGERS TRIGGER_SCHEMA
USER_PRIVILEGES GRANTEE
VIEWS TABLE_SCHEMA
+INNODB_BUFFER_POOL_PAGES_INDEX schema_name
+INNODB_RSEG rseg_id
+INNODB_LOCKS lock_id
+INNODB_BUFFER_POOL_PAGES page_type
+XTRADB_ENHANCEMENTS name
+INNODB_TRX trx_id
+INNODB_BUFFER_POOL_PAGES_BLOB space_id
+INNODB_LOCK_WAITS requesting_trx_id
+INNODB_CMP_RESET page_size
+INNODB_CMP page_size
+INNODB_CMPMEM_RESET page_size
+INNODB_CMPMEM page_size
PBXT_STATISTICS ID
select 1 as f1 from information_schema.tables where "CHARACTER_SETS"=
(select cast(table_name as char) from information_schema.tables
@@ -149,6 +185,17 @@ EVENTS information_schema.EVENTS 1
FILES information_schema.FILES 1
GLOBAL_STATUS information_schema.GLOBAL_STATUS 1
GLOBAL_VARIABLES information_schema.GLOBAL_VARIABLES 1
+INNODB_BUFFER_POOL_PAGES information_schema.INNODB_BUFFER_POOL_PAGES 1
+INNODB_BUFFER_POOL_PAGES_BLOB information_schema.INNODB_BUFFER_POOL_PAGES_BLOB 1
+INNODB_BUFFER_POOL_PAGES_INDEX information_schema.INNODB_BUFFER_POOL_PAGES_INDEX 1
+INNODB_CMP information_schema.INNODB_CMP 1
+INNODB_CMPMEM information_schema.INNODB_CMPMEM 1
+INNODB_CMPMEM_RESET information_schema.INNODB_CMPMEM_RESET 1
+INNODB_CMP_RESET information_schema.INNODB_CMP_RESET 1
+INNODB_LOCKS information_schema.INNODB_LOCKS 1
+INNODB_LOCK_WAITS information_schema.INNODB_LOCK_WAITS 1
+INNODB_RSEG information_schema.INNODB_RSEG 1
+INNODB_TRX information_schema.INNODB_TRX 1
KEY_COLUMN_USAGE information_schema.KEY_COLUMN_USAGE 1
PARTITIONS information_schema.PARTITIONS 1
PBXT_STATISTICS information_schema.PBXT_STATISTICS 1
@@ -168,6 +215,7 @@ TABLE_PRIVILEGES information_schema.TABL
TRIGGERS information_schema.TRIGGERS 1
USER_PRIVILEGES information_schema.USER_PRIVILEGES 1
VIEWS information_schema.VIEWS 1
+XTRADB_ENHANCEMENTS information_schema.XTRADB_ENHANCEMENTS 1
Database: information_schema
+---------------------------------------+
| Tables |
@@ -200,6 +248,18 @@ Database: information_schema
| TRIGGERS |
| USER_PRIVILEGES |
| VIEWS |
+| INNODB_BUFFER_POOL_PAGES_INDEX |
+| INNODB_RSEG |
+| INNODB_LOCKS |
+| INNODB_BUFFER_POOL_PAGES |
+| XTRADB_ENHANCEMENTS |
+| INNODB_TRX |
+| INNODB_BUFFER_POOL_PAGES_BLOB |
+| INNODB_LOCK_WAITS |
+| INNODB_CMP_RESET |
+| INNODB_CMP |
+| INNODB_CMPMEM_RESET |
+| INNODB_CMPMEM |
| PBXT_STATISTICS |
+---------------------------------------+
Database: INFORMATION_SCHEMA
@@ -234,6 +294,18 @@ Database: INFORMATION_SCHEMA
| TRIGGERS |
| USER_PRIVILEGES |
| VIEWS |
+| INNODB_BUFFER_POOL_PAGES_INDEX |
+| INNODB_RSEG |
+| INNODB_LOCKS |
+| INNODB_BUFFER_POOL_PAGES |
+| XTRADB_ENHANCEMENTS |
+| INNODB_TRX |
+| INNODB_BUFFER_POOL_PAGES_BLOB |
+| INNODB_LOCK_WAITS |
+| INNODB_CMP_RESET |
+| INNODB_CMP |
+| INNODB_CMPMEM_RESET |
+| INNODB_CMPMEM |
| PBXT_STATISTICS |
+---------------------------------------+
Wildcard: inf_rmation_schema
@@ -244,5 +316,5 @@ Wildcard: inf_rmation_schema
+--------------------+
SELECT table_schema, count(*) FROM information_schema.TABLES WHERE table_schema IN ('mysql', 'INFORMATION_SCHEMA', 'test', 'mysqltest') AND table_name<>'ndb_binlog_index' AND table_name<>'ndb_apply_status' GROUP BY TABLE_SCHEMA;
table_schema count(*)
-information_schema 29
+information_schema 41
mysql 22
=== modified file 'mysql-test/r/innodb-autoinc.result'
--- a/mysql-test/r/innodb-autoinc.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb-autoinc.result 2009-06-09 15:08:46 +0000
@@ -197,7 +197,7 @@ c1 c2
5 9
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 100
auto_increment_offset 10
@@ -230,7 +230,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -269,7 +269,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -282,7 +282,7 @@ SELECT * FROM t1;
c1
-1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 100
auto_increment_offset 10
@@ -315,7 +315,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -330,7 +330,7 @@ SELECT * FROM t1;
c1
1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 100
auto_increment_offset 10
@@ -370,7 +370,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -385,7 +385,7 @@ SELECT * FROM t1;
c1
1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 100
auto_increment_offset 10
@@ -419,7 +419,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -434,7 +434,7 @@ c1
1
9223372036854775794
SET @@SESSION.AUTO_INCREMENT_INCREMENT=2, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 2
auto_increment_offset 10
@@ -452,7 +452,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -467,7 +467,7 @@ c1
1
18446744073709551603
SET @@SESSION.AUTO_INCREMENT_INCREMENT=2, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 2
auto_increment_offset 10
@@ -485,7 +485,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -500,7 +500,7 @@ c1
1
18446744073709551603
SET @@SESSION.AUTO_INCREMENT_INCREMENT=5, @@SESSION.AUTO_INCREMENT_OFFSET=7;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 5
auto_increment_offset 7
@@ -514,7 +514,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -533,7 +533,7 @@ c1
-9223372036854775806
1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=3, @@SESSION.AUTO_INCREMENT_OFFSET=3;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 3
auto_increment_offset 3
@@ -550,7 +550,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
@@ -568,7 +568,7 @@ SET @@SESSION.AUTO_INCREMENT_INCREMENT=1
Warnings:
Warning 1292 Truncated incorrect auto_increment_increment value: '1152921504606846976'
Warning 1292 Truncated incorrect auto_increment_offset value: '1152921504606846976'
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 65535
auto_increment_offset 65535
@@ -581,7 +581,7 @@ c1
DROP TABLE t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
=== modified file 'mysql-test/r/innodb-index.result'
--- a/mysql-test/r/innodb-index.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb-index.result 2009-06-09 15:08:46 +0000
@@ -1,3 +1,4 @@
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
create table t1(a int not null, b int, c char(10) not null, d varchar(20)) engine = innodb;
insert into t1 values (5,5,'oo','oo'),(4,4,'tr','tr'),(3,4,'ad','ad'),(2,3,'ak','ak');
commit;
@@ -47,6 +48,7 @@ t1 CREATE TABLE `t1` (
KEY `b` (`b`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `t1#1`(a INT PRIMARY KEY) ENGINE=InnoDB;
+call mtr.add_suppression(" table `test`\\.`t1#[12]` already exists in InnoDB internal");
alter table t1 add unique index (c), add index (d);
ERROR HY000: Table 'test.t1#1' already exists
rename table `t1#1` to `t1#2`;
@@ -1132,3 +1134,4 @@ t2 CREATE TABLE `t2` (
) ENGINE=InnoDB DEFAULT CHARSET=latin1
DROP TABLE t2;
DROP TABLE t1;
+SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
=== modified file 'mysql-test/r/innodb-zip.result'
--- a/mysql-test/r/innodb-zip.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb-zip.result 2009-06-09 15:08:46 +0000
@@ -141,7 +141,7 @@ drop table t1;
CREATE TABLE t1(c TEXT, PRIMARY KEY (c(440)))
ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1 CHARSET=ASCII;
ERROR 42000: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 8126. You have to change some columns to TEXT or BLOBs
-CREATE TABLE t1(c TEXT, PRIMARY KEY (c(439)))
+CREATE TABLE t1(c TEXT, PRIMARY KEY (c(438)))
ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1 CHARSET=ASCII;
INSERT INTO t1 VALUES(REPEAT('A',512)),(REPEAT('B',512));
DROP TABLE t1;
=== modified file 'mysql-test/r/innodb.result'
--- a/mysql-test/r/innodb.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb.result 2009-06-09 15:08:46 +0000
@@ -1433,7 +1433,7 @@ insert t2 select * from t1;
insert t3 select * from t1;
checksum table t1, t2, t3, t4 quick;
Table Checksum
-test.t1 2948697075
+test.t1 3442722830
test.t2 NULL
test.t3 NULL
test.t4 NULL
@@ -1441,17 +1441,17 @@ Warnings:
Error 1146 Table 'test.t4' doesn't exist
checksum table t1, t2, t3, t4;
Table Checksum
-test.t1 2948697075
-test.t2 2948697075
-test.t3 2948697075
+test.t1 3442722830
+test.t2 3442722830
+test.t3 3442722830
test.t4 NULL
Warnings:
Error 1146 Table 'test.t4' doesn't exist
checksum table t1, t2, t3, t4 extended;
Table Checksum
-test.t1 2948697075
-test.t2 2948697075
-test.t3 2948697075
+test.t1 3442722830
+test.t2 3442722830
+test.t3 3442722830
test.t4 NULL
Warnings:
Error 1146 Table 'test.t4' doesn't exist
@@ -1781,6 +1781,7 @@ set global innodb_sync_spin_loops=20;
show variables like "innodb_sync_spin_loops";
Variable_name Value
innodb_sync_spin_loops 20
+SET @old_innodb_thread_concurrency= @@global.innodb_thread_concurrency;
show variables like "innodb_thread_concurrency";
Variable_name Value
innodb_thread_concurrency 0
@@ -1798,6 +1799,7 @@ set global innodb_thread_concurrency=16;
show variables like "innodb_thread_concurrency";
Variable_name Value
innodb_thread_concurrency 16
+SET @@global.innodb_thread_concurrency= @old_innodb_thread_concurrency;
show variables like "innodb_concurrency_tickets";
Variable_name Value
innodb_concurrency_tickets 500
=== modified file 'mysql-test/r/innodb_bug36169.result'
--- a/mysql-test/r/innodb_bug36169.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb_bug36169.result 2009-06-09 15:08:46 +0000
@@ -1,2 +1,5 @@
+SET @save_innodb_file_format=@@global.innodb_file_format;
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
=== modified file 'mysql-test/r/innodb_xtradb_bug317074.result'
--- a/mysql-test/r/innodb_xtradb_bug317074.result 2009-06-09 13:19:13 +0000
+++ b/mysql-test/r/innodb_xtradb_bug317074.result 2009-06-09 15:08:46 +0000
@@ -1,2 +1,5 @@
+SET @save_innodb_file_format=@@global.innodb_file_format;
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
=== modified file 'mysql-test/r/row-checksum-old.result'
--- a/mysql-test/r/row-checksum-old.result 2008-06-28 12:45:15 +0000
+++ b/mysql-test/r/row-checksum-old.result 2009-06-09 15:08:46 +0000
@@ -72,6 +72,8 @@ Table Checksum
test.t1 4108368782
drop table if exists t1;
create table t1 (a int null, v varchar(100)) engine=innodb checksum=0 row_format=fixed;
+Warnings:
+Warning 1478 InnoDB: assuming ROW_FORMAT=COMPACT.
insert into t1 values(null, null), (1, "hello");
checksum table t1;
Table Checksum
=== modified file 'mysql-test/r/row-checksum.result'
--- a/mysql-test/r/row-checksum.result 2008-06-28 12:45:15 +0000
+++ b/mysql-test/r/row-checksum.result 2009-06-09 15:08:46 +0000
@@ -72,6 +72,8 @@ Table Checksum
test.t1 3885665021
drop table if exists t1;
create table t1 (a int null, v varchar(100)) engine=innodb checksum=0 row_format=fixed;
+Warnings:
+Warning 1478 InnoDB: assuming ROW_FORMAT=COMPACT.
insert into t1 values(null, null), (1, "hello");
checksum table t1;
Table Checksum
=== modified file 'mysql-test/t/information_schema.test'
--- a/mysql-test/t/information_schema.test 2009-04-08 16:55:26 +0000
+++ b/mysql-test/t/information_schema.test 2009-06-09 15:08:46 +0000
@@ -43,7 +43,7 @@ create view v1 (c) as
table_name<>'ndb_binlog_index' AND
table_name<>'ndb_apply_status' AND
NOT (table_schema = 'INFORMATION_SCHEMA' AND table_name LIKE 'PBXT_%');
-select * from v1;
+select * from v1 ORDER BY c COLLATE utf8_bin;
select c,table_name from v1
inner join information_schema.TABLES v2 on (v1.c=v2.table_name)
=== modified file 'mysql-test/t/innodb-analyze.test'
--- a/mysql-test/t/innodb-analyze.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb-analyze.test 2009-06-09 15:08:46 +0000
@@ -11,6 +11,7 @@
-- disable_result_log
-- enable_warnings
+SET @save_innodb_stats_sample_pages=@@innodb_stats_sample_pages;
SET GLOBAL innodb_stats_sample_pages=0;
# check that the value has been adjusted to 1
@@ -60,4 +61,5 @@ ANALYZE TABLE innodb_analyze;
SET GLOBAL innodb_stats_sample_pages=16;
ANALYZE TABLE innodb_analyze;
+SET GLOBAL innodb_stats_sample_pages=@save_innodb_stats_sample_pages;
DROP TABLE innodb_analyze;
=== modified file 'mysql-test/t/innodb-autoinc.test'
--- a/mysql-test/t/innodb-autoinc.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb-autoinc.test 2009-06-09 15:08:46 +0000
@@ -156,7 +156,7 @@ DROP TABLE t1;
#
# Test changes to AUTOINC next value calculation
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (NULL),(5),(NULL);
@@ -173,7 +173,7 @@ DROP TABLE t1;
# Reset the AUTOINC session variables
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES(0);
@@ -193,13 +193,13 @@ DROP TABLE t1;
# Reset the AUTOINC session variables
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES(-1);
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
INSERT INTO t1 VALUES (-2), (NULL),(2),(NULL);
INSERT INTO t1 VALUES (250),(NULL);
SELECT * FROM t1;
@@ -214,13 +214,13 @@ DROP TABLE t1;
# Reset the AUTOINC session variables
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT UNSIGNED AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES(-1);
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
INSERT INTO t1 VALUES (-2);
INSERT INTO t1 VALUES (NULL);
INSERT INTO t1 VALUES (2);
@@ -240,13 +240,13 @@ DROP TABLE t1;
# Reset the AUTOINC session variables
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT UNSIGNED AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES(-1);
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=100, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
INSERT INTO t1 VALUES (-2),(NULL),(2),(NULL);
INSERT INTO t1 VALUES (250),(NULL);
SELECT * FROM t1;
@@ -262,7 +262,7 @@ DROP TABLE t1;
# Check for overflow handling when increment is > 1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 BIGINT AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
# TODO: Fix the autoinc init code
@@ -271,7 +271,7 @@ INSERT INTO t1 VALUES(NULL);
INSERT INTO t1 VALUES (9223372036854775794); #-- 2^63 - 14
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=2, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
# This should just fit
INSERT INTO t1 VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL);
SELECT * FROM t1;
@@ -281,7 +281,7 @@ DROP TABLE t1;
# Check for overflow handling when increment and offser are > 1
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 BIGINT UNSIGNED AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
# TODO: Fix the autoinc init code
@@ -290,7 +290,7 @@ INSERT INTO t1 VALUES(NULL);
INSERT INTO t1 VALUES (18446744073709551603); #-- 2^64 - 13
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=2, @@SESSION.AUTO_INCREMENT_OFFSET=10;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
# This should fail because of overflow but it doesn't, it seems to be
# a MySQL server bug. It wraps around to 0 for the last value.
# See MySQL Bug# 39828
@@ -313,7 +313,7 @@ DROP TABLE t1;
# Check for overflow handling when increment and offset are odd numbers
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 BIGINT UNSIGNED AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
# TODO: Fix the autoinc init code
@@ -322,7 +322,7 @@ INSERT INTO t1 VALUES(NULL);
INSERT INTO t1 VALUES (18446744073709551603); #-- 2^64 - 13
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=5, @@SESSION.AUTO_INCREMENT_OFFSET=7;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
# This should fail because of overflow but it doesn't. It fails with
# a duplicate entry message because of a MySQL server bug, it wraps
# around. See MySQL Bug# 39828, once MySQL fix the bug we can replace
@@ -344,7 +344,7 @@ DROP TABLE t1;
# and check for large -ve numbers
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 BIGINT AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
# TODO: Fix the autoinc init code
@@ -355,7 +355,7 @@ INSERT INTO t1 VALUES(-92233720368547758
INSERT INTO t1 VALUES(-9223372036854775808); #-- -2^63
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=3, @@SESSION.AUTO_INCREMENT_OFFSET=3;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
INSERT INTO t1 VALUES (NULL),(NULL), (NULL);
SELECT * FROM t1;
DROP TABLE t1;
@@ -364,7 +364,7 @@ DROP TABLE t1;
# large numbers 2^60
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 BIGINT UNSIGNED AUTO_INCREMENT, PRIMARY KEY(c1)) ENGINE=InnoDB;
# TODO: Fix the autoinc init code
@@ -373,7 +373,7 @@ INSERT INTO t1 VALUES(NULL);
INSERT INTO t1 VALUES (18446744073709551610); #-- 2^64 - 2
SELECT * FROM t1;
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1152921504606846976, @@SESSION.AUTO_INCREMENT_OFFSET=1152921504606846976;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
# This should fail because of overflow but it doesn't. It wraps around
# and the autoinc values look bogus too.
# See MySQL Bug# 39828, once MySQL fix the bug we can enable the error
@@ -396,7 +396,7 @@ DROP TABLE t1;
#
SET @@SESSION.AUTO_INCREMENT_INCREMENT=1, @@SESSION.AUTO_INCREMENT_OFFSET=1;
SET @@INSERT_ID=1;
-SHOW VARIABLES LIKE "%auto_inc%";
+SHOW VARIABLES LIKE "auto_inc%";
CREATE TABLE t1 (c1 DOUBLE NOT NULL AUTO_INCREMENT, c2 INT, PRIMARY KEY (c1)) ENGINE=InnoDB;
INSERT INTO t1 VALUES(NULL, 1);
INSERT INTO t1 VALUES(NULL, 2);
=== modified file 'mysql-test/t/innodb-index.test'
--- a/mysql-test/t/innodb-index.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb-index.test 2009-06-09 15:08:46 +0000
@@ -1,5 +1,7 @@
-- source include/have_innodb.inc
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+
create table t1(a int not null, b int, c char(10) not null, d varchar(20)) engine = innodb;
insert into t1 values (5,5,'oo','oo'),(4,4,'tr','tr'),(3,4,'ad','ad'),(2,3,'ak','ak');
commit;
@@ -20,6 +22,8 @@ show create table t1;
# Check how existing tables interfere with temporary tables.
CREATE TABLE `t1#1`(a INT PRIMARY KEY) ENGINE=InnoDB;
+call mtr.add_suppression(" table `test`\\.`t1#[12]` already exists in InnoDB internal");
+
--error 156
alter table t1 add unique index (c), add index (d);
rename table `t1#1` to `t1#2`;
@@ -509,3 +513,4 @@ SHOW CREATE TABLE t2;
DROP TABLE t2;
DROP TABLE t1;
+SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
=== modified file 'mysql-test/t/innodb-zip.test'
--- a/mysql-test/t/innodb-zip.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb-zip.test 2009-06-09 15:08:46 +0000
@@ -105,7 +105,11 @@ drop table t1;
--error ER_TOO_BIG_ROWSIZE
CREATE TABLE t1(c TEXT, PRIMARY KEY (c(440)))
ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1 CHARSET=ASCII;
-CREATE TABLE t1(c TEXT, PRIMARY KEY (c(439)))
+# The maximum key size for a compressed row actually depends on the
+# version of libz used, as account must be taken for the maximum
+# compressed size of a key, and this differs between libz
+# versions. Some libz versions allow a size of 439, some only 438.
+CREATE TABLE t1(c TEXT, PRIMARY KEY (c(438)))
ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1 CHARSET=ASCII;
INSERT INTO t1 VALUES(REPEAT('A',512)),(REPEAT('B',512));
DROP TABLE t1;
=== modified file 'mysql-test/t/innodb.test'
--- a/mysql-test/t/innodb.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb.test 2009-06-09 15:08:46 +0000
@@ -1163,7 +1163,7 @@ drop table t2;
# Test error handling
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
--error ER_WRONG_FK_DEF
create table t2 (id int(11) not null, id2 int(11) not null, constraint t1_id_fk foreign key (id2,id) references t1 (id)) engine = innodb;
@@ -1318,6 +1318,7 @@ set global innodb_sync_spin_loops=20;
show variables like "innodb_sync_spin_loops";
# Test for innodb_thread_concurrency variable
+SET @old_innodb_thread_concurrency= @@global.innodb_thread_concurrency;
show variables like "innodb_thread_concurrency";
set global innodb_thread_concurrency=1001;
show variables like "innodb_thread_concurrency";
@@ -1325,6 +1326,7 @@ set global innodb_thread_concurrency=0;
show variables like "innodb_thread_concurrency";
set global innodb_thread_concurrency=16;
show variables like "innodb_thread_concurrency";
+SET @@global.innodb_thread_concurrency= @old_innodb_thread_concurrency;
# Test for innodb_concurrency_tickets variable
show variables like "innodb_concurrency_tickets";
@@ -1357,7 +1359,7 @@ source include/varchar.inc;
#
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
create table t1 (v varchar(65530), key(v));
drop table t1;
create table t1 (v varchar(65536));
@@ -1632,7 +1634,7 @@ disconnect b;
set foreign_key_checks=0;
create table t2 (a int primary key, b int, foreign key (b) references t1(a)) engine = innodb;
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
-- error 1005
create table t1(a char(10) primary key, b varchar(20)) engine = innodb;
set foreign_key_checks=1;
@@ -1644,7 +1646,7 @@ drop table t2;
set foreign_key_checks=0;
create table t1(a varchar(10) primary key) engine = innodb DEFAULT CHARSET=latin1;
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
-- error 1005
create table t2 (a varchar(10), foreign key (a) references t1(a)) engine = innodb DEFAULT CHARSET=utf8;
set foreign_key_checks=1;
@@ -1675,7 +1677,7 @@ set foreign_key_checks=0;
create table t2 (a varchar(10), foreign key (a) references t1(a)) engine = innodb DEFAULT CHARSET=latin1;
create table t3(a varchar(10) primary key) engine = innodb DEFAULT CHARSET=utf8;
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
-- error 1025
rename table t3 to t1;
set foreign_key_checks=1;
@@ -2315,7 +2317,7 @@ ALTER TABLE t2 ADD FOREIGN KEY (a) REFER
# mysqltest first does replace_regex, then replace_result
--replace_regex /'[^']*test\/#sql-[0-9a-f_]*'/'#sql-temporary'/
# Embedded server doesn't chdir to data directory
---replace_result $MYSQLTEST_VARDIR . master-data/ ''
+--replace_result $MYSQLTEST_VARDIR . mysqld.1/data/ ''
--error 1025
ALTER TABLE t2 MODIFY a INT NOT NULL;
DELETE FROM t1;
=== modified file 'mysql-test/t/innodb_bug34300.test'
--- a/mysql-test/t/innodb_bug34300.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb_bug34300.test 2009-06-09 15:08:46 +0000
@@ -9,6 +9,7 @@
-- disable_result_log
# set packet size and reconnect
+SET @save_max_allowed_packet=@@global.max_allowed_packet;
SET @@global.max_allowed_packet=16777216;
--connect (newconn, localhost, root,,)
@@ -30,3 +31,6 @@ ALTER TABLE bug34300 ADD COLUMN (f10 INT
SELECT f4, f8 FROM bug34300;
DROP TABLE bug34300;
+disconnect newconn;
+connection default;
+SET @@global.max_allowed_packet=@save_max_allowed_packet;
=== modified file 'mysql-test/t/innodb_bug36169.test'
--- a/mysql-test/t/innodb_bug36169.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb_bug36169.test 2009-06-09 15:08:46 +0000
@@ -5,6 +5,9 @@
-- source include/have_innodb.inc
+SET @save_innodb_file_format=@@global.innodb_file_format;
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
@@ -1145,6 +1148,10 @@ KEY `idx44` (`col176`(100),`col42`,`col7
KEY `idx45` (`col2`(27),`col27`(116))
)engine=innodb ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
+SET GLOBAL innodb_file_format=@save_innodb_file_format;
+SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
+SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
+
DROP TABLE IF EXISTS table0;
DROP TABLE IF EXISTS table1;
DROP TABLE IF EXISTS table2;
=== modified file 'mysql-test/t/innodb_bug36172.test'
--- a/mysql-test/t/innodb_bug36172.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb_bug36172.test 2009-06-09 15:08:46 +0000
@@ -14,6 +14,9 @@ SET storage_engine=InnoDB;
-- disable_query_log
-- disable_result_log
+SET @save_innodb_file_format=@@global.innodb_file_format;
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=on;
@@ -23,4 +26,8 @@ insert ignore into `table0` set `col23`
CHECK TABLE table0 EXTENDED;
INSERT IGNORE INTO `table0` SET `col19` = '19940127002709', `col20` = 2383927.9055146948, `col21` = 4293243420.5621204000, `col22` = '20511211123705', `col23` = 4289899778.6573381000, `col24` = 4293449279.0540481000, `col25` = 'emphysemic', `col26` = 'dentally', `col27` = '2347406', `col28` = 'eruct', `col30` = 1222, `col31` = 4294372994.9941406000, `col32` = 4291385574.1173744000, `col33` = 'borrowing\'s', `col34` = 'septics', `col35` = 'ratter\'s', `col36` = 'Kaye', `col37` = 'Florentia', `col38` = 'allium', `col39` = 'barkeep', `col40` = '19510407003441', `col41` = 4293559200.4215522000, `col42` = 22482, `col43` = 'decussate', `col44` = 'Brom\'s', `col45` = 'violated', `col46` = 4925506.4635456400, `col47` = 930549, `col48` = '51296066', `col49` = 'voluminously', `col50` = '29306676', `col51` = -88, `col52` = -2153690, `col53` = 4290250202.1464887000, `col54` = 'expropriation', `col55` = 'Aberdeen\'s', `col56` = 20343, `col58` = '19640415171532', `col59` = 'extern', `col60` = 'Ubana', `col61` = 4290487961.8539081000, `col62` = '2147', `col63` = -24271, `col64` = '20750801194548', `col65` = 'Cunaxa\'s', `col66` = 'pasticcio', `col67` = 2795817, `col68` = 'Indore\'s', `col70` = 6864127, `col71` = '1817832', `col72` = '20540506114211', `col73` = '20040101012300', `col74` = 'rationalized', `col75` = '45522', `col76` = 'indene', `col77` = -6964559, `col78` = 4247535.5266884370, `col79` = '20720416124357', `col80` = '2143', `col81` = 4292060102.4466386000, `col82` = 'striving', `col83` = 'boneblack\'s', `col84` = 'redolent', `col85` = 6489697.9009369183, `col86` = 4287473465.9731131000, `col87` = 7726015, `col88` = 'perplexed', `col89` = '17153791', `col90` = 5478587.1108127078, `col91` = 4287091404.7004304000, `col92` = 'Boulez\'s', `col93` = '2931278';
CHECK TABLE table0 EXTENDED;
+
+SET GLOBAL innodb_file_format=@save_innodb_file_format;
+SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
+SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
DROP TABLE table0;
=== modified file 'mysql-test/t/innodb_xtradb_bug317074.test'
--- a/mysql-test/t/innodb_xtradb_bug317074.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb_xtradb_bug317074.test 2009-06-09 15:08:46 +0000
@@ -1,5 +1,8 @@
-- source include/have_innodb.inc
+SET @save_innodb_file_format=@@global.innodb_file_format;
+SET @save_innodb_file_format_check=@@global.innodb_file_format_check;
+SET @save_innodb_file_per_table=@@global.innodb_file_per_table;
SET GLOBAL innodb_file_format='Barracuda';
SET GLOBAL innodb_file_per_table=ON;
@@ -35,4 +38,8 @@ DROP PROCEDURE insert_many;
# The bug is hangup at the following statement
ALTER TABLE test1 ENGINE=MyISAM;
+SET GLOBAL innodb_file_format=@save_innodb_file_format;
+SET GLOBAL innodb_file_format_check=@save_innodb_file_format_check;
+SET GLOBAL innodb_file_per_table=@save_innodb_file_per_table;
+
DROP TABLE test1;
=== modified file 'mysql-test/t/partition_innodb.test'
--- a/mysql-test/t/partition_innodb.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/partition_innodb.test 2009-06-09 15:08:46 +0000
@@ -27,14 +27,14 @@ UPDATE t1 SET DATA = data*2 WHERE id = 3
# grouping/referencing in replace_regex is very slow on long strings,
# removing all before/after the interesting row before grouping/referencing
---replace_regex /.*---TRANSACTION [0-9A-F]+, .*, OS thread id [0-9]+// /MySQL thread id [0-9]+, query id [0-9]+ .*// /.*([0-9]+ lock struct\(s\)), heap size [0-9]+, ([0-9]+ row lock\(s\)).*/\1 \2/
+--replace_regex /.*LIST OF TRANSACTIONS FOR EACH SESSION:// /MySQL thread id [0-9]+, query id [0-9]+ .*// /.*([0-9]+ lock struct\(s\)), heap size [0-9]+, ([0-9]+ row lock\(s\)).*/\1 \2/
SHOW ENGINE InnoDB STATUS;
UPDATE t1 SET data = data*2 WHERE data = 2;
# grouping/referencing in replace_regex is very slow on long strings,
# removing all before/after the interesting row before grouping/referencing
---replace_regex /.*---TRANSACTION [0-9A-F]+, .*, OS thread id [0-9]+// /MySQL thread id [0-9]+, query id [0-9]+ .*// /.*([0-9]+ lock struct\(s\)), heap size [0-9]+, ([0-9]+ row lock\(s\)).*/\1 \2/
+--replace_regex /.*LIST OF TRANSACTIONS FOR EACH SESSION:// /MySQL thread id [0-9]+, query id [0-9]+ .*// /.*([0-9]+ lock struct\(s\)), heap size [0-9]+, ([0-9]+ row lock\(s\)).*/\1 \2/
SHOW ENGINE InnoDB STATUS;
SET @@session.tx_isolation = @old_tx_isolation;
=== modified file 'mysys/thr_mutex.c'
--- a/mysys/thr_mutex.c 2009-02-19 09:01:25 +0000
+++ b/mysys/thr_mutex.c 2009-06-09 15:08:46 +0000
@@ -149,6 +149,35 @@ static inline void remove_from_active_li
mp->prev= mp->next= 0;
}
+/*
+ We initialise the hashes for deadlock detection lazily.
+ This greatly helps with performance when lots of mutexes are initiased but
+ only a few of them are actually used (eg. XtraDB).
+*/
+static int safe_mutex_lazy_init_deadlock_detection(safe_mutex_t *mp)
+{
+ if (!my_multi_malloc(MY_FAE | MY_WME,
+ &mp->locked_mutex, sizeof(*mp->locked_mutex),
+ &mp->used_mutex, sizeof(*mp->used_mutex), NullS))
+ {
+ return 1; /* Error */
+ }
+
+ pthread_mutex_lock(&THR_LOCK_mutex);
+ mp->id= ++safe_mutex_id;
+ pthread_mutex_unlock(&THR_LOCK_mutex);
+ hash_init(mp->locked_mutex, &my_charset_bin,
+ 1000,
+ offsetof(safe_mutex_deadlock_t, id),
+ sizeof(mp->id),
+ 0, 0, HASH_UNIQUE);
+ hash_init(mp->used_mutex, &my_charset_bin,
+ 1000,
+ offsetof(safe_mutex_t, id),
+ sizeof(mp->id),
+ 0, 0, HASH_UNIQUE);
+ return 0;
+}
int safe_mutex_init(safe_mutex_t *mp,
const pthread_mutexattr_t *attr __attribute__((unused)),
@@ -167,35 +196,8 @@ int safe_mutex_init(safe_mutex_t *mp,
mp->line= line;
/* Skip the very common '&' prefix from the autogenerated name */
mp->name= name[0] == '&' ? name + 1 : name;
+ /* Deadlock detection is initialised only lazily, on first use. */
- if (safe_mutex_deadlock_detector && !( my_flags & MYF_NO_DEADLOCK_DETECTION))
- {
- if (!my_multi_malloc(MY_FAE | MY_WME,
- &mp->locked_mutex, sizeof(*mp->locked_mutex),
- &mp->used_mutex, sizeof(*mp->used_mutex), NullS))
- {
- /* Disable deadlock handling for this mutex */
- my_flags|= MYF_NO_DEADLOCK_DETECTION;
- }
- else
- {
- pthread_mutex_lock(&THR_LOCK_mutex);
- mp->id= ++safe_mutex_id;
- pthread_mutex_unlock(&THR_LOCK_mutex);
- hash_init(mp->locked_mutex, &my_charset_bin,
- 1000,
- offsetof(safe_mutex_deadlock_t, id),
- sizeof(mp->id),
- 0, 0, HASH_UNIQUE);
- hash_init(mp->used_mutex, &my_charset_bin,
- 1000,
- offsetof(safe_mutex_t, id),
- sizeof(mp->id),
- 0, 0, HASH_UNIQUE);
- }
- }
- else
- my_flags|= MYF_NO_DEADLOCK_DETECTION;
mp->create_flags= my_flags;
#ifdef SAFE_MUTEX_DETECT_DESTROY
@@ -310,7 +312,8 @@ int safe_mutex_lock(safe_mutex_t *mp, my
/* Deadlock detection */
mp->prev= mp->next= 0;
- if (!(mp->active_flags & (MYF_TRY_LOCK | MYF_NO_DEADLOCK_DETECTION)))
+ if (!(mp->active_flags & (MYF_TRY_LOCK | MYF_NO_DEADLOCK_DETECTION)) &&
+ (mp->used_mutex != NULL || !safe_mutex_lazy_init_deadlock_detection(mp)))
{
safe_mutex_t **mutex_in_use= my_thread_var_mutex_in_use();
@@ -643,7 +646,7 @@ int safe_mutex_destroy(safe_mutex_t *mp,
void safe_mutex_free_deadlock_data(safe_mutex_t *mp)
{
/* Free all entries that points to this one */
- if (!(mp->create_flags & MYF_NO_DEADLOCK_DETECTION))
+ if (!(mp->create_flags & MYF_NO_DEADLOCK_DETECTION) && mp->used_mutex != NULL)
{
pthread_mutex_lock(&THR_LOCK_mutex);
my_hash_iterate(mp->used_mutex,
=== modified file 'storage/xtradb/ibuf/ibuf0ibuf.c'
--- a/storage/xtradb/ibuf/ibuf0ibuf.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/ibuf/ibuf0ibuf.c 2009-06-09 15:08:46 +0000
@@ -422,7 +422,12 @@ ibuf_init_at_db_start(void)
grow in size, as the references on the upper levels of the tree can
change */
- ibuf->max_size = ut_min( buf_pool_get_curr_size() / UNIV_PAGE_SIZE
+ /* The default for ibuf_max_size is calculated from the requested
+ buffer pool size srv_buf_pool_size, not the actual size as returned
+ by buf_pool_get_curr_size(). The latter can differ from the former
+ by one page due to alignment requirements, and we do not want a
+ user-visible variable like INNODB_IBUF_MAX_SIZE to vary at random. */
+ ibuf->max_size = ut_min( srv_buf_pool_size / UNIV_PAGE_SIZE
/ IBUF_POOL_SIZE_PER_MAX_SIZE, (ulint) srv_ibuf_max_size / UNIV_PAGE_SIZE);
srv_ibuf_max_size = (long long) ibuf->max_size * UNIV_PAGE_SIZE;
=== modified file 'storage/xtradb/include/sync0rw.h'
--- a/storage/xtradb/include/sync0rw.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/sync0rw.h 2009-06-09 15:08:46 +0000
@@ -357,6 +357,8 @@ rw_lock_get_x_lock_count(
rw_lock_t* lock); /* in: rw-lock */
/************************************************************************
Accessor functions for rw lock. */
+
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
UNIV_INLINE
ulint
rw_lock_get_s_waiters(
@@ -372,6 +374,14 @@ ulint
rw_lock_get_wx_waiters(
/*================*/
rw_lock_t* lock);
+#else /* !INNODB_RW_LOCKS_USE_ATOMICS */
+UNIV_INLINE
+ulint
+rw_lock_get_waiters(
+/*==================*/
+ rw_lock_t* lock);
+#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
+
UNIV_INLINE
ulint
rw_lock_get_writer(
@@ -488,6 +498,7 @@ rw_lock_debug_print(
rw_lock_debug_t* info); /* in: debug struct */
#endif /* UNIV_SYNC_DEBUG */
+/*
#ifndef INNODB_RW_LOCKS_USE_ATOMICS
#error INNODB_RW_LOCKS_USE_ATOMICS is not defined. Do you use enough new GCC or compatibles?
#error Or do you use exact options for CFLAGS?
@@ -495,6 +506,7 @@ rw_lock_debug_print(
#error e.g. (for Sparc_64): "-m64 -mcpu=v9"
#error Otherwise, this build may be slower than normal version.
#endif
+*/
/* NOTE! The structure appears here only for the compiler to know its size.
Do not use its fields directly! The structure used in the spin lock
=== modified file 'storage/xtradb/include/sync0rw.ic'
--- a/storage/xtradb/include/sync0rw.ic 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/sync0rw.ic 2009-06-09 15:08:46 +0000
@@ -68,6 +68,8 @@ rw_lock_remove_debug_info(
/************************************************************************
Accessor functions for rw lock. */
+
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
UNIV_INLINE
ulint
rw_lock_get_s_waiters(
@@ -93,23 +95,32 @@ rw_lock_get_wx_waiters(
{
return(lock->wait_ex_waiters);
}
+#else /* !INNODB_RW_LOCKS_USE_ATOMICS */
+UNIV_INLINE
+ulint
+rw_lock_get_waiters(
+/*================*/
+ /* out: 1 if waiters, 0 otherwise */
+ rw_lock_t* lock) /* in: rw-lock */
+{
+ return(lock->waiters);
+}
+#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
+
/************************************************************************
Sets lock->waiters to 1. It is not an error if lock->waiters is already
1. On platforms where ATOMIC builtins are used this function enforces a
memory barrier. */
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
UNIV_INLINE
void
rw_lock_set_s_waiter_flag(
/*====================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->s_waiters, 0, 1);
__sync_lock_test_and_set(&lock->s_waiters, 1);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->s_waiters = 1;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
UNIV_INLINE
void
@@ -117,12 +128,8 @@ rw_lock_set_x_waiter_flag(
/*====================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->x_waiters, 0, 1);
__sync_lock_test_and_set(&lock->x_waiters, 1);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->x_waiters = 1;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
UNIV_INLINE
void
@@ -130,30 +137,34 @@ rw_lock_set_wx_waiter_flag(
/*====================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->wait_ex_waiters, 0, 1);
__sync_lock_test_and_set(&lock->wait_ex_waiters, 1);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->wait_ex_waiters = 1;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
+#else /* !INNODB_RW_LOCKS_USE_ATOMICS */
+UNIV_INLINE
+void
+rw_lock_set_waiter_flag(
+/*====================*/
+ rw_lock_t* lock) /* in: rw-lock */
+{
+ lock->waiters = 1;
+}
+#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
/************************************************************************
Resets lock->waiters to 0. It is not an error if lock->waiters is already
0. On platforms where ATOMIC builtins are used this function enforces a
memory barrier. */
+#ifdef INNODB_RW_LOCKS_USE_ATOMICS
+
UNIV_INLINE
void
rw_lock_reset_s_waiter_flag(
/*======================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->s_waiters, 1, 0);
__sync_lock_test_and_set(&lock->s_waiters, 0);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->s_waiters = 0;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
UNIV_INLINE
void
@@ -161,12 +172,8 @@ rw_lock_reset_x_waiter_flag(
/*======================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->x_waiters, 1, 0);
__sync_lock_test_and_set(&lock->x_waiters, 0);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->x_waiters = 0;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
UNIV_INLINE
void
@@ -174,13 +181,19 @@ rw_lock_reset_wx_waiter_flag(
/*======================*/
rw_lock_t* lock) /* in: rw-lock */
{
-#ifdef INNODB_RW_LOCKS_USE_ATOMICS
// os_compare_and_swap(&lock->wait_ex_waiters, 1, 0);
__sync_lock_test_and_set(&lock->wait_ex_waiters, 0);
-#else /* INNODB_RW_LOCKS_USE_ATOMICS */
- lock->wait_ex_waiters = 0;
-#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
}
+#else /* !INNODB_RW_LOCKS_USE_ATOMICS */
+UNIV_INLINE
+void
+rw_lock_reset_waiter_flag(
+/*======================*/
+ rw_lock_t* lock) /* in: rw-lock */
+{
+ lock->waiters = 0;
+}
+#endif /* INNODB_RW_LOCKS_USE_ATOMICS */
/**********************************************************************
Returns the write-status of the lock - this function made more sense
=== modified file 'storage/xtradb/include/univ.i'
--- a/storage/xtradb/include/univ.i 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/univ.i 2009-06-09 15:08:46 +0000
@@ -210,7 +210,7 @@ operations (very slow); also UNIV_DEBUG
#define UNIV_BTR_DEBUG /* check B-tree links */
#define UNIV_LIGHT_MEM_DEBUG /* light memory debugging */
-#ifdef HAVE_purify
+#ifdef HAVE_valgrind
/* The following sets all new allocated memory to zero before use:
this can be used to eliminate unnecessary Purify warnings, but note that
it also masks many bugs Purify could detect. For detailed Purify analysis it
=== removed file 'storage/xtradb/setup.sh'
--- a/storage/xtradb/setup.sh 2009-06-09 11:16:11 +0000
+++ b/storage/xtradb/setup.sh 1970-01-01 00:00:00 +0000
@@ -1,47 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 1995, 2009, Innobase Oy. All Rights Reserved.
-#
-# This program is free software; you can redistribute it and/or modify it under
-# the terms of the GNU General Public License as published by the Free Software
-# Foundation; version 2 of the License.
-#
-# This program is distributed in the hope that it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc., 59 Temple
-# Place, Suite 330, Boston, MA 02111-1307 USA
-#
-# Prepare the MySQL source code tree for building
-# with checked-out InnoDB Subversion directory.
-
-# This script assumes that the current directory is storage/innobase.
-
-set -eu
-
-TARGETDIR=../storage/innobase
-
-# link the build scripts
-BUILDSCRIPTS="compile-innodb compile-innodb-debug"
-for script in $BUILDSCRIPTS ; do
- ln -sf $TARGETDIR/$script ../../BUILD/
-done
-
-cd ../../mysql-test/t
-ln -sf ../$TARGETDIR/mysql-test/*.test ../$TARGETDIR/mysql-test/*.opt .
-cd ../r
-ln -sf ../$TARGETDIR/mysql-test/*.result .
-cd ../include
-ln -sf ../$TARGETDIR/mysql-test/*.inc .
-
-# Apply any patches that are needed to make the mysql-test suite successful.
-# These patches are usually needed because of deviations of behavior between
-# the stock InnoDB and the InnoDB Plugin.
-cd ../..
-for patch in storage/innobase/mysql-test/patches/*.diff ; do
- if [ "${patch}" != "storage/innobase/mysql-test/patches/*.diff" ] ; then
- patch -p0 < ${patch}
- fi
-done
=== modified file 'storage/xtradb/srv/srv0start.c'
--- a/storage/xtradb/srv/srv0start.c 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/srv/srv0start.c 2009-06-09 15:08:46 +0000
@@ -124,7 +124,7 @@ static char* srv_monitor_file_name;
/* Avoid warnings when using purify */
-#ifdef HAVE_purify
+#ifdef HAVE_valgrind
static int inno_bcmp(register const char *s1, register const char *s2,
register uint len)
{
3
2
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2714)
by knielsen@knielsen-hq.org 19 Jun '09
by knielsen@knielsen-hq.org 19 Jun '09
19 Jun '09
#At lp:maria
2714 knielsen(a)knielsen-hq.org 2009-06-18
Fix test cases after merge of XtraDB into MariaDB.
Manually merge some InnoDB changes into XtraDB.
Fix ALTER TABLE bug in XtraDB with wrong comparison of row type.
modified:
mysql-test/include/varchar.inc
mysql-test/mysql-test-run.pl
mysql-test/r/innodb.result
mysql-test/t/innodb-use-sys-malloc.test
sql/sql_table.cc
storage/xtradb/handler/ha_innodb.cc
storage/xtradb/include/pars0pars.h
storage/xtradb/include/univ.i
per-file messages:
mysql-test/include/varchar.inc
Fix in test case that which of several duplicate keys triggers an error is not
deterministic.
mysql-test/mysql-test-run.pl
InnoDB does not bother to free resources individually during shutdown, but due to using
its own memory tracking it nevertheless can free everything at exit. But XtraDB adds an
option, on by default, to skip this extra tracking. This causes lots of Valgrind
warnings, so needs to be disabled for Valgrind testing.
mysql-test/r/innodb.result
Fix in test case that which of several duplicate keys triggers an error is not
deterministic.
mysql-test/t/innodb-use-sys-malloc.test
InnoDB does not bother to free resources individually during shutdown, but due to using
its own memory tracking it nevertheless can free everything at exit. But XtraDB adds an
option, on by default, to skip this extra tracking. This causes lots of Valgrind
warnings, so needs to be disabled for Valgrind testing.
sql/sql_table.cc
Add some useful DBUG while debugging alter table.
storage/xtradb/handler/ha_innodb.cc
Fix that check_if_incompatible_data did not realise that ROW_TYPE_DEFAULT is identical
to the default row format ROW_TYPE_COMPACT, causing excessive table copying in
ALTER TABLE
Add some useful DBUG while debugging alter table.
Manually merge into XtraDB a few small changes for InnoDB from upstream MySQL.
storage/xtradb/include/pars0pars.h
Manually merge into XtraDB a few small changes for InnoDB from upstream MySQL.
storage/xtradb/include/univ.i
Manually merge a MariaDB fix in InnoDB into XtraDB.
=== modified file 'mysql-test/include/varchar.inc'
--- a/mysql-test/include/varchar.inc 2008-05-14 06:50:16 +0000
+++ b/mysql-test/include/varchar.inc 2009-06-18 12:39:21 +0000
@@ -86,6 +86,8 @@ explain select count(*) from t1 where v
--replace_column 9 #
explain select count(*) from t1 where v between 'a' and 'a ' and v between 'a ' and 'b\n';
+# Which duplicate entry triggers error is not deterministic.
+--replace_regex /Duplicate entry '[^']+' for key/Duplicate entry '{ ' for key/
--error ER_DUP_ENTRY
alter table t1 add unique(v);
alter table t1 add key(v);
=== modified file 'mysql-test/mysql-test-run.pl'
--- a/mysql-test/mysql-test-run.pl 2009-06-05 15:35:22 +0000
+++ b/mysql-test/mysql-test-run.pl 2009-06-18 12:39:21 +0000
@@ -1356,6 +1356,18 @@ sub command_line_setup {
join(" ", @valgrind_args), "\"");
}
+ # InnoDB does not bother to do individual de-allocations at exit. Instead it
+ # relies on a custom allocator to track every allocation, and frees all at
+ # once during exit.
+ # In XtraDB, an option use-sys-malloc is introduced (and on by default) to
+ # disable this (for performance). But this exposes Valgrind to all the
+ # missing de-allocations, so we need to disable it to at least get
+ # meaningful leak checking for the rest of the server.
+ if ($opt_valgrind_mysqld)
+ {
+ push(@opt_extra_mysqld_opt, "--loose-skip-innodb-use-sys-malloc");
+ }
+
mtr_report("Checking supported features...");
check_ndbcluster_support(\%mysqld_variables);
=== modified file 'mysql-test/r/innodb.result'
--- a/mysql-test/r/innodb.result 2009-06-09 15:08:46 +0000
+++ b/mysql-test/r/innodb.result 2009-06-18 12:39:21 +0000
@@ -1970,7 +1970,7 @@ explain select count(*) from t1 where v
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 ref v v 13 const # Using where; Using index
alter table t1 add unique(v);
-ERROR 23000: Duplicate entry 'v' for key 'v_2'
+ERROR 23000: Duplicate entry '{ ' for key 'v_2'
alter table t1 add key(v);
select concat('*',v,'*',c,'*',t,'*') as qq from t1 where v='a';
qq
=== modified file 'mysql-test/t/innodb-use-sys-malloc.test'
--- a/mysql-test/t/innodb-use-sys-malloc.test 2009-06-09 13:19:13 +0000
+++ b/mysql-test/t/innodb-use-sys-malloc.test 2009-06-18 12:39:21 +0000
@@ -1,4 +1,7 @@
--source include/have_innodb.inc
+# XtraDB has lots of memory leak warnings at shutdown when
+# --innodb-use-sys-malloc
+--source include/not_valgrind.inc
#display current value of innodb_use_sys_malloc
SELECT @@GLOBAL.innodb_use_sys_malloc;
=== modified file 'sql/sql_table.cc'
--- a/sql/sql_table.cc 2009-06-02 09:58:27 +0000
+++ b/sql/sql_table.cc 2009-06-18 12:39:21 +0000
@@ -5333,6 +5333,7 @@ compare_tables(TABLE *table,
!table->s->mysql_version ||
(table->s->frm_version < FRM_VER_TRUE_VARCHAR && varchar))
{
+ DBUG_PRINT("info", ("Basic checks -> ALTER_TABLE_DATA_CHANGED"));
*need_copy_table= ALTER_TABLE_DATA_CHANGED;
DBUG_RETURN(0);
}
@@ -5361,6 +5362,8 @@ compare_tables(TABLE *table,
if ((tmp_new_field->flags & NOT_NULL_FLAG) !=
(uint) (field->flags & NOT_NULL_FLAG))
{
+ DBUG_PRINT("info", ("NULL behaviour difference in field '%s' -> "
+ "ALTER_TABLE_DATA_CHANGED", new_field->field_name));
*need_copy_table= ALTER_TABLE_DATA_CHANGED;
DBUG_RETURN(0);
}
@@ -5382,6 +5385,8 @@ compare_tables(TABLE *table,
/* Evaluate changes bitmap and send to check_if_incompatible_data() */
if (!(tmp= field->is_equal(tmp_new_field)))
{
+ DBUG_PRINT("info", ("!field_is_equal('%s') -> ALTER_TABLE_DATA_CHANGED",
+ new_field->field_name));
*need_copy_table= ALTER_TABLE_DATA_CHANGED;
DBUG_RETURN(0);
}
@@ -5515,16 +5520,22 @@ compare_tables(TABLE *table,
/* Check if changes are compatible with current handler without a copy */
if (table->file->check_if_incompatible_data(create_info, changes))
{
+ DBUG_PRINT("info", ("check_if_incompatible_data() -> "
+ "ALTER_TABLE_DATA_CHANGED"));
*need_copy_table= ALTER_TABLE_DATA_CHANGED;
DBUG_RETURN(0);
}
if (*index_drop_count || *index_add_count)
{
+ DBUG_PRINT("info", ("Index dropped=%u added=%u -> "
+ "ALTER_TABLE_INDEX_CHANGED",
+ *index_drop_count, *index_add_count));
*need_copy_table= ALTER_TABLE_INDEX_CHANGED;
DBUG_RETURN(0);
}
+ DBUG_PRINT("info", (" -> ALTER_TABLE_METADATA_ONLY"));
*need_copy_table= ALTER_TABLE_METADATA_ONLY; // Tables are compatible
DBUG_RETURN(0);
}
=== modified file 'storage/xtradb/handler/ha_innodb.cc'
--- a/storage/xtradb/handler/ha_innodb.cc 2009-06-11 12:53:26 +0000
+++ b/storage/xtradb/handler/ha_innodb.cc 2009-06-18 12:39:21 +0000
@@ -5018,7 +5018,6 @@ convert_search_mode_to_innobase(
case HA_READ_MBR_WITHIN:
case HA_READ_MBR_DISJOINT:
case HA_READ_MBR_EQUAL:
- my_error(ER_TABLE_CANT_HANDLE_SPKEYS, MYF(0));
return(PAGE_CUR_UNSUPP);
/* do not use "default:" in order to produce a gcc warning:
enumeration value '...' not handled in switch
@@ -6720,6 +6719,7 @@ innobase_rename_table(
int error;
char* norm_to;
char* norm_from;
+ DBUG_ENTER("innobase_rename_table");
if (lower_case_table_names) {
srv_lower_case_table_names = TRUE;
@@ -6747,6 +6747,7 @@ innobase_rename_table(
if (error != DB_SUCCESS) {
FILE* ef = dict_foreign_err_file;
+ DBUG_PRINT("info", ("rename failed: %d", error));
fputs("InnoDB: Renaming table ", ef);
ut_print_name(ef, trx, TRUE, norm_from);
fputs(" to ", ef);
@@ -6767,7 +6768,7 @@ innobase_rename_table(
my_free(norm_to, MYF(0));
my_free(norm_from, MYF(0));
- return error;
+ DBUG_RETURN(error);
}
/*************************************************************************
Renames an InnoDB table. */
@@ -6900,7 +6901,7 @@ ha_innobase::records_in_range(
mode2);
} else {
- n_rows = 0;
+ n_rows = HA_POS_ERROR;
}
mem_heap_free(heap);
@@ -7614,7 +7615,7 @@ ha_innobase::get_foreign_key_list(THD *t
f_key_info.referenced_key_name = thd_make_lex_string(
thd, f_key_info.referenced_key_name,
foreign->referenced_index->name,
- strlen(foreign->referenced_index->name), 1);
+ (uint) strlen(foreign->referenced_index->name), 1);
}
else
f_key_info.referenced_key_name= 0;
@@ -8227,7 +8228,7 @@ innodb_show_status(
bool result = FALSE;
- if (stat_print(thd, innobase_hton_name, strlen(innobase_hton_name),
+ if (stat_print(thd, innobase_hton_name, (uint) strlen(innobase_hton_name),
STRING_WITH_LEN(""), str, flen)) {
result= TRUE;
}
@@ -8258,7 +8259,7 @@ innodb_mutex_show_status(
ulint rw_lock_count_os_yield= 0;
ulonglong rw_lock_wait_time= 0;
#endif /* UNIV_DEBUG */
- uint hton_name_len= strlen(innobase_hton_name), buf1len, buf2len;
+ uint hton_name_len= (uint) strlen(innobase_hton_name), buf1len, buf2len;
DBUG_ENTER("innodb_mutex_show_status");
DBUG_ASSERT(hton == innodb_hton_ptr);
@@ -8302,9 +8303,9 @@ innodb_mutex_show_status(
rw_lock_wait_time += mutex->lspent_time;
}
#else /* UNIV_DEBUG */
- buf1len= my_snprintf(buf1, sizeof(buf1), "%s:%lu",
+ buf1len= (uint) my_snprintf(buf1, sizeof(buf1), "%s:%lu",
mutex->cfile_name, (ulong) mutex->cline);
- buf2len= my_snprintf(buf2, sizeof(buf2), "os_waits=%lu",
+ buf2len= (uint) my_snprintf(buf2, sizeof(buf2), "os_waits=%lu",
mutex->count_os_wait);
if (stat_print(thd, innobase_hton_name,
@@ -8860,7 +8861,7 @@ ha_innobase::get_error_message(int error
{
trx_t* trx = check_trx_exists(ha_thd());
- buf->copy(trx->detailed_error, strlen(trx->detailed_error),
+ buf->copy(trx->detailed_error, (uint) strlen(trx->detailed_error),
system_charset_info);
return(FALSE);
@@ -9294,31 +9295,49 @@ ha_innobase::check_if_incompatible_data(
HA_CREATE_INFO* info,
uint table_changes)
{
+ enum row_type row_type, info_row_type;
+ DBUG_ENTER("ha_innobase::check_if_incompatible_data");
+
if (table_changes != IS_EQUAL_YES) {
- return(COMPATIBLE_DATA_NO);
+ DBUG_PRINT("info", ("table_changes != IS_EQUAL_YES "
+ "-> COMPATIBLE_DATA_NO"));
+ DBUG_RETURN(COMPATIBLE_DATA_NO);
}
/* Check that auto_increment value was not changed */
if ((info->used_fields & HA_CREATE_USED_AUTO) &&
info->auto_increment_value != 0) {
- return(COMPATIBLE_DATA_NO);
+ DBUG_PRINT("info", ("auto_increment_value changed -> "
+ "COMPATIBLE_DATA_NO"));
+ DBUG_RETURN(COMPATIBLE_DATA_NO);
}
/* Check that row format didn't change */
+ row_type = get_row_type();
+ info_row_type = info->row_type;
+ /* Default is compact. */
+ if (info_row_type == ROW_TYPE_DEFAULT)
+ info_row_type = ROW_TYPE_COMPACT;
if ((info->used_fields & HA_CREATE_USED_ROW_FORMAT) &&
- get_row_type() != info->row_type) {
+ row_type != info_row_type) {
- return(COMPATIBLE_DATA_NO);
+ DBUG_PRINT("info", ("get_row_type()=%d != info->row_type=%d -> "
+ "COMPATIBLE_DATA_NO",
+ row_type, info->row_type));
+ DBUG_RETURN(COMPATIBLE_DATA_NO);
}
/* Specifying KEY_BLOCK_SIZE requests a rebuild of the table. */
if (info->used_fields & HA_CREATE_USED_KEY_BLOCK_SIZE) {
- return(COMPATIBLE_DATA_NO);
+ DBUG_PRINT("info", ("HA_CREATE_USED_KEY_BLOCK_SIZE -> "
+ "COMPATIBLE_DATA_NO"));
+ DBUG_RETURN(COMPATIBLE_DATA_NO);
}
- return(COMPATIBLE_DATA_YES);
+ DBUG_PRINT("info", (" -> COMPATIBLE_DATA_YES"));
+ DBUG_RETURN(COMPATIBLE_DATA_YES);
}
/****************************************************************
=== modified file 'storage/xtradb/include/pars0pars.h'
--- a/storage/xtradb/include/pars0pars.h 2009-03-26 06:11:11 +0000
+++ b/storage/xtradb/include/pars0pars.h 2009-06-18 12:39:21 +0000
@@ -700,7 +700,7 @@ struct for_node_struct{
definition */
que_node_t* loop_start_limit;/* initial value of loop variable */
que_node_t* loop_end_limit; /* end value of loop variable */
- int loop_end_value; /* evaluated value for the end value:
+ lint loop_end_value; /* evaluated value for the end value:
it is calculated only when the loop
is entered, and will not change within
the loop */
=== modified file 'storage/xtradb/include/univ.i'
--- a/storage/xtradb/include/univ.i 2009-06-11 12:53:26 +0000
+++ b/storage/xtradb/include/univ.i 2009-06-18 12:39:21 +0000
@@ -133,10 +133,10 @@ from Makefile.in->ut0auxconf.h */
# endif /* HAVE_ATOMIC_PTHREAD_T */
#endif /* HAVE_GCC_ATOMIC_BUILTINS */
-/* We only try to do explicit inlining of functions with gcc and
-Microsoft Visual C++ */
+/* Enable explicit inlining of functions only for compilers known to
+support it. */
-# if !defined(__GNUC__)
+# if !defined(__GNUC__) && !defined(__SUNPRO_C)
# undef UNIV_MUST_NOT_INLINE /* Remove compiler warning */
# define UNIV_MUST_NOT_INLINE
# endif
3
2
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300
+++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300
@@ -141,13 +141,15 @@
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
+
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
-1. Don't remove index_merge part of the tree.
+A1. Don't remove index_merge part of the tree (this will take care of
+ DISCARD-IMERGE-1 problem)
-2. Push range conditions down into index_merge trees that may support them.
+A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
@@ -155,8 +157,86 @@
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
-3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
-2.2 New tree_or()
+2.2 New tree_or()
+-----------------
+O1. Dont remove non-range plans:
+ Current tree_or() code will refuse to produce index_merge plans for
+ conditions like
+
+ "t.key1part2=const OR t.key2part1=const"
+
+ (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
+ the AND condition is not usable for range access, and the operation of
+ tree_and() guaranteed that there was no way it could changed to make a
+ usable range plan. With new tree_and() and rule A2, this is no longer the
+ case. For example for this query:
+
+ (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
+
+ it will construct a
+
+ imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
+
+ then tree_and() will apply rule A2 to push the range down into index merge
+ and after that we'll have:
+
+ range(t.key1part1=const)
+ imerge(
+ t.key1part2=const AND t.key1part1=const,
+ t.key2part1=const
+ )
+ note that imerge(...) describes a usable index_merge plan and it's possible
+ that it will be the best access path.
+
+O2. "Create index_merge accesses when possible"
+ Current tree_or() will not create index_merge access when it could create
+ non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
+ in the current implementation" section). This will be changed to work as
+ follows: we will create index_merge made for index scans that didn't have
+ their match in the other sel_tree.
+ Ilustrating it with an example:
+
+ | sel_tree_A | sel_tree_B | A or B | include in index_merge?
+ ------+------------+------------+--------+------------------------
+ key1 | cond1 | cond2 | condM | no
+ key2 | cond3 | cond4 | NULL | no
+ key3 | cond5 | | | yes, A-side
+ key4 | cond6 | | | yes, A-side
+ key5 | | cond7 | | yes, B-side
+ key6 | | cond8 | | yes, B-side
+
+ here we assume that
+ - (cond1 OR cond2) did produce a combined range. Not including them in
+ index_merge.
+ - (cond3 OR cond4) didn't produce a usable range (e.g. they were
+ t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
+ didn't yield any range list)
+ - All other scand didn't have their counterparts, so we'll end up with a
+ SEL_TREE of:
+
+ range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
+ .
+
+O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
+that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
+seen any complaints that could be attributed to it.
+If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
+lift it ,and produce a cross-product:
+
+ ((key1p OR key2p) AND (key3p OR key4p))
+ OR
+ ((key5p OR key6p) AND (key7p OR key8p))
+
+ = (key1p OR key2p OR key5p OR key6p) AND // this part is currently
+ (key3p OR key4p OR key5p OR key6p) AND // produced
+
+ (key1p OR key2p OR key5p OR key6p) AND // this part will be added
+ (key3p OR key4p OR key5p OR key6p) //.
+
+In order to limit the impact of this combinatorial explosion, we will
+introduce a rule that we won't generate more than #defined
+MAX_IMERGE_OPTS options.
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
------------------------------------------------------------
-=-=(View All Progress Notes, 11 total)=-=-
http://askmonty.org/worklog/index.pl?tid=24&nolimit=1
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
A1. Don't remove index_merge part of the tree (this will take care of
DISCARD-IMERGE-1 problem)
A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
-----------------
O1. Dont remove non-range plans:
Current tree_or() code will refuse to produce index_merge plans for
conditions like
"t.key1part2=const OR t.key2part1=const"
(this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
the AND condition is not usable for range access, and the operation of
tree_and() guaranteed that there was no way it could changed to make a
usable range plan. With new tree_and() and rule A2, this is no longer the
case. For example for this query:
(t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
it will construct a
imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
then tree_and() will apply rule A2 to push the range down into index merge
and after that we'll have:
range(t.key1part1=const)
imerge(
t.key1part2=const AND t.key1part1=const,
t.key2part1=const
)
note that imerge(...) describes a usable index_merge plan and it's possible
that it will be the best access path.
O2. "Create index_merge accesses when possible"
Current tree_or() will not create index_merge access when it could create
non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
in the current implementation" section). This will be changed to work as
follows: we will create index_merge made for index scans that didn't have
their match in the other sel_tree.
Ilustrating it with an example:
| sel_tree_A | sel_tree_B | A or B | include in index_merge?
------+------------+------------+--------+------------------------
key1 | cond1 | cond2 | condM | no
key2 | cond3 | cond4 | NULL | no
key3 | cond5 | | | yes, A-side
key4 | cond6 | | | yes, A-side
key5 | | cond7 | | yes, B-side
key6 | | cond8 | | yes, B-side
here we assume that
- (cond1 OR cond2) did produce a combined range. Not including them in
index_merge.
- (cond3 OR cond4) didn't produce a usable range (e.g. they were
t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
didn't yield any range list)
- All other scand didn't have their counterparts, so we'll end up with a
SEL_TREE of:
range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
.
O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
seen any complaints that could be attributed to it.
If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
lift it ,and produce a cross-product:
((key1p OR key2p) AND (key3p OR key4p))
OR
((key5p OR key6p) AND (key7p OR key8p))
= (key1p OR key2p OR key5p OR key6p) AND // this part is currently
(key3p OR key4p OR key5p OR key6p) AND // produced
(key1p OR key2p OR key5p OR key6p) AND // this part will be added
(key3p OR key4p OR key5p OR key6p) //.
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300
+++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300
@@ -141,13 +141,15 @@
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
+
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
-1. Don't remove index_merge part of the tree.
+A1. Don't remove index_merge part of the tree (this will take care of
+ DISCARD-IMERGE-1 problem)
-2. Push range conditions down into index_merge trees that may support them.
+A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
@@ -155,8 +157,86 @@
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
-3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
-2.2 New tree_or()
+2.2 New tree_or()
+-----------------
+O1. Dont remove non-range plans:
+ Current tree_or() code will refuse to produce index_merge plans for
+ conditions like
+
+ "t.key1part2=const OR t.key2part1=const"
+
+ (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
+ the AND condition is not usable for range access, and the operation of
+ tree_and() guaranteed that there was no way it could changed to make a
+ usable range plan. With new tree_and() and rule A2, this is no longer the
+ case. For example for this query:
+
+ (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
+
+ it will construct a
+
+ imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
+
+ then tree_and() will apply rule A2 to push the range down into index merge
+ and after that we'll have:
+
+ range(t.key1part1=const)
+ imerge(
+ t.key1part2=const AND t.key1part1=const,
+ t.key2part1=const
+ )
+ note that imerge(...) describes a usable index_merge plan and it's possible
+ that it will be the best access path.
+
+O2. "Create index_merge accesses when possible"
+ Current tree_or() will not create index_merge access when it could create
+ non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
+ in the current implementation" section). This will be changed to work as
+ follows: we will create index_merge made for index scans that didn't have
+ their match in the other sel_tree.
+ Ilustrating it with an example:
+
+ | sel_tree_A | sel_tree_B | A or B | include in index_merge?
+ ------+------------+------------+--------+------------------------
+ key1 | cond1 | cond2 | condM | no
+ key2 | cond3 | cond4 | NULL | no
+ key3 | cond5 | | | yes, A-side
+ key4 | cond6 | | | yes, A-side
+ key5 | | cond7 | | yes, B-side
+ key6 | | cond8 | | yes, B-side
+
+ here we assume that
+ - (cond1 OR cond2) did produce a combined range. Not including them in
+ index_merge.
+ - (cond3 OR cond4) didn't produce a usable range (e.g. they were
+ t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
+ didn't yield any range list)
+ - All other scand didn't have their counterparts, so we'll end up with a
+ SEL_TREE of:
+
+ range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
+ .
+
+O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
+that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
+seen any complaints that could be attributed to it.
+If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
+lift it ,and produce a cross-product:
+
+ ((key1p OR key2p) AND (key3p OR key4p))
+ OR
+ ((key5p OR key6p) AND (key7p OR key8p))
+
+ = (key1p OR key2p OR key5p OR key6p) AND // this part is currently
+ (key3p OR key4p OR key5p OR key6p) AND // produced
+
+ (key1p OR key2p OR key5p OR key6p) AND // this part will be added
+ (key3p OR key4p OR key5p OR key6p) //.
+
+In order to limit the impact of this combinatorial explosion, we will
+introduce a rule that we won't generate more than #defined
+MAX_IMERGE_OPTS options.
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
------------------------------------------------------------
-=-=(View All Progress Notes, 11 total)=-=-
http://askmonty.org/worklog/index.pl?tid=24&nolimit=1
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
A1. Don't remove index_merge part of the tree (this will take care of
DISCARD-IMERGE-1 problem)
A2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
A3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
-----------------
O1. Dont remove non-range plans:
Current tree_or() code will refuse to produce index_merge plans for
conditions like
"t.key1part2=const OR t.key2part1=const"
(this is marked as DISCARD-IMERGE-3). This was justifed as the left part of
the AND condition is not usable for range access, and the operation of
tree_and() guaranteed that there was no way it could changed to make a
usable range plan. With new tree_and() and rule A2, this is no longer the
case. For example for this query:
(t.key1part2=const OR t.key2part1=const) AND t.key1part1=const
it will construct a
imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const)
then tree_and() will apply rule A2 to push the range down into index merge
and after that we'll have:
range(t.key1part1=const)
imerge(
t.key1part2=const AND t.key1part1=const,
t.key2part1=const
)
note that imerge(...) describes a usable index_merge plan and it's possible
that it will be the best access path.
O2. "Create index_merge accesses when possible"
Current tree_or() will not create index_merge access when it could create
non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems
in the current implementation" section). This will be changed to work as
follows: we will create index_merge made for index scans that didn't have
their match in the other sel_tree.
Ilustrating it with an example:
| sel_tree_A | sel_tree_B | A or B | include in index_merge?
------+------------+------------+--------+------------------------
key1 | cond1 | cond2 | condM | no
key2 | cond3 | cond4 | NULL | no
key3 | cond5 | | | yes, A-side
key4 | cond6 | | | yes, A-side
key5 | | cond7 | | yes, B-side
key6 | | cond8 | | yes, B-side
here we assume that
- (cond1 OR cond2) did produce a combined range. Not including them in
index_merge.
- (cond3 OR cond4) didn't produce a usable range (e.g. they were
t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them
didn't yield any range list)
- All other scand didn't have their counterparts, so we'll end up with a
SEL_TREE of:
range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8))
.
O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is
that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven
seen any complaints that could be attributed to it.
If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to
lift it ,and produce a cross-product:
((key1p OR key2p) AND (key3p OR key4p))
OR
((key5p OR key6p) AND (key7p OR key8p))
= (key1p OR key2p OR key5p OR key6p) AND // this part is currently
(key3p OR key4p OR key5p OR key6p) AND // produced
(key1p OR key2p OR key5p OR key6p) AND // this part will be added
(key3p OR key4p OR key5p OR key6p) //.
In order to limit the impact of this combinatorial explosion, we will
introduce a rule that we won't generate more than #defined
MAX_IMERGE_OPTS options.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point
+where it is really necessary.
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to
+take it into consideration together with range access plans.
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND
+ range_cond(key2_1) AND
+ other_cond AND
+ index_merge_OR_cond1(key3_1, key3_2, ...)
+ index_merge_OR_cond2(key4_1, key4_2, ...)
+
+where
+
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR
+ range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+ and doesn't allow to construct range/index_merge accesses
+ for any keys of the table in question.
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+ SEL_TREE(
+ range(key1_1),
+ ...
+ range(key2_1),
+ SEL_IMERGE( (1)
+ SEL_TREE(key3_1})
+ SEL_TREE(key3_2})
+ ...
+ )
+ ...
+ )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+ index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok
+ it did but that was done for accesses to different tables). Will there be any
+ possible gotchas here?
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
1. Don't remove index_merge part of the tree.
2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents
+
+ # There are sel_trees, a sel_tree is either range or merge tree
+ sel_tree = range_tree | imerge_tree
+
+ # a range tree has range access options, possibly for several keys
+ range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+
+ # merge tree represents several way to index_merge
+ imerge_tree = imerge1 AND imerge2 AND ...
+
+ # a way to do index merge == a set to use of different indexes.
+ imergeX = range_tree1 OR range_tree2 OR ..
+ where no pair of range_treeX have ranges over the same index.
+
+
+ tree_and(A, B)
+ {
+ if (both A and B are range trees)
+ return a range_tree with computed intersection for each range;
+ if (only one of A and B is a range tree)
+ return that tree; // DISCARD-IMERGE-1
+ // at this point both trees are index_merge trees
+ return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+ }
+
+
+ tree_or(A, B)
+ {
+ if (A and B are range trees)
+ {
+ R = new range_tree;
+ for each index i
+ R.add(range_union(A.range(i), B.range(i)));
+
+ if (R has at least one range access)
+ return R;
+ else
+ {
+ /* could not build any range accesses. construct index_merge */
+ remove non-ranges from A; // DISCARD-IMERGE-2
+ remove non-ranges from B;
+ return new index_merge(A, B);
+ }
+ }
+ else if (A is range tree and B is index_merge tree (or vice versa))
+ {
+ Perform this transformation:
+
+ range_treeA // this is A
+ OR
+ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+ (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
+ =
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+ (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+ ...
+ (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+ Now each line represents an index_merge..
+ }
+ else if (both A and B are index_merge trees)
+ {
+ Perform this transformation:
+
+ imergeA1 AND imergeA2 AND ... AND imergeAN
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN
+
+ -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
+
+ imergeA1
+ OR
+ imergeB1 AND imergeB2 AND ... AND imergeBN =
+
+ = (combine imergeA1 with each of the imergeB{i} ) =
+
+ combine(imergeA1 OR imergeB1) AND
+ combine(imergeA1 OR imergeB2) AND
+ ... AND
+ combine(imergeA1 OR imergeBN)
+ }
+ }
+
+1.1. Problems in the current implementation
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
+two indexes:
+
+ INDEX i1(col1, col2),
+ INDEX i2(col1, col3)
+
+and this WHERE clause:
+
+ col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is,
+
+ sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+ if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+ then perform an equvalent of this operation:
+
+ rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+ concatenate them together.
+
+2.2 New tree_or()
-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753 2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753 2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint
-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991 2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991 2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog
-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24
-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
* How strict is the limitation on the form of the WHERE?
+* Which version should this be based on? 5.1? Which patches are should be in
+ (google's/percona's/maria/etc?)
+
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin
-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414 2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414 2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access
-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point
+where it is really necessary.
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to
+take it into consideration together with range access plans.
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND
+ range_cond(key2_1) AND
+ other_cond AND
+ index_merge_OR_cond1(key3_1, key3_2, ...)
+ index_merge_OR_cond2(key4_1, key4_2, ...)
+
+where
+
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR
+ range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+ and doesn't allow to construct range/index_merge accesses
+ for any keys of the table in question.
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+ SEL_TREE(
+ range(key1_1),
+ ...
+ range(key2_1),
+ SEL_IMERGE( (1)
+ SEL_TREE(key3_1})
+ SEL_TREE(key3_2})
+ ...
+ )
+ ...
+ )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+ index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok
+ it did but that was done for accesses to different tables). Will there be any
+ possible gotchas here?
DESCRIPTION:
Current range optimizer will discard possible index_merge/[sort]union
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.
A bad side effect of this is that for WHERE clauses in form
t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')
the optimizer will
- discard union(key2,key3) in favor of range(key1)
- consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.
This WL is to about lifting this limitation at least for some subset of WHERE
clauses.
HIGH-LEVEL SPECIFICATION:
(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO
</contents>
Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point
where it is really necessary.
This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to
take it into consideration together with range access plans.
Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:
WHERE := range_cond(key1_1) AND
range_cond(key2_1) AND
other_cond AND
index_merge_OR_cond1(key3_1, key3_2, ...)
index_merge_OR_cond2(key4_1, key4_2, ...)
where
index_merge_OR_cond{N} := (range_cond(keyN_1) OR
range_cond(keyN_2) OR ...)
range_cond(keyX) := condition that allows to construct range access of keyX
and doesn't allow to construct range/index_merge accesses
for any keys of the table in question.
For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
SEL_TREE(
range(key1_1),
...
range(key2_1),
SEL_IMERGE( (1)
SEL_TREE(key3_1})
SEL_TREE(key3_2})
...
)
...
)
which can be used to make a cost-based choice between range and index_merge.
Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).
TODO
----
* is it a problem if there are keys that are referred to both from
index_merge and from range access?
* How strict is the limitation on the form of the WHERE?
* Which version should this be based on? 5.1? Which patches are should be in
(google's/percona's/maria/etc?)
* TODO: The optimizer didn't compare costs of index_merge and range before (ok
it did but that was done for accesses to different tables). Will there be any
possible gotchas here?
LOW-LEVEL DESIGN:
<contents>
1. Current implementation overview
1.1. Problems in the current implementation
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
</contents>
1. Current implementation overview
==================================
At the moment, range analyzer works as follows:
SEL_TREE structure represents
# There are sel_trees, a sel_tree is either range or merge tree
sel_tree = range_tree | imerge_tree
# a range tree has range access options, possibly for several keys
range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
# merge tree represents several way to index_merge
imerge_tree = imerge1 AND imerge2 AND ...
# a way to do index merge == a set to use of different indexes.
imergeX = range_tree1 OR range_tree2 OR ..
where no pair of range_treeX have ranges over the same index.
tree_and(A, B)
{
if (both A and B are range trees)
return a range_tree with computed intersection for each range;
if (only one of A and B is a range tree)
return that tree; // DISCARD-IMERGE-1
// at this point both trees are index_merge trees
return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
}
tree_or(A, B)
{
if (A and B are range trees)
{
R = new range_tree;
for each index i
R.add(range_union(A.range(i), B.range(i)));
if (R has at least one range access)
return R;
else
{
/* could not build any range accesses. construct index_merge */
remove non-ranges from A; // DISCARD-IMERGE-2
remove non-ranges from B;
return new index_merge(A, B);
}
}
else if (A is range tree and B is index_merge tree (or vice versa))
{
Perform this transformation:
range_treeA // this is A
OR
(range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
(range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
...
(range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND
=
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
(range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
...
(range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
Now each line represents an index_merge..
}
else if (both A and B are index_merge trees)
{
Perform this transformation:
imergeA1 AND imergeA2 AND ... AND imergeAN
OR
imergeB1 AND imergeB2 AND ... AND imergeBN
-> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3
imergeA1
OR
imergeB1 AND imergeB2 AND ... AND imergeBN =
= (combine imergeA1 with each of the imergeB{i} ) =
combine(imergeA1 OR imergeB1) AND
combine(imergeA1 OR imergeB2) AND
... AND
combine(imergeA1 OR imergeBN)
}
}
1.1. Problems in the current implementation
-------------------------------------------
As marked in the code above:
DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:
(t.key1=c1 OR t.key2=c2) AND t.badkey < c3
DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):
(t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are
two indexes:
INDEX i1(col1, col2),
INDEX i2(col1, col3)
and this WHERE clause:
col1=c1 AND (col2=c2 OR col3=c3)
The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.
2. New implementation
=====================
<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>
SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is,
sel_tree2 = range_tree AND imerge_tree
where both parts are optional (i.e. can be empty)
Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:
2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:
1. Don't remove index_merge part of the tree.
2. Push range conditions down into index_merge trees that may support them.
if one tree has range(key1) and the other tree has imerge(key1 OR key2)
then perform an equvalent of this operation:
rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
(rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
concatenate them together.
2.2 New tree_or()
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
------------------------------------------------------------
-=-=(View All Progress Notes, 26 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300
+++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300
@@ -158,3 +158,43 @@
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
+* What is described above will not be able to eliminate this outer join
+ create unique index idx on tableB (id, fromDate);
+ ...
+ left outer join
+ tableB B
+ on
+ B.id = A.id
+ and
+ B.fromDate = (select max(sub.fromDate)
+ from tableB sub where sub.id = A.id);
+
+ This is because condition "B.fromDate= func(tableB)" cannot be used.
+ Reason#1: update_ref_and_keys() does not consider such conditions to
+ be of any use (and indeed they are not usable for ref access)
+ so they are not put into KEYUSE array.
+ Reason#2: even if they were put there, we would need to be able to tell
+ between predicates like
+ B.fromDate= func(B.id) // guarantees only one matching row as
+ // B.id is already bound by B.id=A.id
+ // hence B.fromDate becomes bound too.
+ and
+ "B.fromDate= func(B.*)" // Can potentially have many matching
+ // records.
+ We need to
+ - Have update_ref_and_keys() create KEYUSE elements for such equalities
+ - Have eliminate_tables() and friends make a more accurate check.
+ The right check is to check whether all parts of a unique key are bound.
+ If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
+ keypartY to be bound.
+ The difficulty here is that correlated subquery predicate cannot tell what
+ columns it depends on (it only remembers tables).
+ Traversing the predicate is expensive and complicated.
+ We're leaning towards making each subquery predicate have a List<Item> with
+ items that
+ - are in the current select
+ - and it depends on.
+ This list will be useful in certain other subquery optimizations as well,
+ it is cheap to collect it in fix_fields() phase, so it will be collected
+ for every subquery predicate.
+
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
------------------------------------------------------------
-=-=(View All Progress Notes, 26 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
* What is described above will not be able to eliminate this outer join
create unique index idx on tableB (id, fromDate);
...
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (select max(sub.fromDate)
from tableB sub where sub.id = A.id);
This is because condition "B.fromDate= func(tableB)" cannot be used.
Reason#1: update_ref_and_keys() does not consider such conditions to
be of any use (and indeed they are not usable for ref access)
so they are not put into KEYUSE array.
Reason#2: even if they were put there, we would need to be able to tell
between predicates like
B.fromDate= func(B.id) // guarantees only one matching row as
// B.id is already bound by B.id=A.id
// hence B.fromDate becomes bound too.
and
"B.fromDate= func(B.*)" // Can potentially have many matching
// records.
We need to
- Have update_ref_and_keys() create KEYUSE elements for such equalities
- Have eliminate_tables() and friends make a more accurate check.
The right check is to check whether all parts of a unique key are bound.
If we have keypartX to be bound, then t.keypartY=func(keypartX) makes
keypartY to be bound.
The difficulty here is that correlated subquery predicate cannot tell what
columns it depends on (it only remembers tables).
Traversing the predicate is expensive and complicated.
We're leaning towards making each subquery predicate have a List<Item> with
items that
- are in the current select
- and it depends on.
This list will be useful in certain other subquery optimizations as well,
it is cheap to collect it in fix_fields() phase, so it will be collected
for every subquery predicate.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
------------------------------------------------------------
-=-=(View All Progress Notes, 25 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 02:48)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27792 2009-06-18 02:48:45.000000000 +0300
+++ /tmp/wklog.17.new.27792 2009-06-18 02:48:45.000000000 +0300
@@ -89,14 +89,14 @@
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
+then compare run times and make a conclusion about whether dbms supports table
+elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
-- Re-check how this works with equality propagation.
-
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
@@ -141,8 +141,13 @@
7. Additional issues
--------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
+* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+ Yes. Current approach: when removing an outer join nest, walk the ON clause
+ and mark subselects as eliminated. Then let EXPLAIN code check if the
+ SELECT was eliminated before the printing (EXPLAIN is generated by doing
+ a recursive descent, so the check will also cause children of eliminated
+ selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
------------------------------------------------------------
-=-=(View All Progress Notes, 25 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
then compare run times and make a conclusion about whether dbms supports table
elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within outer join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
Yes. Current approach: when removing an outer join nest, walk the ON clause
and mark subselects as eliminated. Then let EXPLAIN code check if the
SELECT was eliminated before the printing (EXPLAIN is generated by doing
a recursive descent, so the check will also cause children of eliminated
selects not to be printed)
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
------------------------------------------------------------
-=-=(View All Progress Notes, 24 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply@askmonty.org 18 Jun '09
by worklog-noreply@askmonty.org 18 Jun '09
18 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Thu, 18 Jun 2009, 02:24)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27162 2009-06-18 02:24:14.000000000 +0300
+++ /tmp/wklog.17.new.27162 2009-06-18 02:24:14.000000000 +0300
@@ -83,9 +83,12 @@
5. Tests and benchmarks
-----------------------
-Should create a benchmark in sql-bench which checks if the dbms has table
+Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
-TODO elaborate
+[According to Monty] Run
+ - queries that would use elimination
+ - queries that are very similar to one above (so that they would have same
+ QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
@@ -109,33 +112,37 @@
6.2 Resolved
~~~~~~~~~~~~
-- outer->inner join conversion is not a problem for table elimination.
+* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
-7. Additional issues
---------------------
-* We remove ON clauses within semi-join nests. If these clauses contain
- subqueries, they probably should be gone from EXPLAIN output also?
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-* Aggregate functions report they depend on all tables, that is,
+* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
- always. If we want table elimination to work in presence of grouping, need
- to devise some other way of analyzing aggregate functions.
+ always. Fixed it, now aggregate function reports it depends on
+ tables that its arguments depend on. In particular, COUNT(*) reports
+ that it depends on no tables (item_count_star->used_tables()==0).
+ One consequence of that is that "item->used_tables()==0" is not
+ equivalent to "item->const_item()==true" anymore (not sure if it's
+ "anymore" or this has been already happening).
+
+* EXPLAIN EXTENDED warning text was generated after the JOIN object has
+ been discarded. This didn't allow to use information about join plan
+ when printing the warning. Fixed this by keeping the JOIN objects until
+ we've printed the warning (have also an intent to remove the const
+ tables from the join output).
-* Should eliminated tables be shown in EXPLAIN EXTENDED?
- - If we just ignore the question, they will be shown
- - this is what happens for constant tables, too.
- - I don't see how showing them could be of any use. They only make it
- harder to read the rewritten query.
- It turns out that
- - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
- lifetime) changes.
- - it is hard to have it show per-execution data. This is because the warning
- text is generated after the execution structures have been destroyed.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
@@ -143,8 +150,6 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+ from user/EXPLAIN point of view: no. constant table is the one that we read
+ one record from. eliminated table is the one that we don't acccess at all.
-* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- - affected tables must not be eliminated
- - tables that are used on the right side of the SET x=y assignments must
- not be eliminated either.
-=-=(Guest - Tue, 16 Jun 2009, 17:01)=-=-
Dependency deleted: 29 no longer depends on 17
-=-=(Guest - Wed, 10 Jun 2009, 01:23)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1842 2009-06-10 01:23:42.000000000 +0300
+++ /tmp/wklog.17.new.1842 2009-06-10 01:23:42.000000000 +0300
@@ -131,6 +131,11 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+ It turns out that
+ - it is easy to have EXPLAIN EXTENDED show permanent (once-per-statement
+ lifetime) changes.
+ - it is hard to have it show per-execution data. This is because the warning
+ text is generated after the execution structures have been destroyed.
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
-=-=(Guest - Wed, 03 Jun 2009, 22:01)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.21801 2009-06-03 22:01:34.000000000 +0300
+++ /tmp/wklog.17.new.21801 2009-06-03 22:01:34.000000000 +0300
@@ -1,3 +1,6 @@
+The code (currently in development) is at lp:
+~maria-captains/maria/maria-5.1-table-elimination tree.
+
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
-=-=(Guest - Wed, 03 Jun 2009, 15:04)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.20378 2009-06-03 15:04:54.000000000 +0300
+++ /tmp/wklog.17.new.20378 2009-06-03 15:04:54.000000000 +0300
@@ -135,3 +135,8 @@
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
+
+* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
+ - affected tables must not be eliminated
+ - tables that are used on the right side of the SET x=y assignments must
+ not be eliminated either.
-=-=(Psergey - Wed, 03 Jun 2009, 12:07)=-=-
Dependency created: 29 now depends on 17
-=-=(Guest - Tue, 02 Jun 2009, 00:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.23548 2009-06-02 00:54:13.000000000 +0300
+++ /tmp/wklog.17.new.23548 2009-06-02 00:54:13.000000000 +0300
@@ -128,3 +128,10 @@
- this is what happens for constant tables, too.
- I don't see how showing them could be of any use. They only make it
harder to read the rewritten query.
+
+* Table elimination is performed after constant table detection (but before
+ the range analysis). Constant tables are technically different from
+ eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
+ Considering we've already done the join_read_const_table() call, is there any
+ real difference between constant table and eliminated one? If there is, should
+ we mark const tables also as eliminated?
-=-=(Psergey - Mon, 01 Jun 2009, 20:46)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.17448 2009-06-01 20:46:40.000000000 +0300
+++ /tmp/wklog.17.new.17448 2009-06-01 20:46:40.000000000 +0300
@@ -122,3 +122,9 @@
always. If we want table elimination to work in presence of grouping, need
to devise some other way of analyzing aggregate functions.
+
+* Should eliminated tables be shown in EXPLAIN EXTENDED?
+ - If we just ignore the question, they will be shown
+ - this is what happens for constant tables, too.
+ - I don't see how showing them could be of any use. They only make it
+ harder to read the rewritten query.
-=-=(Guest - Mon, 01 Jun 2009, 12:49)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.32202 2009-06-01 12:49:15.000000000 +0300
+++ /tmp/wklog.17.new.32202 2009-06-01 12:49:15.000000000 +0300
@@ -8,7 +8,7 @@
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
-
+7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
@@ -116,3 +116,9 @@
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
+* Aggregate functions report they depend on all tables, that is,
+
+ item_agg_func->used_tables() == (1ULL << join->tables) - 1
+
+ always. If we want table elimination to work in presence of grouping, need
+ to devise some other way of analyzing aggregate functions.
-=-=(Guest - Fri, 29 May 2009, 00:45)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.1348 2009-05-29 00:45:21.000000000 +0300
+++ /tmp/wklog.17.new.1348 2009-05-29 00:45:21.000000000 +0300
@@ -111,3 +111,8 @@
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
+7. Additional issues
+--------------------
+* We remove ON clauses within semi-join nests. If these clauses contain
+ subqueries, they probably should be gone from EXPLAIN output also?
+
------------------------------------------------------------
-=-=(View All Progress Notes, 24 total)=-=-
http://askmonty.org/worklog/index.pl?tid=17&nolimit=1
DESCRIPTION:
Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.
Example:
CREATE TABLE B (id int primary key);
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
In this case we can remove table B and the join from the query.
HIGH-LEVEL SPECIFICATION:
Here is an extended explanation of table elimination.
Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.
Basically, what table elimination does, is to remove tables from the
execution plan when it is unnecessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id;
When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
still contain it's original value. The not seen B.* row would contain all NULL:s.
However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what the result will look like is to actually touch both tables during
execution.
Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unnecessary. We can remove the whole join operation from the execution
plan.
Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in the case described above. Let us look at a more advanced query, where
Oracle fails.
select
A.colA
from
tableA A
left outer join
tableB B
on
B.id = A.id
and
B.fromDate = (
select
max(sub.fromDate)
from
tableB sub
where
sub.id = A.id
);
In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).
Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.
SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.
LOW-LEVEL DESIGN:
The code (currently in development) is at lp:
~maria-captains/maria/maria-5.1-table-elimination tree.
<contents>
1. Conditions for removal
1.1 Quick check if there are candidates
2. Removal operation properties
3. Removal operation
4. User interface
5. Tests and benchmarks
6. Todo, issues to resolve
6.1 To resolve
6.2 Resolved
7. Additional issues
</contents>
It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.
1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
exactly one record.
2. There are no references to columns of the inner tables anywhere else in
the query.
#1 means that every table inside the outer join nest is:
- is a constant table:
= because it can be accessed via eq_ref(const) access, or
= it is a zero-rows or one-row MyISAM-like table [MARK1]
- has an eq_ref access method candidate.
#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
GROUP BY and HAVING do not refer to the inner tables of the outer join
nest.
1.1 Quick check if there are candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before we start to enumerate join nests, here is a quick way to check if
there *can be* something to be removed:
if ((tables used in select_list |
tables used in group/order by UNION |
tables used in where) != bitmap_of_all_tables)
{
attempt table elimination;
}
2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within
our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
$OJ->embedding->nested_join. Update table_map's of all ancestor nested
joins). [MARK2]
* Move the tables and their JOIN_TABs to front like it is done with const
tables, with exception that if eliminated outer join nest was within
another outer join nest, that shouldn't prevent us from moving away the
eliminated tables.
* Update join->table_count and all-join-tables bitmap.
* That's it. Nothing else?
4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
name: 'table_elimination'.
(Note ^^ utility of the above questioned ^, as table elimination can never
be worse than no elimination. We're leaning towards not adding the flag)
* EXPLAIN will not show the removed tables at all. This will allow to check
if tables were removed, and also will behave nicely with anchor model and
VIEWs: stuff that user doesn't care about just won't be there.
5. Tests and benchmarks
-----------------------
Create a benchmark in sql-bench which checks if the DBMS has table
elimination.
[According to Monty] Run
- queries that would use elimination
- queries that are very similar to one above (so that they would have same
QEP, execution cost, etc) but cannot use table elimination.
6. Todo, issues to resolve
--------------------------
6.1 To resolve
~~~~~~~~~~~~~~
- Re-check how this works with equality propagation.
- Relationship with prepared statements.
On one hand, it's natural to desire to make table elimination a
once-per-statement operation, like outer->inner join conversion. We'll have
to limit the applicability by removing [MARK1] as that can change during
lifetime of the statement.
The other option is to do table elimination every time. This will require to
rework operation [MARK2] to be undoable.
I'm leaning towards doing the former. With anchor modeling, it is unlikely
that we'll meet outer joins which have N inner tables of which some are 1-row
MyISAM tables that do not have primary key.
6.2 Resolved
~~~~~~~~~~~~
* outer->inner join conversion is not a problem for table elimination.
We make outer->inner conversions based on predicates in WHERE. If the WHERE
referred to an inner table (requirement for OJ->IJ conversion) then table
elimination would not be applicable anyway.
* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause:
- affected tables must not be eliminated
- tables that are used on the right side of the SET x=y assignments must
not be eliminated either.
* Aggregate functions used to report that they depend on all tables, that is,
item_agg_func->used_tables() == (1ULL << join->tables) - 1
always. Fixed it, now aggregate function reports it depends on
tables that its arguments depend on. In particular, COUNT(*) reports
that it depends on no tables (item_count_star->used_tables()==0).
One consequence of that is that "item->used_tables()==0" is not
equivalent to "item->const_item()==true" anymore (not sure if it's
"anymore" or this has been already happening).
* EXPLAIN EXTENDED warning text was generated after the JOIN object has
been discarded. This didn't allow to use information about join plan
when printing the warning. Fixed this by keeping the JOIN objects until
we've printed the warning (have also an intent to remove the const
tables from the join output).
7. Additional issues
--------------------
* We remove ON clauses within semi-join nests. If these clauses contain
subqueries, they probably should be gone from EXPLAIN output also?
* Table elimination is performed after constant table detection (but before
the range analysis). Constant tables are technically different from
eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't).
Considering we've already done the join_read_const_table() call, is there any
real difference between constant table and eliminated one? If there is, should
we mark const tables also as eliminated?
from user/EXPLAIN point of view: no. constant table is the one that we read
one record from. eliminated table is the one that we don't acccess at all.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2717)
by Sergey Petrunia 17 Jun '09
by Sergey Petrunia 17 Jun '09
17 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090616204358-yjkyfxczsomrn9yn
2717 Sergey Petrunia 2009-06-17
* Use excessive parentheses to stop compiler warning
* Fix test results to account for changes in previous cset
modified:
mysql-test/r/select.result
sql/sql_select.cc
per-file messages:
mysql-test/r/select.result
* Use excessive parentheses to stop compiler warning
* Fix test results to account for changes in previous cset
sql/sql_select.cc
* Use excessive parentheses to stop compiler warning
* Fix test results to account for changes in previous cset
=== modified file 'mysql-test/r/select.result'
--- a/mysql-test/r/select.result 2009-03-16 05:02:10 +0000
+++ b/mysql-test/r/select.result 2009-06-17 05:27:39 +0000
@@ -3585,7 +3585,6 @@ INSERT INTO t2 VALUES (1,'a'),(2,'b'),(3
EXPLAIN SELECT t1.a FROM t1 LEFT JOIN t2 ON t2.b=t1.b WHERE t1.a=3;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 const PRIMARY PRIMARY 4 const 1
-1 SIMPLE t2 const b b 22 const 1 Using index
DROP TABLE t1,t2;
CREATE TABLE t1(id int PRIMARY KEY, b int, e int);
CREATE TABLE t2(i int, a int, INDEX si(i), INDEX ai(a));
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-16 19:54:13 +0000
+++ b/sql/sql_select.cc 2009-06-17 05:27:39 +0000
@@ -16963,9 +16963,9 @@ static void print_join(THD *thd,
CREATE VIEW. There we'll have nested_join->used_tables==0.
*/
if (eliminated_tables && // (*)
- (curr->table && (curr->table->map & eliminated_tables) ||
- curr->nested_join && !(curr->nested_join->used_tables &
- ~eliminated_tables)))
+ ((curr->table && (curr->table->map & eliminated_tables)) ||
+ (curr->nested_join && !(curr->nested_join->used_tables &
+ ~eliminated_tables))))
{
continue;
}
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2716)
by Sergey Petrunia 16 Jun '09
by Sergey Petrunia 16 Jun '09
16 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090616195413-rfmi9un20za8gn8g
2716 Sergey Petrunia 2009-06-17 [merge]
* Merge
* Change valgrind suppression to work on valgrind 3.3.0
modified:
mysql-test/valgrind.supp
per-file messages:
mysql-test/valgrind.supp
* Merge
* Change valgrind suppression to work on valgrind 3.3.0
=== modified file 'mysql-test/valgrind.supp'
--- a/mysql-test/valgrind.supp 2009-05-22 12:38:50 +0000
+++ b/mysql-test/valgrind.supp 2009-06-16 20:43:58 +0000
@@ -631,3 +631,73 @@
fun:malloc
fun:inet_ntoa
}
+
+
+#
+# Some problem inside glibc on Ubuntu 9.04, x86 (but not amd64):
+#
+# ==5985== 19 bytes in 1 blocks are still reachable in loss record 1 of 6
+# ==5985== at 0x7AF3FDE: malloc (vg_replace_malloc.c:207)
+# ... 11,12, or 13 functions w/o symbols ...
+# ==5985== by 0x8717185: nptl_pthread_exit_hack_handler (my_thr_init.c:55)
+#
+# Since valgrind 3.3.0 doesn't support '...' multi-function pattern, using
+# multiple suppressions:
+#
+{
+ Mem loss inside nptl_pthread_exit_hack_handler
+ Memcheck:Leak
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:nptl_pthread_exit_hack_handler
+}
+
+{
+ Mem loss inside nptl_pthread_exit_hack_handler
+ Memcheck:Leak
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:nptl_pthread_exit_hack_handler
+}
+
+{
+ Mem loss inside nptl_pthread_exit_hack_handler
+ Memcheck:Leak
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:*
+ fun:nptl_pthread_exit_hack_handler
+}
+
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2715)
by Sergey Petrunia 16 Jun '09
by Sergey Petrunia 16 Jun '09
16 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090614205924-1vnfwbuo4brzyfhp
2715 Sergey Petrunia 2009-06-16
MWL#17: Table elimination
- Move eliminate_tables() to before constant table detection.
- First code for benchmark
added:
sql-bench/test-table-elimination.sh
modified:
sql/sql_select.cc
per-file messages:
sql-bench/test-table-elimination.sh
MWL#17: Table elimination
- sql-bench "Benchmark", incomplete
sql/sql_select.cc
MWL#17: Table elimination
- Move eliminate_tables() to before constant table detection, this will allow
to spare const table reads (at a cost of not being able to take advantage of
tables that are constant because they have no records, but this case is of
lesser importance)
=== added file 'sql-bench/test-table-elimination.sh'
--- a/sql-bench/test-table-elimination.sh 1970-01-01 00:00:00 +0000
+++ b/sql-bench/test-table-elimination.sh 2009-06-16 19:54:13 +0000
@@ -0,0 +1,320 @@
+#!@PERL@
+# Test of table elimination feature
+
+use Cwd;
+use DBI;
+use Getopt::Long;
+use Benchmark;
+
+$opt_loop_count=100000;
+$opt_medium_loop_count=10000;
+$opt_small_loop_count=100;
+
+$pwd = cwd(); $pwd = "." if ($pwd eq '');
+require "$pwd/bench-init.pl" || die "Can't read Configuration file: $!\n";
+
+if ($opt_small_test)
+{
+ $opt_loop_count/=10;
+ $opt_medium_loop_count/=10;
+ $opt_small_loop_count/=10;
+}
+
+print "Testing table elimination feature\n";
+print "The test table has $opt_loop_count rows.\n\n";
+
+# A query to get the recent versions of all attributes:
+$select_current_full_facts="
+ select
+ F.id, A1.attr1, A2.attr2
+ from
+ elim_facts F
+ left join elim_attr1 A1 on A1.id=F.id
+ left join elim_attr2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ elim_attr2 where id=A2.id);
+";
+$select_current_full_facts="
+ select
+ F.id, A1.attr1, A2.attr2
+ from
+ elim_facts F
+ left join elim_attr1 A1 on A1.id=F.id
+ left join elim_attr2 A2 on A2.id=F.id and
+ A2.fromdate=(select MAX(fromdate) from
+ elim_attr2 where id=F.id);
+";
+# TODO: same as above but for some given date also?
+# TODO:
+
+
+####
+#### Connect and start timeing
+####
+
+$dbh = $server->connect();
+$start_time=new Benchmark;
+
+####
+#### Create needed tables
+####
+
+goto select_test if ($opt_skip_create);
+
+print "Creating tables\n";
+$dbh->do("drop table elim_facts" . $server->{'drop_attr'});
+$dbh->do("drop table elim_attr1" . $server->{'drop_attr'});
+$dbh->do("drop table elim_attr2" . $server->{'drop_attr'});
+
+# The facts table
+do_many($dbh,$server->create("elim_facts",
+ ["id integer"],
+ ["primary key (id)"]));
+
+# Attribute1, non-versioned
+do_many($dbh,$server->create("elim_attr1",
+ ["id integer",
+ "attr1 integer"],
+ ["primary key (id)",
+ "key (attr1)"]));
+
+# Attribute1, time-versioned
+do_many($dbh,$server->create("elim_attr2",
+ ["id integer",
+ "attr2 integer",
+ "fromdate date"],
+ ["primary key (id, fromdate)",
+ "key (attr2,fromdate)"]));
+
+#NOTE: ignoring: if ($limits->{'views'})
+$dbh->do("drop view elim_current_facts");
+$dbh->do("create view elim_current_facts as $select_current_full_facts");
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE");
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(1,\$dbh);
+}
+
+####
+#### Fill the facts table
+####
+$n_facts= $opt_loop_count;
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into facts table\n";
+$loop_time=new Benchmark;
+
+$query="insert into elim_facts values (";
+for ($id=0; $id < $n_facts ; $id++)
+{
+ do_query($dbh,"$query $id)");
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Fill attr1 table
+####
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into attr1 table\n";
+$loop_time=new Benchmark;
+
+$query="insert into elim_attr1 values (";
+for ($id=0; $id < $n_facts ; $id++)
+{
+ $attr1= ceil(rand($n_facts));
+ do_query($dbh,"$query $id, $attr1)");
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Fill attr2 table
+####
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->{AutoCommit} = 0;
+}
+
+print "Inserting $n_facts rows into attr2 table\n";
+$loop_time=new Benchmark;
+
+for ($id=0; $id < $n_facts ; $id++)
+{
+ # Two values for each $id - current one and obsolete one.
+ $attr1= ceil(rand($n_facts));
+ $query="insert into elim_attr2 values ($id, $attr1, now())";
+ do_query($dbh,$query);
+ $query="insert into elim_attr2 values ($id, $attr1, '2009-01-01')";
+ do_query($dbh,$query);
+}
+
+if ($opt_fast && $server->{transactions})
+{
+ $dbh->commit;
+ $dbh->{AutoCommit} = 1;
+}
+
+$end_time=new Benchmark;
+print "Time to insert ($n_facts): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n\n";
+
+####
+#### Finalize the database population
+####
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"UNLOCK TABLES");
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(0,\$dbh,["elim_facts", "elim_attr1", "elim_attr2"]);
+}
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE");
+}
+
+####
+#### Do some selects on the table
+####
+
+select_test:
+
+#
+# The selects will be:
+# - N pk-lookups with all attributes
+# - pk-attribute-based lookup
+# - latest-attribute value based lookup.
+
+
+###
+### Bare facts select:
+###
+print "testing bare facts facts table\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= ceil(rand($n_facts));
+ $rows+=fetch_all_rows($dbh,"select * from elim_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_bare_facts ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+
+###
+### Full facts select, no elimination:
+###
+print "testing full facts facts table\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select * from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_two_attributes ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+###
+### Now with elimination: select only only one fact
+###
+print "testing selection of one attribute\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select id, attr1 from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_one_attribute ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+###
+### Now with elimination: select only only one fact
+###
+print "testing selection of one attribute\n";
+$loop_time=new Benchmark;
+$rows=0;
+for ($i=0 ; $i < $opt_medium_loop_count ; $i++)
+{
+ $val= rand($n_facts);
+ $rows+=fetch_all_rows($dbh,"select id, attr2 from elim_current_facts where id=$val");
+}
+$count=$i;
+
+$end_time=new Benchmark;
+print "time for select_one_attribute ($count:$rows): " .
+ timestr(timediff($end_time, $loop_time),"all") . "\n";
+
+
+###
+### TODO...
+###
+
+;
+
+####
+#### End of benchmark
+####
+
+if ($opt_lock_tables)
+{
+ do_query($dbh,"UNLOCK TABLES");
+}
+if (!$opt_skip_delete)
+{
+ do_query($dbh,"drop table elim_facts, elim_attr1, elim_attr2" . $server->{'drop_attr'});
+}
+
+if ($opt_fast && defined($server->{vacuum}))
+{
+ $server->vacuum(0,\$dbh);
+}
+
+$dbh->disconnect; # close connection
+
+end_benchmark($start_time);
+
=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc 2009-06-14 20:59:24 +0000
+++ b/sql/sql_select.cc 2009-06-16 19:54:13 +0000
@@ -2959,22 +2959,28 @@ make_join_statistics(JOIN *join, TABLE_L
/* Read tables with 0 or 1 rows (system tables) */
join->const_table_map= 0;
+
+ eliminate_tables(join, &const_count, &found_const_table_map);
+ join->const_table_map= found_const_table_map;
for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
p_pos < p_end ;
p_pos++)
{
- int tmp;
s= p_pos->table;
- s->type=JT_SYSTEM;
- join->const_table_map|=s->table->map;
- if ((tmp=join_read_const_table(s, p_pos)))
+ if (! (s->table->map & join->eliminated_tables))
{
- if (tmp > 0)
- goto error; // Fatal error
+ int tmp;
+ s->type=JT_SYSTEM;
+ join->const_table_map|=s->table->map;
+ if ((tmp=join_read_const_table(s, p_pos)))
+ {
+ if (tmp > 0)
+ goto error; // Fatal error
+ }
+ else
+ found_const_table_map|= s->table->map;
}
- else
- found_const_table_map|= s->table->map;
}
/* loop until no more const tables are found */
@@ -2999,7 +3005,8 @@ make_join_statistics(JOIN *join, TABLE_L
substitution of a const table the key value happens to be null
then we can state that there are no matches for this equi-join.
*/
- if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map)
+ if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map &&
+ !(table->map & join->eliminated_tables))
{
/*
When performing an outer join operation if there are no matching rows
@@ -3135,7 +3142,7 @@ make_join_statistics(JOIN *join, TABLE_L
}
//psergey-todo: table elimination
- eliminate_tables(join, &const_count, &found_const_table_map);
+ //eliminate_tables(join, &const_count, &found_const_table_map);
//:psergey-todo
/* Calc how many (possible) matched records in each table */
@@ -16517,7 +16524,7 @@ static void select_describe(JOIN *join,
quick_type= -1;
- //psergey-todo:
+ /* Don't show eliminated tables */
if (table->map & join->eliminated_tables)
{
used_tables|=table->map;
1
0
[Maria-developers] Updated (by Guest): Backporting pool of threads to MariaDB (6)
by worklog-noreply@askmonty.org 15 Jun '09
by worklog-noreply@askmonty.org 15 Jun '09
15 Jun '09
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Backporting pool of threads to MariaDB
CREATION DATE..: Mon, 09 Mar 2009, 17:21
SUPERVISOR.....: Monty
IMPLEMENTOR....: Monty
COPIES TO......: Monty
CATEGORY.......: Server-Sprint
TASK ID........: 6 (http://askmonty.org/worklog/?tid=6)
VERSION........: Server-9.x
STATUS.........: Complete
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 8 (hours remain)
ORIG. ESTIMATE.: 8
PROGRESS NOTES:
-=-=(Guest - Mon, 15 Jun 2009, 22:06)=-=-
Version updated.
--- /tmp/wklog.6.old.487 2009-06-15 22:06:59.000000000 +0300
+++ /tmp/wklog.6.new.487 2009-06-15 22:06:59.000000000 +0300
@@ -1 +1 @@
-WorkLog-3.4
+Server-9.x
-=-=(Guest - Tue, 21 Apr 2009, 16:39)=-=-
Version updated.
--- /tmp/wklog.6.old.24673 2009-04-21 16:39:20.000000000 +0300
+++ /tmp/wklog.6.new.24673 2009-04-21 16:39:20.000000000 +0300
@@ -1 +1 @@
-Server-5.1
+WorkLog-3.4
-=-=(Monty - Thu, 26 Mar 2009, 00:32)=-=-
Privacy level updated.
--- /tmp/wklog.6.old.6586 2009-03-26 00:32:23.000000000 +0200
+++ /tmp/wklog.6.new.6586 2009-03-26 00:32:23.000000000 +0200
@@ -1 +1 @@
-y
+n
-=-=(Monty - Thu, 26 Mar 2009, 00:31)=-=-
Supervisor updated.
--- /tmp/wklog.6.old.6580 2009-03-26 00:31:30.000000000 +0200
+++ /tmp/wklog.6.new.6580 2009-03-26 00:31:30.000000000 +0200
@@ -1 +1 @@
-Knielsen
+Monty
-=-=(Monty - Fri, 13 Mar 2009, 02:43)=-=-
Low Level Design modified.
--- /tmp/wklog.6.old.26076 2009-03-13 02:43:17.000000000 +0200
+++ /tmp/wklog.6.new.26076 2009-03-13 02:43:17.000000000 +0200
@@ -1 +1,20 @@
+To be able to work with both one-thread-per-connection and pool-of-threads at
+the same time, I added a new global scheduler variable 'extra_thread_scheduler'
+that is always using the one-thread-per-connection method.
+
+To the THD structure was added a pointer to the 'scheduler' variable that should
+be used for this connection.
+
+To do easy handing of two connect counter and two max_connection variables, I
+added pointer to these pointer in the scheduler variable.:
+
+Other changes was:
+
+- If extra-port was <> 0, start listing to this port too
+- At connect time, set THD->scheduler to point to the given scheduler (based on
+the port that was used to connect)
+- Change some calls that was done trough functions pointer in the scheduler to
+instead use thd->scheduler->
+- Change max_connections to *thd->scheduler->max_connections
+- Change connection_count to *thd->scheduler->connection_count
-=-=(Monty - Fri, 13 Mar 2009, 02:29)=-=-
Version updated.
--- /tmp/wklog.6.old.25818 2009-03-13 02:29:16.000000000 +0200
+++ /tmp/wklog.6.new.25818 2009-03-13 02:29:16.000000000 +0200
@@ -1 +1 @@
-Server-9.x
+Server-5.1
-=-=(Monty - Fri, 13 Mar 2009, 02:29)=-=-
Status updated.
--- /tmp/wklog.6.old.25818 2009-03-13 02:29:16.000000000 +0200
+++ /tmp/wklog.6.new.25818 2009-03-13 02:29:16.000000000 +0200
@@ -1 +1 @@
-Assigned
+Complete
-=-=(Monty - Fri, 13 Mar 2009, 02:28)=-=-
High Level Description modified.
--- /tmp/wklog.6.old.25790 2009-03-13 02:28:25.000000000 +0200
+++ /tmp/wklog.6.new.25790 2009-03-13 02:28:25.000000000 +0200
@@ -8,3 +8,6 @@
Add option --extra-port to allow connections with old one-thread-per-connection
method. This is needed to allow root to login and kill threads if something
goes wrong.
+Add option --extra-max-connections to regulate how many connections can be made
+to 'extra-port'. This should work in a similar way as 'max-connections', in the
+way that one connection is reserved for a SUPER user.
-=-=(Knielsen - Mon, 09 Mar 2009, 19:02)=-=-
Version updated.
--- /tmp/wklog.6.old.10740 2009-03-09 19:02:38.000000000 +0200
+++ /tmp/wklog.6.new.10740 2009-03-09 19:02:38.000000000 +0200
@@ -1 +1 @@
-WorkLog-3.4
+Server-9.x
-=-=(Knielsen - Mon, 09 Mar 2009, 19:02)=-=-
Title modified.
--- /tmp/wklog.6.old.10740 2009-03-09 19:02:38.000000000 +0200
+++ /tmp/wklog.6.new.10740 2009-03-09 19:02:38.000000000 +0200
@@ -1 +1 @@
-Backporting pool of threads tro MariaDB
+Backporting pool of threads to MariaDB
DESCRIPTION:
Back porting pool of threads to MariaDB
We will use code for Maria 6.0, with the following extensions:
Add option: --test-ignore-wrong-options to ignore errors in enum values for
testing pool-of-threads. (Better than having --pool-of-threads command line
option just for testing)
Add option --extra-port to allow connections with old one-thread-per-connection
method. This is needed to allow root to login and kill threads if something
goes wrong.
Add option --extra-max-connections to regulate how many connections can be made
to 'extra-port'. This should work in a similar way as 'max-connections', in the
way that one connection is reserved for a SUPER user.
LOW-LEVEL DESIGN:
To be able to work with both one-thread-per-connection and pool-of-threads at
the same time, I added a new global scheduler variable 'extra_thread_scheduler'
that is always using the one-thread-per-connection method.
To the THD structure was added a pointer to the 'scheduler' variable that should
be used for this connection.
To do easy handing of two connect counter and two max_connection variables, I
added pointer to these pointer in the scheduler variable.:
Other changes was:
- If extra-port was <> 0, start listing to this port too
- At connect time, set THD->scheduler to point to the given scheduler (based on
the port that was used to connect)
- Change some calls that was done trough functions pointer in the scheduler to
instead use thd->scheduler->
- Change max_connections to *thd->scheduler->max_connections
- Change connection_count to *thd->scheduler->connection_count
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (psergey:2715)
by Sergey Petrunia 15 Jun '09
by Sergey Petrunia 15 Jun '09
15 Jun '09
#At lp:maria based on revid:psergey@askmonty.org-20090614205924-1vnfwbuo4brzyfhp
2715 Sergey Petrunia 2009-06-15
Fix spurious valgrind warnings in rpl_trigger.test
modified:
mysql-test/valgrind.supp
per-file messages:
mysql-test/valgrind.supp
Fix spurious valgrind warnings in rpl_trigger.test
=== modified file 'mysql-test/valgrind.supp'
--- a/mysql-test/valgrind.supp 2009-05-22 12:38:50 +0000
+++ b/mysql-test/valgrind.supp 2009-06-15 16:22:08 +0000
@@ -631,3 +631,13 @@
fun:malloc
fun:inet_ntoa
}
+
+#
+# Some problem inside glibc on Ubuntu 9.04, x86 (but not amd64)
+#
+{
+ Mem loss inside nptl_pthread_exit_hack_handler
+ Memcheck:Leak
+ ...
+ fun:nptl_pthread_exit_hack_handler
+}
1
0
[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2713)
by knielsen@knielsen-hq.org 15 Jun '09
by knielsen@knielsen-hq.org 15 Jun '09
15 Jun '09
#At lp:maria
2713 knielsen(a)knielsen-hq.org 2009-06-15
Cherry-pick revid:psergey@askmonty.org-20090608135546-ut1yrzbah4gdw6e6
from Sergey's table-elimination branch to get a clean Valgrind.
added:
strings/strmov_overlapp.c
modified:
include/m_string.h
libmysql/Makefile.shared
strings/Makefile.am
=== modified file 'include/m_string.h'
--- a/include/m_string.h 2009-05-06 12:03:24 +0000
+++ b/include/m_string.h 2009-06-15 11:01:35 +0000
@@ -98,7 +98,8 @@ extern const double log_10[309];
#ifdef BAD_STRING_COMPILER
#define strmov(A,B) (memccpy(A,B,0,INT_MAX)-1)
#else
-#define strmov_overlapp(A,B) strmov(A,B)
+extern char *strmov_overlapp(char *dest, const char *src);
+/* Warning: the following is likely not to work: */
#define strmake_overlapp(A,B,C) strmake(A,B,C)
#endif
=== modified file 'libmysql/Makefile.shared'
--- a/libmysql/Makefile.shared 2008-04-28 16:24:05 +0000
+++ b/libmysql/Makefile.shared 2009-06-15 11:01:35 +0000
@@ -46,7 +46,8 @@ mystringsobjects = strmov.lo strxmov.lo
ctype-win1250ch.lo ctype-utf8.lo ctype-extra.lo \
ctype-ucs2.lo ctype-gb2312.lo ctype-gbk.lo \
ctype-sjis.lo ctype-tis620.lo ctype-ujis.lo \
- ctype-uca.lo xml.lo my_strtoll10.lo str_alloc.lo
+ ctype-uca.lo xml.lo my_strtoll10.lo str_alloc.lo \
+ strmov_overlapp.lo
mystringsextra= strto.c
dbugobjects = dbug.lo # IT IS IN SAFEMALLOC.C sanity.lo
=== modified file 'strings/Makefile.am'
--- a/strings/Makefile.am 2009-03-24 13:58:52 +0000
+++ b/strings/Makefile.am 2009-06-15 11:01:35 +0000
@@ -21,19 +21,19 @@ pkglib_LIBRARIES = libmystrings.a
# Exact one of ASSEMBLER_X
if ASSEMBLER_x86
ASRCS = strings-x86.s longlong2str-x86.s my_strtoll10-x86.s
-CSRCS = bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c str_alloc.c longlong2str_asm.c my_strchr.c
+CSRCS = bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c str_alloc.c longlong2str_asm.c my_strchr.c strmov_overlapp.c
else
if ASSEMBLER_sparc32
# These file MUST all be on the same line!! Otherwise automake
# generats a very broken makefile
ASRCS = bmove_upp-sparc.s strappend-sparc.s strend-sparc.s strinstr-sparc.s strmake-sparc.s strmov-sparc.s strnmov-sparc.s strstr-sparc.s
-CSRCS = strcont.c strfill.c strcend.c is_prefix.c longlong2str.c bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c strxmov.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c my_strtoll10.c str_alloc.c my_strchr.c
+CSRCS = strcont.c strfill.c strcend.c is_prefix.c longlong2str.c bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c strxmov.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c my_strtoll10.c str_alloc.c my_strchr.c strmov_overlapp.c
else
#no assembler
ASRCS =
# These file MUST all be on the same line!! Otherwise automake
# generats a very broken makefile
-CSRCS = strxmov.c bmove_upp.c strappend.c strcont.c strend.c strfill.c strcend.c is_prefix.c strstr.c strinstr.c strmake.c strnmov.c strmov.c longlong2str.c bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c my_strtoll10.c str_alloc.c my_strchr.c
+CSRCS = strxmov.c bmove_upp.c strappend.c strcont.c strend.c strfill.c strcend.c is_prefix.c strstr.c strinstr.c strmake.c strnmov.c strmov.c longlong2str.c bfill.c bmove.c bmove512.c bchange.c strxnmov.c int2str.c str2int.c r_strinstr.c strtod.c bcmp.c strtol.c strtoul.c strtoll.c strtoull.c llstr.c strnlen.c ctype.c ctype-simple.c ctype-mb.c ctype-big5.c ctype-cp932.c ctype-czech.c ctype-eucjpms.c ctype-euc_kr.c ctype-gb2312.c ctype-gbk.c ctype-sjis.c ctype-tis620.c ctype-ujis.c ctype-utf8.c ctype-ucs2.c ctype-uca.c ctype-win1250ch.c ctype-bin.c ctype-latin1.c my_vsnprintf.c xml.c decimal.c ctype-extra.c my_strtoll10.c str_alloc.c my_strchr.c strmov_overlapp.c
endif
endif
@@ -54,7 +54,7 @@ EXTRA_DIST = ctype-big5.c ctype-cp932.c
strinstr-sparc.s strmake-sparc.s strmov-sparc.s \
strnmov-sparc.s strstr-sparc.s strxmov-sparc.s \
t_ctype.h my_strchr.c CMakeLists.txt \
- CHARSET_INFO.txt
+ CHARSET_INFO.txt strmov_overlapp.c
libmystrings_a_LIBADD=
conf_to_src_SOURCES = conf_to_src.c xml.c ctype.c bcmp.c
=== added file 'strings/strmov_overlapp.c'
--- a/strings/strmov_overlapp.c 1970-01-01 00:00:00 +0000
+++ b/strings/strmov_overlapp.c 2009-06-15 11:01:35 +0000
@@ -0,0 +1,26 @@
+/* Copyright (C) 2000 MySQL AB
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; version 2 of the License.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
+
+#include <my_global.h>
+#include "m_string.h"
+
+/* A trivial implementation */
+char *strmov_overlapp(char *dst, const char *src)
+{
+ size_t len= strlen(src);
+ memmove(dst, src, len+1);
+ return dst+len;
+}
+
1
0